Hi all, We've got a solrcloud cluster set up on 6.3.0 with the BasicAuthentication plugin enabled. All of the hosts are time synchronized using ntp and are on the same network switch.
We're periodically experiencing issues where follower replicas are put into down states by the leader in the case of requests that failed due to invalid timestamps. To minimize the issue we've increased the pkiauth.ttl value to 10000, and that seems to have taken care of most of the occurrences. As vague as the question is, is there anything specific with solr that we could look into that would affect the requests having invalid keys? We are working on tracking ntp's performance in case there was some sort of lapse, but everything we've seen puts the hosts within around 20 milliseconds of each other at worst. Possibly related but only noticed yesterday. A request for recovery was sent from a leader to a follower replica and it didn't seem to have an authorization header, and the wrong user was chosen. 2017-12-19 23:10:44.764 INFO (qtp759156157-8224123) [ ] o.a.s.s.RuleBasedAuthorizationPlugin This resource is configured to have a permission { "name":"core-admin-edit", "role":"admin"}, The principal [principal: solrwriter] does not have the right role 2017-12-19 23:10:44.765 INFO (qtp759156157-8224123) [ ] o.a.s.s.HttpSolrCall USER_REQUIRED auth header null context : userPrincipal: [[principal: solrwriter]] type: [ADMIN], collections: [], Path: [/admin/cores] path : /admin/cores params :core=Feeds_shard11_replica2&action=REQUESTRECOVERY&wt=javabin&version=2 How does solr determine what user/authentication to use for inter-node requests? Are there any of the predefined permissions that we shouldn't have assigned to a user that are causing this? Thanks, Chris