Overseer could not get tags

Chris Ulicny Wed, 17 Oct 2018 05:34:06 -0700

Hi all,

Recently in a 7.4.0 test cluster, we ran into SOLR-12814
<https://issues.apache.org/jira/browse/SOLR-12814> which we fixed by
slightly increasing the request header size. However, there were some other
log messages along with the "URI size >8192" message which we thought were
related, but have not abated since increasing the header size. A full
shutdown of the solr processes and bringing them back up one at a time did
not solve the issue.


The overseer node seems to not be authenticating any of the requests to
/solr/admin/metrics on any node (including itself). Every minute, there are
two warning per node

10/17/2018, 7:53:45 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host1:port1_solr
10/17/2018, 7:53:45 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host1:port1_solr
10/17/2018, 7:53:46 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host2:port2_solr
10/17/2018, 7:53:46 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host2:port2_solr
10/17/2018, 7:53:46 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host3:port3_solr
10/17/2018, 7:53:46 AM    WARN    SolrClientNodeStateProvider    could not
get tags from node host3:port3_solr

There are two slightly different stack traces that appear with each pair:
https://pastebin.com/2Z1C5rXr

The warning message possibly comes from
solrj.impl.SolrClientNodeStateProvider.fetchMetrics which both of the
attempted requests call in their stack trace.

However, we already have a 7.4.0 production cluster running that also has
security enabled with similar replica density where we have not seen this
issue.

*Test:*
-- 10 collections (9 with 2 shards, 1 with 43 shards)
-- replication factor of 2 for all collections
-- 3 hosts with 40 or 41 replicas each

*Production:*
-- 9 collections with 14 shards
-- replication factor of 2 for all collections
-- 7 hosts with 36 replicas each

I've enabled TRACE logging in our test environment on most options related
to metrics and authentication. So far the only new message I've gotten is
the challenge from the target server for the necessary credentials right
before the warning and stack trace.

2018-10-17 12:20:46.368 DEBUG (MetricsHistoryHandler-8-thread-1) [   ]
o.a.h.i.a.HttpAuthenticator Authentication required
2018-10-17 12:20:46.368 DEBUG (MetricsHistoryHandler-8-thread-1) [   ]
o.a.h.i.a.HttpAuthenticator host3:port3 requested authentication

I suspect the creation and balancing of the large collection on test might
have something to do with it since the problem started happening after
that.

Are there any other specific log settings I should turn on that might
produce some useful information?

Thanks,
Chris

Overseer could not get tags

Reply via email to