Re: Collections API timeout

2019-06-10 Thread Софія Строчик
Yes, I've checked them and all nodes are pointing to the sme IP and the
same port (2181). Also all of them are visible in the SolrCloud Graph
section so this would mean they are part of the same cloud.

Largest file is solrconfig which is 58K so this shouldn't be a problem
either.
The potential problem I see is one of the nodes which is loading cloud info
(specifically the *Tree *section) slower than the others.
It reaches the admin interface timeout and displays the "Connection to Solr
lost" message when accessed from UI.
But the same request
(http://ip:port/solr/admin/zookeeper?_=1560198364377=json)
works if issued from comand line.

I've checked the logs of the corresponding instance and can see entries
like this on startup:
2019-06-10 20:26:17.061 ERROR (qtp1335503880-18) [   ]
o.a.s.c.c.ZkStateReader Collection collection2 is not lazy or watched!
The other instances don't have these messages so maybe it is related to the
loading issue, but I'm also not sure about this because I can't find any
further information on this error.

пн, 10 черв. 2019 о 22:05 Erick Erickson  пише:

> Hmmm, I didn’t really look carefully at the end of your e-mail. There not
> being an /overseer znode _looks_ like one or more of your Solr nodes isn’t
> connecting to the proper ZooKeeper ensemble.
>
> bq. All of the instances are able to talk to zookeeper (they are
> >
> >>> displayed as active in the SolrCloud view, so they must be able to
> >> connect,
> >>> right?).
>
> Well, maybe or maybe not. The particular Solr node that you’re working on
> can see ZK, true. But are all of them looking  at the _same_ ensemble? Are
> any of the Solr nodes somehow  running with embedded ZooKeeper through a
> typo or something? And since that’s in the  ZooKeeper log, is the ensemble
> properly configured?  For troubleshooting _only_, I might go back to a
> single ZK instance just long enough to eliminate that possibility.
>
> bq. o.a.s.s.SolrDispatchFilter Could not consume full client request
> >>> org.eclipse.jetty.io.EofException: Early EOF
>
> This usually indicates either massive requests or a mis-configured jetty
> such that the request size exceeds the max allowed. There are  a few
> settings that can be extended, but this is pretty unusual. Unless you have
> lots and lots and lots of nodes, the request size should be reasonably
> small.
>
> Hmmm, do you  have any massive files in your config (schema, solrconfig,
> synonym files, etc?)? There is a 1M default limit on the size of files,
> perhaps you’re exceeding that. One test would be to use a minimal configset
> to see if that encounters the same issue.
>
> Best,
> Erick
>
>
> > On Jun 10, 2019, at 11:51 AM, Софія Строчик  wrote:
> >
> > Hi Erick, thanks for your reply!
> >
> > I didn't mention it but we have tried async requests. Then it does not
> time
> > out of course, but instead appears to run indefinitely, with
> REQUESTSTATUS
> > response like this:
> > {
> >  "responseHeader":{
> >"status":0,
> >"QTime":1},
> >  "status":{
> >"state":"submitted",
> >"msg":"found [123] in submitted tasks"}}
> >
> > These requests then pile up in zookeeper's collection-queue-work without
> > ever moving to the completed or failed status.
> >
> > While I guess some operations are expensive and can run for a long time,
> it
> > doesn't seem likely that all of these have to take hours (without high
> load
> > on any of the servers!)
> >
> > Maybe you have any other suggestions because this one doesn't seem to be
> > the case :(
> >
> > пн, 10 черв. 2019 о 21:14 Erick Erickson  пише:
> >
> >> Certainly at times  some things  just  take a  long time. The 180
> >> second timeout is fairly arbitrary.
> >> GC pauses, creating a zillion replicas etc. can cause timeouts like
> >> this to be exceeded.
> >>
> >> Rather than rely on lengthening some magic timeout value and hoping, I
> >> suggest you use
> >> the async option, see:
> >> https://lucene.apache.org/solr/guide/7_3/collections-api.html
> >>
> >> Then you need to periodically check the status of that job to see the
> >> completion status.
> >>
> >> Do note  this bit in particular:
> >>
> >> As of now, REQUESTSTATUS does not automatically clean up the tracking
> >> data structures...
> >>
> >> in the  link above.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Jun 10, 2019 at 11:07 AM Софія Строчик 
> wrote:

Re: Collections API timeout

2019-06-10 Thread Софія Строчик
Hi Erick, thanks for your reply!

I didn't mention it but we have tried async requests. Then it does not time
out of course, but instead appears to run indefinitely, with REQUESTSTATUS
response like this:
{
  "responseHeader":{
"status":0,
"QTime":1},
  "status":{
"state":"submitted",
"msg":"found [123] in submitted tasks"}}

These requests then pile up in zookeeper's collection-queue-work without
ever moving to the completed or failed status.

While I guess some operations are expensive and can run for a long time, it
doesn't seem likely that all of these have to take hours (without high load
on any of the servers!)

Maybe you have any other suggestions because this one doesn't seem to be
the case :(

пн, 10 черв. 2019 о 21:14 Erick Erickson  пише:

> Certainly at times  some things  just  take a  long time. The 180
> second timeout is fairly arbitrary.
> GC pauses, creating a zillion replicas etc. can cause timeouts like
> this to be exceeded.
>
> Rather than rely on lengthening some magic timeout value and hoping, I
> suggest you use
> the async option, see:
> https://lucene.apache.org/solr/guide/7_3/collections-api.html
>
> Then you need to periodically check the status of that job to see the
> completion status.
>
> Do note  this bit in particular:
>
> As of now, REQUESTSTATUS does not automatically clean up the tracking
> data structures...
>
> in the  link above.
>
> Best,
> Erick
>
> On Mon, Jun 10, 2019 at 11:07 AM Софія Строчик  wrote:
> >
> > Hi everyone,
> >
> > recently when trying to delete a collection we have noticed that all
> calls
> > to the Collections API time out after 180s.
> > Something similar is described here
> > <
> http://lucene.472066.n3.nabble.com/Can-t-create-collection-td4314225.html>
> > however
> > restarting the instance or the server does not help.
> >
> > *This is what the response to the API call looks like:*
> > {
> >   "responseHeader":{
> > "status":500,
> > "QTime":180163},
> >   "error":{
> > "metadata":[
> >   "error-class","org.apache.solr.common.SolrException",
> >   "root-error-class","org.apache.solr.common.SolrException"],
> > "msg":"overseerstatus the collection time out:180s",
> > "trace":"org.apache.solr.common.SolrException: overseerstatus the
> > collection time out:180s\n\tat
> >
> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:367)\n\tat
> >
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:272)\n\tat
> >
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
> >
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat
> >
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
> >
> org.eclipse.jetty.server.handler.ScopedHandler.ne

Collections API timeout

2019-06-10 Thread Софія Строчик
Hi everyone,

recently when trying to delete a collection we have noticed that all calls
to the Collections API time out after 180s.
Something similar is described here

however
restarting the instance or the server does not help.

*This is what the response to the API call looks like:*
{
  "responseHeader":{
"status":500,
"QTime":180163},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"overseerstatus the collection time out:180s",
"trace":"org.apache.solr.common.SolrException: overseerstatus the
collection time out:180s\n\tat
org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:367)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:272)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:531)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)\n\tat
java.lang.Thread.run(Thread.java:745)\n",
"code":500}}

*The errors look like this in the logs:*

2019-06-10 15:37:19.446 ERROR (qtp315932542-5748) [   ]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: reload the
collection time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(CollectionsHandler.java:367)
at
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:272)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:246)
at