Re: Job stuck at aborted/starting up

Karl Wright Mon, 14 May 2018 05:45:01 -0700

Hi Shashank,

Bashing the database is not the right way to do this, at all.  You will
cause carnage.

What you need to do is diagnose what is happening.

First of all, large jobs take a long time to abort or start up, because
they need to set document priorities for all the crawlable documents in the
queue.  If that's what is happening you will just have to wait until the
operation is complete.  It could take several hours if you have a very
large job (millions of documents).

Second, it could be related to the connector you are using.  If that's the
case, you should see log messages which give you repeating errors.

My suggestion: get a thread dump of the agents process, and post it here.

Karl

On Mon, May 14, 2018 at 7:17 AM Shashank Raj <[email protected]>
wrote:

> Hi Karl,
> We are working on a File-Server based project and have a multi-node
> zookeeper setup. We are using Manifold's TIKA to parse the files and send
> it to Solr Cloud. We have multiple jobs in this setup but some jobs get
> stuck and continue to be in either "Aborting" or "Starting Up" state.
>
> We have earlier tried to change the jobs status through API or/And direct
> DB update by changing the job status to "N". But this has not helped and on
> click of "start" or "start minimal", the job again goes into stuck state.
> We have seen that if we restart agents process then this issue gets fixed
> but it is not possible to keep restarting the agents process everytime a
> job is stuck.
>
> Is there a possible workaround or any API to trigger agents restart?
>
> Thanks and regards,
> Shashank
>

Re: Job stuck at aborted/starting up

Reply via email to