Great info Koji, thank you very much. Will do.

patw

On Thu, Jul 25, 2019 at 9:40 PM Koji Kawamura <[email protected]>
wrote:

> Hi Pat,
>
> I recommend getting a thread-dump when you encounter the situation next
> time.
> Thread-dump shows what each thread is doing, including the stuck
> SelectHiveQL thread.
>
> You can get thread-dump by executing:
> ${NIFI_HOME}/bin/nifi.sh dump-file-name
>
> Then thread stack traces are logged to the specified file.
> Lots of logs look like below:
> "Timer-Driven Process Thread-8" Id=71 WAITING  on
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@
> 1b3abf12
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>
> Once you get the thread dump, please share it with us for further
> investigation.
>
> Thanks,
> Koji
>
> On Fri, Jul 26, 2019 at 1:57 AM Pat White <[email protected]>
> wrote:
> >
> > Hi Folks,
> >
> > Would like to ask for suggestions on debugging SelectHiveQL processors,
> we've seen a very odd error mode twice now, where a SelectHiveQL processor
> which had been running fine suddenly becomes "stuck". This is on 1.6.0, so
> a bit dated compared to 1.9.2, but i'm still very puzzled at the lack of
> error indications.
> >
> > Symptom; processor is running fine, continues to report 'running' on
> canvas but the input port begins to queue up and show backlogs. Stopping
> the processor in the canvas reports success and shows 'stopped', but trying
> to start it again gets the popup "No eligible components are selected.
> Please select the components to be stopped.". Making sure the processor is
> clearly selected reports same error. Only way to get it unstuck is to
> restart the primary, this appears to kill the affected threads and allow
> the processor to begin running again, at that point it's ok again.
> >
> > Issue appears directly related to the processor itself, as opposed to
> say the ConnectionPool. On that, tried restarting the ConnectionPool being
> used, stop attempt hangs on the affected processor, to the point the stop
> fails. Another oddity, tried stopping upstream objects to the affected
> processor, they report "cannot be disabled because it is referenced by 1
> components that are currently running", even though the canvas clearly
> shows that processor as stopped.
> >
> > What's really strange is the lack of error indications anywhere, see
> nothing in the logs at all regarding the affected processor, until primary
> restart. Then see the start event when the processor is coming back online
> "StandardProcessScheduler Starting SelectHiveQL id=".
> >
> > Appreciate any suggestions on additional logging or other resources that
> would help debug. Thanks!
> >
> > patw
> >
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to