Hi Pat,

I recommend getting a thread-dump when you encounter the situation next time.
Thread-dump shows what each thread is doing, including the stuck
SelectHiveQL thread.

You can get thread-dump by executing:
${NIFI_HOME}/bin/nifi.sh dump-file-name

Then thread stack traces are logged to the specified file.
Lots of logs look like below:
"Timer-Driven Process Thread-8" Id=71 WAITING  on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@
1b3abf12
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)

Once you get the thread dump, please share it with us for further investigation.

Thanks,
Koji

On Fri, Jul 26, 2019 at 1:57 AM Pat White <[email protected]> wrote:
>
> Hi Folks,
>
> Would like to ask for suggestions on debugging SelectHiveQL processors, we've 
> seen a very odd error mode twice now, where a SelectHiveQL processor which 
> had been running fine suddenly becomes "stuck". This is on 1.6.0, so a bit 
> dated compared to 1.9.2, but i'm still very puzzled at the lack of error 
> indications.
>
> Symptom; processor is running fine, continues to report 'running' on canvas 
> but the input port begins to queue up and show backlogs. Stopping the 
> processor in the canvas reports success and shows 'stopped', but trying to 
> start it again gets the popup "No eligible components are selected. Please 
> select the components to be stopped.". Making sure the processor is clearly 
> selected reports same error. Only way to get it unstuck is to restart the 
> primary, this appears to kill the affected threads and allow the processor to 
> begin running again, at that point it's ok again.
>
> Issue appears directly related to the processor itself, as opposed to say the 
> ConnectionPool. On that, tried restarting the ConnectionPool being used, stop 
> attempt hangs on the affected processor, to the point the stop fails. Another 
> oddity, tried stopping upstream objects to the affected processor, they 
> report "cannot be disabled because it is referenced by 1 components that are 
> currently running", even though the canvas clearly shows that processor as 
> stopped.
>
> What's really strange is the lack of error indications anywhere, see nothing 
> in the logs at all regarding the affected processor, until primary restart. 
> Then see the start event when the processor is coming back online 
> "StandardProcessScheduler Starting SelectHiveQL id=".
>
> Appreciate any suggestions on additional logging or other resources that 
> would help debug. Thanks!
>
> patw
>
>
>
>
>
>
>
>

Reply via email to