Great info Koji, thank you very much. Will do. patw
On Thu, Jul 25, 2019 at 9:40 PM Koji Kawamura <[email protected]> wrote: > Hi Pat, > > I recommend getting a thread-dump when you encounter the situation next > time. > Thread-dump shows what each thread is doing, including the stuck > SelectHiveQL thread. > > You can get thread-dump by executing: > ${NIFI_HOME}/bin/nifi.sh dump-file-name > > Then thread stack traces are logged to the specified file. > Lots of logs look like below: > "Timer-Driven Process Thread-8" Id=71 WAITING on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@ > 1b3abf12 > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > > Once you get the thread dump, please share it with us for further > investigation. > > Thanks, > Koji > > On Fri, Jul 26, 2019 at 1:57 AM Pat White <[email protected]> > wrote: > > > > Hi Folks, > > > > Would like to ask for suggestions on debugging SelectHiveQL processors, > we've seen a very odd error mode twice now, where a SelectHiveQL processor > which had been running fine suddenly becomes "stuck". This is on 1.6.0, so > a bit dated compared to 1.9.2, but i'm still very puzzled at the lack of > error indications. > > > > Symptom; processor is running fine, continues to report 'running' on > canvas but the input port begins to queue up and show backlogs. Stopping > the processor in the canvas reports success and shows 'stopped', but trying > to start it again gets the popup "No eligible components are selected. > Please select the components to be stopped.". Making sure the processor is > clearly selected reports same error. Only way to get it unstuck is to > restart the primary, this appears to kill the affected threads and allow > the processor to begin running again, at that point it's ok again. > > > > Issue appears directly related to the processor itself, as opposed to > say the ConnectionPool. On that, tried restarting the ConnectionPool being > used, stop attempt hangs on the affected processor, to the point the stop > fails. Another oddity, tried stopping upstream objects to the affected > processor, they report "cannot be disabled because it is referenced by 1 > components that are currently running", even though the canvas clearly > shows that processor as stopped. > > > > What's really strange is the lack of error indications anywhere, see > nothing in the logs at all regarding the affected processor, until primary > restart. Then see the start event when the processor is coming back online > "StandardProcessScheduler Starting SelectHiveQL id=". > > > > Appreciate any suggestions on additional logging or other resources that > would help debug. Thanks! > > > > patw > > > > > > > > > > > > > > > > >
