Hi Folks,

Would like to ask for suggestions on debugging SelectHiveQL processors,
we've seen a very odd error mode twice now, where a SelectHiveQL processor
which had been running fine suddenly becomes "stuck". This is on 1.6.0, so
a bit dated compared to 1.9.2, but i'm still very puzzled at the lack of
error indications.

Symptom; processor is running fine, continues to report 'running' on canvas
but the input port begins to queue up and show backlogs. Stopping the
processor in the canvas reports success and shows 'stopped', but trying to
start it again gets the popup "No eligible components are selected. Please
select the components to be stopped.". Making sure the processor is clearly
selected reports same error. Only way to get it unstuck is to restart the
primary, this appears to kill the affected threads and allow the processor
to begin running again, at that point it's ok again.

Issue appears directly related to the processor itself, as opposed to say
the ConnectionPool. On that, tried restarting the ConnectionPool being
used, stop attempt hangs on the affected processor, to the point the stop
fails. Another oddity, tried stopping upstream objects to the affected
processor, they report "cannot be disabled because it is referenced by 1
components that are currently running", even though the canvas clearly
shows that processor as stopped.

What's really strange is the lack of error indications anywhere, see
nothing in the logs at all regarding the affected processor, until primary
restart. Then see the start event when the processor is coming back
online "StandardProcessScheduler
Starting SelectHiveQL id=".

Appreciate any suggestions on additional logging or other resources that
would help debug. Thanks!

patw

Reply via email to