Re: Help needed in debugging why one of the nodes in a cluster is not making any progress ..

Bryan Bende Thu, 01 Dec 2016 13:27:13 -0800

Hello,

I think the behavior you saw with the queue going to 0 is expected
behavior... when you are looking at the UI it is showing the aggregated
view of all the nodes in the cluster, so if one node has flow files in the
queue and that nodes goes down while you are in the UI in another node,
that number of flow files would no longer be visible in the UI.


How many flow files are in the queue before ExecuteStreamCommand when you
see it not making any progress?

There was a bug in 1.0 where if the # of flow files in a queue was evenly
splittable by the swap size, then those flow files would be swapped out and
never swapped back in and would be sitting there.

The JIRA was: https://issues.apache.org/jira/browse/NIFI-2754

Additionally you could try upgrading to the recently released 1.1 release
to see if the same behavior occurs.

-Bryan


On Thu, Dec 1, 2016 at 3:39 PM, A.B. Srinivasan <[email protected]>
wrote:

> Folks,
>
> I have a NiFi 1.0 deployed in a non-secure cluster across 3 nodes.
>
> I have a flow pipeline that reads from a Kafka topic using ConsumeKafka
> and kicks off an ExecuteStreamCommand mediated job based on attributes
> included in the notification message.
>
> What I observe is that jobs are  being kicked off and they complete
> successfully on 2 of the nodes. The 3rd node however never seems to make
> progress on any of the jobs scheduled on it.
> I do see the node receiving the notification messages (based on PutRiemann
> events posted when message is received by ConsumeKafka) but thereafter
> there is no progress at all. The consequence is that the queue in front of
> the ExecuteStreamCommand processor keeps growing whenever a job is
> scheduled on the 'stuck' node.
>
> I don't see anything obvious to me in the nifi-app logs on any of the
> nodes that helps me get insight into what is afoot. I figured that some
> state is out-of-sync on the stuck node and decided to restart it. When that
> node went down, the queue in front of the ExecuteStreamCommand immediately
> went to 0 (I happened to be watching using the UI on one of the other
> nodes). When that node came back up, the queue is restored to the value it
> had prior to the restart.
>
> I am looking for debugging hints / ideas to help get insight into what is
> really going on.
>
> Thanks,
> A.B.
>

Re: Help needed in debugging why one of the nodes in a cluster is not making any progress ..

Reply via email to