Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-28 Thread dan young
I've converted over our flows based on your recommendation, will monitor
and report back if I see any issues

On Fri, Dec 28, 2018 at 8:43 AM Mark Payne  wrote:

> Dan, et al,
>
> Great news! I was able to replicate this issue finally, by creating a
> Load-Balanced connection
> between two Process Groups/Ports instead of between two processors. The
> fact that it's between
> two Ports does not, in and of itself, matter. But there is a race
> condition, and Ports do no actual
> Processing of the FlowFile (simply pull it from one queue and transfer it
> to another). As a result, because
> it is extremely fast, it is more likely to trigger the race condition.
>
> So I created a JIRA [1] and have submitted a PR for it.
>
> Interestingly, while there is no real workaround that is fool-proof, until
> this fix is in and released, you could
> choose to update your flow so that the connection between Process Groups
> is not load balanced and instead
> the connection between the Input Port and the first Processor is load
> balanced. Again, this is not fool-proof,
> because it could affect the Load Balanced Connection even if it is
> connected to a Processor, but it is less likely
> to do so, so you would likely see the issue occur far less often.
>
> Thank you so much for sticking with us all as we diagnose this and figure
> it all out - would not have been able to
> figure it out without you spending the time to debug the issue!
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-5919
>
>
> On Dec 26, 2018, at 10:31 PM, dan young  wrote:
>
> Hello Mark,
>
> I just stopped the destination processor, and then disconnected the node
> in question (nifi1-1). Once I disconnected the node, the flow file in the
> load balance connection disappeared from the queue.  After that, I
> reconnected the node (with the downstream processor disconnected) and once
> the node successfully rejoined the cluster, the flowfile showed up in the
> queue again. After this, I started the connected downstream processor, but
> the flowfile stays in the queue. The only way to clear the queue is if I
> actually restart the node.  If I disconnect the node, and then restart that
> node, the flowfile is no longer present in the queue.
>
> Regards,
>
> Dano
>
>
> On Wed, Dec 26, 2018 at 6:13 PM Mark Payne  wrote:
>
>> Ok, I just wanted to confirm that when you said “once it rejoins the
>> cluster that flow file is gone” that you mean “the flowfile did not exist
>> on the system” and NOT “the queue size was 0 by the time that I looked at
>> the UI.” I.e., is it possible that the FlowFile did exist, was restored,
>> and then was processed before you looked at the UI? Or the FlowFile
>> definitely did not exist after the node was restarted? That’s why I was
>> suggesting that you restart with the connection’s source and destination
>> stopped. Just to make sure that the FlowFile didn’t just get processed
>> quickly on restart.
>>
>> Sent from my iPhone
>>
>> On Dec 26, 2018, at 7:55 PM, dan young  wrote:
>>
>> Heya Mark,
>>
>> If we restart the node, that "stuck" flowfile will disappear. This is the
>> only way so far to clear out the flowfile. I usually disconnect the node,
>> then once it's disconnected I restart nifi, and then once it rejoins the
>> cluster that flow file is gone. If we try to empty the queue, it will just
>> say that there no flow files in the queue.
>>
>>
>> On Wed, Dec 26, 2018, 5:22 PM Mark Payne >
>>> Hey Dan,
>>>
>>> Thanks, this is super useful! So, the following section is the damning
>>> part of the JSON:
>>>
>>>   {
>>> "totalFlowFileCount": 1,
>>> "totalByteCount": 975890,
>>> "nodeIdentifier": "nifi1-1:9443",
>>> "localQueuePartition": {
>>>   "totalFlowFileCount": 0,
>>>   "totalByteCount": 0,
>>>   "activeQueueFlowFileCount": 0,
>>>   "activeQueueByteCount": 0,
>>>   "swapFlowFileCount": 0,
>>>   "swapByteCount": 0,
>>>   "swapFiles": 0,
>>>   "inFlightFlowFileCount": 0,
>>>   "inFlightByteCount": 0,
>>>   "allActiveQueueFlowFilesPenalized": false,
>>>   "anyActiveQueueFlowFilesPenalized": false
>>> },
>>> "remoteQueuePartitions": [
>>>   {
>>> "totalFlowFileCount": 0,
>>> "totalByteCount": 0,
>>> "activeQueueFlowFileCount": 0,
>>> "activeQueueByteCount": 0,
>>> "swapFlowFileCount": 0,
>>> "swapByteCount": 0,
>>> "swapFiles": 0,
>>> "inFlightFlowFileCount": 0,
>>> "inFlightByteCount": 0,
>>> "nodeIdentifier": "nifi2-1:9443"
>>>   },
>>>   {
>>> "totalFlowFileCount": 0,
>>> "totalByteCount": 0,
>>> "activeQueueFlowFileCount": 0,
>>> 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-28 Thread Boris Tyukin
Mark, you are a troubleshooting master! thanks for chasing this down as
this new feature is really awesome and we are about to start using it. Good
to know there is a semi-safe workaround.

Boris

On Fri, Dec 28, 2018 at 10:43 AM Mark Payne  wrote:

> Dan, et al,
>
> Great news! I was able to replicate this issue finally, by creating a
> Load-Balanced connection
> between two Process Groups/Ports instead of between two processors. The
> fact that it's between
> two Ports does not, in and of itself, matter. But there is a race
> condition, and Ports do no actual
> Processing of the FlowFile (simply pull it from one queue and transfer it
> to another). As a result, because
> it is extremely fast, it is more likely to trigger the race condition.
>
> So I created a JIRA [1] and have submitted a PR for it.
>
> Interestingly, while there is no real workaround that is fool-proof, until
> this fix is in and released, you could
> choose to update your flow so that the connection between Process Groups
> is not load balanced and instead
> the connection between the Input Port and the first Processor is load
> balanced. Again, this is not fool-proof,
> because it could affect the Load Balanced Connection even if it is
> connected to a Processor, but it is less likely
> to do so, so you would likely see the issue occur far less often.
>
> Thank you so much for sticking with us all as we diagnose this and figure
> it all out - would not have been able to
> figure it out without you spending the time to debug the issue!
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-5919
>
>
> On Dec 26, 2018, at 10:31 PM, dan young  wrote:
>
> Hello Mark,
>
> I just stopped the destination processor, and then disconnected the node
> in question (nifi1-1). Once I disconnected the node, the flow file in the
> load balance connection disappeared from the queue.  After that, I
> reconnected the node (with the downstream processor disconnected) and once
> the node successfully rejoined the cluster, the flowfile showed up in the
> queue again. After this, I started the connected downstream processor, but
> the flowfile stays in the queue. The only way to clear the queue is if I
> actually restart the node.  If I disconnect the node, and then restart that
> node, the flowfile is no longer present in the queue.
>
> Regards,
>
> Dano
>
>
> On Wed, Dec 26, 2018 at 6:13 PM Mark Payne  wrote:
>
>> Ok, I just wanted to confirm that when you said “once it rejoins the
>> cluster that flow file is gone” that you mean “the flowfile did not exist
>> on the system” and NOT “the queue size was 0 by the time that I looked at
>> the UI.” I.e., is it possible that the FlowFile did exist, was restored,
>> and then was processed before you looked at the UI? Or the FlowFile
>> definitely did not exist after the node was restarted? That’s why I was
>> suggesting that you restart with the connection’s source and destination
>> stopped. Just to make sure that the FlowFile didn’t just get processed
>> quickly on restart.
>>
>> Sent from my iPhone
>>
>> On Dec 26, 2018, at 7:55 PM, dan young  wrote:
>>
>> Heya Mark,
>>
>> If we restart the node, that "stuck" flowfile will disappear. This is the
>> only way so far to clear out the flowfile. I usually disconnect the node,
>> then once it's disconnected I restart nifi, and then once it rejoins the
>> cluster that flow file is gone. If we try to empty the queue, it will just
>> say that there no flow files in the queue.
>>
>>
>> On Wed, Dec 26, 2018, 5:22 PM Mark Payne >
>>> Hey Dan,
>>>
>>> Thanks, this is super useful! So, the following section is the damning
>>> part of the JSON:
>>>
>>>   {
>>> "totalFlowFileCount": 1,
>>> "totalByteCount": 975890,
>>> "nodeIdentifier": "nifi1-1:9443",
>>> "localQueuePartition": {
>>>   "totalFlowFileCount": 0,
>>>   "totalByteCount": 0,
>>>   "activeQueueFlowFileCount": 0,
>>>   "activeQueueByteCount": 0,
>>>   "swapFlowFileCount": 0,
>>>   "swapByteCount": 0,
>>>   "swapFiles": 0,
>>>   "inFlightFlowFileCount": 0,
>>>   "inFlightByteCount": 0,
>>>   "allActiveQueueFlowFilesPenalized": false,
>>>   "anyActiveQueueFlowFilesPenalized": false
>>> },
>>> "remoteQueuePartitions": [
>>>   {
>>> "totalFlowFileCount": 0,
>>> "totalByteCount": 0,
>>> "activeQueueFlowFileCount": 0,
>>> "activeQueueByteCount": 0,
>>> "swapFlowFileCount": 0,
>>> "swapByteCount": 0,
>>> "swapFiles": 0,
>>> "inFlightFlowFileCount": 0,
>>> "inFlightByteCount": 0,
>>> "nodeIdentifier": "nifi2-1:9443"
>>>   },
>>>   {
>>> "totalFlowFileCount": 0,
>>> 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-28 Thread dan young
Thank you Mark! This is great news and promising... I'll look into
refactoring our flows per your suggestion...

Regards

Dano

On Fri, Dec 28, 2018, 8:43 AM Mark Payne  Dan, et al,
>
> Great news! I was able to replicate this issue finally, by creating a
> Load-Balanced connection
> between two Process Groups/Ports instead of between two processors. The
> fact that it's between
> two Ports does not, in and of itself, matter. But there is a race
> condition, and Ports do no actual
> Processing of the FlowFile (simply pull it from one queue and transfer it
> to another). As a result, because
> it is extremely fast, it is more likely to trigger the race condition.
>
> So I created a JIRA [1] and have submitted a PR for it.
>
> Interestingly, while there is no real workaround that is fool-proof, until
> this fix is in and released, you could
> choose to update your flow so that the connection between Process Groups
> is not load balanced and instead
> the connection between the Input Port and the first Processor is load
> balanced. Again, this is not fool-proof,
> because it could affect the Load Balanced Connection even if it is
> connected to a Processor, but it is less likely
> to do so, so you would likely see the issue occur far less often.
>
> Thank you so much for sticking with us all as we diagnose this and figure
> it all out - would not have been able to
> figure it out without you spending the time to debug the issue!
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-5919
>
>
> On Dec 26, 2018, at 10:31 PM, dan young  wrote:
>
> Hello Mark,
>
> I just stopped the destination processor, and then disconnected the node
> in question (nifi1-1). Once I disconnected the node, the flow file in the
> load balance connection disappeared from the queue.  After that, I
> reconnected the node (with the downstream processor disconnected) and once
> the node successfully rejoined the cluster, the flowfile showed up in the
> queue again. After this, I started the connected downstream processor, but
> the flowfile stays in the queue. The only way to clear the queue is if I
> actually restart the node.  If I disconnect the node, and then restart that
> node, the flowfile is no longer present in the queue.
>
> Regards,
>
> Dano
>
>
> On Wed, Dec 26, 2018 at 6:13 PM Mark Payne  wrote:
>
>> Ok, I just wanted to confirm that when you said “once it rejoins the
>> cluster that flow file is gone” that you mean “the flowfile did not exist
>> on the system” and NOT “the queue size was 0 by the time that I looked at
>> the UI.” I.e., is it possible that the FlowFile did exist, was restored,
>> and then was processed before you looked at the UI? Or the FlowFile
>> definitely did not exist after the node was restarted? That’s why I was
>> suggesting that you restart with the connection’s source and destination
>> stopped. Just to make sure that the FlowFile didn’t just get processed
>> quickly on restart.
>>
>> Sent from my iPhone
>>
>> On Dec 26, 2018, at 7:55 PM, dan young  wrote:
>>
>> Heya Mark,
>>
>> If we restart the node, that "stuck" flowfile will disappear. This is the
>> only way so far to clear out the flowfile. I usually disconnect the node,
>> then once it's disconnected I restart nifi, and then once it rejoins the
>> cluster that flow file is gone. If we try to empty the queue, it will just
>> say that there no flow files in the queue.
>>
>>
>> On Wed, Dec 26, 2018, 5:22 PM Mark Payne >
>>> Hey Dan,
>>>
>>> Thanks, this is super useful! So, the following section is the damning
>>> part of the JSON:
>>>
>>>   {
>>> "totalFlowFileCount": 1,
>>> "totalByteCount": 975890,
>>> "nodeIdentifier": "nifi1-1:9443",
>>> "localQueuePartition": {
>>>   "totalFlowFileCount": 0,
>>>   "totalByteCount": 0,
>>>   "activeQueueFlowFileCount": 0,
>>>   "activeQueueByteCount": 0,
>>>   "swapFlowFileCount": 0,
>>>   "swapByteCount": 0,
>>>   "swapFiles": 0,
>>>   "inFlightFlowFileCount": 0,
>>>   "inFlightByteCount": 0,
>>>   "allActiveQueueFlowFilesPenalized": false,
>>>   "anyActiveQueueFlowFilesPenalized": false
>>> },
>>> "remoteQueuePartitions": [
>>>   {
>>> "totalFlowFileCount": 0,
>>> "totalByteCount": 0,
>>> "activeQueueFlowFileCount": 0,
>>> "activeQueueByteCount": 0,
>>> "swapFlowFileCount": 0,
>>> "swapByteCount": 0,
>>> "swapFiles": 0,
>>> "inFlightFlowFileCount": 0,
>>> "inFlightByteCount": 0,
>>> "nodeIdentifier": "nifi2-1:9443"
>>>   },
>>>   {
>>> "totalFlowFileCount": 0,
>>> "totalByteCount": 0,
>>> "activeQueueFlowFileCount": 0,
>>> 

Re: flowfiles stuck in load balanced queue; nifi 1.8

2018-12-28 Thread Mark Payne
Dan, et al,

Great news! I was able to replicate this issue finally, by creating a 
Load-Balanced connection
between two Process Groups/Ports instead of between two processors. The fact 
that it's between
two Ports does not, in and of itself, matter. But there is a race condition, 
and Ports do no actual
Processing of the FlowFile (simply pull it from one queue and transfer it to 
another). As a result, because
it is extremely fast, it is more likely to trigger the race condition.

So I created a JIRA [1] and have submitted a PR for it.

Interestingly, while there is no real workaround that is fool-proof, until this 
fix is in and released, you could
choose to update your flow so that the connection between Process Groups is not 
load balanced and instead
the connection between the Input Port and the first Processor is load balanced. 
Again, this is not fool-proof,
because it could affect the Load Balanced Connection even if it is connected to 
a Processor, but it is less likely
to do so, so you would likely see the issue occur far less often.

Thank you so much for sticking with us all as we diagnose this and figure it 
all out - would not have been able to
figure it out without you spending the time to debug the issue!

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-5919


On Dec 26, 2018, at 10:31 PM, dan young 
mailto:danoyo...@gmail.com>> wrote:

Hello Mark,

I just stopped the destination processor, and then disconnected the node in 
question (nifi1-1). Once I disconnected the node, the flow file in the load 
balance connection disappeared from the queue.  After that, I reconnected the 
node (with the downstream processor disconnected) and once the node 
successfully rejoined the cluster, the flowfile showed up in the queue again. 
After this, I started the connected downstream processor, but the flowfile 
stays in the queue. The only way to clear the queue is if I actually restart 
the node.  If I disconnect the node, and then restart that node, the flowfile 
is no longer present in the queue.

Regards,

Dano


On Wed, Dec 26, 2018 at 6:13 PM Mark Payne 
mailto:marka...@hotmail.com>> wrote:
Ok, I just wanted to confirm that when you said “once it rejoins the cluster 
that flow file is gone” that you mean “the flowfile did not exist on the 
system” and NOT “the queue size was 0 by the time that I looked at the UI.” 
I.e., is it possible that the FlowFile did exist, was restored, and then was 
processed before you looked at the UI? Or the FlowFile definitely did not exist 
after the node was restarted? That’s why I was suggesting that you restart with 
the connection’s source and destination stopped. Just to make sure that the 
FlowFile didn’t just get processed quickly on restart.

Sent from my iPhone

On Dec 26, 2018, at 7:55 PM, dan young 
mailto:danoyo...@gmail.com>> wrote:

Heya Mark,

If we restart the node, that "stuck" flowfile will disappear. This is the only 
way so far to clear out the flowfile. I usually disconnect the node, then once 
it's disconnected I restart nifi, and then once it rejoins the cluster that 
flow file is gone. If we try to empty the queue, it will just say that there no 
flow files in the queue.


On Wed, Dec 26, 2018, 5:22 PM Mark Payne 
mailto:marka...@hotmail.com> wrote:
Hey Dan,

Thanks, this is super useful! So, the following section is the damning part of 
the JSON:

  {
"totalFlowFileCount": 1,
"totalByteCount": 975890,
"nodeIdentifier": "nifi1-1:9443",
"localQueuePartition": {
  "totalFlowFileCount": 0,
  "totalByteCount": 0,
  "activeQueueFlowFileCount": 0,
  "activeQueueByteCount": 0,
  "swapFlowFileCount": 0,
  "swapByteCount": 0,
  "swapFiles": 0,
  "inFlightFlowFileCount": 0,
  "inFlightByteCount": 0,
  "allActiveQueueFlowFilesPenalized": false,
  "anyActiveQueueFlowFilesPenalized": false
},
"remoteQueuePartitions": [
  {
"totalFlowFileCount": 0,
"totalByteCount": 0,
"activeQueueFlowFileCount": 0,
"activeQueueByteCount": 0,
"swapFlowFileCount": 0,
"swapByteCount": 0,
"swapFiles": 0,
"inFlightFlowFileCount": 0,
"inFlightByteCount": 0,
"nodeIdentifier": "nifi2-1:9443"
  },
  {
"totalFlowFileCount": 0,
"totalByteCount": 0,
"activeQueueFlowFileCount": 0,
"activeQueueByteCount": 0,
"swapFlowFileCount": 0,
"swapByteCount": 0,
"swapFiles": 0,
"inFlightFlowFileCount": 0,
"inFlightByteCount": 0,
"nodeIdentifier": "nifi3-1:9443"
  }
]
  }

It indicates that node