Re: RE: [EXT] Re: Polling Processors impact on Latency

2017-11-08 Thread Chirag Dewan
 That's a great start Andy and Peter. Thank you for such precise answers. 
I will start tweaking with the parameters you mentioned and try and reach an 
optimum latency-resource configuration. 
Thanks a lot for your help. 
Chirag
On Tuesday 7 November 2017, 8:40:58 PM IST, Andy Christianson 
<aichr...@protonmail.com> wrote:  
 
 Chirag,

Peter's note about bored yield duration is right on. Some additional things I'd 
like to point out are:

1) You might get the lowest latency in a configuration where the processor runs 
continuously (bored yield duration 0). This is because with the code executing 
continuously, CPU caches should stay hot. The trade-off is wasted CPU cycles 
when the consumer is waiting for input more often than processing it.

2) We do have an event driven scheduling mode. For NiFi this still appears to 
be experimental:

"Event driven: When this mode is selected, the Processor will be triggered to 
run by an event, and that event occurs when FlowFiles enter Connections feeding 
this Processor. This mode is currently considered experimental and is not 
supported by all Processors. When this mode is selected, the ‘Run schedule’ 
option is not configurable, as the Processor is not triggered to run 
periodically but as the result of an event. Additionally, this is the only mode 
for which the ‘Concurrent tasks’ option can be set to 0. In this case, the 
number of threads is limited only by the size of the Event-Driven Thread Pool 
that the administrator has configured."

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

This mode is also implemented in MiNiFi - C++ and is done using condition 
variables. This type of event-driven scheduling should put you near the lower 
limit of latency without sacrificing much CPU, assuming a workload where the 
consumer is waiting for input more often than processing it.

I would suggest trying out different configurations and taking measurements, as 
the ideal config will depend a lot on your workload.

Regards,

Andy I.C.

Sent from ProtonMail, Swiss-based encrypted email.


> Original Message --------
>Subject: RE: [EXT] Re: Polling Processors impact on Latency
>Local Time: November 7, 2017 7:29 AM
>UTC Time: November 7, 2017 12:29 PM
>From: pwi...@micron.com
>To: users@nifi.apache.org <users@nifi.apache.org>
>
>
>If you schedule the processor to run every 0 sec (the default) then in my 
>experience you won’t notice latency from polling at all. But I guess this 
>depends on your expectations,
> volume, and over all Flow processing time.
>
>
>
>Yes, event driven may help, but from what I’ve read it’s more about reducing 
>server resource consumption than latency (could be wrong).
>
>
>
>As for a hard set limit, there is a configuration entry in nifi.properties 
>that seems relevant:
>
>
>
># If a component has no work to do (is "bored"), how long should we wait 
>before checking again for work?
>
>nifi.bored.yield.duration=10 millis
>
>
>
>Thanks,
>
>  Peter
>
>
>
>From: Chirag Dewan [mailto:chirag.dewa...@yahoo.in] 
>Sent: Tuesday, November 07, 2017 8:02 PM
>To: apere...@gmail.com; users@nifi.apache.org
>Subject: [EXT] Re: Polling Processors impact on Latency
>
>
>Thanks Andrew for the quick response.
>
>
>I am more concerned about the processors polling for flow files on the 
>connection between the processors?
>
>Thanks,
>
>Chirag
>
>Sent from Yahoo Mail on Android
>
>
>>On Tue, 7 Nov 2017 at 5:24 PM, Andrew Grande
>><apere...@gmail.com> wrote:
>>Yes, polling increases latency in some cases. But no, NiFi is not just 
>>polling. It has all kinds of sources, and listening vs polling vs subscribing 
>>purely depends on the protocol of that given processor.
>>
>>Hope this helps,
>> Andrew
>>
>>
>>
>>On Tue, Nov 7, 2017, 1:39 AM Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
>>>Hi All,
>>>
>>>I am a layman to NiFi. I am exploring NiFi as a data flow engine to be 
>>>integrated with my Flink processing engine. A brief history of our approach 
>>>: 
>>>
>>>We are trying to build a Streaming Data processing engine. We started off 
>>>with Flink as the sole core engine, which is responsible for 
>>>collection(through Flink Sources) as well as processing
>>> the data. 
>>>
>>>Soon we fumbled onto NiFi and the data flow world. 
>>>
>>>So far, my understanding is that the NiFi processors are poling processors 
>>>and not Pub-Sub processors. That makes me wonder, whats the impact of 
>>>polling on latency? I know I can configure
>>> my processors to tradeoff latency with throughput, but is there a hard set 
>>> limit on the latency I can achieve using NiFi? 
>>>
>>>As I said, I am layman as yet. Perhaps my understanding is short here. Any 
>>>leads would be much appreciated. 
>>>
>>>P.S - Not diving much into Event Driven Processors. They look like something 
>>>which might clear my thoughts. But since they are marked experimental, would 
>>>be more interested in understanding
>>> the timer driven processors.
>>>
>>>Thanks,
>>>
>>>Chirag
>>>
>>>  

RE: [EXT] Re: Polling Processors impact on Latency

2017-11-07 Thread Andy Christianson
Chirag,

Peter's note about bored yield duration is right on. Some additional things I'd 
like to point out are:

1) You might get the lowest latency in a configuration where the processor runs 
continuously (bored yield duration 0). This is because with the code executing 
continuously, CPU caches should stay hot. The trade-off is wasted CPU cycles 
when the consumer is waiting for input more often than processing it.

2) We do have an event driven scheduling mode. For NiFi this still appears to 
be experimental:

"Event driven: When this mode is selected, the Processor will be triggered to 
run by an event, and that event occurs when FlowFiles enter Connections feeding 
this Processor. This mode is currently considered experimental and is not 
supported by all Processors. When this mode is selected, the ‘Run schedule’ 
option is not configurable, as the Processor is not triggered to run 
periodically but as the result of an event. Additionally, this is the only mode 
for which the ‘Concurrent tasks’ option can be set to 0. In this case, the 
number of threads is limited only by the size of the Event-Driven Thread Pool 
that the administrator has configured."

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

This mode is also implemented in MiNiFi - C++ and is done using condition 
variables. This type of event-driven scheduling should put you near the lower 
limit of latency without sacrificing much CPU, assuming a workload where the 
consumer is waiting for input more often than processing it.

I would suggest trying out different configurations and taking measurements, as 
the ideal config will depend a lot on your workload.

Regards,

Andy I.C.

Sent from ProtonMail, Swiss-based encrypted email.


> Original Message ----
>Subject: RE: [EXT] Re: Polling Processors impact on Latency
>Local Time: November 7, 2017 7:29 AM
>UTC Time: November 7, 2017 12:29 PM
>From: pwi...@micron.com
>To: users@nifi.apache.org <users@nifi.apache.org>
>
>
>If you schedule the processor to run every 0 sec (the default) then in my 
>experience you won’t notice latency from polling at all. But I guess this 
>depends on your expectations,
> volume, and over all Flow processing time.
>
>
>
>Yes, event driven may help, but from what I’ve read it’s more about reducing 
>server resource consumption than latency (could be wrong).
>
>
>
>As for a hard set limit, there is a configuration entry in nifi.properties 
>that seems relevant:
>
>
>
># If a component has no work to do (is "bored"), how long should we wait 
>before checking again for work?
>
>nifi.bored.yield.duration=10 millis
>
>
>
>Thanks,
>
>  Peter
>
>
>
>From: Chirag Dewan [mailto:chirag.dewa...@yahoo.in] 
>Sent: Tuesday, November 07, 2017 8:02 PM
>To: apere...@gmail.com; users@nifi.apache.org
>Subject: [EXT] Re: Polling Processors impact on Latency
>
>
>Thanks Andrew for the quick response.
>
>
>I am more concerned about the processors polling for flow files on the 
>connection between the processors?
>
>Thanks,
>
>Chirag
>
>Sent from Yahoo Mail on Android
>
>
>>On Tue, 7 Nov 2017 at 5:24 PM, Andrew Grande
>><apere...@gmail.com> wrote:
>>Yes, polling increases latency in some cases. But no, NiFi is not just 
>>polling. It has all kinds of sources, and listening vs polling vs subscribing 
>>purely depends on the protocol of that given processor.
>>
>>Hope this helps,
>> Andrew
>>
>>
>>
>>On Tue, Nov 7, 2017, 1:39 AM Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
>>>Hi All,
>>>
>>>I am a layman to NiFi. I am exploring NiFi as a data flow engine to be 
>>>integrated with my Flink processing engine. A brief history of our approach 
>>>: 
>>>
>>>We are trying to build a Streaming Data processing engine. We started off 
>>>with Flink as the sole core engine, which is responsible for 
>>>collection(through Flink Sources) as well as processing
>>> the data. 
>>>
>>>Soon we fumbled onto NiFi and the data flow world. 
>>>
>>>So far, my understanding is that the NiFi processors are poling processors 
>>>and not Pub-Sub processors. That makes me wonder, whats the impact of 
>>>polling on latency? I know I can configure
>>> my processors to tradeoff latency with throughput, but is there a hard set 
>>> limit on the latency I can achieve using NiFi? 
>>>
>>>As I said, I am layman as yet. Perhaps my understanding is short here. Any 
>>>leads would be much appreciated. 
>>>
>>>P.S - Not diving much into Event Driven Processors. They look like something 
>>>which might clear my thoughts. But since they are marked experimental, would 
>>>be more interested in understanding
>>> the timer driven processors.
>>>
>>>Thanks,
>>>
>>>Chirag
>>>
>>>

RE: [EXT] Re: Polling Processors impact on Latency

2017-11-07 Thread Peter Wicks (pwicks)
If you schedule the processor to run every 0 sec (the default) then in my 
experience you won’t notice latency from polling at all. But I guess this 
depends on your expectations, volume, and over all Flow processing time.

Yes, event driven may help, but from what I’ve read it’s more about reducing 
server resource consumption than latency (could be wrong).

As for a hard set limit, there is a configuration entry in nifi.properties that 
seems relevant:

# If a component has no work to do (is "bored"), how long should we wait before 
checking again for work?
nifi.bored.yield.duration=10 millis

Thanks,
  Peter

From: Chirag Dewan [mailto:chirag.dewa...@yahoo.in]
Sent: Tuesday, November 07, 2017 8:02 PM
To: apere...@gmail.com; users@nifi.apache.org
Subject: [EXT] Re: Polling Processors impact on Latency

Thanks Andrew for the quick response.

I am more concerned about the processors polling for flow files on the 
connection between the processors?

Thanks,

Chirag
Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

On Tue, 7 Nov 2017 at 5:24 PM, Andrew Grande
<apere...@gmail.com<mailto:apere...@gmail.com>> wrote:

Yes, polling increases latency in some cases. But no, NiFi is not just polling. 
It has all kinds of sources, and listening vs polling vs subscribing purely 
depends on the protocol of that given processor.

Hope this helps,
Andrew

On Tue, Nov 7, 2017, 1:39 AM Chirag Dewan 
<chirag.dewa...@yahoo.in<mailto:chirag.dewa...@yahoo.in>> wrote:
Hi All,

I am a layman to NiFi. I am exploring NiFi as a data flow engine to be 
integrated with my Flink processing engine. A brief history of our approach :

We are trying to build a Streaming Data processing engine. We started off with 
Flink as the sole core engine, which is responsible for collection(through 
Flink Sources) as well as processing the data.

Soon we fumbled onto NiFi and the data flow world.

So far, my understanding is that the NiFi processors are poling processors and 
not Pub-Sub processors. That makes me wonder, whats the impact of polling on 
latency? I know I can configure my processors to tradeoff latency with 
throughput, but is there a hard set limit on the latency I can achieve using 
NiFi?

As I said, I am layman as yet. Perhaps my understanding is short here. Any 
leads would be much appreciated.

P.S - Not diving much into Event Driven Processors. They look like something 
which might clear my thoughts. But since they are marked experimental, would be 
more interested in understanding the timer driven processors.

Thanks,

Chirag