Jim,

we experienced 2k flowfiles per second on HandleHTTPRequest with 50 threads on 
the processor without issues, the issue was later in processors down the flow 
and primarily related to slow Disk-IO.

Best,

Seb

> Am 06.04.2017 um 12:00 schrieb James McMahon <[email protected]>:
> 
> Intriguing. I'm one of those who have employed the "single flowfile" 
> approach. I'm certainly willing to test out this refinement.
> So to press your point, this is more efficient than setting the processor's 
> "Concurrent tasks" to 10 because it assumes the burden of initialization for 
> ExecuteScript once, rather than using the processor configuration parm (which 
> presumably assumes that initialization burden ten times)?
> 
> I currently set "Concurrent tasks" to 50.  The logjam I am seeing is not in 
> my ExecuteScript processor. My delay is definitely a non-steady, "non-fast" 
> stream of data at my HandleHttpRequest processor, the first processor in my 
> workflow. Why that is the case is a mystery we've yet to resolve.
> 
> One thing I'd welcome is some idea of what is a reasonable expectation for 
> requests handled by HandleHttpRequest in an hour? Maybe 1500 in an hour is 
> low, high, or perhaps it is entirely reasonable. We really have little 
> insight. Any empirical data from user practical experience would be most 
> welcome. 
> 
> Also, I added a second HandleHttpRequest fielding requests on a second port. 
> I did not see any level of improved throughput. Why might that be? My 
> expectation was that with two doors open rather than one, I'd see some more 
> influx of data.
> 
> Thank you.
> - Jim
> 
>> On Wed, Apr 5, 2017 at 4:26 PM, Scott Wagner <[email protected]> 
>> wrote:
>> One of my experiences is that when using ExecuteScript and Python is that 
>> having an ExecuteScript that works on an individual FlowFile when you have 
>> multiple in the input queue is very inefficient, even when you set it to a 
>> timer of 0  sec.
>> 
>> Instead, I have the following in all of my Python scripts:
>> 
>> flowFiles = session.get(10)
>> for flowFile in flowFiles:
>>     if flowFile is None:
>>         continue
>>     # Do stuff here
>> 
>> That seems to improve the throughput of the ExecuteScript processor 
>> dramatically.
>> 
>> YMMV
>> 
>> - Scott
>>> James McMahon Wednesday, April 5, 2017 12:48 PM
>>> I am receiving POSTs from a Pentaho process, delivering files to my NiFi 
>>> 0.7.x workflow HandleHttpRequest processor. That processor hands the 
>>> flowfile off to an ExecuteScript processor that runs a python script. This 
>>> script is very, very simple: it takes an incoming JSO object and loads it 
>>> into a Python dictionary, and verifies the presence of required fields 
>>> using simple has_key checks on the dictionary. There are only eight fields 
>>> in the incoming JSON object.
>>> 
>>> The throughput for these two processes is not exceeding 100-150 files in 
>>> five minutes. It seems very slow in light of the minimal processing going 
>>> on in these two steps.
>>> 
>>> I notice that there are configuration operations seemingly related to 
>>> optimizing performance. "Concurrent tasks", for example,  is only set by 
>>> default to 1 for each processor.
>>> 
>>> What performance optimizations at the processor level do users recommend? 
>>> Is it advisable to crank up the concurrent tasks for a processor, and is 
>>> there an optimal performance point beyond which you should not crank up 
>>> that value? Are there trade-offs?
>>> 
>>> I am particularly interested in optimizations for HandleHttpRequest and 
>>> ExecuteScript processors.
>>> 
>>> Thanks in advance for your thoughts.
>>> 
>>> cheers,
>>> 
>>> Jim
>> 
> 

Reply via email to