RE: Requesting Obscene FlowFile Batch Sizes

Peter Wicks (pwicks) Tue, 20 Sep 2016 19:40:05 -0700

Andy/Bryan,

Thanks for all of the detail, it’s been helpful.
I actually did an experiment this morning where I modified the processor to 
force it to keep calling `get` until it had all 1 million FlowFiles.  Since I 
was calling it sequentially it was able to move files out of swap and into 
active on each request. I was able to retrieve them and process them through, 
which was great until… NiFi tried to move them through provenance.  At that 
point NiFi ran out of memory and fell over (stopped responding).  Right before 
NiFi ran out of memory I received several bulletins related to Provenance being 
written to too quickly, and that it was being slowed down.


I found another solution to my mass insert and got it up and running. Using a 
Teradata JDBC proprietary flag called FastLoadCSV, and a new custom processor, 
I was able to pass in a CSV file to my JDBC driver and get the same result.  In 
this scenario there was just a single FlowFile and everything went smoothly.

Thanks again!

Peter Wicks



From: Bryan Bende [mailto:[email protected]]
Sent: Tuesday, September 20, 2016 3:38 PM
To: [email protected]
Subject: Re: Requesting Obscene FlowFile Batch Sizes

Andy,

That was my thinking. An easy test might be to bump the threshold up to 100k 
(increase heap if needed) and see if it starts grabbing 100k every time.

If it does then I would think it is swapping related, then need to figure out 
if you really want to get all 1 million in a single batch, and if theres enough 
heap to support that.

-Bryan

On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto 
<[email protected]<mailto:[email protected]>> wrote:
Bryan,

That’s a good point. Would running with a larger Java heap and higher swap 
threshold allow Peter to get larger batches out?

Andy LoPresto
[email protected]<mailto:[email protected]>
[email protected]<mailto:[email protected]>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Sep 20, 2016, at 1:41 PM, Bryan Bende 
<[email protected]<mailto:[email protected]>> wrote:

Peter,

Does 10k happen to be your swap threshold in nifi.properties by any chance (it 
defaults to 20k I believe)?

I suspect the behavior you are seeing could be due to the way swapping works, 
but Mark or others could probably confirm.

I found this thread where Mark explained how swapping works with a background 
thread, and I believe it still works this way:
http://apache-nifi.1125220.n5.nabble.com/Nifi-amp-Spark-receiver-performance-configuration-td524.html

-Bryan

On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks) 
<[email protected]<mailto:[email protected]>> wrote:
I’m using JSONToSQL, followed by PutSQL.  I’m using Teradata, which supports a 
special JDBC mode called FastLoad, designed for a minimum of 100,000 rows of 
data per batch.

What I’m finding is that when PutSQL requests a new batch of FlowFiles from the 
queue, which has over 1 million rows in it, with a batch size of 1000000, it 
always returns a maximum of 10k.  How can I get my obscenely sized batch 
request to return all the FlowFile’s I’m asking for?

Thanks,
  Peter

RE: Requesting Obscene FlowFile Batch Sizes

Reply via email to