Re: Requesting Obscene FlowFile Batch Sizes

Andy LoPresto Tue, 20 Sep 2016 23:23:00 -0700

Hi Peter,

Thanks for letting us know you found a solution and for the additional context. 
Provenance performance is a key area of focus in the next couple releases, so 
hopefully we will have that fixed soon.


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 20, 2016, at 19:39, Peter Wicks (pwicks) <[email protected]> wrote:
> 
> Andy/Bryan,
>  
> Thanks for all of the detail, it’s been helpful.
> I actually did an experiment this morning where I modified the processor to 
> force it to keep calling `get` until it had all 1 million FlowFiles.  Since I 
> was calling it sequentially it was able to move files out of swap and into 
> active on each request. I was able to retrieve them and process them through, 
> which was great until… NiFi tried to move them through provenance.  At that 
> point NiFi ran out of memory and fell over (stopped responding).  Right 
> before NiFi ran out of memory I received several bulletins related to 
> Provenance being written to too quickly, and that it was being slowed down.
>  
> I found another solution to my mass insert and got it up and running. Using a 
> Teradata JDBC proprietary flag called FastLoadCSV, and a new custom 
> processor, I was able to pass in a CSV file to my JDBC driver and get the 
> same result.  In this scenario there was just a single FlowFile and 
> everything went smoothly.
>  
> Thanks again!
>  
> Peter Wicks
>  
>  
>  
> From: Bryan Bende [mailto:[email protected]] 
> Sent: Tuesday, September 20, 2016 3:38 PM
> To: [email protected]
> Subject: Re: Requesting Obscene FlowFile Batch Sizes
>  
> Andy,
>  
> That was my thinking. An easy test might be to bump the threshold up to 100k 
> (increase heap if needed) and see if it starts grabbing 100k every time. 
>  
> If it does then I would think it is swapping related, then need to figure out 
> if you really want to get all 1 million in a single batch, and if theres 
> enough heap to support that.
>  
> -Bryan
>  
> On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto <[email protected]> wrote:
> Bryan,
>  
> That’s a good point. Would running with a larger Java heap and higher swap 
> threshold allow Peter to get larger batches out?
>  
> Andy LoPresto
> [email protected]
> [email protected]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>  
> On Sep 20, 2016, at 1:41 PM, Bryan Bende <[email protected]> wrote:
>  
> Peter,
>  
> Does 10k happen to be your swap threshold in nifi.properties by any chance 
> (it defaults to 20k I believe)?
>  
> I suspect the behavior you are seeing could be due to the way swapping works, 
> but Mark or others could probably confirm.
>  
> I found this thread where Mark explained how swapping works with a background 
> thread, and I believe it still works this way:
> http://apache-nifi.1125220.n5.nabble.com/Nifi-amp-Spark-receiver-performance-configuration-td524.html
>  
> -Bryan
>  
> On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks) <[email protected]> 
> wrote:
> I’m using JSONToSQL, followed by PutSQL.  I’m using Teradata, which supports 
> a special JDBC mode called FastLoad, designed for a minimum of 100,000 rows 
> of data per batch.
>  
> What I’m finding is that when PutSQL requests a new batch of FlowFiles from 
> the queue, which has over 1 million rows in it, with a batch size of 1000000, 
> it always returns a maximum of 10k.  How can I get my obscenely sized batch 
> request to return all the FlowFile’s I’m asking for?
>  
> Thanks,
>   Peter
>  
>  
>

Re: Requesting Obscene FlowFile Batch Sizes

Reply via email to