Hi Peter,

Thanks for letting us know you found a solution and for the additional context. 
Provenance performance is a key area of focus in the next couple releases, so 
hopefully we will have that fixed soon. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 20, 2016, at 19:39, Peter Wicks (pwicks) <pwi...@micron.com> wrote:
> 
> Andy/Bryan,
>  
> Thanks for all of the detail, it’s been helpful.
> I actually did an experiment this morning where I modified the processor to 
> force it to keep calling `get` until it had all 1 million FlowFiles.  Since I 
> was calling it sequentially it was able to move files out of swap and into 
> active on each request. I was able to retrieve them and process them through, 
> which was great until… NiFi tried to move them through provenance.  At that 
> point NiFi ran out of memory and fell over (stopped responding).  Right 
> before NiFi ran out of memory I received several bulletins related to 
> Provenance being written to too quickly, and that it was being slowed down.
>  
> I found another solution to my mass insert and got it up and running. Using a 
> Teradata JDBC proprietary flag called FastLoadCSV, and a new custom 
> processor, I was able to pass in a CSV file to my JDBC driver and get the 
> same result.  In this scenario there was just a single FlowFile and 
> everything went smoothly.
>  
> Thanks again!
>  
> Peter Wicks
>  
>  
>  
> From: Bryan Bende [mailto:bbe...@gmail.com] 
> Sent: Tuesday, September 20, 2016 3:38 PM
> To: users@nifi.apache.org
> Subject: Re: Requesting Obscene FlowFile Batch Sizes
>  
> Andy,
>  
> That was my thinking. An easy test might be to bump the threshold up to 100k 
> (increase heap if needed) and see if it starts grabbing 100k every time. 
>  
> If it does then I would think it is swapping related, then need to figure out 
> if you really want to get all 1 million in a single batch, and if theres 
> enough heap to support that.
>  
> -Bryan
>  
> On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto <alopre...@apache.org> wrote:
> Bryan,
>  
> That’s a good point. Would running with a larger Java heap and higher swap 
> threshold allow Peter to get larger batches out?
>  
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>  
> On Sep 20, 2016, at 1:41 PM, Bryan Bende <bbe...@gmail.com> wrote:
>  
> Peter,
>  
> Does 10k happen to be your swap threshold in nifi.properties by any chance 
> (it defaults to 20k I believe)?
>  
> I suspect the behavior you are seeing could be due to the way swapping works, 
> but Mark or others could probably confirm.
>  
> I found this thread where Mark explained how swapping works with a background 
> thread, and I believe it still works this way:
> http://apache-nifi.1125220.n5.nabble.com/Nifi-amp-Spark-receiver-performance-configuration-td524.html
>  
> -Bryan
>  
> On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks) <pwi...@micron.com> 
> wrote:
> I’m using JSONToSQL, followed by PutSQL.  I’m using Teradata, which supports 
> a special JDBC mode called FastLoad, designed for a minimum of 100,000 rows 
> of data per batch.
>  
> What I’m finding is that when PutSQL requests a new batch of FlowFiles from 
> the queue, which has over 1 million rows in it, with a batch size of 1000000, 
> it always returns a maximum of 10k.  How can I get my obscenely sized batch 
> request to return all the FlowFile’s I’m asking for?
>  
> Thanks,
>   Peter
>  
>  
>  

Reply via email to