Thanks for all of the detail, it’s been helpful.
I actually did an experiment this morning where I modified the processor to
force it to keep calling `get` until it had all 1 million FlowFiles. Since I
was calling it sequentially it was able to move files out of swap and into
active on each request. I was able to retrieve them and process them through,
which was great until… NiFi tried to move them through provenance. At that
point NiFi ran out of memory and fell over (stopped responding). Right before
NiFi ran out of memory I received several bulletins related to Provenance being
written to too quickly, and that it was being slowed down.
I found another solution to my mass insert and got it up and running. Using a
Teradata JDBC proprietary flag called FastLoadCSV, and a new custom processor,
I was able to pass in a CSV file to my JDBC driver and get the same result. In
this scenario there was just a single FlowFile and everything went smoothly.
From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Tuesday, September 20, 2016 3:38 PM
Subject: Re: Requesting Obscene FlowFile Batch Sizes
That was my thinking. An easy test might be to bump the threshold up to 100k
(increase heap if needed) and see if it starts grabbing 100k every time.
If it does then I would think it is swapping related, then need to figure out
if you really want to get all 1 million in a single batch, and if theres enough
heap to support that.
On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto
That’s a good point. Would running with a larger Java heap and higher swap
threshold allow Peter to get larger batches out?
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
On Sep 20, 2016, at 1:41 PM, Bryan Bende
Does 10k happen to be your swap threshold in nifi.properties by any chance (it
defaults to 20k I believe)?
I suspect the behavior you are seeing could be due to the way swapping works,
but Mark or others could probably confirm.
I found this thread where Mark explained how swapping works with a background
thread, and I believe it still works this way:
On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks)
I’m using JSONToSQL, followed by PutSQL. I’m using Teradata, which supports a
special JDBC mode called FastLoad, designed for a minimum of 100,000 rows of
data per batch.
What I’m finding is that when PutSQL requests a new batch of FlowFiles from the
queue, which has over 1 million rows in it, with a batch size of 1000000, it
always returns a maximum of 10k. How can I get my obscenely sized batch
request to return all the FlowFile’s I’m asking for?