Hi Peter, Thanks for letting us know you found a solution and for the additional context. Provenance performance is a key area of focus in the next couple releases, so hopefully we will have that fixed soon.
Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Sep 20, 2016, at 19:39, Peter Wicks (pwicks) <[email protected]> wrote: > > Andy/Bryan, > > Thanks for all of the detail, it’s been helpful. > I actually did an experiment this morning where I modified the processor to > force it to keep calling `get` until it had all 1 million FlowFiles. Since I > was calling it sequentially it was able to move files out of swap and into > active on each request. I was able to retrieve them and process them through, > which was great until… NiFi tried to move them through provenance. At that > point NiFi ran out of memory and fell over (stopped responding). Right > before NiFi ran out of memory I received several bulletins related to > Provenance being written to too quickly, and that it was being slowed down. > > I found another solution to my mass insert and got it up and running. Using a > Teradata JDBC proprietary flag called FastLoadCSV, and a new custom > processor, I was able to pass in a CSV file to my JDBC driver and get the > same result. In this scenario there was just a single FlowFile and > everything went smoothly. > > Thanks again! > > Peter Wicks > > > > From: Bryan Bende [mailto:[email protected]] > Sent: Tuesday, September 20, 2016 3:38 PM > To: [email protected] > Subject: Re: Requesting Obscene FlowFile Batch Sizes > > Andy, > > That was my thinking. An easy test might be to bump the threshold up to 100k > (increase heap if needed) and see if it starts grabbing 100k every time. > > If it does then I would think it is swapping related, then need to figure out > if you really want to get all 1 million in a single batch, and if theres > enough heap to support that. > > -Bryan > > On Tue, Sep 20, 2016 at 5:29 PM, Andy LoPresto <[email protected]> wrote: > Bryan, > > That’s a good point. Would running with a larger Java heap and higher swap > threshold allow Peter to get larger batches out? > > Andy LoPresto > [email protected] > [email protected] > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Sep 20, 2016, at 1:41 PM, Bryan Bende <[email protected]> wrote: > > Peter, > > Does 10k happen to be your swap threshold in nifi.properties by any chance > (it defaults to 20k I believe)? > > I suspect the behavior you are seeing could be due to the way swapping works, > but Mark or others could probably confirm. > > I found this thread where Mark explained how swapping works with a background > thread, and I believe it still works this way: > http://apache-nifi.1125220.n5.nabble.com/Nifi-amp-Spark-receiver-performance-configuration-td524.html > > -Bryan > > On Tue, Sep 20, 2016 at 10:22 AM, Peter Wicks (pwicks) <[email protected]> > wrote: > I’m using JSONToSQL, followed by PutSQL. I’m using Teradata, which supports > a special JDBC mode called FastLoad, designed for a minimum of 100,000 rows > of data per batch. > > What I’m finding is that when PutSQL requests a new batch of FlowFiles from > the queue, which has over 1 million rows in it, with a batch size of 1000000, > it always returns a maximum of 10k. How can I get my obscenely sized batch > request to return all the FlowFile’s I’m asking for? > > Thanks, > Peter > > >
