Re: Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Richard Miskin
Mark,Joe, I’d spotted the ‘correlation attribute’ and I’d set that to a uuid relating to the incoming request, that much is fine. The problem I had was how the set things up so that MergeContent knows when it has got all the files. The options seemed to be batch size or time related, neither

Re: Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Mark Payne
Just to build on what Joe said - the correlation attribute can be used to ensure that FlowFiles are bunched together appropriately. So you could, for instance, create an attribute named "batch.id" and set it as UUID. So if you perform a query against your database, you can generate a new UUID

Re: Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Joe Witt
Richard, MergeContent supports a concept called 'correlation attribute' which will merge things together based on a matching correlation attribute value. That might be useful for your case. Regarding heap use you are observing i'd be happy to work through that more with you. Thanks Joe On

Re: Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Richard Miskin
Hi Mark, Thanks for pointer, I’d not spotted I was losing my provenance information. I’d changed my code from transferring the temporary FlowFiles to a relationship that was auto-terminated to using session.remove() and had assumed that the provenance report was the same. I’ve just tested it

Re: Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Mark Payne
Richard, So the order of the children may be important for some people. It certainly is reasonable to care about the order in which the children were created. The larger concern, though, would be that if we moved to a Set such as HashSet, the difference in the amount of heap consumed is pretty

Use of List in StandardProvenanceEventRecord.Builder

2016-04-15 Thread Richard Miskin
Hi, I’m trying to track down a performance problem that I’ve spotted with a custom NiFi processor that I’ve written. When triggered by an incoming FlowFile, the processor loads many (up to about 500,000) records from a database and produces an output file in a custom format. I’m trying to