Mark,Joe,
I’d spotted the ‘correlation attribute’ and I’d set that to a uuid relating to
the incoming request, that much is fine. The problem I had was how the set
things up so that MergeContent knows when it has got all the files. The options
seemed to be batch size or time related, neither
Just to build on what Joe said - the correlation attribute can be used to
ensure that FlowFiles
are bunched together appropriately. So you could, for instance, create an
attribute named "batch.id"
and set it as UUID. So if you perform a query against your database, you can
generate a new UUID
Richard,
MergeContent supports a concept called 'correlation attribute' which
will merge things together based on a matching correlation attribute
value. That might be useful for your case.
Regarding heap use you are observing i'd be happy to work through that
more with you.
Thanks
Joe
On
Hi Mark,
Thanks for pointer, I’d not spotted I was losing my provenance information. I’d
changed my code from transferring the temporary FlowFiles to a relationship
that was auto-terminated to using session.remove() and had assumed that the
provenance report was the same. I’ve just tested it
Richard,
So the order of the children may be important for some people. It certainly is
reasonable to care
about the order in which the children were created.
The larger concern, though, would be that if we moved to a Set such as HashSet,
the difference
in the amount of heap consumed is pretty
Hi,
I’m trying to track down a performance problem that I’ve spotted with a custom
NiFi processor that I’ve written. When triggered by an incoming FlowFile, the
processor loads many (up to about 500,000) records from a database and produces
an output file in a custom format. I’m trying to