Louis-Etienne,

My initial thought is your idea with MergeContent is the right one.
However, the issue there is not just the combining of the data but the
'what does merging truly mean in that case'.  So it is a bit undefined
what the next step will be.  Merge the content?  If so, how?  What is
the format and schema of the objects before the merge and after?

Another member of the community had an idea for a concept of a
HoldProcessor.  It would allow these sorts of multi-object gates to
occur.  The same issue exists of what to do once the gate criteria is
hit but at that point you'd have more control over it.  MergeContent
is an already prescribed set of behaviors whereas HoldContent would
let you choose the next gate.  We really should get on with helping
get that contribution in.

Thanks
Joe

On Sun, Dec 6, 2015 at 9:35 PM, Louis-Étienne Dorval <[email protected]> wrote:
> Hi everyone!
>
> I'm very excited to start using NiFi and I think that it will be very
> usefull for a some projects.
>
> I've been playing with it for some times and did a few basic flow, but I'm
> having a hard time figuring how to achieve a part of my flow or if NiFi will
> be able to do it.
> I'm building a flow around existing systems, so NiFi would run in parallel
> of that and gather the output of these systems (everything is asynchronous)
> to take actions.
>
> Everything starts with a GetJMSTopic on Topic1, then follows 2-3 processor
> that does Attribute Extractions.
> During that time the existing system will process the same message, enrich
> the message (but also remove some usefull information) and will publish on
> Topic2.
> I need the message from Topic2, so I've added another GetJMSTopic on Topic2.
> Then I need to somehow take the FlowFile from Topic1 and from Topic2,
> "merge" them together in order to have the attributes from both FlowFiles.
> After that I will probably need to use the GetMongo to access some
> information. This will probably create a new FlowFile that I need to "merge"
> with the others.
> Then I'll put that in HBase or something else, not sure yet.
>
> The part that I'm not sure how to solve is the "merge" of multiples
> FlowFile, I hesitate between using the MergeContent processor and the
> DetectDuplicate:
>
> MergeContent seems like what I needs but the existing systems might add some
> latency (and it will increase when there's a lot of publish on Topic1) so I
> would need to increase the 'Maximum number of Bins'.
> It will probably affect the performance of the system but how bad?
> DetectDuplicate, it would feel akward to use that since it's not really a
> duplicate, but it would be more lightweight (only keeps a hash). But will I
> be able to find the previous FlowFile with "original.flowfile.description" ?
>
>
> Let me know if there's another option that I didn't look into.
> Or maybe my problem is really trivial but I need to change my perspective on
> it...
>
> Best regards,
> Louis-Etienne

Reply via email to