Hi Joe:
Thanks for the info.  
I believe one of the Nifi team members had a webinar/presentation on it or 
something very similar.  If you have a reference for that, please let me know.
Thanks again for your help. 

    On Friday, July 15, 2016 6:37 AM, Joe Witt <[email protected]> wrote:
 

 Mans,

The general pattern for something like this that works well is:
 - Capture
 - Split
 - Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
 - Do work on smaller chunks

We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS

Does that make sense/help for your scenario?

Thanks
Joe


On Fri, Jul 15, 2016 at 9:09 AM, M Singh <[email protected]> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster.  I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans


  

Reply via email to