Mans, The general pattern for something like this that works well is: - Capture - Split - Site-to-Site transfer back to same cluster which distributes the partitioned/split data to all nodes - Do work on smaller chunks
We often do exactly this sort of thing for larger scale geo enrichment for example. - Receive large batch of events on a given system (in a line oriented event model) - Run SplitText to break out each event - Use site-to-site to distribute them to the entire cluster - On each node receive split events then run geo enrichment - then send to Kafka as-is or aggregate and send to HDFS Does that make sense/help for your scenario? Thanks Joe On Fri, Jul 15, 2016 at 9:09 AM, M Singh <[email protected]> wrote: > Hey Folks: > > I am looking for information on how to split/partition input in a generic > way (say rows in a relational database, or lines in a file) and then process > each split on a different node in parallel in a Nifi cluster. I believe > there is a webinar from the Nifi team on this but am not able to find it > now. > > If someone has the documentation on this or link the webinar, please let me > know. > > Thanks > > Mans
