Hi Mans, Not sure if this is what you are referring to, but there is a diagram in this article that shows how this would work for fetching from HDFS in parallel: https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
It is more from the logical point of view, rather than how to actually configure step-by-step in NiFi. -Bryan On Fri, Jul 15, 2016 at 12:42 PM, M Singh <[email protected]> wrote: > Hi Joe: > > Thanks for the info. > > I believe one of the Nifi team members had a webinar/presentation on it or > something very similar. If you have a reference for that, please let me > know. > > Thanks again for your help. > > > On Friday, July 15, 2016 6:37 AM, Joe Witt <[email protected]> wrote: > > > Mans, > > The general pattern for something like this that works well is: > - Capture > - Split > - Site-to-Site transfer back to same cluster which distributes the > partitioned/split data to all nodes > - Do work on smaller chunks > > We often do exactly this sort of thing for larger scale geo enrichment > for example. > - Receive large batch of events on a given system (in a line oriented > event model) > - Run SplitText to break out each event > - Use site-to-site to distribute them to the entire cluster > - On each node receive split events then run geo enrichment > - then send to Kafka as-is or aggregate and send to HDFS > > Does that make sense/help for your scenario? > > Thanks > Joe > > > On Fri, Jul 15, 2016 at 9:09 AM, M Singh <[email protected]> wrote: > > Hey Folks: > > > > I am looking for information on how to split/partition input in a generic > > way (say rows in a relational database, or lines in a file) and then > process > > each split on a different node in parallel in a Nifi cluster. I believe > > there is a webinar from the Nifi team on this but am not able to find it > > now. > > > > If someone has the documentation on this or link the webinar, please let > me > > know. > > > > Thanks > > > > Mans > > >
