Hi Mans,

Not sure if this is what you are referring to, but there is a diagram in
this article that shows how this would work for fetching from HDFS in
parallel:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

It is more from the logical point of view, rather than how to actually
configure step-by-step in NiFi.

-Bryan

On Fri, Jul 15, 2016 at 12:42 PM, M Singh <[email protected]> wrote:

> Hi Joe:
>
> Thanks for the info.
>
> I believe one of the Nifi team members had a webinar/presentation on it or
> something very similar.  If you have a reference for that, please let me
> know.
>
> Thanks again for your help.
>
>
> On Friday, July 15, 2016 6:37 AM, Joe Witt <[email protected]> wrote:
>
>
> Mans,
>
> The general pattern for something like this that works well is:
> - Capture
> - Split
> - Site-to-Site transfer back to same cluster which distributes the
> partitioned/split data to all nodes
> - Do work on smaller chunks
>
> We often do exactly this sort of thing for larger scale geo enrichment
> for example.
> - Receive large batch of events on a given system (in a line oriented
> event model)
> - Run SplitText to break out each event
> - Use site-to-site to distribute them to the entire cluster
> - On each node receive split events then run geo enrichment
> - then send to Kafka as-is or aggregate and send to HDFS
>
> Does that make sense/help for your scenario?
>
> Thanks
> Joe
>
>
> On Fri, Jul 15, 2016 at 9:09 AM, M Singh <[email protected]> wrote:
> > Hey Folks:
> >
> > I am looking for information on how to split/partition input in a generic
> > way (say rows in a relational database, or lines in a file) and then
> process
> > each split on a different node in parallel in a Nifi cluster.  I believe
> > there is a webinar from the Nifi team on this but am not able to find it
> > now.
> >
> > If someone has the documentation on this or link the webinar, please let
> me
> > know.
> >
> > Thanks
> >
> > Mans
>
>
>

Reply via email to