Thanks Bryan. I will check it.
On Friday, July 15, 2016 9:49 AM, Bryan Bende <[email protected]> wrote:
Hi Mans,
Not sure if this is what you are referring to, but there is a diagram in this
article that shows how this would work for fetching from HDFS in
parallel:https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
It is more from the logical point of view, rather than how to actually
configure step-by-step in NiFi.
-Bryan
On Fri, Jul 15, 2016 at 12:42 PM, M Singh <[email protected]> wrote:
Hi Joe:
Thanks for the info.
I believe one of the Nifi team members had a webinar/presentation on it or
something very similar. If you have a reference for that, please let me know.
Thanks again for your help.
On Friday, July 15, 2016 6:37 AM, Joe Witt <[email protected]> wrote:
Mans,
The general pattern for something like this that works well is:
- Capture
- Split
- Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
- Do work on smaller chunks
We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS
Does that make sense/help for your scenario?
Thanks
Joe
On Fri, Jul 15, 2016 at 9:09 AM, M Singh <[email protected]> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster. I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans