thx, the flow is executing as expected now. On Mon, Jul 2, 2018 at 10:09 AM, Matt Burgess <[email protected]> wrote:
> Joe, > > Only the first (source) processor needs to be set to Primary Node > Only. Once that happens, the flow files will only proceed down the > flow on the primary node, so step 5 will also only run on the primary > node. In order to redistribute the flow files among the cluster, > you'll want a Remote Process Group to point back to an Input Port on > your cluster, between steps 4 & 5. From that point on, the flow files > will be distributed among the nodes and the downstream flow (steps > 5-7) will run on all the nodes. > > Regards, > Matt > > On Mon, Jul 2, 2018 at 10:05 AM Joe Trite <[email protected]> wrote: > > > > I have a question/need confirmation about cluster execution. I have a 3 > node - 1.6 NiFi cluster. My use case is extracting data from Hive and > deposting it into an RDBMS. Here is my flow. > > > > 1. SelectHiveQL - executes a "show paritions" command. > > 2. SplitText - splits the returned partition (7) into individual > flowFiles > > 3. ExtractText - populates a 'partition_info' attribute > > 4. UpdateAttribute - reformat the 'partition_info' into sql syntax > > 5. SelectHiveQL - executes the "SELECT" against hive with the provided > 'partition_info' as the WHERE clause. > > 6. SplitAvro - chunks the data info bit-size peices. > > 7. PutDatabaseRecord - INSERT into the db. > > > > Processors 1-4 are set to 'Primary Node' only. 5-7 are set to 'All > Nodes'. All processors are set to 1 concurrent task. > > > > The question is around what happens in step 5. I see the 7 > 'partition_info' flowFiles in the queue after step 4 completes and they > seem to get executed one-at-a-time in step 5, atleast from viewing the > queue drain. I would expect that step 5 would execute on each on the nodes > (3) and that i would see the queue drain in 3's, is this assumption correct > and maybe I have something misconfigured? > > > > I do see in the provenance data that all 3 nodes did process a flowFile, > I am just expecting it to happen in parallel. > > > > I did see this article about distribution but don't think it is required > for this use case to work: > > https://community.hortonworks.com/articles/16120/how-do-i- > distribute-data-across-a-nifi-cluster.html > > > > Thanks > > Joe > > > > >
