Joe,

Only the first (source) processor needs to be set to Primary Node
Only. Once that happens, the flow files will only proceed down the
flow on the primary node, so step 5 will also only run on the primary
node. In order to redistribute the flow files among the cluster,
you'll want a Remote Process Group to point back to an Input Port on
your cluster, between steps 4 & 5. From that point on, the flow files
will be distributed among the nodes and the downstream flow (steps
5-7) will run on all the nodes.

Regards,
Matt

On Mon, Jul 2, 2018 at 10:05 AM Joe Trite <[email protected]> wrote:
>
> I have a question/need confirmation about cluster execution.  I have a 3 node 
> - 1.6 NiFi cluster.  My use case is extracting data from Hive and deposting 
> it into an RDBMS.  Here is my flow.
>
> 1. SelectHiveQL - executes a "show paritions" command.
> 2. SplitText - splits the returned partition (7) into individual flowFiles
> 3. ExtractText - populates a 'partition_info' attribute
> 4. UpdateAttribute - reformat the 'partition_info' into sql syntax
> 5. SelectHiveQL - executes the "SELECT" against hive with the provided 
> 'partition_info' as the WHERE clause.
> 6. SplitAvro - chunks the data info bit-size peices.
> 7. PutDatabaseRecord - INSERT into the db.
>
> Processors 1-4 are set to 'Primary Node' only.  5-7 are set to 'All Nodes'.  
> All processors are set to 1 concurrent task.
>
> The question is around what happens in step 5.  I see the 7 'partition_info' 
> flowFiles in the queue after step 4 completes and they seem to get executed 
> one-at-a-time in step 5, atleast from viewing the queue drain.  I would 
> expect that step 5 would execute on each on the nodes (3) and that i would 
> see the queue drain in 3's, is this assumption correct and maybe I have 
> something misconfigured?
>
> I do see in the provenance data that all 3 nodes did process a flowFile, I am 
> just expecting it to happen in parallel.
>
> I did see this article about distribution but don't think it is required for 
> this use case to work:
> https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
>
> Thanks
> Joe
>
>

Reply via email to