Thanks Joe.

On Sun, May 1, 2016 at 2:55 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Igor,
>
> There is no automatic failover of the the node that is considered
> primary.  For the upcoming 1.x release though this has been addressed
> https://issues.apache.org/jira/browse/NIFI-483
>
> Thanks
> Joe
>
> On Sun, May 1, 2016 at 2:36 PM, Igor Kravzov <igork.ine...@gmail.com>
> wrote:
> > Thanks Aldrin for the repose.
> > What didn't fully understand from documentation: is automatic fail-over
> > implemented? I would rather configure entire workflow to run "On primary
> > node".
> >
> >
> > On Sun, May 1, 2016 at 1:31 PM, Aldrin Piri <aldrinp...@gmail.com>
> wrote:
> >>
> >> Igor,
> >>
> >> Your thoughts are correct, and without any additional configuration, the
> >> GetTwitter processor would run on both nodes.  The way to avoid this is
> to
> >> select the "On primary node" scheduling strategy which would only have
> the
> >> processor run on whichever node is currently primary.
> >>
> >> PutHDFS has similar semantics but these would likely be desired.
> Consider
> >> where data is partitioned across each of the nodes.  PutHDFS would then
> need
> >> to run on each node to ensure the data is delivered to HDFS.  The
> property
> >> you list is that of where the data should land on the configured HDFS
> >> instance.  Often times this is done via Expression Language (EL) to get
> the
> >> familiar time slicing of resources when persisted such as
> >> ${now():format('yyyy/MM/dd/HH')}.  You could additionally have directory
> >> structure that mirrors the data making use of attributes the files may
> have
> >> gained as they made their way through your flow or an UpdateAttribute
> to set
> >> a property, such as "hadoop.dest.dir", that is used by the final PutHDFS
> >> property to give a dynamic location on a per FlowFile basis.
> >>
> >> Let us know if you have additional questions or if things are unclear.
> >>
> >> --aldrin
> >>
> >>
> >> On Sun, May 1, 2016 at 1:20 PM, Igor Kravzov <igork.ine...@gmail.com>
> >> wrote:
> >>>
> >>> If I understand correctly in cluster mode the same dataflow runs on all
> >>> the notes.
> >>> So let's say I have a simple dataflow with GetTwitter and PutHDFS
> >>> processors. And one NCM + 2 nodes.
> >>> Does it actually that mean the GetTwitter will be called independently
> >>> and potentially simultaneously on each node and there may be duplicate
> >>> results?
> >>> How about PutHDFS processor?  To where "hadoop configuration resources"
> >>> "parent HDFS directory" should point to in each node?
> >>
> >>
> >
>

Reply via email to