Load-balancing web api in cluster

2016-12-19 Thread Hart, Greg
Hi all,

What¹s the recommended way for communicating with the NiFi REST API in a
cluster? I see that NiFi uses ZooKeeper so is it possible to get the
Cluster Coordinator hostname and API port from ZooKeeper, or should I use
something like haproxy?

Thanks!
-Greg



Re: DetectDuplicate

2016-12-19 Thread Andrew Grande
Juan, no change from how you remember this processor yet. I personally
would love to have a more pluggable backend for it, too.

Andrew

On Mon, Dec 19, 2016, 2:35 PM Juan Sequeiros  wrote:

> Hello,
>
> I am wondering if DetectDuplicate still has single dependency on
> Distributed Cache Service?
> And if so can I assume that DetectDuplicate will fail if Distributed Cache
> server is down?
>
>
> I want to replace our DetectDuplicate solution "external DB" and use
> NIFI's but single point reliance on Cache server is a blocker. Not sure if
> I am missing something possibly now it uses zookeeper?
>
>
>


DetectDuplicate

2016-12-19 Thread Juan Sequeiros
Hello,

I am wondering if DetectDuplicate still has single dependency on
Distributed Cache Service?
And if so can I assume that DetectDuplicate will fail if Distributed Cache
server is down?


I want to replace our DetectDuplicate solution "external DB" and use NIFI's
but single point reliance on Cache server is a blocker. Not sure if I am
missing something possibly now it uses zookeeper?


Re: merge flowfiles

2016-12-19 Thread Raf Huys
Yeah, I have indeed no clue as to when all flowfiles are landed. Somehow I
need to figure out when that attribute changed, and act upon that event.

Currently looking at the FlowfileAggregationProcessor.

On Mon, Dec 19, 2016 at 6:29 PM, Lee Laim  wrote:

> Raf,
>
> You might be able to use PutFile and 'merge' your flowfiles in a temporary
> batch directory.  Once you are confident that all the flow files have
> landed, you can pull the contents of the directory.
>
> In other words, when a new directory shows up, pull the contents of the
> older directory back into the NiFi flow, then delete the old directory.
> This method provides some merging versatility, but will interrupt
> provenance as the flowfiles will be given a new UUID when brought back into
> the flow.
>
> Lee
>
>
>
>
> On Mon, Dec 19, 2016 at 9:41 AM, Jeff  wrote:
>
>> Hello Raf,
>>
>> MergeContent can merge based on a correlation ID (attribute).  However,
>> the merging currently operates in two modes: Defragment or Bin-Packing
>> Algorithm.  Defragment is completed by defragmenting based on the
>> correlation ID and a known number of fragments.  Bin-Packing Algorithm is
>> completed based on a min or max age of a "bin", and/or after a certain
>> number of flowfiles have been received.
>>
>> Based on your question, I'm assuming you will not know how many flowfiles
>> you'd be merging per attribute, so I'm not sure that MergeContent will work
>> for your use case.  Depending on how quickly you want those files merged
>> and sent downstream, a max bin age might work for you, though.  There is a
>> JIRA for implementing a more general-case aggregation processor [1].
>>
>> With some more details around your scenario we might be able to figure
>> out how to get it to work for you with the standard processors.
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1926
>>
>> On Mon, Dec 19, 2016 at 10:23 AM Raf Huys  wrote:
>>
>>> I want to batch incoming flowfiles based on an attribute. As soon as
>>> this attributes' value changes, the current batch should be transferred
>>> downstream and be reset. So basically I'm looking for a tumbling window.
>>>
>>> Can this be done with the MergeContent processor (which strategy?) or
>>> should I write my own processor?
>>>
>>>
>>> --
>>> tx
>>>
>>>
>


-- 
Mvg,

Raf Huys


Re: merge flowfiles

2016-12-19 Thread Lee Laim
Raf,

You might be able to use PutFile and 'merge' your flowfiles in a temporary
batch directory.  Once you are confident that all the flow files have
landed, you can pull the contents of the directory.

In other words, when a new directory shows up, pull the contents of the
older directory back into the NiFi flow, then delete the old directory.
This method provides some merging versatility, but will interrupt
provenance as the flowfiles will be given a new UUID when brought back into
the flow.

Lee




On Mon, Dec 19, 2016 at 9:41 AM, Jeff  wrote:

> Hello Raf,
>
> MergeContent can merge based on a correlation ID (attribute).  However,
> the merging currently operates in two modes: Defragment or Bin-Packing
> Algorithm.  Defragment is completed by defragmenting based on the
> correlation ID and a known number of fragments.  Bin-Packing Algorithm is
> completed based on a min or max age of a "bin", and/or after a certain
> number of flowfiles have been received.
>
> Based on your question, I'm assuming you will not know how many flowfiles
> you'd be merging per attribute, so I'm not sure that MergeContent will work
> for your use case.  Depending on how quickly you want those files merged
> and sent downstream, a max bin age might work for you, though.  There is a
> JIRA for implementing a more general-case aggregation processor [1].
>
> With some more details around your scenario we might be able to figure out
> how to get it to work for you with the standard processors.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1926
>
> On Mon, Dec 19, 2016 at 10:23 AM Raf Huys  wrote:
>
>> I want to batch incoming flowfiles based on an attribute. As soon as this
>> attributes' value changes, the current batch should be transferred
>> downstream and be reset. So basically I'm looking for a tumbling window.
>>
>> Can this be done with the MergeContent processor (which strategy?) or
>> should I write my own processor?
>>
>>
>> --
>> tx
>>
>>


Re: merge flowfiles

2016-12-19 Thread Jeff
Hello Raf,

MergeContent can merge based on a correlation ID (attribute).  However, the
merging currently operates in two modes: Defragment or Bin-Packing
Algorithm.  Defragment is completed by defragmenting based on the
correlation ID and a known number of fragments.  Bin-Packing Algorithm is
completed based on a min or max age of a "bin", and/or after a certain
number of flowfiles have been received.

Based on your question, I'm assuming you will not know how many flowfiles
you'd be merging per attribute, so I'm not sure that MergeContent will work
for your use case.  Depending on how quickly you want those files merged
and sent downstream, a max bin age might work for you, though.  There is a
JIRA for implementing a more general-case aggregation processor [1].

With some more details around your scenario we might be able to figure out
how to get it to work for you with the standard processors.

[1] https://issues.apache.org/jira/browse/NIFI-1926

On Mon, Dec 19, 2016 at 10:23 AM Raf Huys  wrote:

> I want to batch incoming flowfiles based on an attribute. As soon as this
> attributes' value changes, the current batch should be transferred
> downstream and be reset. So basically I'm looking for a tumbling window.
>
> Can this be done with the MergeContent processor (which strategy?) or
> should I write my own processor?
>
>
> --
> tx
>
>


merge flowfiles

2016-12-19 Thread Raf Huys
I want to batch incoming flowfiles based on an attribute. As soon as this
attributes' value changes, the current batch should be transferred
downstream and be reset. So basically I'm looking for a tumbling window.

Can this be done with the MergeContent processor (which strategy?) or
should I write my own processor?


-- 
tx