Re: Dataflow architecture for multiple sources

2016-03-25 Thread Joe Witt
Aurélien

As you progress here please let us know how it goes.  If you think
there is more the framework should do for such cases let's talk
through it.

Thanks
Joe

On Fri, Mar 25, 2016 at 4:42 AM, Aurélien DEHAY
<aurelien.de...@zorel.org> wrote:
>
> thanks for the answer.
>
>
> I understand that It's mostly up to me. I think I will go on one processor
> per source to avoid backpressure mesures to be taken for all the flows.
>
>
> Aurélien.
>
>
>
> 
> De : Andrew Grande <agra...@hortonworks.com>
> Envoyé : mardi 22 mars 2016 22:15
> À : users@nifi.apache.org
> Objet : Re: Dataflow architecture for multiple sources
>
> Aurélien,
>
> The choice of a multiplexing channel or multiple dedicated ones is really up
> to any constraints your environment may (not) have. E.g. if you are able to
> expose every port required for a socket-based protocol or no.
>
> On the NiFi side, take a close look at Backpressure here, it will take care
> of data storms:
>
> https://nifi.apache.org/docs/nifi-docs/html/getting-started.html
>
> Andrew
>
> From: Aurélien DEHAY <aurelien.de...@zorel.org> on behalf of
> "aurelien.de...@gmail.com" <aurelien.de...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, March 22, 2016 at 8:03 AM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Dataflow architecture for multiple sources
>
> Hello.
>
>
> I've to make an architecture based on nifi to collect & route data from
> sources to some hadoop/ES cluster.
>
>
> Sources will have different constraints (from 50msg/s to hundred of
> thousand, not the same latency necessities, not the same protocol, etc.).
>
>
> I wonder if we should make a processor per data source (e.g. one port for
> each data source), or I can send to a processor per protocol, and route
> based on some attribute afterwards.
>
>
> I we use a single entry processor per protocol for all data sources, won't
> there be risks on the shared queue, in case of data storm for example?
>
>
> thanks for any pointer / answer.
>
>
> Aurélien.


Re: Dataflow architecture for multiple sources

2016-03-22 Thread Andrew Grande
Aurélien,

The choice of a multiplexing channel or multiple dedicated ones is really up to 
any constraints your environment may (not) have. E.g. if you are able to expose 
every port required for a socket-based protocol or no.

On the NiFi side, take a close look at Backpressure here, it will take care of 
data storms:

https://nifi.apache.org/docs/nifi-docs/html/getting-started.html

Andrew

From: Aurélien DEHAY 
<aurelien.de...@zorel.org<mailto:aurelien.de...@zorel.org>> on behalf of 
"aurelien.de...@gmail.com<mailto:aurelien.de...@gmail.com>" 
<aurelien.de...@gmail.com<mailto:aurelien.de...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Tuesday, March 22, 2016 at 8:03 AM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Dataflow architecture for multiple sources


Hello.


I've to make an architecture based on nifi to collect & route data from sources 
to some hadoop/ES cluster.


Sources will have different constraints (from 50msg/s to hundred of thousand, 
not the same latency necessities, not the same protocol, etc.).


I wonder if we should make a processor per data source (e.g. one port for each 
data source), or I can send to a processor per protocol, and route based on 
some attribute afterwards.


I we use a single entry processor per protocol for all data sources, won't 
there be risks on the shared queue, in case of data storm for example?


thanks for any pointer / answer.


Aurélien.


Dataflow architecture for multiple sources

2016-03-22 Thread aurelien.de...@gmail.com
Hello.


I've to make an architecture based on nifi to collect & route data from sources 
to some hadoop/ES cluster.


Sources will have different constraints (from 50msg/s to hundred of thousand, 
not the same latency necessities, not the same protocol, etc.).


I wonder if we should make a processor per data source (e.g. one port for each 
data source), or I can send to a processor per protocol, and route based on 
some attribute afterwards.


I we use a single entry processor per protocol for all data sources, won't 
there be risks on the shared queue, in case of data storm for example?


thanks for any pointer / answer.


Aurélien.