RE: DF design question

2016-04-18 Thread aurelien.de...@gmail.com

Hello.


I would use an evaluateJson processor to update an attribute rather than parse 
via a regexp the entire flowfile.


Then, I would the routeOnAttribute processor which creates one route per value 
of the attribute, plus one for the "unmatched" attribute.


BTW, you can reach me, by phone or mail or communicator), just search my name 
in the intranet.


Regards.

Aurélien DEHAY




De : philippe.gib...@orange.com 
Envoyé : lundi 18 avril 2016 16:51
À : users@nifi.apache.org
Objet : DF design question


Hello

I have this simple Use Case to implement ( but it's  not so clear for me  which 
processors to put in the chain :)) :



I have JSON file with records identified  by  one type property   {..  
"type": " smartphone"},  { . "type" :  "PC" }  , {   
"type": "tablet"} .

I want to  route records based on the "type" property to different  sink 
destinations .



Looking to routeText or routeContent procs  , seems to be the right direction  
but I do not see how to  route to multiple sinks  ( 3 in my example ) :

I want  records of "type": "smartphone" to be route to one sink ( first 
ElasticSearch processor with  index1)  , "type": "PC"  on another  sink ( 2nd 
ES processor)  , and "type": "tablet' to a third ( 3rd ES processor)

A kind of demultiplexer to N sinks 



Is it the right design (and  processors ) to implement this DF , please? :)





Phil






Dataflow architecture for multiple sources

2016-03-22 Thread aurelien.de...@gmail.com
Hello.


I've to make an architecture based on nifi to collect & route data from sources 
to some hadoop/ES cluster.


Sources will have different constraints (from 50msg/s to hundred of thousand, 
not the same latency necessities, not the same protocol, etc.).


I wonder if we should make a processor per data source (e.g. one port for each 
data source), or I can send to a processor per protocol, and route based on 
some attribute afterwards.


I we use a single entry processor per protocol for all data sources, won't 
there be risks on the shared queue, in case of data storm for example?


thanks for any pointer / answer.


Aurélien.