RE: DF design question

2016-04-18 Thread aurelien.de...@gmail.com

Hello.


I would use an evaluateJson processor to update an attribute rather than parse 
via a regexp the entire flowfile.


Then, I would the routeOnAttribute processor which creates one route per value 
of the attribute, plus one for the "unmatched" attribute.


BTW, you can reach me, by phone or mail or communicator), just search my name 
in the intranet.


Regards.

Aurélien DEHAY




De : philippe.gib...@orange.com 
Envoyé : lundi 18 avril 2016 16:51
À : users@nifi.apache.org
Objet : DF design question


Hello

I have this simple Use Case to implement ( but it's  not so clear for me  which 
processors to put in the chain :)) :



I have JSON file with records identified  by  one type property   {..  
"type": " smartphone"},  { . "type" :  "PC" }  , {   
"type": "tablet"} .

I want to  route records based on the "type" property to different  sink 
destinations .



Looking to routeText or routeContent procs  , seems to be the right direction  
but I do not see how to  route to multiple sinks  ( 3 in my example ) :

I want  records of "type": "smartphone" to be route to one sink ( first 
ElasticSearch processor with  index1)  , "type": "PC"  on another  sink ( 2nd 
ES processor)  , and "type": "tablet' to a third ( 3rd ES processor)

A kind of demultiplexer to N sinks 



Is it the right design (and  processors ) to implement this DF , please? :)





Phil






DF design question

2016-04-18 Thread philippe.gibert
Hello
I have this simple Use Case to implement ( but it's  not so clear for me  which 
processors to put in the chain :)) :

I have JSON file with records identified  by  one type property   {..  
"type": " smartphone"},  { . "type" :  "PC" }  , {   
"type": "tablet"} .
I want to  route records based on the "type" property to different  sink 
destinations .

Looking to routeText or routeContent procs  , seems to be the right direction  
but I do not see how to  route to multiple sinks  ( 3 in my example ) :
I want  records of "type": "smartphone" to be route to one sink ( first 
ElasticSearch processor with  index1)  , "type": "PC"  on another  sink ( 2nd 
ES processor)  , and "type": "tablet' to a third ( 3rd ES processor)
A kind of demultiplexer to N sinks 

Is it the right design (and  processors ) to implement this DF , please? :)


Phil



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Re: howto dynamically change the PutHDFS target directory

2016-04-18 Thread Mike Harding
Awesome! thanks for the heads up..I'll give that a try.

Mike

On 18 April 2016 at 15:02, Bryan Bende  wrote:

> Mike,
>
> If I am understanding correctly I think this can be done today... The
> Directory property on PutHDFS supports expression language, so you could
> set it to a value like:
>
> /data/${now():format('dd-MM-yy')}/
>
> This could be set directly in PutHDFS, although it is also a common
> pattern to stick an UpdateAttribute processor in front of PutHDFS and set
> filename and hadoop.dir attributes, and then in PutHDFS reference those as
> ${filename} and ${hadoop.dir}
>
> The advantage to the UpdateAttribute approach is that you can have a
> single PutHDFS processor that actually writes to many different locations.
>
> Hope that helps.
>
> -Bryan
>
>
> On Mon, Apr 18, 2016 at 2:53 PM, Oleg Zhurakousky <
> ozhurakou...@hortonworks.com> wrote:
>
>> Mike
>>
>> Indeed a very common requirement and we should support it.
>> Would you mind raising a JIRA for it?
>> https://issues.apache.org/jira/browse/NIFI
>>
>> Cheers
>> Oleg
>>
>> On Apr 18, 2016, at 9:50 AM, Mike Harding  wrote:
>>
>> Hi All,
>>
>> I have a requirement to write a data stream into HDFS, where the
>> flowfiles received per day are group into a directory. e.g. so I would end
>> up with a folder structure as follows:
>>
>> data/18-04-16
>> data/19-04-16
>> data/20-04-16 ... etc
>>
>> Currently I can specify in the config for the putHDFS processor a target
>> directory but I want this to change and point to a new directory as each
>> day ends.
>>
>> So using nifi id like to 1) be able to create new directories in HDFS
>> (although I could potentially write a bash script to do the directory
>> creation) and 2) change the target directory as the day changes.
>>
>> Any help much appreciated,
>>
>> Mike
>>
>>
>>
>


Re: howto dynamically change the PutHDFS target directory

2016-04-18 Thread Bryan Bende
Mike,

If I am understanding correctly I think this can be done today... The
Directory property on PutHDFS supports expression language, so you could
set it to a value like:

/data/${now():format('dd-MM-yy')}/

This could be set directly in PutHDFS, although it is also a common pattern
to stick an UpdateAttribute processor in front of PutHDFS and set filename
and hadoop.dir attributes, and then in PutHDFS reference those as
${filename} and ${hadoop.dir}

The advantage to the UpdateAttribute approach is that you can have a single
PutHDFS processor that actually writes to many different locations.

Hope that helps.

-Bryan


On Mon, Apr 18, 2016 at 2:53 PM, Oleg Zhurakousky <
ozhurakou...@hortonworks.com> wrote:

> Mike
>
> Indeed a very common requirement and we should support it.
> Would you mind raising a JIRA for it?
> https://issues.apache.org/jira/browse/NIFI
>
> Cheers
> Oleg
>
> On Apr 18, 2016, at 9:50 AM, Mike Harding  wrote:
>
> Hi All,
>
> I have a requirement to write a data stream into HDFS, where the flowfiles
> received per day are group into a directory. e.g. so I would end up with a
> folder structure as follows:
>
> data/18-04-16
> data/19-04-16
> data/20-04-16 ... etc
>
> Currently I can specify in the config for the putHDFS processor a target
> directory but I want this to change and point to a new directory as each
> day ends.
>
> So using nifi id like to 1) be able to create new directories in HDFS
> (although I could potentially write a bash script to do the directory
> creation) and 2) change the target directory as the day changes.
>
> Any help much appreciated,
>
> Mike
>
>
>


howto dynamically change the PutHDFS target directory

2016-04-18 Thread Mike Harding
Hi All,

I have a requirement to write a data stream into HDFS, where the flowfiles
received per day are group into a directory. e.g. so I would end up with a
folder structure as follows:

data/18-04-16
data/19-04-16
data/20-04-16 ... etc

Currently I can specify in the config for the putHDFS processor a target
directory but I want this to change and point to a new directory as each
day ends.

So using nifi id like to 1) be able to create new directories in HDFS
(although I could potentially write a bash script to do the directory
creation) and 2) change the target directory as the day changes.

Any help much appreciated,

Mike


[ANNOUNCE] Apache NiFi 0.6.1 release

2016-04-18 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 0.6.1.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute data.  Apache NiFi was made for dataflow.  It
supports highly configurable directed graphs of data routing,
transformation, and system mediation logic.

More details on Apache NiFi can be found here:
  http://nifi.apache.org/

The release artifacts can be downloaded from here:
  http://nifi.apache.org/download.html

Maven artifacts have been made available here:
  https://repository.apache.org/content/repositories/releases/org/apache/nifi/

Release note highlights can be found here:
  
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.6.1

Thank you
The Apache NiFi team