RE: DF design question
Hello. I would use an evaluateJson processor to update an attribute rather than parse via a regexp the entire flowfile. Then, I would the routeOnAttribute processor which creates one route per value of the attribute, plus one for the "unmatched" attribute. BTW, you can reach me, by phone or mail or communicator), just search my name in the intranet. Regards. Aurélien DEHAY De : philippe.gib...@orange.comEnvoyé : lundi 18 avril 2016 16:51 À : users@nifi.apache.org Objet : DF design question Hello I have this simple Use Case to implement ( but it's not so clear for me which processors to put in the chain :)) : I have JSON file with records identified by one type property {.. "type": " smartphone"}, { . "type" : "PC" } , { "type": "tablet"} . I want to route records based on the "type" property to different sink destinations . Looking to routeText or routeContent procs , seems to be the right direction but I do not see how to route to multiple sinks ( 3 in my example ) : I want records of "type": "smartphone" to be route to one sink ( first ElasticSearch processor with index1) , "type": "PC" on another sink ( 2nd ES processor) , and "type": "tablet' to a third ( 3rd ES processor) A kind of demultiplexer to N sinks Is it the right design (and processors ) to implement this DF , please? :) Phil
DF design question
Hello I have this simple Use Case to implement ( but it's not so clear for me which processors to put in the chain :)) : I have JSON file with records identified by one type property {.. "type": " smartphone"}, { . "type" : "PC" } , { "type": "tablet"} . I want to route records based on the "type" property to different sink destinations . Looking to routeText or routeContent procs , seems to be the right direction but I do not see how to route to multiple sinks ( 3 in my example ) : I want records of "type": "smartphone" to be route to one sink ( first ElasticSearch processor with index1) , "type": "PC" on another sink ( 2nd ES processor) , and "type": "tablet' to a third ( 3rd ES processor) A kind of demultiplexer to N sinks Is it the right design (and processors ) to implement this DF , please? :) Phil _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: howto dynamically change the PutHDFS target directory
Awesome! thanks for the heads up..I'll give that a try. Mike On 18 April 2016 at 15:02, Bryan Bendewrote: > Mike, > > If I am understanding correctly I think this can be done today... The > Directory property on PutHDFS supports expression language, so you could > set it to a value like: > > /data/${now():format('dd-MM-yy')}/ > > This could be set directly in PutHDFS, although it is also a common > pattern to stick an UpdateAttribute processor in front of PutHDFS and set > filename and hadoop.dir attributes, and then in PutHDFS reference those as > ${filename} and ${hadoop.dir} > > The advantage to the UpdateAttribute approach is that you can have a > single PutHDFS processor that actually writes to many different locations. > > Hope that helps. > > -Bryan > > > On Mon, Apr 18, 2016 at 2:53 PM, Oleg Zhurakousky < > ozhurakou...@hortonworks.com> wrote: > >> Mike >> >> Indeed a very common requirement and we should support it. >> Would you mind raising a JIRA for it? >> https://issues.apache.org/jira/browse/NIFI >> >> Cheers >> Oleg >> >> On Apr 18, 2016, at 9:50 AM, Mike Harding wrote: >> >> Hi All, >> >> I have a requirement to write a data stream into HDFS, where the >> flowfiles received per day are group into a directory. e.g. so I would end >> up with a folder structure as follows: >> >> data/18-04-16 >> data/19-04-16 >> data/20-04-16 ... etc >> >> Currently I can specify in the config for the putHDFS processor a target >> directory but I want this to change and point to a new directory as each >> day ends. >> >> So using nifi id like to 1) be able to create new directories in HDFS >> (although I could potentially write a bash script to do the directory >> creation) and 2) change the target directory as the day changes. >> >> Any help much appreciated, >> >> Mike >> >> >> >
Re: howto dynamically change the PutHDFS target directory
Mike, If I am understanding correctly I think this can be done today... The Directory property on PutHDFS supports expression language, so you could set it to a value like: /data/${now():format('dd-MM-yy')}/ This could be set directly in PutHDFS, although it is also a common pattern to stick an UpdateAttribute processor in front of PutHDFS and set filename and hadoop.dir attributes, and then in PutHDFS reference those as ${filename} and ${hadoop.dir} The advantage to the UpdateAttribute approach is that you can have a single PutHDFS processor that actually writes to many different locations. Hope that helps. -Bryan On Mon, Apr 18, 2016 at 2:53 PM, Oleg Zhurakousky < ozhurakou...@hortonworks.com> wrote: > Mike > > Indeed a very common requirement and we should support it. > Would you mind raising a JIRA for it? > https://issues.apache.org/jira/browse/NIFI > > Cheers > Oleg > > On Apr 18, 2016, at 9:50 AM, Mike Hardingwrote: > > Hi All, > > I have a requirement to write a data stream into HDFS, where the flowfiles > received per day are group into a directory. e.g. so I would end up with a > folder structure as follows: > > data/18-04-16 > data/19-04-16 > data/20-04-16 ... etc > > Currently I can specify in the config for the putHDFS processor a target > directory but I want this to change and point to a new directory as each > day ends. > > So using nifi id like to 1) be able to create new directories in HDFS > (although I could potentially write a bash script to do the directory > creation) and 2) change the target directory as the day changes. > > Any help much appreciated, > > Mike > > >
howto dynamically change the PutHDFS target directory
Hi All, I have a requirement to write a data stream into HDFS, where the flowfiles received per day are group into a directory. e.g. so I would end up with a folder structure as follows: data/18-04-16 data/19-04-16 data/20-04-16 ... etc Currently I can specify in the config for the putHDFS processor a target directory but I want this to change and point to a new directory as each day ends. So using nifi id like to 1) be able to create new directories in HDFS (although I could potentially write a bash script to do the directory creation) and 2) change the target directory as the day changes. Any help much appreciated, Mike
[ANNOUNCE] Apache NiFi 0.6.1 release
Hello The Apache NiFi team would like to announce the release of Apache NiFi 0.6.1. Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic. More details on Apache NiFi can be found here: http://nifi.apache.org/ The release artifacts can be downloaded from here: http://nifi.apache.org/download.html Maven artifacts have been made available here: https://repository.apache.org/content/repositories/releases/org/apache/nifi/ Release note highlights can be found here: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.6.1 Thank you The Apache NiFi team