Hi there, Considering implementing a lambda architecture using NiFi, where as usual, one data path goes into HDFS while another data path goes into Spark/Flink/whatever, however, before I get to streaming section of the pipeline I want to plan a decent file saving strategy to use.
I've noticed the filename property for PutHDFS isn't exposed via UI, however as very well documented in (here | https://kisstechdocs.wordpress.com/2015/01/15/creating-a-limited-failure-loop-in-nifi/) I can change the attribute using different processors (e.g. UpdateAttribute) prior to reaching the PutHDFS processor. This suggests me that I could for example have a pipeline that looks pretty much like: 1. ListenHTTP => captures attribute LogSrc from HTTP request header LogSrc 2. MergeContent => where Correlation Attribute Name = LogSrc / Attribute Strategy = Keep Only Common Attributes 3. UpdateAttribute => Updates $filename so that it is now data-${now():format('yyyyMMdd')}.log (e.g. data-20151019.log ) 4. PutHDFS => Directory = /${LogSrc}/${now():format('yyyy/MM/dd')} (e.g. /mydevice/2015/10/19/) This, I believe, would result on a file named /mydevice/2015/10/19/data-20151019.log Now the question: I know I could skip step 3 had I accepted the idea of NiFi determined filenames but I wonder is this the best way of achieving the file naming defined above? On a side note: Could anyone point me to the section of the code that defines the current naming convention? :-) I thank you in advance
