For the specific issue of ListHDFS... if you are organizing all of your environments under /data, can you just set ListHDFS to list /data with recurse set to true?
Then you could probably use an UpdateAttribute somewhere after ListHDFS, or maybe after FetchHDFS, to parse out the environment from the path that was fetched, and then you could make routing decisions based on the environment, if necessary. Alternatively, if you have a fixed set of environments you could probably just have a ListHDFS per environment. On Thu, Aug 17, 2017 at 12:14 PM, Jeremy Farbota <[email protected]> wrote: > Steve, > > We have dev and prod hdfs/NiFi/kafka/etc. > > For directory stuff in HDFS from NiFi, I feed data in raw based on the > kafka topic and data type. I keep my directories in HDFS bucketed on > static/feed then topic/feature then timeframe/granularity then if necessary > type. > > e.g. static/mailer/201701/parquet > > > Your directory structure in HDFS will be heavily dependent on your use > case and the needs of the analytics/batch processes/whatever you plan to do > with it. I recommend researching it and getting to know the use case more. > You can always move stuff around if you need to change. > > [image: Payoff, Inc.] > *Jeremy Farbota* > Software Engineer, Data > Payoff, Inc. > > [email protected] > (217) 898-8110 <+2178988110> > > On Thu, Aug 17, 2017 at 8:48 AM, Steve Champagne <[email protected]> > wrote: > >> Hello, >> >> How would I handle environment separation in HDFS? My initial thought was >> to use a directory structure like /data/<env>/<table-user>/<table-name>, >> but I'm running into problems with reading the files back out of HDFS (for >> example merging small files into larger files). For the ListHDFS processor, >> it doesn't allow input connections, so I can't specify the environment with >> an attribute. Would something like this require me to use two instances of >> NiFi and some sort of environment system variable lookup in EL? Is it even >> common practice to encode the environment information in the directory >> structure, or do people generally have an HDFS instance per environment >> instead? Sorry if this question sort of extends outside of the scope of >> NiFi. >> >> Thanks! >> > >
