Steve, We have dev and prod hdfs/NiFi/kafka/etc.
For directory stuff in HDFS from NiFi, I feed data in raw based on the kafka topic and data type. I keep my directories in HDFS bucketed on static/feed then topic/feature then timeframe/granularity then if necessary type. e.g. static/mailer/201701/parquet Your directory structure in HDFS will be heavily dependent on your use case and the needs of the analytics/batch processes/whatever you plan to do with it. I recommend researching it and getting to know the use case more. You can always move stuff around if you need to change. [image: Payoff, Inc.] *Jeremy Farbota* Software Engineer, Data Payoff, Inc. [email protected] (217) 898-8110 <+2178988110> On Thu, Aug 17, 2017 at 8:48 AM, Steve Champagne <[email protected]> wrote: > Hello, > > How would I handle environment separation in HDFS? My initial thought was > to use a directory structure like /data/<env>/<table-user>/<table-name>, > but I'm running into problems with reading the files back out of HDFS (for > example merging small files into larger files). For the ListHDFS processor, > it doesn't allow input connections, so I can't specify the environment with > an attribute. Would something like this require me to use two instances of > NiFi and some sort of environment system variable lookup in EL? Is it even > common practice to encode the environment information in the directory > structure, or do people generally have an HDFS instance per environment > instead? Sorry if this question sort of extends outside of the scope of > NiFi. > > Thanks! >
