Steve,

We have dev and prod hdfs/NiFi/kafka/etc.

For directory stuff in HDFS from NiFi, I feed data in raw based on the
kafka topic and data type. I keep my directories in HDFS bucketed on
static/feed then topic/feature then timeframe/granularity then if necessary
type.

e.g. static/mailer/201701/parquet


Your directory structure in HDFS will be heavily dependent on your use case
and the needs of the analytics/batch processes/whatever you plan to do with
it. I recommend researching it and getting to know the use case more. You
can always move stuff around if you need to change.

[image: Payoff, Inc.]
*Jeremy Farbota*
Software Engineer, Data
Payoff, Inc.

[email protected]
(217) 898-8110 <+2178988110>

On Thu, Aug 17, 2017 at 8:48 AM, Steve Champagne <[email protected]>
wrote:

> Hello,
>
> How would I handle environment separation in HDFS? My initial thought was
> to use a directory structure like /data/<env>/<table-user>/<table-name>,
> but I'm running into problems with reading the files back out of HDFS (for
> example merging small files into larger files). For the ListHDFS processor,
> it doesn't allow input connections, so I can't specify the environment with
> an attribute. Would something like this require me to use two instances of
> NiFi and some sort of environment system variable lookup in EL? Is it even
> common practice to encode the environment information in the directory
> structure, or do people generally have an HDFS instance per environment
> instead? Sorry if this question sort of extends outside of the scope of
> NiFi.
>
> Thanks!
>

Reply via email to