Re: HDFS Environments

Jeremy Farbota Thu, 17 Aug 2017 09:15:08 -0700

Steve,

We have dev and prod hdfs/NiFi/kafka/etc.

For directory stuff in HDFS from NiFi, I feed data in raw based on the
kafka topic and data type. I keep my directories in HDFS bucketed on
static/feed then topic/feature then timeframe/granularity then if necessary
type.

e.g. static/mailer/201701/parquet

Your directory structure in HDFS will be heavily dependent on your use case
and the needs of the analytics/batch processes/whatever you plan to do with
it. I recommend researching it and getting to know the use case more. You
can always move stuff around if you need to change.

[image: Payoff, Inc.]
*Jeremy Farbota*
Software Engineer, Data
Payoff, Inc.

[email protected]
(217) 898-8110 <+2178988110>

On Thu, Aug 17, 2017 at 8:48 AM, Steve Champagne <[email protected]>
wrote:

> Hello,
>
> How would I handle environment separation in HDFS? My initial thought was
> to use a directory structure like /data/<env>/<table-user>/<table-name>,
> but I'm running into problems with reading the files back out of HDFS (for
> example merging small files into larger files). For the ListHDFS processor,
> it doesn't allow input connections, so I can't specify the environment with
> an attribute. Would something like this require me to use two instances of
> NiFi and some sort of environment system variable lookup in EL? Is it even
> common practice to encode the environment information in the directory
> structure, or do people generally have an HDFS instance per environment
> instead? Sorry if this question sort of extends outside of the scope of
> NiFi.
>
> Thanks!
>

Re: HDFS Environments

Reply via email to