I've been doing a lot of this recently into both hive and spark.

One thing that will make life a lot easier is to use the JSON record file 
format, this is essentially just a JSON document per line of a text file, which 
means you can use nifi's MergeContent processor to handle batching into hdfs. 
Avro also makes a lot of sense and can be generated directly out of nifi.

Simon

-
Simon Elliston Ball
Product Solutions Architect
+44 7930 424111<tel:+44%207930%20424111>
Hortonworks - Powering the future of data


On 2 Mar 2016, at 11:33, Mike Harding 
<[email protected]<mailto:[email protected]>> wrote:

Hi All,

I currently have a small hadoop cluster running with HDFS and Hive. My ultimate 
goal is to leverage NiFi's ingestion and flow capabilities to store real-time 
external JSON formatted event data.

What I am unclear about is what the best strategy/design is for storing 
FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
accessed and analysed in Hive tables.

Is much of the design in terms of storage handled in the NiFi flow or do I need 
to set something up external of NiFi to ensure I can query each JSON formatted 
event as a record in a Hive log table for example?

Any examples or suggestions much appreciated,

Thanks,
M

Reply via email to