I am exploring to use kite processor to store data into Hadoop. I hope this 
lets me change storage engine form hdfs to hive to hbase later. Since my Hadoop 
distribution is MapR, I didn't have full success yet.
Sumo
 

Sent from my iPhone

> On Mar 2, 2016, at 2:54 AM, Mike Harding <[email protected]> wrote:
> 
> Hi Conrad,
> 
> Thanks for the heads up, I will investigate Apache Drill. I also forgot to 
> mention that I have downstream requirements about which tools the data 
> modellers are comfortable using - they want to use Hive and Spark as the data 
> access engines primarily so the data needs to be persisted in HDFS in a way 
> that it can be easily accessed by these services.
> 
> But your right - there is multiple ways of doing this and I'm hoping NiFi 
> would help scope/simplify the pipeline design.
> 
> Cheers,
> M
> 
>> On 2 March 2016 at 10:38, Conrad Crampton <[email protected]> 
>> wrote:
>> Hi,
>> I am doing something similar, but having wrestled with Hive data population 
>> (not from NiFi) and its performance I am currently looking at Apache Drill 
>> as my SQL abstraction layer over my Hadoop cluster (similar size to yours). 
>> To this end, I have chosen Avro as my ‘persistence’ format and using a 
>> number of processors to get from raw data though mapping attributes to json 
>> to avro (via schemas) and ultimately storing in HDFS. Querying this with 
>> Drill is a breeze then as the schema is already specified within the data 
>> which Drill understands. The schema can also be extended without impacting 
>> existing data too.
>> HTH – I’m sure there are a ton of other ways to skin this particular cat 
>> though,
>> Conrad
>> 
>> From: Mike Harding <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Wednesday, 2 March 2016 at 10:33
>> To: "[email protected]" <[email protected]>
>> Subject: Nifi JSON event storage in HDFS
>> 
>> Hi All,
>> 
>> I currently have a small hadoop cluster running with HDFS and Hive. My 
>> ultimate goal is to leverage NiFi's ingestion and flow capabilities to store 
>> real-time external JSON formatted event data.
>> 
>> What I am unclear about is what the best strategy/design is for storing 
>> FlowFile data (i.e. JSON events in my case) within HDFS that can then be 
>> accessed and analysed in Hive tables.
>> 
>> Is much of the design in terms of storage handled in the NiFi flow or do I 
>> need to set something up external of NiFi to ensure I can query each JSON 
>> formatted event as a record in a Hive log table for example?
>> 
>> Any examples or suggestions much appreciated,
>> 
>> Thanks,
>> M
>> 
>> 
>> ***This email originated outside SecureData***
>> 
>> Click here to report this email as spam.
>> 
>> 
>> 
>> SecureData, combating cyber threats
>> 
>> The information contained in this message or any of its attachments may be 
>> privileged and confidential and intended for the exclusive use of the 
>> intended recipient. If you are not the intended recipient any disclosure, 
>> reproduction, distribution or other dissemination or use of this 
>> communications is strictly prohibited. The views expressed in this email are 
>> those of the individual and not necessarily of SecureData Europe Ltd. Any 
>> prices quoted are only valid if followed up by a formal written quote.
>> 
>> SecureData Europe Limited. Registered in England & Wales 04365896. 
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane, 
>> Maidstone, Kent, ME16 9NT
>> 
> 

Reply via email to