I used the ConvertJsonToAvro and PutHDFS processors to land files into a Hive warehouse . Once you get the AVRO schema right it's easy. Look at the avro-tools jar file to help with the schema.
Chris On Wed, Mar 2, 2016, 4:59 AM Conrad Crampton <[email protected]> wrote: > Hi, > I have similar specifications about SQL access – those specifying this > keep saying Hive, but I don’t believe that is the requirement (typical > developer knowing best eh?) - I think it is just SQL access that is > required. Drill is more flexible (in my opinion – I am not affiliated to > Drill in any way) and has drivers for tooling access too (in a similar way > Hive has). There is Spark support for Avro too. > I’ll be interested to follow your progress on this. > Conrad > > From: Mike Harding <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, 2 March 2016 at 10:54 > To: "[email protected]" <[email protected]> > Subject: Re: Nifi JSON event storage in HDFS > > Hi Conrad, > > Thanks for the heads up, I will investigate Apache Drill. I also forgot to > mention that I have downstream requirements about which tools the data > modellers are comfortable using - they want to use Hive and Spark as the > data access engines primarily so the data needs to be persisted in HDFS in > a way that it can be easily accessed by these services. > > But your right - there is multiple ways of doing this and I'm hoping NiFi > would help scope/simplify the pipeline design. > > Cheers, > M > > On 2 March 2016 at 10:38, Conrad Crampton <[email protected]> > wrote: > >> Hi, >> I am doing something similar, but having wrestled with Hive data >> population (not from NiFi) and its performance I am currently looking at >> Apache Drill as my SQL abstraction layer over my Hadoop cluster (similar >> size to yours). To this end, I have chosen Avro as my ‘persistence’ format >> and using a number of processors to get from raw data though mapping >> attributes to json to avro (via schemas) and ultimately storing in HDFS. >> Querying this with Drill is a breeze then as the schema is already >> specified within the data which Drill understands. The schema can also be >> extended without impacting existing data too. >> HTH – I’m sure there are a ton of other ways to skin this particular cat >> though, >> Conrad >> >> From: Mike Harding <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Wednesday, 2 March 2016 at 10:33 >> To: "[email protected]" <[email protected]> >> Subject: Nifi JSON event storage in HDFS >> >> Hi All, >> >> I currently have a small hadoop cluster running with HDFS and Hive. My >> ultimate goal is to leverage NiFi's ingestion and flow capabilities to >> store real-time external JSON formatted event data. >> >> What I am unclear about is what the best strategy/design is for storing >> FlowFile data (i.e. JSON events in my case) within HDFS that can then be >> accessed and analysed in Hive tables. >> >> Is much of the design in terms of storage handled in the NiFi flow or do >> I need to set something up external of NiFi to ensure I can query each JSON >> formatted event as a record in a Hive log table for example? >> >> Any examples or suggestions much appreciated, >> >> Thanks, >> M >> >> >> ***This email originated outside SecureData*** >> >> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to >> report this email as spam. >> >> >> SecureData, combating cyber threats >> >> ------------------------------ >> >> The information contained in this message or any of its attachments may >> be privileged and confidential and intended for the exclusive use of the >> intended recipient. If you are not the intended recipient any disclosure, >> reproduction, distribution or other dissemination or use of this >> communications is strictly prohibited. The views expressed in this email >> are those of the individual and not necessarily of SecureData Europe Ltd. >> Any prices quoted are only valid if followed up by a formal written quote. >> >> SecureData Europe Limited. Registered in England & Wales 04365896. >> Registered Address: SecureData House, Hermitage Court, Hermitage Lane, >> Maidstone, Kent, ME16 9NT >> > >
