I used the ConvertJsonToAvro and PutHDFS processors to land files into a
Hive warehouse . Once you get the AVRO schema right it's easy.  Look at the
avro-tools jar file to help with the schema.

Chris

On Wed, Mar 2, 2016, 4:59 AM Conrad Crampton <[email protected]>
wrote:

> Hi,
> I have similar specifications about SQL access – those specifying this
> keep saying Hive, but I don’t believe that is the requirement (typical
> developer knowing best eh?) - I think it is just SQL access that is
> required. Drill is more flexible (in my opinion – I am not affiliated to
> Drill in any way) and has drivers for tooling access too (in a similar way
> Hive has). There is Spark support for Avro too.
> I’ll be interested to follow your progress on this.
> Conrad
>
> From: Mike Harding <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Wednesday, 2 March 2016 at 10:54
> To: "[email protected]" <[email protected]>
> Subject: Re: Nifi JSON event storage in HDFS
>
> Hi Conrad,
>
> Thanks for the heads up, I will investigate Apache Drill. I also forgot to
> mention that I have downstream requirements about which tools the data
> modellers are comfortable using - they want to use Hive and Spark as the
> data access engines primarily so the data needs to be persisted in HDFS in
> a way that it can be easily accessed by these services.
>
> But your right - there is multiple ways of doing this and I'm hoping NiFi
> would help scope/simplify the pipeline design.
>
> Cheers,
> M
>
> On 2 March 2016 at 10:38, Conrad Crampton <[email protected]>
> wrote:
>
>> Hi,
>> I am doing something similar, but having wrestled with Hive data
>> population (not from NiFi) and its performance I am currently looking at
>> Apache Drill as my SQL abstraction layer over my Hadoop cluster (similar
>> size to yours). To this end, I have chosen Avro as my ‘persistence’ format
>> and using a number of processors to get from raw data though mapping
>> attributes to json to avro (via schemas) and ultimately storing in HDFS.
>> Querying this with Drill is a breeze then as the schema is already
>> specified within the data which Drill understands. The schema can also be
>> extended without impacting existing data too.
>> HTH – I’m sure there are a ton of other ways to skin this particular cat
>> though,
>> Conrad
>>
>> From: Mike Harding <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Wednesday, 2 March 2016 at 10:33
>> To: "[email protected]" <[email protected]>
>> Subject: Nifi JSON event storage in HDFS
>>
>> Hi All,
>>
>> I currently have a small hadoop cluster running with HDFS and Hive. My
>> ultimate goal is to leverage NiFi's ingestion and flow capabilities to
>> store real-time external JSON formatted event data.
>>
>> What I am unclear about is what the best strategy/design is for storing
>> FlowFile data (i.e. JSON events in my case) within HDFS that can then be
>> accessed and analysed in Hive tables.
>>
>> Is much of the design in terms of storage handled in the NiFi flow or do
>> I need to set something up external of NiFi to ensure I can query each JSON
>> formatted event as a record in a Hive log table for example?
>>
>> Any examples or suggestions much appreciated,
>>
>> Thanks,
>> M
>>
>>
>> ***This email originated outside SecureData***
>>
>> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
>> report this email as spam.
>>
>>
>> SecureData, combating cyber threats
>>
>> ------------------------------
>>
>> The information contained in this message or any of its attachments may
>> be privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email
>> are those of the individual and not necessarily of SecureData Europe Ltd.
>> Any prices quoted are only valid if followed up by a formal written quote.
>>
>> SecureData Europe Limited. Registered in England & Wales 04365896.
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>> Maidstone, Kent, ME16 9NT
>>
>
>

Reply via email to