Hi Conrad,

Thanks for the heads up, I will investigate Apache Drill. I also forgot to
mention that I have downstream requirements about which tools the data
modellers are comfortable using - they want to use Hive and Spark as the
data access engines primarily so the data needs to be persisted in HDFS in
a way that it can be easily accessed by these services.

But your right - there is multiple ways of doing this and I'm hoping NiFi
would help scope/simplify the pipeline design.

Cheers,
M

On 2 March 2016 at 10:38, Conrad Crampton <[email protected]>
wrote:

> Hi,
> I am doing something similar, but having wrestled with Hive data
> population (not from NiFi) and its performance I am currently looking at
> Apache Drill as my SQL abstraction layer over my Hadoop cluster (similar
> size to yours). To this end, I have chosen Avro as my ‘persistence’ format
> and using a number of processors to get from raw data though mapping
> attributes to json to avro (via schemas) and ultimately storing in HDFS.
> Querying this with Drill is a breeze then as the schema is already
> specified within the data which Drill understands. The schema can also be
> extended without impacting existing data too.
> HTH – I’m sure there are a ton of other ways to skin this particular cat
> though,
> Conrad
>
> From: Mike Harding <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Wednesday, 2 March 2016 at 10:33
> To: "[email protected]" <[email protected]>
> Subject: Nifi JSON event storage in HDFS
>
> Hi All,
>
> I currently have a small hadoop cluster running with HDFS and Hive. My
> ultimate goal is to leverage NiFi's ingestion and flow capabilities to
> store real-time external JSON formatted event data.
>
> What I am unclear about is what the best strategy/design is for storing
> FlowFile data (i.e. JSON events in my case) within HDFS that can then be
> accessed and analysed in Hive tables.
>
> Is much of the design in terms of storage handled in the NiFi flow or do I
> need to set something up external of NiFi to ensure I can query each JSON
> formatted event as a record in a Hive log table for example?
>
> Any examples or suggestions much appreciated,
>
> Thanks,
> M
>
>
> ***This email originated outside SecureData***
>
> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> ------------------------------
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>

Reply via email to