Re: Camel + Drill + Parquet

Omar Al-Safi Thu, 13 Feb 2020 00:51:12 -0800

No worries Ron! And enjoy the ride :)

Regards,
Omar


On Thu, Feb 13, 2020 at 6:39 AM Ron Cecchini <roncecch...@comcast.net>
wrote:

> Thanks, Omar.
>
> As it turns out, Parquet is not the way to go since it looks like it is
> geared more toward data warehousing, whereas I need to persist streaming
> data - and from what I can gather, I would need the overhead of Spark or
> Hive to accomplish that with Parquet (appending to a growing Parquet file).
>
> *However*, it looks like Apache Kudu is exactly what we need.  And not
> only does Camel already provide a Kudu component, as coincidence would have
> it it looks like you co-authored.  Awesome!
>
> Moreover, Kudu takes just a Map as input, and not an Avro formatted
> message or whatever like Parquet.  So migrating this Kafka->Mongo route to
> Kafka->Kudu is almost trivial.
>
> Anyway, time to bump up my Camel version to 3.0.1 and give Kudu a whirl...
>
> Thanks again.
>
> > On February 12, 2020 at 4:33 AM Omar Al-Safi <o...@oalsafi.com> wrote:
> >
> >
> > Hi Ron,
> >
> > By reading some introduction in Apache Drill, I'd say the file component
> > would be more suitable to write parquet files.
> > In regards to parquet and Camel, we don't have an example for it but the
> > way I see it, you are heading into the right direction by creating a
> > processor to convert the data to parquet format.
> > However, we do have an open feature request
> > <https://issues.apache.org/jira/browse/CAMEL-13573> to add parquet data
> > format, we would love to see some contributions to add this to Camel :) .
> >
> > Regards,
> > Omar
> >
> >
> > On Tue, Feb 11, 2020 at 11:37 PM Ron Cecchini <roncecch...@comcast.net>
> > wrote:
> >
> > > Hi, all.  I'm just looking for quick guidance or confirmation that I'm
> > > going in the right direction here:
> > >
> > > - There's a small Kotlin service that uses Camel to read from Kafka and
> > > write to Mongo.
> > > - I need to replace Mongo with Apache Drill and write Parquet files to
> the
> > > file system.
> > >   (I know nothing about Parquet but I know a little bit about Drill.)
> > >
> > > - This service isn't used to do any queries, it's just for persisting
> data.
> > >   So, given that, and the fact that Drill is just a query engine, I
> really
> > > can't use the "Drill" component for anything.
> > >
> > > - But there is that "HDFS" component that I think I can use?
> > >   Or maybe the "File" component is better here?
> > >
> > > So my thinking is that I just need to:
> > >
> > > 1. write a Processor to transform the JSON data into Parquet
> > >    (and keep in mind that I know nothing about Parquet...)
> > >
> > > 2. use the HDFS (or File) component to write it to a file
> > >    (I think there's some Parquet set up to do (?) outside the scope of
> > > this service, but that's another matter...)
> > >
> > > Seems pretty straight-forward.  Does that sound reasonable?
> > >
> > > Are there any Camel examples I can look at?  The Google machine seems
> to
> > > not find anything related to Camel and Parquet...
> > >
> > > Thank you so much!
> > >
> > > Ron
> > >
>

Re: Camel + Drill + Parquet

Reply via email to