No worries Ron! And enjoy the ride :) Regards, Omar
On Thu, Feb 13, 2020 at 6:39 AM Ron Cecchini <roncecch...@comcast.net> wrote: > Thanks, Omar. > > As it turns out, Parquet is not the way to go since it looks like it is > geared more toward data warehousing, whereas I need to persist streaming > data - and from what I can gather, I would need the overhead of Spark or > Hive to accomplish that with Parquet (appending to a growing Parquet file). > > *However*, it looks like Apache Kudu is exactly what we need. And not > only does Camel already provide a Kudu component, as coincidence would have > it it looks like you co-authored. Awesome! > > Moreover, Kudu takes just a Map as input, and not an Avro formatted > message or whatever like Parquet. So migrating this Kafka->Mongo route to > Kafka->Kudu is almost trivial. > > Anyway, time to bump up my Camel version to 3.0.1 and give Kudu a whirl... > > Thanks again. > > > On February 12, 2020 at 4:33 AM Omar Al-Safi <o...@oalsafi.com> wrote: > > > > > > Hi Ron, > > > > By reading some introduction in Apache Drill, I'd say the file component > > would be more suitable to write parquet files. > > In regards to parquet and Camel, we don't have an example for it but the > > way I see it, you are heading into the right direction by creating a > > processor to convert the data to parquet format. > > However, we do have an open feature request > > <https://issues.apache.org/jira/browse/CAMEL-13573> to add parquet data > > format, we would love to see some contributions to add this to Camel :) . > > > > Regards, > > Omar > > > > > > On Tue, Feb 11, 2020 at 11:37 PM Ron Cecchini <roncecch...@comcast.net> > > wrote: > > > > > Hi, all. I'm just looking for quick guidance or confirmation that I'm > > > going in the right direction here: > > > > > > - There's a small Kotlin service that uses Camel to read from Kafka and > > > write to Mongo. > > > - I need to replace Mongo with Apache Drill and write Parquet files to > the > > > file system. > > > (I know nothing about Parquet but I know a little bit about Drill.) > > > > > > - This service isn't used to do any queries, it's just for persisting > data. > > > So, given that, and the fact that Drill is just a query engine, I > really > > > can't use the "Drill" component for anything. > > > > > > - But there is that "HDFS" component that I think I can use? > > > Or maybe the "File" component is better here? > > > > > > So my thinking is that I just need to: > > > > > > 1. write a Processor to transform the JSON data into Parquet > > > (and keep in mind that I know nothing about Parquet...) > > > > > > 2. use the HDFS (or File) component to write it to a file > > > (I think there's some Parquet set up to do (?) outside the scope of > > > this service, but that's another matter...) > > > > > > Seems pretty straight-forward. Does that sound reasonable? > > > > > > Are there any Camel examples I can look at? The Google machine seems > to > > > not find anything related to Camel and Parquet... > > > > > > Thank you so much! > > > > > > Ron > > > >