Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
Hi Qian, You are right on the choice of tools for 2 and 3. But for 1, if you want to do a 1-time bulk load, you can look into options on the migration guide http://hudi.apache.org/migration_guide.html (HiveSyncTool is orthogonal to this, it simply registers a Hudi dataset to Hive metastore) On

Re: Using Hudi to Pull multiple tables

2019-10-02 Thread Vinoth Chandar
https://issues.apache.org/jira/browse/HUDI-288 tracks this On Tue, Oct 1, 2019 at 10:17 AM Vinoth Chandar wrote: > > I think this has come up before. > > +1 to the point pratyaksh mentioned. I would like to add a few more > > - Schema could be fetched dynamically from a registry based on

Re: How to deploy Hudi

2019-10-02 Thread Qian Wang
Hi Kabeer, I plan to do an incremental query PoC. My use case including: 1. Load one big Hive table located in HDFS to Hudi as a history table (I think should use HiveSyncTool) 2. Sink streaming data from Kafka to  Hudi as real time table(use HoodieDeltaStreamer?) 3. Join both of two table get

Re: Kafka read exception when using HoodieDeltaStreamer

2019-10-02 Thread Vinoth Chandar
Awesome! On Wed, Oct 2, 2019 at 3:01 PM Gautam Nayak wrote: > Thanks Vinoth for the tip,We were able to fix the issue as our spark > cluster(2.2.0) bundled both spark-streaming-kafka-0-8 and > spark-streaming-kafka-0-10 jars. Getting rid of spark-streaming-kafka-0-10 > jars from the cluster

Re: Kafka read exception when using HoodieDeltaStreamer

2019-10-02 Thread Gautam Nayak
Thanks Vinoth for the tip,We were able to fix the issue as our spark cluster(2.2.0) bundled both spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 jars. Getting rid of spark-streaming-kafka-0-10 jars from the cluster resolved the ClasscastException. On Oct 1, 2019, at 10:25 AM, Vinoth

Re: How to deploy Hudi

2019-10-02 Thread Kabeer Ahmed
Qian Welcome! Are you able to tell us a bit more about your use case? Eg: type of the project, industry, complexity of the pipeline that you plan to write (eg: pulling data from external APIs like New York taxi dataset and writing them into Hive for analysis) etc. This will give us a bit more

Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
edit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed? with the ? at the end On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar wrote: > Hi Qian, > > Welcome! Does >

Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
Hi Qian, Welcome! Does https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed? help ? On Wed, Oct 2, 2019 at 10:18 AM Qian Wang wrote: > Hi, > > I am new to Apache Hudi. Currently I am working on a PoC using Hudi and >

How to deploy Hudi

2019-10-02 Thread Qian Wang
Hi, I am new to Apache Hudi. Currently I am working on a PoC using Hudi and anyone can give me some documents what how to deploy Apache Hudi? Thanks. Best, Eric

Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-10-02 Thread Luciano Resende
This week I have limited internet access and would not be able to help much. On Wed, Oct 2, 2019 at 13:26 Thomas Weise wrote: > I looked at the PR and I see a disturbing number of LICENSE file > repetitions in it. There should be no need for that as LICENSE can be > included automatically by

Re: [DISCUSS] Decouple Hudi and Spark (in wiki design page)

2019-10-02 Thread Vinoth Chandar
Based on some conversations I had with Flink folks including Hudi's very own mentor Thomas, it seems future proof to look into supporting the Flink streaming APIs. The batch apis IIUC will move towards converging with Streaming APIs, which matches Hudi's model anyway >From Hudi's perspective,