[DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

2019-09-13 Thread Taher Koitawala
Hi All, Currently, we are trying to pull data incrementally from our RDBMS sources, however the way we are doing this is with HUDI is to create a spark table on top of the JDBC source using [1] which writes raw data to an HDFS dir. We then use DeltaStreamer dfs-source to write that to a

[VOTE] Release 0.5.0-incubating, release candidate #1

2019-09-13 Thread vbal...@apache.org
Hi everyone, We have prepared the first apache release candidate for Apache Hudi (incubating). The version is : 0.5.0-incubating-rc1. Please review and vote on the release candidate #1 for the version 0.5.0, as follows:[ ] +1, Approve the release [ ] -1, Do not approve the release (please

Merging schema's during Incremental load

2019-09-13 Thread Gautam Nayak
Hi, We have been evaluating Hudi and there is one use case we are trying to solve, where incremental datasets can have fewer columns than the ones that have been already persisted in Hudi format. For example : In initial batch , We have a total of 4 columns val initial = Seq(("id1", "col1",

Re: ApacheCon NA 19 slides

2019-09-13 Thread Thomas Weise
Thanks for sharing. The presentation provides a nice intro into Hudi that will help making new folks more curious. It explains the journey and lessons learned along the path, which is equally valuable and no doubt will resonate with the many folks that had to solve similar problems. Unfortunately

Re: ApacheCon NA 19 slides

2019-09-13 Thread Vinoth Chandar
Will do over the weekend! On Wed, Sep 11, 2019 at 5:59 PM vino yang wrote: > Hi Vinoth, > > Thanks for sharing the slides of the talk. > > and +1 to leesf's suggestion > > Best, > Vino > > vbal...@apache.org 于2019年9月12日周四 上午12:42写道: > > > > > Thanks guys. The talk was primarily focussed on a

Re: [BUG] Exception when running HoodieDeltaStreamer

2019-09-13 Thread vbal...@apache.org
Hi Pratyaksh, For boolean flags, you don't need to pass true or false. It is implicit. Just pass "--enable-hive-sync" without additional true/false in the command line. Balaji.VOn Friday, September 13, 2019, 03:06:38 AM PDT, Pratyaksh Sharma wrote: Hi, I am trying to run

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

2019-09-13 Thread Balaji Varadarajan
Hi Pratyaksh, This is expected. You need to pass a schema-provider since you are using Avro Sources.For RowBased sources, DeltaStreamer can deduce schema from Row type information available from Spark Dataset. Balaji.V On Friday, September 13, 2019, 02:57:37 AM PDT, Pratyaksh Sharma

[BUG] Exception when running HoodieDeltaStreamer

2019-09-13 Thread Pratyaksh Sharma
Hi, I am trying to run HoodieDeltaStreamer and am working on tag hoodie-0.4.7. I am using spark version 2.3.2. I was trying to enable hive sync along with running HoodieDeltaStreamer by passing the flag --enable-hive-sync as true. Here is the command I used - spark-submit --master local[1]

[BUG] Null Pointer Exception in SourceFormatAdapter

2019-09-13 Thread Pratyaksh Sharma
Hi, I am trying to build a CDC pipeline using Hudi working on tag hoodie-0.4.7. Here is the command I used for running DeltaStreamer - spark-submit --files jaas.conf --conf 'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' --conf