Re: Using hudi with pyspark

2019-09-11 Thread Vinoth Chandar
Awesome. Also you could try building off master 0.5.0-snapshot if you are having some trouble with the bundles. Greatly appreciate if you can share progress/feedback. On Wed, Sep 11, 2019 at 1:55 AM Rodrigo Dominguez wrote: > Hi Kabeer > > I was able to build a simple script on python, and

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Vinoth Chandar
+1 Always welcome new sources. Any takers for a PulsarSource in DeltaStreamer? On Tue, Sep 10, 2019 at 4:33 AM taher koitawala wrote: > Hi Vinoth, > Apache Pulsar is a pub/sub messaging system like Kafka, > however, it has a few more functions which makes it different like >

Re: Dropping support for Spark 2.2 and lower

2019-09-11 Thread Vinoth Chandar
Oh wow.. This is like the most popular vote by a long shot. :) Will move forward. Thanks all! On Tue, Sep 10, 2019 at 3:16 PM leesf wrote: > +1 > > Y. Ethan Guo 于2019年9月11日周三 上午3:06写道: > > > +1 we’re on Spark 2.4. > > > > On Tue, Sep 10, 2019 at 11:22 AM Minh Pham > wrote: > > > > > +1 we

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Vinoth Chandar
yes JIRA would be great to scope out the work. On Wed, Sep 11, 2019 at 10:00 PM Bhavani Sudha Saktheeswaran wrote: > +1 for integrating Apache Pulsar. > > On Wed, Sep 11, 2019 at 8:58 PM taher koitawala > wrote: > > > Should we file a jira? If everyone agrees? > > > > On Thu, Sep 12, 2019,

Re: Apache Pulsar component for Hudi

2019-09-11 Thread taher koitawala
Should we file a jira? If everyone agrees? On Thu, Sep 12, 2019, 6:30 AM vino yang wrote: > +1 to welcome Pulsar connector > > Vinoth Chandar 于2019年9月12日周四 上午6:57写道: > > > +1 Always welcome new sources. Any takers for a PulsarSource in > > DeltaStreamer? > > > > On Tue, Sep 10, 2019 at 4:33 AM

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Bhavani Sudha Saktheeswaran
+1 for integrating Apache Pulsar. On Wed, Sep 11, 2019 at 8:58 PM taher koitawala wrote: > Should we file a jira? If everyone agrees? > > On Thu, Sep 12, 2019, 6:30 AM vino yang wrote: > > > +1 to welcome Pulsar connector > > > > Vinoth Chandar 于2019年9月12日周四 上午6:57写道: > > > > > +1 Always

Re: ApacheCon NA 19 slides

2019-09-11 Thread vino yang
Hi Vinoth, Thanks for sharing the slides of the talk. and +1 to leesf's suggestion Best, Vino vbal...@apache.org 于2019年9月12日周四 上午12:42写道: > > Thanks guys. The talk was primarily focussed on a high level design around > building data lakes. Hence, we did not go too deep into lower level >

Re: Apache Pulsar component for Hudi

2019-09-11 Thread vino yang
+1 to welcome Pulsar connector Vinoth Chandar 于2019年9月12日周四 上午6:57写道: > +1 Always welcome new sources. Any takers for a PulsarSource in > DeltaStreamer? > > On Tue, Sep 10, 2019 at 4:33 AM taher koitawala > wrote: > > > Hi Vinoth, > > Apache Pulsar is a pub/sub messaging system

Re: ApacheCon NA 19 slides

2019-09-11 Thread taher koitawala
Hi Vinoth, Slides look amazing to me. However, shouldn't we give out some more clarity on Hoodie Index, Compactions and also how we can do UDFs when pulling data to Hudi? Other than that, the slides and explanation are great. Regards, Taher Koitawala On Wed, Sep 11, 2019 at 12:44

ApacheCon NA 19 slides

2019-09-11 Thread Vinoth Chandar
Hi all, You might have noticed reduced responses this week. Reason was that Balaji and I were prepping for our talk at ApacheCon. Shared the slides here https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM Thanks Vinoth

Re: Using hudi with pyspark

2019-09-11 Thread Rodrigo Dominguez
Hi Kabeer I was able to build a simple script on python, and submit it with: spark-submit --jars $HUDI_SRC/packaging/hoodie-spark-bundle/target/hoodie-spark-bundle-0.4.7.jar --packages com.databricks:spark-avro_2.11:4.0.0 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

Re: ApacheCon NA 19 slides

2019-09-11 Thread leesf
Also, for easy access by others, we can link the talk and slides to Talk & Presentations on this page( https://hudi.apache.org/powered_by.html ) Best Leesf leesf 于2019年9月11日周三 下午6:21写道: > The slides are very detailed and can help others know better about hudi. > Thanks for sharing! > > Best >

Re: ApacheCon NA 19 slides

2019-09-11 Thread leesf
The slides are very detailed and can help others know better about hudi. Thanks for sharing! Best Leesf taher koitawala 于2019年9月11日周三 下午3:26写道: > Hi Vinoth, > Slides look amazing to me. However, shouldn't we give out > some more clarity on Hoodie Index, Compactions and also how

Re: ApacheCon NA 19 slides

2019-09-11 Thread vbal...@apache.org
Thanks guys. The talk was primarily focussed on a high level design around building data lakes. Hence, we did not go too deep into lower level details. Not sure if/when Apache Con is going to publish the talk video. We will add the slides meanwhile to the powered-by sectionOn Wednesday,