Re: Inline storage of parquet data in logs

2019-10-23 Thread Vinoth Chandar
quet. Please correct me if I am missing something. > > Thanks, > Jaimin > > On Wednesday, 23 October 2019, Vinoth Chandar wrote: > > > Sure. Take your time! Just to clarify, here log refers to the Hudi > append > > log, not user's log4j or such logs. yes that would be v

Re: Unable to run Integration tests

2019-10-23 Thread Vinoth Chandar
I saw someone else share the same experience. Can't think of anything that could have caused this to become flaky recently. I already created https://issues.apache.org/jira/browse/HUDI-312

Re: [DISCUSS] Rename HIP process to RFC

2019-10-23 Thread Vinoth Chandar
Thanks all for the constructive comments! Will change the name in cWiki On Tue, Oct 22, 2019 at 6:27 PM vino yang wrote: > agree Vinoth, +1 > > Vinoth Chandar 于2019年10月22日周二 下午8:31写道: > > > Good point. Even for HIP we initially had gdoc as the starting point and > >

Re: Inline storage of parquet data in logs

2019-10-23 Thread Vinoth Chandar
; > On Oct 21 2019, at 3:07 pm, Vinoth Chandar wrote: > > Any thoughts? :) anyone? > > > > On Wed, Oct 9, 2019 at 11:06 AM Vinoth Chandar > wrote: > > > Hi all, > > > Wanted to share some prototyping I was doing for HUDI-46. The idea > here is > > &

Re: [DISCUSS] Rename HIP process to RFC

2019-10-22 Thread Vinoth Chandar
> > >> +1 on RFC. It's good to have a few pages of RFC to get a quick look > > of > > > an > > > >> idea. It doesn't have to be a full standard like some IETF RFCs. > > > >> > > > >> On Mon, Oct 21, 2019 at 5:31 AM Taher Koitawal

Re: Small size of data

2019-10-21 Thread Vinoth Chandar
+1 additionally this needs operation to be upsert or insert . On Mon, Oct 21, 2019 at 8:31 PM leesf wrote: > Hi Qian, > > Maybe you could set hoodie.parquet.max.file.size[1] > and hoodie.parquet.compression.ratio[2] larger to control data size. And > you could see the code snippet in

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-21 Thread Vinoth Chandar
/packaging/hudi-hadoop-mr-bundle/pom.xml#L66 here. On Fri, Oct 18, 2019 at 8:50 AM Vinoth Chandar wrote: > Looks like this is similar to the Hive 1.x issue faced in another thread. > > Let me think through this and get back to you. We need to reverse trace > change we did to drop hive

Re: [DISCUSS][VOTE] DyanamoDB Streams support in Hudi

2019-10-21 Thread Vinoth Chandar
https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get this into the next release as much as possible :) On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar wrote: > No problem. Having kinesis will get us a compelling story for cloud data > ingestion > > On Thu, Oct 1

Re: Inline storage of parquet data in logs

2019-10-21 Thread Vinoth Chandar
Any thoughts? :) anyone? On Wed, Oct 9, 2019 at 11:06 AM Vinoth Chandar wrote: > Hi all, > > Wanted to share some prototyping I was doing for HUDI-46. The idea here is > to see if we can embed a parquet file "inline" into an outer file (our > log), so that if the user

[DISCUSS] Rename HIP process to RFC

2019-10-20 Thread Vinoth Chandar
Someone asked me this and made me thinking about it. While HIP process covers concrete proposals to Hudi, sometimes we may need to just write up some ideas and solicit comments (e.g HudiLink https://cwiki.apache.org/confluence/display/HUDI/Hudi+for+Continuous+Deep+Analytics )

Re: [DISCUSS][VOTE] DyanamoDB Streams support in Hudi

2019-10-17 Thread Vinoth Chandar
ill be actively working on > this. > > On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, wrote: > > > Just wanted to bump this thread and see if anyone is actively working on > > kinesis support > > > > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar > wrote: > >

Re: Parquet issue

2019-10-17 Thread Vinoth Chandar
o, with respect to Hive, I am aware of the same, as of now I have > > share the Hive URL of 1.x which is used by CDH in config. > > > > > > Again attaching the logs for reference. > > > > > > Regards, > > > Shahida R. Khan > > > > > &g

Re: [VOTE] Release 0.5.0-incubating, release candidate #6

2019-10-16 Thread Vinoth Chandar
+1 (Binding) https://gist.github.com/vinothchandar/b558d3a86ffe1e733c54d1305a44ec38 for checks On Wed, Oct 16, 2019 at 11:03 AM vbal...@apache.org wrote: > > Forgot to mention that this release candidate addresses the licensing > concerns that came up during voting in general@incubator. The

Re: [DISCUSS][VOTE] DyanamoDB Streams support in Hudi

2019-10-15 Thread Vinoth Chandar
Just wanted to bump this thread and see if anyone is actively working on kinesis support On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar wrote: > I think we are on the same page. Thanks for clarifying! > Note on implementation: it would be great if we can reuse the spark > streaming

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-15 Thread Vinoth Chandar
custom changes? On Mon, Oct 14, 2019 at 10:11 PM Gurudatt Kulkarni wrote: > Hi Vinoth, > > Thank you for the quick response, but using the master branch would mean > building for Hive 2.X, but we are still working on Hive 1.1.0 :( > > > On Mon, Oct 14, 2019 at 7:57 PM

Re: Parquet issue

2019-10-15 Thread Vinoth Chandar
> I have added both the dependency and tried too. If you are trying to get the hudi-utilities bundle to include a jar, then you also need to whitelist it explicitly here https://github.com/apache/incubator-hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L67 Heads up : you may hit issues

Re: Parquet issue

2019-10-15 Thread Vinoth Chandar
First off, thanks Kabeer for stepping up and answering bunch of these questions! Really helps us scale community support! Keep em coming :) Hi Shaida, The utilities-bundle 0.5.1-SNAPSHOT (master) does not bundle parquet jars (except parquet-avro), but instead use it from the spark installation.

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-14 Thread Vinoth Chandar
Hi Gurudatt, Thanks for reporting this. This seems like a class mismatch issue (the particular stack trace). master and the next org.apache.hudi release has tons of fixes around this. Could you give master branch a shot by building it yourself?

Inline storage of parquet data in logs

2019-10-09 Thread Vinoth Chandar
Hi all, Wanted to share some prototyping I was doing for HUDI-46. The idea here is to see if we can embed a parquet file "inline" into an outer file (our log), so that if the user chooses to they can also get parquet data in the logs to speed up real-time view queries. We would be using the

Re: quickstart and use-cases pages for Chinese version of docs not rendering properly

2019-10-08 Thread Vinoth Chandar
Merged. Thanks for quick action, both! On Tue, Oct 8, 2019 at 6:44 PM leesf wrote: > Hi Sudha, > > Maybe it is caused by this PR[1], which makes the perlink start with /cn. > I open an another PR[2] to revert it. > > Best, > Leesf > > [1] https://github.com/apache/incubator-hudi/pull/900 > [2]

Re: [DISCUSS] cleaning up git history from Notice/License changes

2019-10-07 Thread Vinoth Chandar
gt; > > > On Thu, Oct 3, 2019 at 2:32 PM vbal...@apache.org > > wrote: > > > > > > > > +1 on both cleanup. This would keep the git history clean and > consistent > > > with contribution. > > > Balaji.VOn Thursday, October 3, 2019, 09:53:46 AM PDT,

Re: [VOTE] Release 0.5.0-incubating, release candidate #5

2019-10-06 Thread Vinoth Chandar
key B4F1CCC4D3541808: "Suneel Marthi (CODE SIGNING KEY) < smar...@apache.org>" not changed gpg: key 24A499037262AAA4: "Balaji Varadarajan " not changed gpg: key BB57228BCF851CFC: "Anbu Cheeralan " not changed gpg: key 0CF177E7BD9D3924: "

[DISCUSS] cleaning up git history from Notice/License changes

2019-10-03 Thread Vinoth Chandar
Folks, As we iterate across the RCs, we have added and removed to the NOTICE/LICENSE files a lot. Does anyone feel the need to clean up the history and do a one time force push? There is also an issue with github contribution stats not showing up everyone's commit (due to email changes etc). We

Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
> > This will give us a bit more context. > > Thanks > > Kabeer. > > > > On Oct 2 2019, at 10:55 pm, Vinoth Chandar wrote: > > > edit: > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquest

Re: Using Hudi to Pull multiple tables

2019-10-02 Thread Vinoth Chandar
https://issues.apache.org/jira/browse/HUDI-288 tracks this On Tue, Oct 1, 2019 at 10:17 AM Vinoth Chandar wrote: > > I think this has come up before. > > +1 to the point pratyaksh mentioned. I would like to add a few more > > - Schema could be fetched dynamically fro

Re: Kafka read exception when using HoodieDeltaStreamer

2019-10-02 Thread Vinoth Chandar
the cluster resolved the ClasscastException. > > > On Oct 1, 2019, at 10:25 AM, Vinoth Chandar vin...@apache.org>> wrote: > > > > [External Email] > > > Thanks for the detailed notes. helps. > > Could you give a quick shot trying to override the version in

Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
edit: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed? with the ? at the end On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar wrote: > Hi Qian, > > Welcome! Does > https://cwiki.apache.org/conf

Re: How to deploy Hudi

2019-10-02 Thread Vinoth Chandar
Hi Qian, Welcome! Does https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed? help ? On Wed, Oct 2, 2019 at 10:18 AM Qian Wang wrote: > Hi, > > I am new to Apache Hudi. Currently I am working on a PoC using Hudi and >

Re: [DISCUSS] Decouple Hudi and Spark (in wiki design page)

2019-10-02 Thread Vinoth Chandar
oss partitions. I have only early ideas. Need some help here as well coming to a solution. Both a & b, need some ground work/clean up. IIUC balaji is already working on some of it. If we can have volunteers for each of these areas, we can get underway. On Thu, Sep 26, 2019 at 10:13 AM Vinoth

Re: Kafka read exception when using HoodieDeltaStreamer

2019-10-01 Thread Vinoth Chandar
Thanks for the detailed notes. helps. Could you give a quick shot trying to override the version in a custom build ? Wondering if just upgrading Kafka would suffice for your scenario (without needing the 2.12 scala bundle) On Tue, Oct 1, 2019 at 10:14 AM Gautam Nayak wrote: > Thanks Nishith

Re: Using Hudi to Pull multiple tables

2019-10-01 Thread Vinoth Chandar
I think this has come up before. +1 to the point pratyaksh mentioned. I would like to add a few more - Schema could be fetched dynamically from a registry based on topic/dataset name. Solvable - The hudi keys, partition fields and the inputs you need for configuring hudi needs to be

Re: Hudi Parquet Storage Basic Question

2019-09-29 Thread Vinoth Chandar
parquet files with only changed rows or it will create parquet files with > > duplicates plus charged rows and wasting storage on hadoop. > > > > On Sat, Sep 28, 2019, 9:20 PM Vinoth Chandar wrote: > > > >> +1 There is also a faq entry here > >> > &

Re: Hudi Parquet Storage Basic Question

2019-09-28 Thread Vinoth Chandar
+1 There is also a faq entry here https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-Whatisthedifferencebetweencopy-on-write(COW)vsmerge-on-read(MOR)storagetypes? On Sat, Sep 28, 2019 at 1:10 AM Bhavani Sudha Saktheeswaran wrote: > Hi

Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-09-26 Thread Vinoth Chandar
@mentors Hopefully we are very close. Your eyes on this will significantly help us to get it right! On Thu, Sep 26, 2019 at 1:42 PM vbal...@apache.org wrote: > > Thanks Luciano for the comments. > I looked at other projects that are currently incubating to see how they > setup top-level

Re: [PROPOSAL] Hudi Web UI

2019-09-24 Thread Vinoth Chandar
ke a look. > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233 > Regards, Taher Koitawala On Mon, Sep 23, 2019 at 10:17 PM Taher Koitawala < > taher...@gmail.com> wrote: > Yup got it. Thanks Vinoth > > On Mon, Sep > 23, 2019, 10:02 PM Vinoth Chandar wro

Re: [DISCUSS] Hudi with Nifi

2019-09-24 Thread Vinoth Chandar
oyable it is. > > On Sun, Sep 22, 2019, 7:01 PM Vinoth Chandar wrote: > > > See a lot of Spark Streaming receiver based approach code there, which > > makes me a bit worried about scalability. > > > > Nonetheless. API wise cant we just so dstream.rdd.forEach? And issue >

Re: [PROPOSAL] Hudi Web UI

2019-09-23 Thread Vinoth Chandar
> > Regards, > Taher Koitawala > > On Sun, Sep 22, 2019 at 6:34 PM Vinoth Chandar wrote: > > > Taher, can we please move the HIP to the cWiki space as documented here > > > > > https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Detail

Re: Apache Pulsar component for Hudi

2019-09-23 Thread Vinoth Chandar
No worries. done. Please claim the ticket. On Sun, Sep 22, 2019 at 8:58 AM Vinay Patil wrote: > HI Vinoth, > > My bad should have specified in the earlier mail :) > > Jira id : vinaypatil18 > > > Regards, > Vinay Patil > > > On Sun, Sep 22, 2019 at 7:34

Re: source connectors

2019-09-23 Thread Vinoth Chandar
Hi Thomas, Good point. I have pretty similar thoughts. Current approach is if you get your data into some staging area : kafka, files on dfs, then DeltaStreamer can ingest them incrementally. For e.g the example here uses Sqoop for first leg and then DeltaStreamer/DataSource

Re: Apache Pulsar component for Hudi

2019-09-22 Thread Vinoth Chandar
e.org/jira/projects/HUDI/issues/HUDI-246 > > > > On Thu, Sep 12, 2019 at 10:48 AM Vinoth Chandar > wrote: > > > > > yes JIRA would be great to scope out the work. > > > > > > On Wed, Sep 11, 2019 at 10:00 PM Bhavani Sudha Saktheeswaran > > >

Re: [PROPOSAL] Hudi Web UI

2019-09-22 Thread Vinoth Chandar
> > > > > > On Sun, Sep 22, 2019, 9:52 AM Bhavani Sudha Saktheeswaran > > > wrote: > > > > > > > +1 for adding web ui. The web ui viz for table configs would be > pretty > > > > useful for easy debugging. > > > > > > >

Re: [DISCUSS][VOTE] DyanamoDB Streams support in Hudi

2019-09-22 Thread Vinoth Chandar
+1 For now we can keep this in hudi-utilities itself IMO. As for the connector or Deltastreamer Source to be specific, should we just integrate to Kinesis? If DynamoDB will pump its changes into Kinesis anyway, why should we aware of DynanoDB directly? Also we may need to rethink how we are going

Re: [PROPOSAL] Hudi Web UI

2019-09-21 Thread Vinoth Chandar
+1 will take a look at the doc for specifics in a few days. On Sat, Sep 21, 2019 at 7:18 PM vino yang wrote: > +1 to introduce Hudi web UI. Great suggestion! On 09/21/2019 12:24, Minh > Pham wrote: +1. I think an admin UI will help with reusability alot. On > Fri, Sep 20, 2019 at 8:32 PM Vinay

Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-09-21 Thread Vinoth Chandar
ossible and let CI be the guard. > > While working through this, I would suggest to compile the release guide > and add it to the website. > > Thanks, > Thomas > > > On Sat, Sep 21, 2019 at 4:54 PM Vinoth Chandar wrote: > > > Good catch Luciano. The jar was a face

Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-09-21 Thread Vinoth Chandar
Good catch Luciano. The jar was a facepalm moment. I remember vividly removing that (code does not use it anymore). Not sure what happened there.. We will go over the others issues and may be script the checks as well, if possible. On Sat, Sep 21, 2019 at 1:48 PM vbal...@apache.org wrote: >

Re: [UNVERIFIED SENDER] Re: [UNVERIFIED SENDER] Re: [UNVERIFIED SENDER] Fw: [VOTE] Release 0.5.0-incubating, release candidate #1

2019-09-18 Thread Vinoth Chandar
robably that’s why Balaji had sent a separate > thread to us, marked as *[UNVERIFIED SENDER]*. Just a guess. > > > > Thanks, > > Udit > > > > *From: *Vinoth Chandar > *Date: *Wednesday, September 18, 2019 at 3:12 PM > *To: *"Mehrotra, Udit" > *Subje

Re: [DISCUSS] Hudi with Nifi

2019-09-18 Thread Vinoth Chandar
Not too familiar wth Nifi myself. Would this still target an use-case like what pratyaksh mentioned? For delta streamer specifically, we are moving more and more towards continuous mode, where Hudi writing and compaction are amanged by a single long running spark application. Would Nifi

Re: Field not found in record HoodieException

2019-09-17 Thread Vinoth Chandar
[Orthogonal comment] It's so awesome to see us troubleshooting together.. Thanks everyone on this thread! On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala wrote: > No there are no nulls in the data and I am getting the same error. > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed wrote: > > >

Re: [VOTE] Release 0.5.0-incubating, release candidate #2

2019-09-17 Thread Vinoth Chandar
+1 binding ## CheckSum (OK) $ shasum -a 512 hudi-0.5.0-incubating-rc2.src.tgz > sha512 $ diff sha512 hudi-0.5.0-incubating-rc2.src.tgz.sha512.txt | wc -l 0 ## Tests (OK) $ mvn clean install # passed! ## Signature (OK) $ gpg --import hudi-0.5.0-incubating-rc2/KEYS ... gpg: Total number

Re: Help unblocking PR 896 to update site

2019-09-17 Thread Vinoth Chandar
+1 ran into same issue as well. Manually replaced html for now. But would be good to fix at the framework level. On Tue, Sep 17, 2019 at 6:07 AM Bhavani Sudha Saktheeswaran wrote: > I am trying to update the hudi site to reflect the latest doc changes since > last update. Since the paths to css

Re: [DISCUSS] Decouple Hudi and Spark

2019-09-16 Thread Vinoth Chandar
> > > > > +1 This is a pretty large undertaking. While the community is getting > > their hands dirty and ramping up on Hudi internals, it would be > productive > > if Vinoth shepherds this > > Balaji.VOn Monday, September 16, 2019, 11:30:44 AM PDT, Vin

Re: [DISCUSS] Decouple Hudi and Spark

2019-09-16 Thread Vinoth Chandar
more work :) > > We want to reach a point where we can start scoping out addition of Flink > and Beam components within. Then I think will tremendous progress. > > On Mon, Sep 16, 2019, 11:43 PM Vinoth Chandar wrote: > > > I still feel the key thing here is reimplementing Hoodi

Re: [DISCUSS] Decouple Hudi and Spark

2019-09-16 Thread Vinoth Chandar
to do it! On Mon, Sep 16, 2019 at 10:23 AM Taher Koitawala wrote: > Guys I think we are slowing down on this again. We need to start planning > small small tasks towards this VC please can you help fast track this? > > Regards, > Taher Koitawala > > On Thu, Aug 15, 2019, 10

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

2019-09-16 Thread Vinoth Chandar
and then call > this code continuously like how we are running HUDI in continuous mode? > > On Mon, Sep 16, 2019 at 9:09 PM Vinoth Chandar wrote: > > > Thanks, Taher! Any takers for driving this? This is something I would be > > very interested in getting involved with. Dont have the b

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

2019-09-16 Thread Vinoth Chandar
Actually went ahead and created https://issues.apache.org/jira/browse/HUDI-253 . Question is just about the PR for this now ? :) On Mon, Sep 16, 2019 at 8:54 AM Vinoth Chandar wrote: > +1 DeltaStreamer can be much nicer in such cases.. Any interest in opening > a JIRA/PR for this? >

Re: [BUG] Null Pointer Exception in SourceFormatAdapter

2019-09-16 Thread Vinoth Chandar
+1 DeltaStreamer can be much nicer in such cases.. Any interest in opening a JIRA/PR for this? On Mon, Sep 16, 2019 at 2:02 AM vbal...@apache.org wrote: > Yes, It makes sense to add validations with descriptive messages. Please > open a ticket and send a PR for this. > Thanks,Balaji.VOn

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

2019-09-16 Thread Vinoth Chandar
r is really valuable. > >> > >> Thanks, > >> Sudha > >> > >> On Sat, Sep 14, 2019 at 7:52 AM vino yang > wrote: > >> > >> > Hi Taher, > >> > > >> > IMO, it's a good supplement to Hudi. > >> > > >&

Re: FAQ page

2019-09-15 Thread Vinoth Chandar
d developers to let them know Hudi > well. > > Best, > Vino > > > > Vinoth Chandar 于2019年9月11日周三 上午2:27写道: > > > Hi all, > > > > I wrote a list of questions based on mailing list conversations and > issues. > > > https://cwiki.apache.or

Re: ApacheCon NA 19 slides

2019-09-14 Thread Vinoth Chandar
> > > > taher koitawala 于2019年9月11日周三 下午3:26写道: > > > > > > > >> Hi Vinoth, > > > >> Slides look amazing to me. However, shouldn't we give > out > > > >> some more clarity on Hoodie Index, Compactions and also how we

Re: [VOTE] Release 0.5.0-incubating, release candidate #1

2019-09-14 Thread Vinoth Chandar
-1 (binding) - Checksums & Signatures verify - Built the branch & tests pass - My own test jobs seem to work - Checked pom for version - NOTICE and LICENSE I think were updated right before RC was cut. Should be good to go - Source files all have ASF license . Tested rat plugin fails build if

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

2019-09-14 Thread Vinoth Chandar
Hi Taher, I am fully onboard on this. This is such a frequently asked question and having it all doable with a simple DeltaStreamer command would be really powerful. +1 - Vinoth On 2019/09/14 05:51:05, Taher Koitawala wrote: > Hi All, > Currently, we are trying to pull data

Re: ApacheCon NA 19 slides

2019-09-13 Thread Vinoth Chandar
awala 于2019年9月11日周三 下午3:26写道: > > > > > >> Hi Vinoth, > > >> Slides look amazing to me. However, shouldn't we give out > > >> some more clarity on Hoodie Index, Compactions and also how we can do > > UDFs > > >> when pu

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Vinoth Chandar
> > On Thu, Sep 12, 2019, 6:30 AM vino yang wrote: > > > > > +1 to welcome Pulsar connector > > > > > > Vinoth Chandar 于2019年9月12日周四 上午6:57写道: > > > > > > > +1 Always welcome new sources. Any takers for a PulsarSource in > > >

Re: Apache Pulsar component for Hudi

2019-09-11 Thread Vinoth Chandar
+1 Always welcome new sources. Any takers for a PulsarSource in DeltaStreamer? On Tue, Sep 10, 2019 at 4:33 AM taher koitawala wrote: > Hi Vinoth, > Apache Pulsar is a pub/sub messaging system like Kafka, > however, it has a few more functions which makes it different like >

Re: Dropping support for Spark 2.2 and lower

2019-09-11 Thread Vinoth Chandar
; > > > > > > >> On Tue, Sep 10, 2019 at 4:45 PM Kabeer Ahmed > > > wrote: > > > >> > > > >> +1. > > > >> > > > >> I am on spark 2.3 but would love to move to Spark 2.4. > > > >>> On Sep

Re: Using hudi with pyspark

2019-09-11 Thread Vinoth Chandar
Awesome. Also you could try building off master 0.5.0-snapshot if you are having some trouble with the bundles. Greatly appreciate if you can share progress/feedback. On Wed, Sep 11, 2019 at 1:55 AM Rodrigo Dominguez wrote: > Hi Kabeer > > I was able to build a simple script on python, and

ApacheCon NA 19 slides

2019-09-11 Thread Vinoth Chandar
Hi all, You might have noticed reduced responses this week. Reason was that Balaji and I were prepping for our talk at ApacheCon. Shared the slides here https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM Thanks Vinoth

FAQ page

2019-09-10 Thread Vinoth Chandar
Hi all, I wrote a list of questions based on mailing list conversations and issues. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185 While I am still working through answers, I thought this can

Dropping support for Spark 2.2 and lower

2019-09-09 Thread Vinoth Chandar
Hello all, I am trying to gauge what spark version everyone is on. We would like to move the spark version to 2.4 and simplify a whole bunch of stuff. Any objections? As a best effort, we can try to make 2.3 work reliably. Any objections? Note that if you are using the RDD based hudi-client

Re: [For Mentors] Readiness for IP Clearance

2019-09-04 Thread Vinoth Chandar
Sep 4, 2019, 11:38 AM vbal...@apache.org > wrote: > > > > > Pinging to see if one of the mentors can update the xml page :) > > Thanks,Balaji.VOn Friday, August 30, 2019, 05:29:52 PM PDT, Thomas > > Weise wrote: > > > > The signed CCLA was recorded on 2019/05/0

Re: Help testing PR 873

2019-09-04 Thread Vinoth Chandar
> Best > Leesf > > Bhavani Sudha Saktheeswaran 于2019年9月4日周三 > 上午4:41写道: > > > Thats really cool. Will update if I come across any issues. > > > > Thanks, > > Sudha > > > > On Mon, Sep 2, 2019 at 5:04 PM Vinoth Chandar wrote: > &

Re: Upsert after Delete

2019-08-31 Thread Vinoth Chandar
> > > > > > I have dug into the issue in detail and it seems it is a bug. I have > > filed > > > > it at: https://github.com/apache/incubator-hudi/issues/859 ( > > > > > > > https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46

Re: Upsert after Delete

2019-08-30 Thread Vinoth Chandar
om/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43 > > > > > > Thanks > > > > > > Kabeer. > > > > > > > > > > > > > /** > > > > > > &g

Re: [For Mentors] Readiness for IP Clearance

2019-08-30 Thread Vinoth Chandar
d" on https://whimsy.apache.org/roster/ppmc/hudi and we are not sure > whether the signed CCLA was submitted or not? > > On Fri, Aug 30, 2019 at 4:59 PM Vinoth Chandar wrote: > > > This is the confirmation > > > > > https://lists.apache.org/thread.html/49a42ca

Re: [For Mentors] Readiness for IP Clearance

2019-08-30 Thread Vinoth Chandar
g%3E > [2] https://incubator.apache.org/ip-clearance/ > > > On Fri, Aug 30, 2019 at 9:53 AM Suneel Marthi wrote: > > > I can do this later tonite after I get home from work - if any of the > other > > mentors get to it before me please go ahead and do the need

Re: Upsert after Delete

2019-08-30 Thread Vinoth Chandar
s%2F859=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ). > > Let me know if more information is required. > > Thank you, > > > > On Aug 23 2019, at 1:37 am, Vinoth Chandar wrote: > > > yes. I was asking about the HUDI storage type.. > > > > > > There is no

Re: [IMP] Understanding present state and planning ahead

2019-08-30 Thread Vinoth Chandar
arter/half-year. > Please also report usages in > https://github.com/apache/incubator-hudi/issues/661. We will add them to > our powered-by webpage. This would greatly help the community grow. > Thanks,Balaji.V > > > > On Wednesday, May 1, 2019, 9:12:55 PM PDT, Vino

Re: [Hudi Improvement]: Introduce secondary source-ordering-field for breaking ties while writing

2019-08-30 Thread Vinoth Chandar
Assigned to you. and also added you to the role for future tickets,, On Thu, Aug 29, 2019 at 11:57 PM Pratyaksh Sharma wrote: > Hi Vinoth, > > The jira is HUDI-207 <https://issues.apache.org/jira/browse/HUDI-207>. > > On Thu, Aug 29, 2019 at 10:17 PM Vinoth Cha

Re: Reg: Hudi Jira Ticket Conventions

2019-08-29 Thread Vinoth Chandar
> > I would like to take up this task. > > > > On Thu, Aug 29, 2019 at 8:49 AM Vinoth Chandar > wrote: > > > > > +1 can we add this to contributing/community pages. As well > > > > > > On Wed, Aug 28, 2019 at 2:33 PM vbal...@apache.org

Re: Reg: Hudi Jira Ticket Conventions

2019-08-28 Thread Vinoth Chandar
+1 can we add this to contributing/community pages. As well On Wed, Aug 28, 2019 at 2:33 PM vbal...@apache.org wrote: > To all contributors of Hudi: > Dear folks, > When filing or updating a JIRA for Apache Hudi, kindly make sure the issue > type and versions (when resolving the ticket) are set

Re: HoodieDeltaStreamer commits history

2019-08-26 Thread Vinoth Chandar
+1 One caveat here is that long queries (e.g Hive) may still be accessing these older files and may fail when cleaning very aggressively like this. On Mon, Aug 26, 2019 at 2:24 AM Gary Li wrote: > Hello, > > You can achieve this by changing the hudi config. >

Re: [DISCUSS] Promote Hudi Chinese Documentation into the official website

2019-08-24 Thread Vinoth Chandar
; > [1]: https://issues.apache.org/jira/browse/HUDI-211 > > Nishith 于2019年8月17日周六 下午12:17写道: > > > +1, that would be great > > > > Sent from my iPhone > > > > > On Aug 16, 2019, at 11:59 AM, Vinoth Chandar > wrote: > > > > > > +1 > &

Re: Upsert after Delete

2019-08-22 Thread Vinoth Chandar
mbine(EmptyHoodieRecordPayload > another) { > > > return another; > > > } > > > @Override > > > public Optional combineAndGetUpdateValue(IndexedRecord > currentValue, > > > chema schema) { > > > return Optional.empty(); > > > } &g

Re: [DISCUSS] Suggestion for Docs UI

2019-08-22 Thread Vinoth Chandar
+1 I was thinking along similar lines for the demo page Our doc theme should already support this https://idratherbewriting.com/documentation-theme-jekyll/mydoc_navtabs.html On Thu, Aug 22, 2019 at 12:04 PM Bhavani Sudha Saktheeswaran wrote: > Hi all, > > I was going through the

Re: Upsert after Delete

2019-08-22 Thread Vinoth Chandar
That’s interesting. Can you also share details on storage type and how you are issuing the deletes and also the table/view (ro, rt) that you are querying? On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed wrote: > Hudi experts and Users, > > Has anyone attempted an upsert after a delete? Here is a

Re: [DISCUSS] Hudi material and resources

2019-08-20 Thread Vinoth Chandar
+1 for creating a resources page. We have not invested enough into these aspects in the past. :) Summarizing where things are atm - Previous talks https://hudi.apache.org/powered_by.html#talks--presentations (i'd be great if you can add yours here if applicable) - the google drive (owned by

Re: Unable to subscribe to slack group

2019-08-19 Thread Vinoth Chandar
t; wrote: > >> Hi Vinoth, >> >> I have commented my mail id on the mentioned github issue. >> >> Sure, I will update the documentation. >> >> On Tue, Aug 13, 2019 at 11:41 PM Vinoth Chandar >> wrote: >> >>> Hi Pratyaksh, >>&

Re: [DISCUSS] Promote Hudi Chinese Documentation into the official website

2019-08-16 Thread Vinoth Chandar
nteers > > to > > > > help > > > > review PRs that come in this area? > > > > > > > > +1, yes, we really need a new component in JIRA. > > > > > > > > Best, > > > > Vino > > > > > > >

Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-16 Thread Vinoth Chandar
https://issues.apache.org/jira/browse/INFRA-18889 is the ticket tracking this! On Fri, Aug 16, 2019 at 1:18 AM vino yang wrote: > Hi vinm, > > Thanks for your effort. Great job. > I can receive a JIRA notification email now when someone mentioned me. > > Best, > Vino > &

Re: [DISCUSS] Refactor the package name of Hudi

2019-08-15 Thread Vinoth Chandar
he prefix may not be >necessary, and make the length of the classes name too long. >- It would be better to keep one keywork: Hudi in the whole project. The >consistency of the name can make the newbie more clear. And it allows > us to >be consistent in publicity and cita

Re: [Hudi Improvement]: Modification of partition path format to support simplified queries

2019-08-14 Thread Vinoth Chandar
Hi, Do these hooks seem sufficient to support what you are looking for? On Tue, Aug 13, 2019 at 8:16 PM vbal...@apache.org wrote: > > Hi Pratyaksh, > The partitioning format is pluggable in Hudi. > 1. For Hudi Writing, you can simply use one of the several implementations > of

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-14 Thread Vinoth Chandar
> > > >> On first focussing on decoupling of Spark and Hudi alone, yes a full > summary of how Spark is being used in a wiki page is a good start IMO. We > can then hash out what can be generalized and what cannot be and needs to > be left in hudi-client-spark vs hudi-client

Re: Contributing to Apache Hudi

2019-08-13 Thread Vinoth Chandar
Done! On Tue, Aug 13, 2019 at 6:44 AM leesf wrote: > Hi, > > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My JIRA ID is xleesf. > > leesf 于2019年8月13日周二 下午9:42写道: > > > Hi, > > > > I want to contribute to Apache Calcite. > > Would you please give

Re: Unable to subscribe to slack group

2019-08-13 Thread Vinoth Chandar
Hi Pratyaksh, We have pre-approved anyone with @apache.org. email and a few others.. Typically, https://github.com/apache/incubator-hudi/issues/143 is used for reporting the email to be added.. Can you provide your email there and we will add you in P.S: I realize there is a documentation gap on

Re: [DISCUSS] Promote Hudi Chinese Documentation into the official website

2019-08-13 Thread Vinoth Chandar
+1 Thanks for starting this initiative, Vino. I also suggest we add a new component in JIRA with a few volunteers to help review PRs that come in this area? On Tue, Aug 13, 2019 at 9:02 AM Gary Li wrote: > +1 This is a great idea. I think there are also some room for improvement > for the

Re: [DISCUSS] Refactor the package name of Hudi

2019-08-13 Thread Vinoth Chandar
t; Thanks for your effort! Good job! > > Another question that maybe not related to this thread: Shall we replace > all the keyword "Hoodie" to "Hudi"? > > What do you think? > > Best, > Vino > > Vinoth Chandar 于2019年8月12日周一 上午8:51写道: > > > T

Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-11 Thread Vinoth Chandar
Okay. I will then proceed with engaging INFRA on this issue. On Thu, Aug 8, 2019 at 5:01 AM Kabeer Ahmed wrote: > +1 > > On Aug 7 2019, at 10:24 pm, Vinoth Chandar wrote: > > Hi Luciano, > > > > > > please consider having a notifications list to avoi

Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-07 Thread Vinoth Chandar
issues that are delivered to concerned contributors, who are watching the issue say. Hope that clarifies. On Wed, Aug 7, 2019 at 1:37 PM Luciano Resende wrote: > On Wed, Aug 7, 2019 at 4:42 AM Vinoth Chandar wrote: > > > > Alright.. May be give this two more days per > > h

Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-07 Thread Vinoth Chandar
M PDT, Bhavani Sudha > > Saktheeswaran wrote: > > > > +1 I think it would be useful > > > > On Tue, Aug 6, 2019 at 9:45 AM Vinoth Chandar wrote: > > > > > This is what I see on the Notification settings . This sort of explains > > > it..

Re: Committership guidelines

2019-08-07 Thread Vinoth Chandar
Got a few reviews on the PR. So going ahead and merging. Please feel free to still leave comments on the PR, if you still have em. On Mon, Aug 5, 2019 at 9:10 AM Vinoth Chandar wrote: > Gentle reminder to review this PR! :) Have a great week! > > On Fri, Aug 2, 2019 at 5:29 AM Vinot

<    4   5   6   7   8   9   10   11   12   >