[DISCUSS] Logos on project front page.

2020-05-12 Thread Vinoth Chandar
Hello all, This was raised during the graduation discussion. We have been referred to [1]. The doc ends saying. "These best practices for linking to outside pages on project websites are meant as suggestions for projects. PMCs are free to adopt (or not) any of these suggestions for their sites.".

Re: [DISCUSS] Why add unit tests for hudi-cli module

2020-05-12 Thread Vinoth Chandar
+1 People rely on CLI to operate on Hudi datasets. So having some tests there, would definitely be useful On Tue, May 12, 2020 at 12:39 PM Shiyan Xu wrote: > Hi, the tests in hudi-cli are more of functional tests. They are conducive > to verifying features in cli module are working. Though not c

[ANNOUNCE] Thread on Hudi graduation started on general@incubator

2020-05-11 Thread Vinoth Chandar
Hello all, Please follow along here https://lists.apache.org/thread.html/r10d8c2038f1131c4da3e0cc2759da3554fab846bd623830251327b27%40%3Cgeneral.incubator.apache.org%3E It would be great to have more hands to answer the feedback/comments that may be raised. If you are part of the IPMC already, pl

Re: [DISCUSS] Bug bash?

2020-05-10 Thread Vinoth Chandar
for the detailed pointers. Will work on it. > >> > >> On Thu, May 7, 2020 at 1:50 AM Vinoth Chandar > wrote: > >> > >> > siva, That would be great. Next step is to put together a bug list > >> > > >> > - Scour existing 0.6.0 tickets, n

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-05-10 Thread Vinoth Chandar
> On Tue, Apr 21, 2020 at 9:34 AM nishith agarwal > wrote: > > > +1, thanks for starting this effort Satish! > > > > -Nishith > > > > On Fri, Apr 17, 2020 at 2:26 PM Vinoth Chandar > wrote: > > > > > Thanks Satish! > > > > > >

[RESULT] [VOTE] Apache Hudi graduation to top level project

2020-05-09 Thread Vinoth Chandar
) * Shaofeng Li * Balaji Varadarajan * Thomas Weise (Mentor) * Vinoth Chandar *Non PPMC +1 Votes (10)* *** Gary Li * Sivabalan Narayanan * Shiyan (Raymond) Xu * wxhjsxz * Lamber-ken * cooper * Y Ethan Guo * Tison * Pratyaksh Sharma * Shaofeng Shi 21 +1 votes and zero -1 votes. Vote

Re: [VOTE] Apache Hudi graduation to top level project

2020-05-09 Thread Vinoth Chandar
group: user-subscr...@kylin.apache.org > > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > > > > > > Pratyaksh Sharma 于2020年5月8日周五 上午3:12写道: > > > > > +1 > > > > > > Would love to see Hudi as a TLP. > > >

Re: Use DataFrame in HoodieWriteClient

2020-05-08 Thread Vinoth Chandar
Hi Dongwook, You can already write a Spark DataFrame into a Hudi table.. Please see quickstart for examples.. Let us know if you meant something else. Thanks Vinoth On Thu, May 7, 2020 at 10:29 AM Dongwook Kwon wrote: > Hi, Hudi community. > > For write operations, in HoodieWriteClient, I wond

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-08 Thread Vinoth Chandar
dn't work either, which leads me to believe > that the spark jobs were actually deserializing those variables to use > later on - it was a nice learning curve for me. > > And definitely not throwing stones on the documentation - documentation is > not easy... > > Thanks ag

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-06 Thread Vinoth Chandar
nt approaches > but I wouldn't mind throwing some documentation together to help folks out. > > Let me know if you need anything else to help move this along - surely I > can't be the only one that needed it! :-) > > Allen > > On Tue, May 5, 2020 at 11:22 AM Vin

Re: [DISCUSS] Bug bash?

2020-05-06 Thread Vinoth Chandar
rote: > I could lend a hand if we need any help in organizing this. LMK. > > On Tue, Apr 28, 2020 at 3:19 AM Bhavani Sudha > wrote: > > > On Mon, Apr 27, 2020 at 10:21 PM Vinoth Chandar > wrote: > > > > > Great! I will prep the bugs and do unif

[VOTE] Apache Hudi graduation to top level project

2020-05-06 Thread Vinoth Chandar
* Nishith Agarwal * Prasanna Rajaperumal * Shaofeng Li * Steve Blackmon * Suneel Marthi * Thomas Weise * Vino Yang * Vinoth Chandar

Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-06 Thread Vinoth Chandar
Hi Sudha, +1 on the overall idea.. I tried to pick out few of these PRs that are - Small enough to apply easily - Have limited scope, fixing pointed problems - Have high impact on performance or usability [HUDI-799] Use appropriate FS when loading configs https://github.com/apache/incubator-h

Weekly Sync 20200505

2020-05-05 Thread Vinoth Chandar
Really fancy date timestamp :) Anyways, here are the notes https://cwiki.apache.org/confluence/display/HUDI/20200505+Weekly+Sync+Minutes

Re: [DISCUSS] Return schema provider as optional?

2020-05-05 Thread Vinoth Chandar
I think discussions are going on in the PR itself.. Please chime in there as well if this suits you. On Sat, May 2, 2020 at 6:01 AM Shiyan Xu wrote: > Hi all, > > In case of reading schema-inferable source like parquet, when no new data > is found, then, if i understand correctly, no schema can

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-05 Thread Vinoth Chandar
On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar wrote: > >> Thanks both! >> >> @allen heard this many times :) hear you. You could write a small class >> yourself with your custom logic and throw it in there? >> >> If you think there is a way to fix the key genera

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-04 Thread Vinoth Chandar
without > the fail and vice versaGood times. > > After I figure this out I'll see if I can put this information somewhere > easy to find. > > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar wrote: > >> Hi Allen, >> >> You are able to configure the key

Re: [DISCUSS] moving blog from cwiki to website

2020-05-04 Thread Vinoth Chandar
posts so whichever method we > choose, it should be quick to move the posts ready once the infra is set > up. > > Please chime in with your suggestions and preferences. > > Thanks > Prashant > > > On Fri, May 1, 2020 at 9:06 AM Vinoth Chandar wrote: > > > Tha

Re: Extracting a partition field from something like a timestamp using Deltastreamer / Configurations?

2020-05-04 Thread Vinoth Chandar
Hi Allen, You are able to configure the key generator for deltastreamer using this property (either via a file or --config ) hoodie.datasource.write.keygenerator.class You might be interested in this built-in generator. https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/

Re: [DISCUSS] Readiness for graduation to TLP

2020-05-01 Thread Vinoth Chandar
sende > > wrote: > > > > > +1 > > > > > > On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar > > wrote: > > > > > > > > Hello all, > > > > > > > > I would like to start a discussion on our readiness to

Re: [DISCUSS] moving blog from cwiki to website

2020-05-01 Thread Vinoth Chandar
That’d be awesome! Thanks! On Fri, May 1, 2020 at 9:06 AM Prashant Wason wrote: > Hi Vinoth, > > Sure, I will prioritize this. Hope to have something by this weekend. > > Thanks > Prashant > > > On Wed, Apr 29, 2020 at 8:31 PM Vinoth Chandar wrote: > > > H

Re: [DISCUSS] Add Github and Twitter Widget on Hudi's official website

2020-05-01 Thread Vinoth Chandar
+1 these are the the little things that matter :) On Fri, May 1, 2020 at 2:28 AM wangxianghu wrote: > Hi vino, > That’s a good idea, Adding the Github to Hudi’s official website will make > it more convenient to get Hudi's source code, and since Twitter is one of > the most popular social tools,

Re: [DISCUSS] moving blog from cwiki to website

2020-04-29 Thread Vinoth Chandar
hting etc.. > > On Wed, Apr 22, 2020 at 10:08 AM Prashant Wason > wrote: > >> I can help drive this. Let me take a look at some other projects and >> suggest how to go about it. >> >> Thanks >> Prashant >> >> >> On Wed, Apr 22, 2020, 9:31

20200428 Weekly Sync

2020-04-28 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200428+Weekly+Sync+Minutes Minutes for your reference!

Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Vinoth Chandar
>>> > >>>> +1 to pursue graduation. I certainly think we are ready. Will chime in > >> on > >>>> voting thread when you start it. > >>>> > >>>> Thanks, > >>>> Sudha > >>>> > >>>>

Re: [DISCUSS] Bug bash?

2020-04-27 Thread Vinoth Chandar
; Sivabalan 于2020年4月23日周四 下午9:30写道: > > > > > > > +1 > > > > > > > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken > wrote: > > > > > > > > > > > > > > > > > > &

Re: Checking out the asf svn repo

2020-04-27 Thread Vinoth Chandar
Done.. Thanks lamber-ken for being watchful, as always! On Thu, Apr 23, 2020 at 9:52 PM Vinoth Chandar wrote: > Thanks!. Will wait for other feedback and the current changes to show up > and then fix on top. > > On Thu, Apr 23, 2020 at 12:33 PM lamberken wrote: > >> >

[DISCUSS] Readiness for graduation to TLP

2020-04-27 Thread Vinoth Chandar
Hello all, I would like to start a discussion on our readiness to pursue graduation to TLP and potentially follow up with a VOTE with a formal resolution. To seed the discussion, our community's achievements since entering the Incubator in early 2018 include the following: - Accepted > 500 patch

Re: [Discussion] Abstract common meta sync module support multiple meta service

2020-04-27 Thread Vinoth Chandar
+1 Will get around to reviewing this more closely this week. On Mon, Apr 27, 2020 at 11:11 AM Gary Li wrote: > Hi Wei, > > Thanks for the proposal. +1 from my side. This is definitely a very useful > feature. > > Best Regards, > Gary Li > > > On 4/27/20, 5:16 AM, "wei li" wrote: > > Curre

Re: [DISCUSS] Next Release timeline

2020-04-26 Thread Vinoth Chandar
Given enough time has passed, we can proceed this way, with Sudha as RM . Please respond if anyone has more to add On Sun, Apr 26, 2020 at 1:12 PM Balaji Varadarajan wrote: > +1 on Sudha being RM and targeting next release for mid may. > > Balaji.V > > On 2020/04/23 14:27:46,

Re: Generic Types of HoodieRecordPayload

2020-04-23 Thread Vinoth Chandar
it and the raw usage forever. > > Best, > tison. > > > Vinoth Chandar 于2020年4月23日周四 下午10:31写道: > > > Thanks Tison! One consideration we need to have is that we cannot have a > > breaking non-backwards compatible change for existing users with custom > > payloads.

Re: Checking out the asf svn repo

2020-04-23 Thread Vinoth Chandar
; https://svn.apache.org/repos/asf/incubator/public/trunk/content/projects/hudi.xml > > Best, > Lamber-Ken > > On 2020/04/23 19:04:19, Vinoth Chandar wrote: > > Good catch.. Fixed! > > > > On Thu, Apr 23, 2020 at 11:57 AM lamberken wrote: > > > > >

Re: Checking out the asf svn repo

2020-04-23 Thread Vinoth Chandar
Good catch.. Fixed! On Thu, Apr 23, 2020 at 11:57 AM lamberken wrote: > Hi Vinoth, > > The browser shown hudi.xml contains syntax error. > > https://svn.apache.org/repos/asf/incubator/public/trunk/content/projects/hudi.xml > > Best, > Lamber-Ken > > On 2020/04/23 1

Re: Checking out the asf svn repo

2020-04-23 Thread Vinoth Chandar
Finally figured out.. :/ Updated the status file now, to reflect latest information all, please take a look and spot any errors (if any) https://svn.apache.org/viewvc/incubator/public/trunk/content/projects/hudi.xml?revision=1876904&view=markup On Mon, Apr 20, 2020 at 5:03 PM Vinoth Cha

Re: Generic Types of HoodieRecordPayload

2020-04-23 Thread Vinoth Chandar
;t bring a lot benefit. > > Best, > tison. > > [1] https://issues.apache.org/jira/browse/HUDI-834 > [2] org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:114 > > > > Vinoth Chandar 于2020年4月23日周四 上午12:04写道: > > > +1 raising a JIRA and summarizing some findings wo

Re: [DISCUSS] Next Release timeline

2020-04-23 Thread Vinoth Chandar
Thanks all. Encourage everyone to chime in more, so we can make a decision here! On Thu, Apr 23, 2020 at 6:29 AM Sivabalan wrote: > sounds good. We could go with a major by mid may. > > On Wed, Apr 22, 2020 at 12:58 PM Vinoth Chandar wrote: > > > +1 on Sudha being the RM >

Re: [Discussion] Abstraction for HoodieInputFormat and RecordReader

2020-04-22 Thread Vinoth Chandar
Hi Gary, On COW, today we already let the engines (Spark, Hive, Presto) use their own readers for parquet.. But as we embark on MOR snapshot query (aka realtime inputformat), it may make sense to have abstractions in our own code base to efficiently read base + log files (file slice) out in diffe

[DISCUSS] Bug bash?

2020-04-22 Thread Vinoth Chandar
Just floating a very random idea here. :) Would there be interest in doing a bug bash for a week, where we aggressively close out some pesky bugs that have been lingering around.. If enough committers and contributors are around, we can move the needle. We could time this a week before cutting RC

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread Vinoth Chandar
:08 AM Prashant Wason wrote: > I can help drive this. Let me take a look at some other projects and > suggest how to go about it. > > Thanks > Prashant > > > On Wed, Apr 22, 2020, 9:31 AM Vinoth Chandar wrote: > > > Any volunteers to drive this? (also may be

Re: [DISCUSS] Next Release timeline

2020-04-22 Thread Vinoth Chandar
+1 on Sudha being the RM My preference would be to do a major release as well, targeting mid may (which means code freeze in 3 weeks?) This gives us enough time to land some major features as well as stabilize them as much as possible. On Wed, Apr 22, 2020 at 3:21 AM Pratyaksh Sharma wrote: > M

Re: [DISCUSS] moving blog from cwiki to website

2020-04-22 Thread Vinoth Chandar
> > > Hi Vinoth, > > > > > > > > > > +1 for moving blogs. > > > > > > > > > > cwiki looks belong to developer's scope and the first experience of > > > users > > > > > is more likely our website. > &

Re: Generic Types of HoodieRecordPayload

2020-04-22 Thread Vinoth Chandar
> > > We don't actually use the type parameter heavily, so it is an alternative > > that we define HoodieRecordPayload just > > > > public class HoodieRecordPayload { } > > > > If the community think it is a worth effort, I'm glad to do mo

Re: [DISCUSS] Support popular metrics reporter

2020-04-21 Thread Vinoth Chandar
+1 from me as well On Mon, Apr 20, 2020 at 9:37 PM vino yang wrote: > Hi Raymond, > > Thanks for opening this discussion. > > IMHO, as Hudi's user base grows, we need to enhance our metrics reporter. > From an ecological point of view, this is also very important. > > So, +1 from my side. > > Be

Re: Generic Types of HoodieRecordPayload

2020-04-21 Thread Vinoth Chandar
Hi Tison, Thanks for raising this.. In most places doing a HoodieTable wildcard should be totally acceptable, since much of the code actually does not depend on the templatized type at all.. Def, worth taking another look holistically and see if we can address this.. My 2c. Vinoth On Mon, Apr

Re: [ATTN] JUnit 5 adoption

2020-04-21 Thread Vinoth Chandar
+1 Appreciate the efforts, Raymond! [Wondering if there is a way to stick a checkstyle rule to this effect. guess it won't check for new changes alone, rather complain about existing junit 4 tests?] On Tue, Apr 21, 2020 at 5:10 PM Shiyan Xu wrote: > Hi all, > > We're in progress with JUnit 5 mi

[DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Vinoth Chandar
Hi community, What does everyone feel about moving blogs we have on cwiki now over to site so they are better discovered? Thanks Vinoth

20200421 Weekly Sync Minutes

2020-04-21 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200421+Weekly+Sync+Minutes

Re: Checking out the asf svn repo

2020-04-20 Thread Vinoth Chandar
[1] https://infra.apache.org/version-control.html > > > > > Best, > Lamber-Ken > > > > > > > > > > > > At 2020-04-17 08:24:01, "Vinoth Chandar" wrote: > >Hello all, > > > >Can anyone here (potentially from prior experienc

Re: apply for contributor permission

2020-04-20 Thread Vinoth Chandar
Done. Welcome to the community! On Mon, Apr 20, 2020 at 8:05 AM Lisheng Wang wrote: > Hi, > > I want to contribute to Apache Hudi. > > Would you please give me the contributor permission? > > My JIRA ID is wanglisheng. > > > Best, > Lisheng >

Re: 来自lansane的邮件

2020-04-20 Thread Vinoth Chandar
Done! Welcome to the community! On Mon, Apr 20, 2020 at 8:06 AM lansane wrote: > hi, > i want to contribute to Apache Hudi. > would yuou please give me the contributor permission? > my JIRA ID is lansane.

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-17 Thread Vinoth Chandar
t; > file-groups which does not have any incoming records assigned. This is > for > > the case when we have fewer incoming records to fit into all existing > > file-groups. Existing file groups will be reused. > > Agree, on the magic part. > > Balaji.VOn Thursday,

Re: Hudi concurrent writes

2020-04-17 Thread Vinoth Chandar
5 > > HoodieWriteClient constructor > > https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java#L120 > HoodieWriteClient rollbackPending Method > > https://github.com/apache/incubator-hudi/blob/master/hudi-clie

Checking out the asf svn repo

2020-04-16 Thread Vinoth Chandar
Hello all, Can anyone here (potentially from prior experience with other apache projects) point me, to how I can checkout the apache svn repo here? https://svn.apache.org/viewvc/incubator/public/trunk/content/projects/hudi.xml?view=log Would like to make some edits to our status file.. Specifica

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Vinoth Chandar
a > different operation type which can be achieved with same API (with flags). > Balaji.V > > On Thursday, April 16, 2020, 09:54:09 AM PDT, Vinoth Chandar < > vin...@apache.org> wrote: > > Hi Satish, > > Thanks for starting this.. Your use-cases do sounds very

Re: [DISCUSS] Insert Overwrite with snapshot isolation

2020-04-16 Thread Vinoth Chandar
Hi Satish, Thanks for starting this.. Your use-cases do sounds very valuable to support. So +1 from me. IIUC, you are implementing a partition level overwrite, where existing filegroups will be retained, but instead of merging, you will just reuse the file names and write the incoming records in

Re: [HELP WANTED] Codecov report skips JUnit 5 test cases

2020-04-14 Thread Vinoth Chandar
i, > > > > AFAIK codecov doesn't really work with the test framework. we only uplod > > the corbetura reports collected locally in the build system. > > > > Can you please verify that the local codecoverage reports pick up JUnit5 > > changes ? > > &

Re: [HELP WANTED] Codecov report skips JUnit 5 test cases

2020-04-14 Thread Vinoth Chandar
This one is probably worth flagging with codecov support as well? Does seem weird.. :/ On Mon, Apr 13, 2020 at 11:14 PM Shiyan Xu wrote: > Hi all, > > We're migrating all test cases to JUnit 5. > > This PR, as an initial step to enable JUnit 5, has migrated quite a few > test cases. The test cas

Re: Manual deletion of a parquet file

2020-04-14 Thread Vinoth Chandar
keys and rewrite the file. Wanted to check if > it would add any value to our code base and if I should raise a PR for the > same. If the community agrees, then we can work together to further improve > it and make it generic enough. > > On Mon, Apr 13, 2020 at 8:22 PM Vinoth Chandar

Re: Hudi concurrent writes

2020-04-14 Thread Vinoth Chandar
Hi Brandon, This is more of practical advice than sharing how to solve it using Hudi. By and large, this need can be mitigated by serializing your writes in an upstream message queue like Kafka.. For e.g , lets say you want to delete some records in a table, that is being currently ingested by del

Re: Manual deletion of a parquet file

2020-04-13 Thread Vinoth Chandar
Hi Pratyaksh, Your understanding is correct. There is a duplicate fix tool in the cli (I wrote this a while ago for cow, but did use it in production few times for situations like these). Check that out? IIRC it will keep the both the commits and its files, but simply get rid of the duplicate reco

Re: [ANNOUNCE] Hudi Weekly Update(2020-04-05 ~ 2020-04-12)

2020-04-12 Thread Vinoth Chandar
Thanks again for the updates!! Folks, from this week we are including jiras where help is wanted in this update. If you are looking to contribute code, this is a great way to discover impactful jiras On Sun, Apr 12, 2020 at 5:56 AM leesf wrote: > Dear community, > > Nice to share Hudi community

Re: Please add permission for assign to me

2020-04-10 Thread Vinoth Chandar
Done. and welcome to the apache hudi community! On Fri, Apr 10, 2020 at 7:37 AM JY F wrote: > Hi all, > > I want to contribute to Apache Hudi. > Would you give me the contributor permission? > My JIRA id is "Kotomi" > > Thanks & Regards >

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-09 Thread Vinoth Chandar
Thanks Raymond.. We can continue engaging on the ticket! On Thu, Apr 9, 2020 at 4:54 PM Shiyan Xu wrote: > Filed! > https://issues.apache.org/jira/browse/HUDI-779 > > On Wed, Apr 8, 2020 at 11:05 PM Vinoth Chandar wrote: > > > +1 on an umbrella task.. > > > >

Re: [DISSCUSS] Troubleshooting flow

2020-04-08 Thread Vinoth Chandar
Mon, 6 Apr 2020, 19:19 Balaji Varadarajan, > > > wrote: > > > > > Agree. The triaging process makes sense to me. > > > Balaji.V > > > On Monday, April 6, 2020, 09:54:24 AM PDT, Vinoth Chandar < > > > vin...@apache.org> wrote: > > > &g

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-08 Thread Vinoth Chandar
maybe could upgrade by module. > > > > vino yang 于2020年4月2日周四 下午4:38写道: > > > > > Hi Shiyan, > > > > > > +1 from my side. > > > > > > Best, > > > Vino > > > > > > Vinoth Chandar 于2020年3月30日周一 下午11:00写道: >

Re: Pyspark with hudi scripts

2020-04-08 Thread Vinoth Chandar
Thanks Udit! I also believe there will be a PR soon for pySpark and we should have formal support next release. On Wed, Apr 8, 2020 at 4:49 PM Mehrotra, Udit wrote: > Hi Yaswanth, > > PFA an example I prepared sometime back which can help you get started. > > Thanks, > Udit > > On 4/8/20, 3:

20200407 Weekly Sync Minutes

2020-04-07 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200407+Weekly+Sync+Minutes

New PPMC Member : Bhavani Sudha

2020-04-07 Thread Vinoth Chandar
Hello all, I am very excited to share that we have new PPMC member - Sudha. She has been a great champion for the project for almost couple years now, driving a lot of presto/query engine facing changes and most of all being the face of our community to new users on Slack, over the past few months

New Committer: lamber-ken

2020-04-07 Thread Vinoth Chandar
Hello Apache Hudi Community, The Podling Project Management Committee (PPMC) for Apache Hudi (Incubating) has invited lamber-ken (Xie Lei) to become a committer and we are pleased to announce that he has accepted. lamber-ken has had a large impact by in hudi, with some sustained efforts in the pa

Re: Re: [DISSCUSS] Troubleshooting flow

2020-04-06 Thread Vinoth Chandar
11:45 AM Bhavani Sudha > > >wrote: > > > > > >> Agree on using GH issues to post code snippets or debugging issues. > > >> > > >> Regarding mirroring slack to commits, the last time I checked there > was > > no > > >&

Re: [DISSCUSS] Troubleshooting flow

2020-04-02 Thread Vinoth Chandar
Hello all, Actually that's how we have been using GH issues.. Both slack/ml are inconvenient for sharing code and having long threaded conversations. (same issues raised here). That said, we could definitely formalize this and look to move slack threads into GH issue for triaging (then follow up

Re: I want to contribute to Apache Hudi.

2020-04-02 Thread Vinoth Chandar
Thanks. Suneel! On Thu, Apr 2, 2020 at 5:19 AM Suneel Marthi wrote: > done. > > On Thu, Apr 2, 2020 at 8:18 AM 郭鹏 <18017339...@163.com> wrote: > > > Hi, > > > > I want to contribute to Apache Hudi. Would you please give me the > > contributor permission? My JIRA ID is GarudaGuo. > > > > > > >

20200331 Weekly Sync Minutes

2020-03-31 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200331+Weekly+Sync+Minutes Stay safe everyone!

Re: HoodieSnapshotExporter

2020-03-30 Thread Vinoth Chandar
Please spread the word :) https://twitter.com/apachehudi/status/1244670019079794688 On Sat, Mar 28, 2020 at 6:45 PM Shiyan Xu wrote: > Sure Vinoth, please feel free to make edits. > > On Sat, 28 Mar 2020, 15:33 Vinoth Chandar, wrote: > > > +1 great contribution everyone. >

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-30 Thread Vinoth Chandar
to be carried out in a long-running ongoing > fashion. > > Any thoughts or feedback? > > On Wed, Mar 25, 2020 at 7:52 AM Vinoth Chandar wrote: > > > +1 on Junit5. > > does seem nicer with support for lambdas. assuming we do a gradual > > rollout. At any point, we can

Re: HoodieSnapshotExporter

2020-03-28 Thread Vinoth Chandar
+1 great contribution everyone. Thanks for the blog raymond. Will make some minor edits if you dont mind and tweet from our handle.:) On Sat, Mar 28, 2020 at 12:50 AM vino yang wrote: > Hi Raymond, > > Thanks for driving this valuable feature! Having this tool, it would be > easier for backup p

Re: Get all deletes after a specific commit time

2020-03-26 Thread Vinoth Chandar
nally? On Thu, Mar 26, 2020 at 1:47 PM Vinoth Chandar wrote: > Currently, soft deletes will show up in the incremental stream, while hard > deletes will not.. > > We are debating how to add this features, since it has come up few times > recently.. > > May be this can be a go

Re: Get all deletes after a specific commit time

2020-03-26 Thread Vinoth Chandar
Currently, soft deletes will show up in the incremental stream, while hard deletes will not.. We are debating how to add this features, since it has come up few times recently.. May be this can be a good discuss thread for that? :) On Thu, Mar 26, 2020 at 1:42 PM Joaquim S wrote: > Folks, > >

Re: DMS - org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class

2020-03-25 Thread Vinoth Chandar
; high caps case. I switched column names and table names to lower case and > it works perfectly. > > > > Vinoth Chandar escreveu no dia quarta, 25/03/2020 à(s) > 11:04: > > > Hi, > > > > That's surprising..Do you have --source-class > > org.apache.hud

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-03-25 Thread Vinoth Chandar
Hi Pratyaksh, Thanks for opening this. Will review and get back to you! Thanks Vinoth On Sat, Mar 21, 2020 at 2:35 AM Pratyaksh Sharma wrote: > @Balaji @Vinoth Chandar , > > Here is a small attempt to make this a generic one - > https://github.com/apache/incubator-hudi/pull/1433/f

Re: [ANNOUNCE] Apache Hudi (incubating) 0.5.2 released

2020-03-25 Thread Vinoth Chandar
This was an important release .. Thanks for driving this, vino! Congrats to everyone involved! Please spread the word https://twitter.com/apachehudi/status/1243035380586123265 On Wed, Mar 25, 2020 at 8:07 PM tison wrote: > Congrats! > > Best, > tison. > > > vino yang 于2020年3月26日周四 上午10:19写道: >

Re: Sequence of Transformers

2020-03-25 Thread Vinoth Chandar
apply > all the way through the list. > The implementation can be minimal for this approach. > > On Mon, Mar 23, 2020 at 4:12 PM Vinoth Chandar wrote: > > > sg. Filed https://issues.apache.org/jira/browse/HUDI-731 > > > > Someone looking to pick this? :). Its an n

Re: DMS - org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class

2020-03-25 Thread Vinoth Chandar
Hi, That's surprising..Do you have --source-class org.apache.hudi.utilities.sources.ParquetDFSSource? I ask sine for Row based sources, the schema provider is auto configured as show in the blog page.. Thanks VInoth On Tue, Mar 24, 2020 at 11:07 AM Joaquim S wrote: > Team, > > When following t

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-25 Thread Vinoth Chandar
+1 on Junit5. does seem nicer with support for lambdas. assuming we do a gradual rollout. At any point, we cannot have any of the core tests disabled :) May be we can use the vintage framework for now, do minimal changes migrate and then proceed to redoing the tests On AssertJ type frameworks, I

Re: [NOTIFICATION] Auto generation asf-site feedback

2020-03-24 Thread Vinoth Chandar
Currently, the new site is published to a "test-content" folder. Our plan is to try this for 1 week and then actually cut over to "content" which is what powers the site. Kudos to lamber-ken for the perseverance in getting this done! On Tue, Mar 24, 2020 at 5:19 PM lamberken wrote: > Hi team,

Re: Sequence of Transformers

2020-03-23 Thread Vinoth Chandar
olks using DMS transformer and that need some kind > of transformation before the DMS transformer adds the op filed for initial > load or when loading the CDC. In the meantime, I will create a custom > transformer. > > Thanks again, > -F. > > > Vinoth Chandar escreveu

Re: Sequence of Transformers

2020-03-22 Thread Vinoth Chandar
Hi F, The Transformer interface allows you to basically plugin anything that takes a DataFrame and returns a transformed DataFrame. Does that help? If you are talking about implementing support for chained calling of multiple Transformers, within DeltaStreamer itself..It has been discussed before.

Re: [Online Meetup] Apache Kylin × Apache Hudi Meetup, Mar. 14, 2020

2020-03-22 Thread Vinoth Chandar
gh-performance data warehouse based on > > > Apache > > > > Hudi and Apache Kylin > > > > https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e > > > > > > > > Vinoth Chandar 于2020年3月12日周四 下午2:31写道: > > > &

Re: Bring back support for spark 2.3?

2020-03-21 Thread Vinoth Chandar
ago but I guess the error was related to parquet-avro > mismatch only. Let me try reproducing it and will file a lira this time. > > On Tue, Feb 25, 2020 at 12:29 PM Vinoth Chandar wrote: > > > Hi Pratyaksh, > > > > Actually we voted on the mailing list here, to move

Re: Query regarding restoring HUDI tables to older commits

2020-03-21 Thread Vinoth Chandar
Hi all, Good discussion. let me try and tease this apart. Rollback. : Should only be used for rolling back an inflight write.. Nothing else IMO.. This is where we guarantee that there will be no impact to readers/query engines. Restore : It's an invasive maintenance operation, that will be disru

Re: [VOTE] Release 0.5.2-incubating, release candidate #2

2020-03-21 Thread Vinoth Chandar
+1 binding Repeated tests from RC1 On Sat, Mar 21, 2020 at 5:44 AM vino yang wrote: > +1 binding > > - checked signature & checksum > - maven clean package -DskipTests > - ran `release/validate_staged_release.sh` > - check RAT (OK) > > Best, > Vino > > Suneel Marthi 于2020年3月21日周六 下午8:33写道: > >

Re: [NOTIFICATION] Hudi 0.5.2 Release Daily Report-20200319

2020-03-19 Thread Vinoth Chandar
Thanks for driving this crucial release, vino. On Thu, Mar 19, 2020 at 6:39 AM vino yang wrote: > Hi all, > > After clarifying our confusion with Justin, we finally found the problem we > needed to solve.[2] Currently, as long as we include the contents of the > NOTICE file of our bundle's proje

Re: [NOTIFICATION] Hudi 0.5.2 Release Daily Report-20200318

2020-03-19 Thread Vinoth Chandar
d this; > > > In short, our LICENSE contains a relatively uncommon description "*This > product includes code from *". In this regard, the main problem here is the > information asymmetry between us and the IPMC. I don't think it makes much > sense to refer to other

Re: Question

2020-03-18 Thread Vinoth Chandar
Hi Syed, Please join the mailing list, so your responses make it here without needed approval. I am sure there is something odd going on here. Few things to check - Hudi does use memory for caching inputs and computing heuristics. I have seen slowness being caused by insufficient executor memory

Re: Query regarding restoring HUDI tables to older commits

2020-03-18 Thread Vinoth Chandar
Hi Prashant, Not sure if there is a specific reason. Mostly, it because until recently, the clean metadata was not actually used. Currently, incremental cleaning will use it, but even then, it only relies on the partition paths being touched there.. So should be fine.. +100 though on consistently

Re: deltastreamer group.id Noeffectaftersetting

2020-03-18 Thread Vinoth Chandar
DeltaStreamer actually just uses the same mechanism as Spark Streaming to manage offsets. So wondering if you see the same behavior with a plain spark streaming job. ? It manages the offset checkpoints manually by itself within the hoodie commit metadata, to do exactly once ingestion of data.. On

Re: Question on DeltaStreamer

2020-03-18 Thread Vinoth Chandar
>>Lets say if I have a source table in Oracle in the format below, will my avro schema for source and target will be same. yes. if you do any transformations in between, then DeltaStreamer can make the target schema automatically. In the upcoming 0.5.2 release, we have also have org.apache.hudi.u

Re: [NOTIFICATION] Hudi 0.5.2 Release Daily Report-20200318

2020-03-18 Thread Vinoth Chandar
Thanks for the update, vino! here's the -1 vote feedback for everyone's context.. As you bundled several ASF projects that have NOTICE files, their NOTICE > files need to be examined and parts added to your NOTICE file. [1] > License is missing information fo this file copyright Twitter [3] > Per

Weekly Sync 20200317

2020-03-17 Thread Vinoth Chandar
Hi folks, Understandably low turnout. So we cancelled today's meeting Thanks VInoth

Re: Small Files

2020-03-16 Thread Vinoth Chandar
udi/20191201/11/a98de531-2581-4d91-b3ba-189d758a06f9-0_12-95-1403_20200316070822.parquet > > > > -rw-r--r-- 3 svchdc110p Hadoop_cdp 14.9 M 2020-03-16 07:12 > > > /projects/transaction_details_hourly_hudi/20191201/11/9d875b04-1536-4b5d-bdd5-4d301019ca67-0_17-95-1408_2020031

<    1   2   3   4   5   6   7   8   9   10   >