Re: GitHub release display issue

2020-07-20 Thread Vinoth Chandar
https://github.com/apache/hudi/releases/tag/0.5.3 I did not seem to have perms to disable this. Probably need infra ticket for this. For now, made the info accurate. On Fri, Jul 17, 2020 at 10:10 PM Vinoth Chandar wrote: > Anyone else with strong views on this? If not, i will go ahead and d

Re: Handling delta

2020-07-19 Thread Vinoth Chandar
Thanks everyone for helping out prakash! On Thu, Jul 16, 2020 at 10:24 AM Sivaprakash wrote: > Great !! > > Got it working !! > > 'hoodie.datasource.write.recordkey.field': 'COL1,COL2', > 'hoodie.datasource.write.keygenerator.class': > 'org.apache.hudi.keygen.ComplexKeyGenerator', > > Thank you.

Re: Expose HUDI CLI as a Service

2020-07-19 Thread Vinoth Chandar
You should have access now. Please let me know if it doesn't work On Sun, Jul 19, 2020 at 12:40 PM tanu dua wrote: > It’s tanudua > > Thanks. > > On Sun, 19 Jul 2020 at 9:10 PM, Vinoth Chandar wrote: > > > Absolutely. Please share your cwiki id. > > > &

Re: Incremental Query missing Deletions

2020-07-19 Thread Vinoth Chandar
ts under that ticket, @Vinoth Chandar > has given a low-level solution. > > IMO, this is a good feature that should be supported. > > Best, > Vino > > [1]: https://issues.apache.org/jira/browse/HUDI-480 > > Adam Feldman 于2020年7月17日周五 上午2:58写道: > >> Hi, >>

Re: Expose HUDI CLI as a Service

2020-07-19 Thread Vinoth Chandar
Absolutely. Please share your cwiki id. On Sat, Jul 18, 2020 at 11:21 PM tanu dua wrote: > Can I please have an access of Confluence to post RFC > > On Sun, Jul 19, 2020 at 6:05 AM Vinoth Chandar wrote: > > > Great. Please feel free to post more followup thoughts here or on

Re: DISCUSS code, config, design walk through sessions

2020-07-18 Thread Vinoth Chandar
Let's freeze July 30 8 AM PST! Will send further details in a separate email thread! Look forward to this! On Thu, Jul 16, 2020 at 3:32 AM Zijing Guo wrote: > +1 for the time. > > > Sent from Yahoo Mail for iPhone > > > On Wednesday, July 15, 2020, 11:42 PM, Vinoth C

Re: Illustration of how Hudi's file sizing/temporal layout help query performance

2020-07-18 Thread Vinoth Chandar
Btw, just to show that these principles are generally true for parquet. I used the vanilla spark.read.parquet() for illustration. On Sat, Jul 18, 2020 at 8:07 PM Vinoth Chandar wrote: > Hi all, > > You might have heard this repeatedly mentioned over tickets, when we talk > about

Illustration of how Hudi's file sizing/temporal layout help query performance

2020-07-18 Thread Vinoth Chandar
Hi all, You might have heard this repeatedly mentioned over tickets, when we talk about Hudi paying some "tax" during write time to ensure query performance is good. These are conscious decisions we made, designing Uber's data lake for scale. and sometimes these are not appreciated when trying to

Re: Expose HUDI CLI as a Service

2020-07-18 Thread Vinoth Chandar
Great. Please feel free to post more followup thoughts here or on an RFC, as you prefer. On Thu, Jul 16, 2020 at 9:46 PM tanu dua wrote: > Thanks Vinoth. I understand now. I would also look timeline server to > understand more how it works. > > On Fri, Jul 17, 2020 at 9:33 AM Vi

Re: GitHub release display issue

2020-07-17 Thread Vinoth Chandar
Anyone else with strong views on this? If not, i will go ahead and drop the older release. On Fri, Jul 17, 2020 at 9:59 AM Shiyan Xu wrote: > +1 to remove github releases > > On Fri, Jul 17, 2020 at 6:44 AM Vinoth Chandar wrote: > > > Thanks for flagging this, Raymond! >

Re: GitHub release display issue

2020-07-17 Thread Vinoth Chandar
Thanks for flagging this, Raymond! I think we can just remove the github releases or mark it as old. All apache releases are hosted on asf infrastructure. Anyone? On Mon, Jul 13, 2020 at 11:42 PM Shiyan Xu wrote: > The new GitHub UI displays 0.4.7 as the latest, which is misleading. > Guess ad

Re: Expose HUDI CLI as a Service

2020-07-16 Thread Vinoth Chandar
help there. > I noticed that we start a javalin server when we start Spark program but > honestly I don’t know where do we use it . Do we use it in hudi spark code > ? Is it a good idea to access rest services from spark code ? > > On Thu, 16 Jul 2020 at 9:11 AM, Vinoth Chandar

Multi engine support PR review

2020-07-16 Thread Vinoth Chandar
Hello all, We have the multi-engine support PR up here https://github.com/apache/hudi/pull/1827/files While I plan to spend good amount of time reviewing the same, I also encourage everyone to take a look and s

Re: DISCUSS code, config, design walk through sessions

2020-07-15 Thread Vinoth Chandar
Great! Moving on to date. Would July 23/30 Thursday 8 AM PST work for everyone? On Tue, Jul 14, 2020 at 12:17 PM Shiyan Xu wrote: > +1 > > On Tue, Jul 14, 2020, 11:34 AM Vinoth Chandar wrote: > > > Typo: date TBD (not data :)) > > > > On Tue, Jul 14, 2020 at

Re: Expose HUDI CLI as a Service

2020-07-15 Thread Vinoth Chandar
; On Mon, Jul 6, 2020 at 6:31 AM tanu dua wrote: > > > > > Sure me and my team can think of in contributing here. May I know if > > > something has already kicked off and the technologies that are used to > > > build the services and UI ? > > > > > > On

20200714 Weekly Sync Minutes

2020-07-14 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200714+Weekly+Sync+Minutes

Re: DISCUSS code, config, design walk through sessions

2020-07-14 Thread Vinoth Chandar
Typo: date TBD (not data :)) On Tue, Jul 14, 2020 at 11:20 AM Adam Feldman wrote: > +1 > > On Tue, Jul 14, 2020, 14:09 Gary Li wrote: > > > +1. 8am works for me. > > > > On Tue, Jul 14, 2020 at 11:01 AM Vinoth Chandar > wrote: > > > > > Hello a

Re: DISCUSS code, config, design walk through sessions

2020-07-14 Thread Vinoth Chandar
gt; > Cheers > > > > On Mon, 13 Jul. 2020, 11:55 am Vinoth Chandar, > wrote: > > > > > Hi all, > > > > > > NO. time/date is not finalized yet until we resolve the time zone > issues. > > > let's > > > spend some time confi

[DISCUSS] Organizing ourselves for scale

2020-07-12 Thread Vinoth Chandar
Hi all, We have grown quite a bit as a community this year. I found myself personally, spread thin amongst too many roles and often prioritizing for the short term over long term. Given we all have limited time/resources, I think it may be pragmatic to assume not all of us have time for all the ro

Re: DISCUSS code, config, design walk through sessions

2020-07-12 Thread Vinoth Chandar
would be great, to fill the > timezone > > gap. > > > > On Friday, July 10, 2020, Pratyaksh Sharma > wrote: > > > @Vinoth Chandar Time zones are indeed tricky. > Maybe > > we > > > can do a poll again to decide on the time for these sessions given the

Re: Question on hoodie.deltastreamer.schemaprovider.[source|target].schema.file with Parquet sources

2020-07-12 Thread Vinoth Chandar
Hi, cc-ing users@ , where these questions can be directed to in the future. > I do have a sql transform in the mix but both input and output schemas are ignored. Is this expected? if we have a Dataset then yes,we just implicitly use that. do you have an use case that we are not supporting today

Re: Keeping Hive in Sync

2020-07-11 Thread Vinoth Chandar
bal...@apache.org wrote: > I don't remember the root cause completely Vinoth. I guess it was due to > some protocol mismatch. > Balaji.V On Tuesday, July 7, 2020, 10:25:48 PM PDT, Vinoth Chandar < > vin...@apache.org> wrote: > > Hi, > > Yes. It can be an

Re: Hudi - Concurrent Writes

2020-07-09 Thread Vinoth Chandar
Also failure/corrupt > > data of one partition delta affects others if we have single write. So we > > wanted these writes to be independent per partition. > > > > Also any timeline when 0.6.0 will be released? > > > > Thanks, > > Shayan > > > > >

Re: DISCUSS code, config, design walk through sessions

2020-07-08 Thread Vinoth Chandar
Apologies. Should have been more detailed. It’s Tuesday. Please see here for details https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Community+Weekly+Sync On Wed, Jul 8, 2020 at 8:55 PM Adam Feldman wrote: > Hi, what day will this be? > > On Tue, Jul 7, 2020, 17:25 Vinot

Re: Hudi - Concurrent Writes

2020-07-08 Thread Vinoth Chandar
We are looking into adding support for parallel writers in 0.6.0. So that should help. I am curious to understand though why you prefer to have 1000 different writer jobs, as opposed to having just one writer. Typical use cases for parallel writing I have seen are related to backfills and such. +

Re: Keeping Hive in Sync

2020-07-07 Thread Vinoth Chandar
Hi, Yes. It can be an issue, probably good to get the table written using hive style partitioning. I will check on this more and get back to you Balaji, do you know top of your head? Thanks Vinoth On Sat, Jul 4, 2020 at 11:22 PM selvaraj periyasamy < selvaraj.periyasamy1...@gmail.com> wrote:

Re: Expose HUDI CLI as a Service

2020-07-07 Thread Vinoth Chandar
nd the technologies that are used to > build the services and UI ? > > On Mon, 6 Jul 2020 at 5:26 PM, Vinoth Chandar wrote: > > > Hi Tanuj, > > > > Good idea to have a service/UI.. There is an inactive proposal around > > this, if you want to revive and drive it forward.

Re: DISCUSS code, config, design walk through sessions

2020-07-07 Thread Vinoth Chandar
e: > > > > > > > > > > > > > This is a great idea and really helpful one. > > > > > > > > > > > > > > On Mon, Jul 6, 2020 at 1:09 PM wrote: > > > > > > > > > > > > > > > +1 > > > >

Re: Expose HUDI CLI as a Service

2020-07-06 Thread Vinoth Chandar
Hi Tanuj, Good idea to have a service/UI.. There is an inactive proposal around this, if you want to revive and drive it forward. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130027233 Thanks Vinoth On Sun, Jul 5, 2020 at 11:07 PM Tanuj wrote: > Hi all, > HUDI CLI is a gre

DISCUSS code, config, design walk through sessions

2020-07-05 Thread Vinoth Chandar
Hi all, As we scale the community, its important that more of us are able to help users, users becoming contributors. In the past, we have drafted faqs, trouble shooting guides. But I feel sometimes, more hands on walk through sessions over video could help. I am happy to spend 2 hours each on c

Re: Prometheus Support

2020-06-30 Thread Vinoth Chandar
Hi Tanu, It's under work.. probably will make its way into 0.6.0 https://github.com/apache/hudi/pull/1726 Thanks Vinoth On Tue, Jun 30, 2020 at 7:10 AM Tanuj wrote: > Hi, > Do we have any plan to support Prometheus ? I can see Graphite and Datadog. > Thanks. >

Re: [DISCUSS] Make delete marker configurable?

2020-06-29 Thread Vinoth Chandar
+1 as well. (sorry , for jumping in late) On Sun, Jun 28, 2020 at 11:36 AM Shiyan Xu wrote: > Thanks for the +1. Filed https://issues.apache.org/jira/browse/HUDI-1058 > > On Sat, Jun 27, 2020 at 11:34 PM Pratyaksh Sharma > wrote: > > > The suggestion looks good to me as well. > > > > On Sun, Ju

Re: Is it possible to run to run compaction asynchronously while upserting via Spark DataSource writer

2020-06-25 Thread Vinoth Chandar
Hi Anton, https://github.com/apache/hudi/pull/1752 brings the self managed compaction to Spark Streaming as well. Would you be interested in testing this out? This is a highly requested feature, that we are trying to get into the next release Thanks Vinoth On Wed, Jun 24, 2020 at 3:56 PM Zuyeu,

Re: [DISCUSS] Introduce a write committed callback hook

2020-06-23 Thread Vinoth Chandar
This is a great discussion! thanks! On Mon, Jun 22, 2020 at 6:33 PM vino yang wrote: > Hi everyone, > > Thanks for sharing your thoughts. > > We have created a Jira issue to track this work.[1] > > Best, > Vino > > [1]: https://issues.apache.org/jira/browse/HUDI-10

Re: [DISCUSS] Introduce a write committed callback hook

2020-06-22 Thread Vinoth Chandar
Great, looks like a JIRA is in order? :), given we all agree enthusiastically On Sun, Jun 21, 2020 at 8:10 PM Gary Li wrote: > +1. > That would be great to have a communication mechanism between downstream > CDC applications chain. > e.g. A->B->C->D. Right now I am using the commit timestamp to

Re: Apache Hudi contributor permission request

2020-06-22 Thread Vinoth Chandar
Hi, Done.. You may also want to subscribe to the mailing list, so your messages are delivered immediately without moderator approval. Welcome aboard! On Mon, Jun 22, 2020 at 7:57 AM 3236164...@qq.com <3236164...@qq.com> wrote: > Hi, > > I want to contribute to Apache Hudi. Would you please give

Re: [DISCUSS] Publishing benchmarks for releases

2020-06-21 Thread Vinoth Chandar
Lucene has nightly runs even https://home.apache.org/~mikemccand/lucenebench/ We can do something like this? In any case, raising a Jira under performance component seems like a good idea? On Sun, Jun 21, 2020 at 6:41 PM vino yang wrote: > +1 as well, > > it would be helpful to measure the perf

Re: [DISCUSS] Introduce a write committed callback hook

2020-06-21 Thread Vinoth Chandar
havani Sudha > wrote: > > > +1 . I think this is a valid use case and would be useful in general. > > > > On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar > wrote: > > > > > +1 as well > > > > > > > We expect to introduce a proacti

Re: [DISCUSS] Regarding nightly builds

2020-06-21 Thread Vinoth Chandar
Hi Sudha, Thanks for getting this kicked off.. +1 on a new nightly build process.. This will help us more easily make the bleeding edge testable.. My initial thoughts here are - Figure out a way to get Azure Pipelines enabled for Hudi - Setup the nightly there (this will also help us transition

Re: [DISCUSS] Introduce a write committed callback hook

2020-06-21 Thread Vinoth Chandar
+1 as well > We expect to introduce a proactive notification(event callback) mechanism. For example, a hook can be introduced after a successful commit. This would be very useful. We could write to a variety of event bus-es and notify new data arrival. On Sat, Jun 20, 2020 at 2:51 AM wangxianghu

Re: [ANNOUNCE] Apache Hudi 0.5.3 released

2020-06-18 Thread Vinoth Chandar
Thanks for all the great work! Onto 0.6.0 now! On Thu, Jun 18, 2020 at 4:06 AM leesf wrote: > Great, thanks siva and sudha! > > vino yang 于2020年6月18日周四 下午2:16写道: > > > Great job! > > > > Thanks for your hard work, Siva and Sudha! > > > > Best, > > Vino > > > > nishith agarwal 于2020年6月18日周四 上午1

Re: [RESULT] [VOTE] Release 0.5.3, release candidate #2

2020-06-14 Thread Vinoth Chandar
Thanks for driving this release, siva! On Sat, Jun 13, 2020 at 4:54 PM Sivabalan wrote: > I'm happy to announce that we have unanimously approved this release. > > There are 13 approving votes, 7 of which are binding: > > * Sudha (Binding) > * Vinoth (Binding) > * Balaji (Binding) > * Nishith (B

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-11 Thread Vinoth Chandar
> > On Wed, Jun 3, 2020 at 9:31 PM Vinoth Chandar wrote: > > > Hi Raymond, > > > > I am not sure generalizing this to all metadata like - errors and > metrics - > > would be a good idea. We can certainly implement logging errors to a > common > > er

Fwd: Just released: "Trillions and Trillions Served" documentary on the ASF

2020-06-11 Thread Vinoth Chandar
Folks, we all love the ASF! Please help spread the word! -- Forwarded message - From: Sally Khudairi Date: Wed, Jun 10, 2020 at 3:55 PM Subject: Just released: "Trillions and Trillions Served" documentary on the ASF To: Hello ASF-ers! I hope you are all well. Per the subject

Re: S3 Performance Issue in finalizing the writes

2020-06-11 Thread Vinoth Chandar
Hi, Hudi is used extensively on top of s3. Do you want to give this a quick shot using the 0.5.3-RC2 we just put out? What you are describing sounds close to an issue that was fixed there.. Based on the results, we can proceed from thre.. thanks vinoth On Tue, Jun 9, 2020 at 12:38 AM Tanuj wrot

Re: 0.5.3 release status update

2020-06-11 Thread Vinoth Chandar
Hello everyone, RC2 is out.. Kindly verify and cast your votes! Thanks Vinoth On Tue, Jun 9, 2020 at 4:29 PM Sivabalan wrote: > Hey folks, > Wanted to start a thread on the status update for 0.5.3 release. > > After first candidate was sent out for voting, we had to pull in 2 more > commit

Re: Re: [VOTE] Release 0.5.3, release candidate #2

2020-06-11 Thread Vinoth Chandar
+1 (binding) Ran tests locally.. Reviewed the changes verbatim against 0.5.2 07:43:48 [hudi-0.5.3]$ RC_NUM=rc2 08:05:17 [hudi-0.5.3]$ RC_VERSION=0.5.3 08:05:36 [hudi-0.5.3]$ # Checksums and Signatures OK 08:05:42 [hudi-0.5.3]$ shasum -a 512 hudi-${RC_VERSION}-${RC_NUM}.src.tgz > sha512 08:06:00

20200609 Weekly Sync Minutes

2020-06-09 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200609+Weekly+Sync+Minutes Thanks Vinoth

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-08 Thread Vinoth Chandar
ing clear... If that is a work in progress would you > have a jira I could follow up and contribute to ? If not , what is the > module name you suggest me looking at? > > Regards, > > Mario. > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, wrote: > > > Sorry did not

Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-06-08 Thread Vinoth Chandar
1706 > https://issues.apache.org/jira/browse/HUDI-998 > > Thanks, > Lamber-Ken > > On 2020/06/01 15:14:32, Vinoth Chandar wrote: > > Great! I left some comment on the PR. around licensing and maintenance > > overhead. > > > > On Sun, May 31, 2020 at 11:51 P

Re: Apply for Confluence

2020-06-08 Thread Vinoth Chandar
Done. Welcome aboard! On Mon, Jun 8, 2020 at 1:00 AM 李 天烨 wrote: > Hi, > > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My Confluence ID is litianye , email is litiany...@outlook.com. >

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-04 Thread Vinoth Chandar
t; please. > > Thanks > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, wrote: > > > Hi Mario, > > > > We actually started with the idea of making the timeline server, a long > > running service. We have a module if you notice that builds our a bundle > &g

TLP Announcement

2020-06-04 Thread Vinoth Chandar
Hello all, The ASF press release announcing Apache Hudi as TLP is live! Thanks for all your contributions! We could not have been achieved that without such a great community effort! Please help spread the word! - GlobeNewswire http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/Th

Re: [DISCUSS] Write failed records

2020-06-03 Thread Vinoth Chandar
andle+failed+records > > On Sun, May 24, 2020 at 12:06 AM Vinoth Chandar wrote: > > > Hi Raymond, > > > > Thanks for starting this discussion. > > > > Agree on 1.. (we may also need some CLI support for inspecting bad/record > > and also code sam

Re: Suggestion needed - Hudi performance wrt no. and depth of partitions

2020-06-03 Thread Vinoth Chandar
t; >> > very difficult to reduce the 1st partition as that is the basic > primary > >> key > >> > of our domain model on which analysts and developers need to query > >> almost > >> > 90% of time and its an integer primary key and can’t be d

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-03 Thread Vinoth Chandar
ding both operational (ie Hudi) and > business metadata. > > would you guys have any opinion on that ? would that be easy as I do not > seem to see a way yet , except reading about RocksDB but that is still not > quite clear. > > best regards, > > Mario. > > Em seg.,

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-03 Thread Vinoth Chandar
hese are really rough > high-level thoughts, and may have sign of over-engineering. Would like to > hear some feedbacks. Thanks. > > > > > On Mon, Jun 1, 2020 at 9:28 PM Satish Kotha > wrote: > > > Got it. I'll look into implementation choices for creating a new data

20200602 Weekly Sync Minutes

2020-06-02 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200602+Weekly+Sync+Minutes Thanks Vinoth

Re: Suggestion needed - Hudi performance wrt no. and depth of partitions

2020-06-02 Thread Vinoth Chandar
Hi tanu, For good query performance, its recommended to write optimally sized files. Hudi already ensures that. Generally speaking, if you have too many partitions, then it also means too many files. Mostly people limit to 1000s of partitions in their datasets, since queries typically crunch data

CI/Master tests failing

2020-06-02 Thread Vinoth Chandar
Hi all, This is PSA.. We are observing some flakiness with master and the last three PR merges have failed. balaji is looking at the fix/issue.. But in the meantime, I'd ask committers to temporarily not merge more PRs until this is resolved. It will help us fix this early. Error looks something

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Vinoth Chandar
metadata access? > > Are you looking for similar functionality as HoodieDatasourceHelpers? > > > This class seems like a list of static methods, I'm not seeing where these > are accessed from. But, I need a way to query metadata details easily > in pyspark. > > > On Mon,

Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-06-01 Thread Vinoth Chandar
Great! I left some comment on the PR. around licensing and maintenance overhead. On Sun, May 31, 2020 at 11:51 PM Lamber Ken wrote: > Hi forks, > > Learned from travis and github actions api docs these days, I used my > project as a demo[1], > the demo pull request will always fail, please use

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-01 Thread Vinoth Chandar
sing a separate datasource relation (option 1) to > query timeline. It is elegant and fits well with spark APIs. > Thanks.Balaji.VOn Saturday, May 30, 2020, 01:18:45 PM PDT, Vinoth > Chandar wrote: > > Hi satish, > > Are you looking for similar functionality as HoodieDat

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-01 Thread Vinoth Chandar
Hi Mario, Thanks for the detailed explanation. Hudi already allows extra metadata to be written atomically with each commit i.e write operation. In fact, that is how we track checkpoints for our delta streamer tool.. It may not solve the need for querying the data together with this information. b

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-05-30 Thread Vinoth Chandar
Hi satish, Are you looking for similar functionality as HoodieDatasourceHelpers? We have historically relied on cli to inspect the table, which does not lend it self well to programmatic access.. overall in like option 1 - allowing the timeline to be queryable with a standard schema does seem way

Re: [Discussion]Hudi support more complete concurrency control when write data

2020-05-29 Thread Vinoth Chandar
Hi Wei Li, Left some detailed comments on the JIRA https://issues.apache.org/jira/browse/HUDI-944?focusedCommentId=17119668&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17119668 This is a great discussion.. thanks for getting this kicked off Thanks Vinoth On M

Re: delete the test module hudi-integ-test

2020-05-28 Thread Vinoth Chandar
Hi Cooper, hudi-integ-tests are very essential and run on every single PR... It works fine for me locally on mac, as long as docker is installed and working properly. Can you please file a JIRA with what issues you are facing and how to reproduce. We can start looking into it then Thanks Vinoth

Re: Progress on 0.5.3 release

2020-05-27 Thread Vinoth Chandar
s are lower :) On Wed, May 27, 2020 at 9:07 AM leesf wrote: > I am glad to provide some help if needed. > > Vinoth Chandar 于2020年5月26日周二 下午1:30写道: > > > Hi Siva, > > > > Thanks for the update.. On the release guide, since this release is going > > to happe

Re: hudi dependency conflicts for test

2020-05-27 Thread Vinoth Chandar
Thanks Lian! Will work it in! On Tue, May 26, 2020 at 9:02 AM Lian Jiang wrote: > I added a comment in this wiki. Hope this works. Thanks. > > On Sun, May 24, 2020 at 2:32 AM Vinoth Chandar wrote: > > > Great team work everyone! > > > > Anything wor

Re: [DISSCUSS] Trigger a Travis-CI rebuild without pushing a commit

2020-05-27 Thread Vinoth Chandar
git commit --amend/git push --force has worked well for me. I am happy with it, personally - no extra commits, just changes the commit sha.. I don't have strong opinions on this one. How easy is it to add "retest this please" or some automation like that.. On Wed, May 27, 2020 at 2:49 PM Lamber K

20200526 Sync Meeting

2020-05-26 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200526+Weekly+Sync+Minutes Thanks Vinoth

Re: Progress on 0.5.3 release

2020-05-25 Thread Vinoth Chandar
Hi Siva, Thanks for the update.. On the release guide, since this release is going to happen as a TLP, we need to get the process re-calibrated again.. Ideally, someone who has done the release before, can help Siva? Anyone wants to volunteer? Thanks VInoth On Mon, May 25, 2020 at 5:48 AM Siva

Re: [Discussion] hudi support log append scenario with better write and asynchronous compaction

2020-05-24 Thread Vinoth Chandar
oups are worth about 450MB, requiring two file groups instead of 1. (this introduces few limitations as we will see) On Tue, May 19, 2020 at 6:21 PM leesf wrote: > +1 from me, also I updated the RFC-19, please take another look when you > get a chance. > > Vinoth Chandar 于2

Re: hudi dependency conflicts for test

2020-05-24 Thread Vinoth Chandar
> >> > at > >> > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213) > >> > at > >> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) > >> > at > >> com.zillow.dataf

Re: [DISCUSS] Write failed records

2020-05-24 Thread Vinoth Chandar
Hi Raymond, Thanks for starting this discussion. Agree on 1.. (we may also need some CLI support for inspecting bad/record and also code samples to consume them etc?) On 2, these place seem appropriate. We can figure it out, in more detail when we get to implementation? On 3. +1 on logs.. We sh

Re: hudi roadmap and user feedback

2020-05-23 Thread Vinoth Chandar
https://hudi.apache.org/docs/powered_by.html has a lot of use-cases, talks, decks on the project, that you might find useful.. On the roadmap, we do have a public facing one here (which would still be valid in large parts) https://cwiki.apache.org/confluence/display/HUDI#ApacheHudi-Roadmap It's d

Re: COW Commit Files Limitations

2020-05-23 Thread Vinoth Chandar
Hi, Right now, archive folder is just that.. an archive.. It's there so that you can use the CLI and trace audit history if needed.. Hudi operations all use the active timeline only (i.e the files you see on .hoodie directly) and an archival process keeps that bounded so we can keep scaling with d

Re: Merge on Read table is recreating affected parquet file on every write

2020-05-23 Thread Vinoth Chandar
afa886/hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L100 > > > https://github.com/apache/incubator-hudi/commit/605af8a82f2cb0c5ea92ba4a12d0684571a17599 > > On Fri, May 22, 2020 at 11:07 AM Vinoth Chandar wrote: > > > Hi, > > >

Re: Merge on Read table is recreating affected parquet file on every write

2020-05-22 Thread Vinoth Chandar
Hi, Sorry, this slipped through the cracks. By default, the compaction policy would run every 10 delta commits or so. https://hudi.apache.org/docs/configurations.html#withMaxNumDeltaCommitsBeforeCompaction >>but in addition to new log file, i also see that corresponding parquet file is also rew

Re: Apache Hudi Graduation vote on general@incubator

2020-05-22 Thread Vinoth Chandar
or apache hudi project. > > > > > > > > > > Best, > > Lamber-Ken > > > > At 2020-05-19 13:35:11, "Vinoth Chandar" wrote: > > >Folks, > > > > > >the vote has passed! > > > > > > https://lists.apache.

Re: hudi dependency conflicts for test

2020-05-21 Thread Vinoth Chandar
> > Caused by: java.lang.ClassNotFoundException: > > org.apache.spark.sql.avro.SchemaConverters$ > > at > > > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583) > > at > > > java.base/jdk.internal.loader.Class

Re: hudi dependency conflicts for test

2020-05-21 Thread Vinoth Chandar
...@gmail.com> wrote: > > > > Thanks Vinoth. > > > > Below dependency has no conflict: > > > > compile group: 'org.apache.spark', name: 'spark-core_2.11', version: > > '2.3.0' > > compile group: 'org.apache.spark&#

Adding Usages to powered by page

2020-05-21 Thread Vinoth Chandar
Hello all, If you are using Apache Hudi and interested in featuring on the powered_by page, please let us know by leaving a comment here https://github.com/apache/incubator-hudi/issues/661 As you know, we put in a lot of effort into the community.. Having a well maintained list, is a great way t

Re: Rollback to previous version for COW

2020-05-21 Thread Vinoth Chandar
Great! On Thu, May 21, 2020 at 6:15 AM tanu dua wrote: > Thanks it worked. > > On Thu, 21 May 2020 at 2:08 AM, Vinoth Chandar wrote: > > > Hi Tanu, > > > > You should be able to use the CLI > > https://hudi.apache.org/docs/deployment.html#cli > >

Re: Rollback to previous version for COW

2020-05-20 Thread Vinoth Chandar
Hi Tanu, You should be able to use the CLI https://hudi.apache.org/docs/deployment.html#cli and perform the rollback/restore. Have you given this a shot? On Wed, May 20, 2020 at 6:43 AM tanujdua wrote: > I have provided wrong schema while appending data and that has corrupted > my Hudi table. >

Re: hudi dependency conflicts for test

2020-05-20 Thread Vinoth Chandar
Hi Leon, Sorry for the late reply. Seems like a version mismatch for mockito.. I see you are already trying to exclude it though.. Could you share the full stack trace? On Mon, May 18, 2020 at 1:12 PM Lian Jiang wrote: > Hi, > > I am using hudi in a scala gradle project: > > dependencies {

20200519 Weekly Sync

2020-05-19 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200519+Weekly+Sync+Minutes

Re: Bug Bash 0.6.0

2020-05-19 Thread Vinoth Chandar
Hi siva, Can we do a quick update on progress so far? So everyone gets a sense of how this is going :) On Fri, May 15, 2020 at 5:19 PM Sivabalan wrote: > Sure Ethan. Will keep you posted if something comes up. > > > On Fri, May 15, 2020 at 1:09 PM Y Ethan Guo > wrote: > > > Thanks for puttin

Re: [Discussion] hudi support log append scenario with better write and asynchronous compaction

2020-05-19 Thread Vinoth Chandar
es. 2 > is to support upsert on demand. This seems to be a different table type > (neither COW nor MOR. Sounds like Merge-on-demand?) > > > > On Sun, May 17, 2020 at 10:10 AM wei li wrote: > > > Thanks, Vinoth Chandar > > Just like https://issues.apache.org/jira/pro

Re: Apache Hudi Graduation vote on general@incubator

2020-05-18 Thread Vinoth Chandar
t; > On Fri, May 15, 2020 at 7:06 PM Vinoth Chandar wrote: > > Hello all, > > Just started the VOTE on the IPMC general list [1] > > If you are an IPMC member, you do a *binding *vote > If you are not, you can still do a *non-binding* vote > > Please take a

Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-18 Thread Vinoth Chandar
; > > > > On Thu, May 7, 2020 at 4:16 PM Minjeong Noh > > > wrote: > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/commit/dbc9acd23a4eb208c7cd458bb3adaf54731d4145 > > > > < > >

Apache Hudi Graduation vote on general@incubator

2020-05-15 Thread Vinoth Chandar
Hello all, Just started the VOTE on the IPMC general list [1] If you are an IPMC member, you do a *binding *vote If you are not, you can still do a *non-binding* vote Please take a moment to vote. [1] https://lists.apache.org/thread.html/r8039c8eece636df8c81a24c26965f5c1556a3c6404de02912d6455b4

Re: preCombine API enhancement for Mongo Oplog integration

2020-05-14 Thread Vinoth Chandar
Hi Yixue, Thanks for starting this thread! I have actually been thinking if we should just deprecate preCombine() and simply use combineAndGetUpdateValue() there as well. But, it boiled down to implementation efficiency.. Having the entire payload during preCombine() helps us keep the actual data

Re: PR backlog

2020-05-14 Thread Vinoth Chandar
Vinoth Chandar wrote: > Hello all, > > As we approach 0.5.3 code freeze, there is a lot of PR backlog. If you are > assigned PRs as reviewer, please take a look at them and drive them towards > completion. > > Thanks > Vinoth >

PR backlog

2020-05-14 Thread Vinoth Chandar
Hello all, As we approach 0.5.3 code freeze, there is a lot of PR backlog. If you are assigned PRs as reviewer, please take a look at them and drive them towards completion. Thanks Vinoth

Re: [Discussion] hudi support log append scenario with better write and asynchronous compaction

2020-05-14 Thread Vinoth Chandar
Hi Wei, Thanks for starting this thread. I am trying to understand your concern - which seems to be that for inserts, we write parquet files instead of logging? FWIW Hudi already supports asynchronous compaction... and a record reader flag that can avoid merging for cases where there are only ins

Re: [DISCUSS] Logos on project front page.

2020-05-13 Thread Vinoth Chandar
r help. > > > On Wed, May 13, 2020 at 10:59 AM Vinoth Chandar wrote: > > > https://github.com/apache/incubator-hudi/pull/1628 Fixed most of the > > issues > > raised (what I mentioned in the previous email) > > > > We need to upload the logo, which I assume Siva

Re: [DISCUSS] Logos on project front page.

2020-05-13 Thread Vinoth Chandar
https://github.com/apache/incubator-hudi/pull/1628 Fixed most of the issues raised (what I mentioned in the previous email) We need to upload the logo, which I assume Siva you are taking care of ? On Wed, May 13, 2020 at 7:45 AM Vinoth Chandar wrote: > https://whimsy.apache.org/pods/proj

Re: [DISCUSS] Logos on project front page.

2020-05-13 Thread Vinoth Chandar
> Thanks, > > > Sudha > > > > > > On Tue, May 12, 2020 at 11:56 PM vino yang > > wrote: > > > > > > > +1 to follow the best practices. > > > > > > > > vbal...@apache.org 于2020年5月13日周三 上午10:31写道: >

Re: Checking out the asf svn repo

2020-05-13 Thread Vinoth Chandar
v/project-logos/originals/>. May I > know whats the credential do I need to pass to commit using svn? > > > On Tue, Apr 28, 2020 at 1:17 AM Vinoth Chandar wrote: > > > Done.. Thanks lamber-ken for being watchful, as always! > > > > On Thu, Apr 23, 2020 at 9:52

<    1   2   3   4   5   6   7   8   9   10   >