20201103 Bi Weekly Sync Minutes

2020-11-03 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20201103+Bi+Weekly+Sync+Minutes Next meeting date : Nov 17, 2020 8PM PST

Re: Reg weekly sync meeting

2020-11-03 Thread Vinoth Chandar
be better. > > > On Mon, Nov 2, 2020 at 7:56 PM Vinoth Chandar wrote: > > > Looks like a lot of us are in favor. > > Anyone objecting? > > > > On Mon, Nov 2, 2020 at 9:11 AM Mani Jindal wrote: > > > > > +1 > > > > > > On Mon

Re: Reg weekly sync meeting

2020-11-02 Thread Vinoth Chandar
; +1 > > > > > > On Mon, Nov 2, 2020 at 11:57 AM Vinoth Chandar > > wrote: > > > > > > > +1 > > > > > > > > On Mon, Nov 2, 2020 at 8:44 AM Balaji Varadarajan > > > > wrote: > > > > > > > >

Re: Reg weekly sync meeting

2020-11-02 Thread Vinoth Chandar
+1 On Mon, Nov 2, 2020 at 8:44 AM Balaji Varadarajan wrote: > +1 > On Sunday, November 1, 2020, 09:13:44 PM PST, Gary Li < > garyli1...@outlook.com> wrote: > > +1 for biweekly meeting. > Gary LiFrom: Vinoth Chandar > Sent: Friday, October 30, 2020 2:01:22 P

Re: Reg weekly sync meeting

2020-10-29 Thread Vinoth Chandar
+ users list as well. On Thu, Oct 29, 2020 at 10:59 PM Bhavani Sudha wrote: > Hello all, > I was wondering if it would make sense to move the weekly sync meeting to > bi-weekly to amortize time and be efficient, especially since people across > different time zones attend. We could still retain

20201027 Weekly Sync Minutes

2020-10-27 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20201027+Weekly+Sync+Minutes Thanks Vinoth

Re: HUDI Table Primary Key - UUID or Custom For Better Performance

2020-10-23 Thread Vinoth Chandar
On Wed, 21 Oct 2020 at 7:45 AM, Vinoth Chandar < > mail.vinoth.chan...@gmail.com> wrote: > > > For now, bloom filters are not actually leveraged in the read/query path > > but only by the writer performing the index lookup for upserting. Hudi is > > write optimized like a

20201020 Weekly Sync Minutes

2020-10-20 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20201020+Weekly+Sync+Minutes Thanks Vinoth

Re: HUDI Table Primary Key - UUID or Custom For Better Performance

2020-10-20 Thread Vinoth Chandar
For now, bloom filters are not actually leveraged in the read/query path but only by the writer performing the index lookup for upserting. Hudi is write optimized like an OLTP store and read optimized like OLAP, if that makes sense. As for bloom index performance, our tuning guide and FAQ talk abo

Adding recent talks to site

2020-10-08 Thread Vinoth Chandar
Hello all, Can you please add any recent talks to our powered by/talks page? I know we had an ApacheCon and may be one more talk? I opened https://github.com/apache/hudi/pull/2155 for the PrestoCon panel Thanks Vinoth

20201006 Weekly Sync Cancelled due to low attendance

2020-10-06 Thread Vinoth Chandar

20200929 Weekly Sync Minutes

2020-09-29 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200929+Weekly+Sync+Minutes Thanks Vinoth

Re: [DISCUSS] Planning for Releases 0.6.1 and 0.7.0

2020-09-24 Thread Vinoth Chandar
> > > > > You have mentioned Spark3.0 support in next release. We were actually > > > > > thinking of moving to Spark 3.0 but thought it’s too early with 0.6 > > > > > release. Is 0.6 not fully tested with Spark 3.0 ? > > >

Re: Enforcing Dataset Schema before pushing to HUDI

2020-09-24 Thread Vinoth Chandar
> > On Sat, 19 Sep 2020 at 11:35 PM, Vinoth Chandar wrote: > > > We could add support to validate the data frame against a schema string > > > > passed > > > > to the data source writer. I guess you want the dataframe to be also > > > > converted int

20200922 Weekly Sync Minutes

2020-09-22 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200922+Weekly+Sync+Minutes Thanks VInoth

[DISCUSS] Planning for Releases 0.6.1 and 0.7.0

2020-09-22 Thread Vinoth Chandar
Hello all, Pursuant to our conversation around release planning, I am happy to share the initial set of proposals for the next minor/major releases (minor release ofc can go out based on time) *Next Minor version 0.6.1 (with stuff that did not make it to 0.6.0..) * Flink/Writer common refactoring

Re: Enforcing Dataset Schema before pushing to HUDI

2020-09-19 Thread Vinoth Chandar
ach dataset as each > > dataset is unique in our case and hence the need to validate the schema for > > each dataset while writing. > > > > On Tue, 15 Sep 2020 at 2:53 AM, Vinoth Chandar wrote: > > > > > Hi, > > > > > > > > > > >

Re: [DISCUSS] Standardizing Java date time APIs

2020-09-19 Thread Vinoth Chandar
I don't have sound deep thoughts on this. Uniformity is good less dependent on other libraries is good. As long as we can implement the same functionalityand Taking care of concurrency etc, I am a +1 On Mon, Sep 14, 2020 at 9:26 AM Pratyaksh Sharma wrote: > Hi Raymond, > > > > I was actually loo

Re: [DISCUSS] Formalizing the release process

2020-09-19 Thread Vinoth Chandar
it feasible. Or it may be easier to > > hit by limiting the minor version to bug fix and docs update. > > > > On Tue, Sep 8, 2020 at 10:41 PM Pratyaksh Sharma > > wrote: > > > > > Missed this thread, the plan looks good to me as well. > > >

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-18 Thread Vinoth Chandar
Wiki page updated with new timing! https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Community+Weekly+Sync On Thu, Sep 17, 2020 at 3:41 PM Gary Li wrote: > I am okay with both :) > > Gary Li > ____ > From: Vinoth Chandar > Sent: Thur

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-17 Thread Vinoth Chandar
n kid care duties some days. We can go ahead with the change > if > > timing is fine with everyone. > > > > Thanks, > > Sudha > > > > On Tue, Sep 15, 2020 at 7:08 AM Vinoth Chandar > wrote: > > > > > Folks, > > > > >

20200915 Weekly Sync Minutes

2020-09-15 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200915+Weekly+Sync+Minutes Thanks Vinoth

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-15 Thread Vinoth Chandar
Folks, Please chime in with your opinions. I still can see some regulars (e.g Nishith, Sudha, Gary) who have not chimed in On Tue, Sep 15, 2020 at 12:22 AM Pratyaksh Sharma wrote: > Hi, > > Just wanted to confirm the time for this week's sync up. @Vinoth Chandar > > >

Re: [DISCUSS] Introduce incremental processing API in Hudi

2020-09-14 Thread Vinoth Chandar
landing with the ACID semantic. > Yes, through the scheduler engine like Apache Airflow, we can read these > data from the storage then process them. But the key difference, we can > avoid re-reading these data from the persistent storage again. > > [1] https://github.com/linkedin/

Re: Enforcing Dataset Schema before pushing to HUDI

2020-09-14 Thread Vinoth Chandar
nks Vinoth. Yes that’s always an option with me to validate myself. I > just wanted to confirm if Spark does it for me for all my datasets and I > wonder why they haven’t provided it for write but provided it for read. > > On Sat, 12 Sep 2020 at 9:02 PM, Vinoth Chandar

Re: Hadoop & hive 3.1 Support

2020-09-14 Thread Vinoth Chandar
Actually, we merged this PR which should make this possible. https://github.com/apache/hudi/pull/1638#issuecomment-629990534 udit/wenning, can you comment on where we are on this? On Thu, Sep 10, 2020 at 11:28 AM Pratyaksh Sharma wrote: > Hi Selvaraj, > > Currently Hudi works with Hadoop 2.7.

Re: Enforcing Dataset Schema before pushing to HUDI

2020-09-12 Thread Vinoth Chandar
Hi, IIUC, you want to be able to pass in a schema to write? AFAIK, Spark Datasource V1 atleast does not allow for passing in the schema. Hudi writing will just use the schema for the df you pass in. Just throwing it out there. can you write a step to drop all unnecessary columns before issuing th

Re: [DISCUSS] enable cross AZ consistency and quality checks of hudi datasets

2020-09-09 Thread Vinoth Chandar
Hi Sanjay, Overall the two proposals sound reasonable to me. Thanks for describing them so well. General comment, it seems like you are implementing multi AZ replication by matching commit times across AZs? I do want to name these properties to be consistent with other Hudi terminology. but we ca

Re: Hudi CLI AWS Glue & S3 Tables

2020-09-09 Thread Vinoth Chandar
it's possible that some of the commands are not erroring gracefully for missing parameters? hudi:tablename->savepoint create for eg, would need a commit time for creating the savepoint, if you are able to connect to the dataset, then it should all be working, On Wed, Sep 9, 2020 at 3:27 AM Prat

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-09-09 Thread Vinoth Chandar
aking progress. Just some questions pop into my > head. I think my questions are all solvable. Happy to discuss more in the > RFC if we move forward :) > > Best, > Gary > > Gary Li > > From: Vinoth Chandar > Sent: Wednesday, September 2

Re: [DISCUSS] Introduce incremental processing API in Hudi

2020-09-09 Thread Vinoth Chandar
n. > > So, in short, this proposal tries to bring something: > >- performance: better performance when processing after data ingestion; >- focus and fluent: inline ingestion and processing logic in some >scenarios; >- boundary: at a high level, introduce more abili

20200908 Weekly Sync Minutes

2020-09-08 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200908+Weekly+Sync+Minutes Please find this week's sync notes

Re: [DISCUSS] Formalizing the release process

2020-09-08 Thread Vinoth Chandar
can give it a try. > > On Tue, Sep 8, 2020 at 5:55 PM Mehrotra, Udit > wrote: > > > +1 on the process. > > > > On 9/8/20, 5:11 PM, "Vinoth Chandar" wrote: > > > > CAUTION: This email originated from outside of the organization. Do > > n

Re: [DISCUSS] Formalizing the release process

2020-09-08 Thread Vinoth Chandar
r 1, 2020, 04:56:55 PM PDT, Gary Li < > >> garyli1...@outlook.com> wrote: > >> > >> +1 > >> Gary LiFrom: Bhavani Sudha > >> Sent: Wednesday, September 2, 2020 3:11:06 AM > >> To: us...@hudi.apache.org > >> Cc: dev@hudi.apache.org >

Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-08 Thread Vinoth Chandar
Anyone else wants to chime in for a new time, that works for everyone? Personally, I can do this time. love to hear more inputs. On Wed, Sep 2, 2020 at 10:16 AM Pratyaksh Sharma wrote: > Hi everyone, > > Currently we are having weekly sync ups between 9 PM - 10 PM PST on > tuesdays. Since I h

Re: schema compatibility check and change column type

2020-09-06 Thread Vinoth Chandar
That does sound like a backwards compatible change. @prashant , any ideas here? (since you have the best context on the schema validation checks) On Thu, Sep 3, 2020 at 8:12 PM cadl wrote: > Hi All, > > I want to change the type of one column in my COW table, from int to long. > When I set “hood

Congrats to our newest committers!

2020-09-03 Thread Vinoth Chandar
Hi all, I am really excited to share the good news about our new committers on the project! *Udit Mehrotra *: Udit has travelled with the project since sept/oct last year and immensely helped us making Hudi work well with the AWS ecosystem. His most notable contributions are towards driving large

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-09-02 Thread Vinoth Chandar
wrote: > > Aah, yes. That’s right. > > On Sat, Aug 22, 2020 at 2:43 AM Vinoth Chandar > wrote: > > > All of the remaining meta fields compress very very nicely. They have > > > > almost no overhead. > > > > > &g

Coding guidelines

2020-09-01 Thread Vinoth Chandar
Hello all, Put together a list to formalize the things we follow in code review process today. Please chime in on the PR review, for comments. https://github.com/apache/hudi/pull/2061 Thanks Vinoth

[DISCUSS] Formalizing the release process

2020-09-01 Thread Vinoth Chandar
Hi all, Love to start a discussion around how we can formalize the release process, timelines more so that we can ensure timely and quality releases. Below is an outline of an idea that was discussed in the last community sync (also in the weekly sync notes). - We will do a "feature driven" majo

Re: [DISCUSS] Introduce incremental processing API in Hudi

2020-09-01 Thread Vinoth Chandar
Hi, While I agree on bringing more of these capabilities to Hudi natively, I have few questions/concerns on the specific approach. > And these calculation functions should be engine independent. Therefore, I plan to introduce some new APIs that allow users to directly define Today, if I am a Spa

Re: Multilevel Partitioning in Hudi with Pyspark

2020-09-01 Thread Vinoth Chandar
Great! More docs here. https://hudi.apache.org/docs/writing_data.html#key-generation On Tue, Sep 1, 2020 at 3:26 AM Raghvendra Dhar Dubey wrote: > I got it working by adding an option > hoodie.datasource.write.keygenerator.class = > org.apache.hudi.keygen.ComplexKeyGenerator > > On Tue, Sep 1,

Re: DevX, Test infra Rgdn

2020-08-31 Thread Vinoth Chandar
+1 this is a great way to also ramp on the code base On Sun, Aug 30, 2020 at 8:00 AM Sivabalan wrote: > As Hudi matures as a project, we need to get our devX and test infra rock > solid. Availability of test utils and base classes for ease of writing more > tests, stable integration tests, ease

20200825Weekly Sync Minutes

2020-08-25 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200825+Weekly+Sync+Minutes

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-25 Thread Vinoth Chandar
ting will make it more convenient for > developers > > > to deal with code styles. On the other hand, it will also make the > > > community more complicated when considering related conventions and weigh > > > more factors. > > > > > > Best, > >

Re: Incremental query on partition column

2020-08-25 Thread Vinoth Chandar
t; We have tried a couple of solutions, but so far without success : > > > - replay the data omitting the data of the persons who have requested to > > > be forgotten. We wanted to manipulate the commit times to rebuild the > > > history. > > > We found that we co

Flaky CI Tests

2020-08-25 Thread Vinoth Chandar
Hello all, I am looking for help squashing the following flaky tests. https://api.travis-ci.org/v3/job/719453775/log.txt [INFO] [ERROR] Failures: [ERROR] TestKeyRangeLookupTree.testFileGroupLookUpManyEntriesWithSameStartValue:76->testRangeOfInputs:154 expected: <[580cf7f7-9269-4670-a11a-66ce66e6f

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

2020-08-24 Thread Vinoth Chandar
- announce Folks, please keep the follow ups to dev@ and users@ On Mon, Aug 24, 2020 at 9:26 PM vino yang wrote: > Great news! > > Thanks to Bhavani Sudha for driving the release! And thanks to every one of > the whole community! > > Best, > Vino > > Bhavani Sudha 于2020年8月25日周二 上午11:37写道: >

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-24 Thread Vinoth Chandar
Hi folks, As you have may have noticed, the 0.6.0 release is out. Huge shoutout to our RM, Sudha for pulling this off! As always, thanks for all our users/contributors. congrats everyone! Onwards and upwards to the next one. Thanks Vinoth On Thu, Aug 20, 2020 at 11:32 AM Vinoth Chandar wrote

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-22 Thread Vinoth Chandar
+1 (binding) - Ran the rc checks, I typically do - Tested a smoke test on both cow, mor tables - by running lot commits over longer period of time, - verifying the state of the dataset - count validation match. On Sat, Aug 22, 2020 at 6:08 AM leesf wrote: > +1 (binding) > - mvn clean

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-21 Thread Vinoth Chandar
Sharma > > wrote: > > > > > This is a good option to have. :) > > > > > > On Thu, Aug 20, 2020 at 11:25 PM Vinoth Chandar > > wrote: > > > > > > > IIRC _hoodie_record_key was supposed to this standardized key field. > :) > >

Re: Incremental query on partition column

2020-08-21 Thread Vinoth Chandar
story) and save storage space > using Hudi. > > Can anyone see a way to achieve this? > > Kind Regards, > David Rosalia > > > Get Outlook for Android<https://aka.ms/ghei36> > > > From: Vinoth Chandar > Sent: Friday, August

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-21 Thread Vinoth Chandar
has been keeping checkstyle, IDE and spotless > >>> agreeing > >>>> on the same thing. > >>>> > >>>> Yes, it's the key thing. But, IMO, we can ignore the IDE here, if it > >>> breaks > >>>> the code style, chec

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-20 Thread Vinoth Chandar
cumented here. but > this > > >> is > > >> > > >> > > >> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177 > > >> > > >> > > >> > > >> > > >> > > >> On Wed, Aug 1

Re: [DISCUSS] Support Spark Structured Streaming read from Hudi table

2020-08-20 Thread Vinoth Chandar
I would for all these new things to be revamped on top of Spark 3's newer APIs (it's kind of frustrating that the datasource APIs don't stabilize easily in Spark) I am thinking we can implement a "hudi3" format using Spark 3, with support for SQL Merges, existing functionality and a redone Spark S

Re: [DISCUSS] Support for `_hoodie_record_key` as a virtual column

2020-08-20 Thread Vinoth Chandar
IIRC _hoodie_record_key was supposed to this standardized key field. :) Anyways, it's good to provide this option to the user. So +1 for. RFC/further discussion. To level set, I want to also share some of the benefits of having an explicit key column. a) if you build your data lake using a bunch o

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-18 Thread Vinoth Chandar
; > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar wrote: > > > Thanks Sudha! This is means master is now open for regular PRs. Thanks > for > > your patience, everyone. > > > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha > > wrote: > > > > > Hel

Weekly Sync Minutes 20200818

2020-08-18 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200818+Weekly+Sync+Minutes

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread Vinoth Chandar
> > +1 on standardizing code formatting. On Tuesday, August 18, 2020, > > 03:58:42 PM PDT, Vinoth Chandar wrote: > > > > can more people please chime in? This will affect all of us on a daily > > basis :) > > > > On Thu, Aug 13, 2020 at

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-18 Thread Vinoth Chandar
can more people please chime in? This will affect all of us on a daily basis :) On Thu, Aug 13, 2020 at 8:25 AM Gary Li wrote: > Vote for mvn spotless:apply to do the auto fix. > > On Thu, Aug 13, 2020 at 1:13 AM Vinoth Chandar wrote: > > > Hi, > > > > Anyone has

Re: Recommendation to load HUDI data across partitions

2020-08-18 Thread Vinoth Chandar
but I am now able to remove > flatMap and could use Dataset joins. > > Thanks again for all your help as always !! > > > > > On Thu, Aug 13, 2020 at 1:42 PM Vinoth Chandar wrote: > > > Hi Tanuj, > > > > From this example, it appears as if you are tryin

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-14 Thread Vinoth Chandar
tabilization and will cut the release > > once we stabilize the builds hopefully tonight/tomorrow. > > Thanks,Balaji.V > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar < > > vin...@apache.org> wrote: > > > > Hello all, > > > >

Re: [ANNOUNCE] Hudi Community Weekly Update(2020-08-02 ~ 2020-08-09)

2020-08-13 Thread Vinoth Chandar
+1 thanks leesf. I actually find these very useful when composing the reports also.:) On Sun, Aug 9, 2020 at 5:32 PM vino yang wrote: > Thanks to leesf for continuously updating Hudi weekly. > > It is great to see that more and more improvements are being proposed in > the community. > > Best, >

Re: Incremental query on partition column

2020-08-13 Thread Vinoth Chandar
Hi, On re-ingesting, do you mean to say you want to overwrite the table, while not getting the changes in the incremental query? This has not come up before. As you can imagine, it'd tricky scenario, where we need some special handling/action type introduced. yes, yes on the next two questions.

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-13 Thread Vinoth Chandar
Hi, Anyone has thoughts on this? esp leesf/vinoyang, given you both drove much of the initial cleanups. On Mon, Aug 10, 2020 at 7:16 PM Shiyan Xu wrote: > in that case, yes, all for automation. > > On Mon, Aug 10, 2020 at 7:12 PM Vinoth Chandar wrote: > > > Overall,

Re: Recommendation to load HUDI data across partitions

2020-08-13 Thread Vinoth Chandar
Hi Tanuj, >From this example, it appears as if you are trying to use sparkSession from within the executor? This will be problematic. Can you please open a support ticket with the full stack trace? I think what you are describing is a join between Kafka and Hudi tables. So I'd read from Kafka fir

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-11 Thread Vinoth Chandar
for this is tomorrow night PST (Aug 12, PST). We will keep this thread posted! Thanks Vinoth On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar wrote: > Small correction: > > >> Vinoth working on code review, tests for PR 1876, > This is landed! > > > On Tue, Aug 4, 20

20200811 Sync Cancelled due to low attendance

2020-08-11 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200811+Weekly+Sync+Minutes Cheers Vinoth

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-10 Thread Vinoth Chandar
Overall, I think we should standardize this across the project. But most importantly, may be revive the long dormant spotless effort first to enable autofixing of checkstyle issues, before we add more checking? On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu wrote: > Hi all, > > I noticed that through

Re: Pushing changes to PRs

2020-08-08 Thread Vinoth Chandar
rticle, I just tried to use Intellij IDEA to access > Github features. > > > leesf 于2020年8月8日周六 下午5:54写道: > > > helpful and thanks for writing up. > > > > Vinoth Chandar 于2020年8月8日周六 下午12:53写道: > > > > > Hello all, > > > > > > Few p

Pushing changes to PRs

2020-08-07 Thread Vinoth Chandar
Hello all, Few people have asked me this on separate occasions. So thought I'll add a wiki page on how to checkout, push changes to PRs . Would be useful for all committers. https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs Thanks vinoth

Re: GDPR - Time Travel Query

2020-08-05 Thread Vinoth Chandar
Hi, IIUC, what you want is for the deletes to be applied on different versions of the data? so that no time travel query can read the deleted field again. I am afraid this cannot be achieved as-is today and would need logging these deletes for older base files - that might be one way to achieve th

Re: Review Hudi MOR support from PrestoDB

2020-08-05 Thread Vinoth Chandar
Great job, Sudha/Brandon! This is great. Now we can keep improving the performance here on. On Wed, Aug 5, 2020 at 10:12 PM Bhavani Sudha wrote: > This PR is landed today and will be available in the next Presto release. > Thanks to Brandan for the Presto fixes. > > - Sudha > > > On Tue, Jul 14,

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-04 Thread Vinoth Chandar
DI-845 : Parallel writing i.e allow multiple writers (Pushed out of >0.6.0) >- HUDI-860 : Small File Handling without memory caching (Pushed out of >0.6.0) > > > Thanks, > Sudha > > On Mon, Aug 3, 2020 at 3:41 PM Vinoth Chandar wrote: > > > +1 (w

20200804 Weekly Sync Minutes

2020-08-04 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200804+Weekly+Sync+Minutes Thanks Vinoth

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-03 Thread Vinoth Chandar
nd bump it's priority. > > Please share your thoughts or concerns. > > Thanks, > Sudha > > > On Mon, Aug 3, 2020 at 8:19 AM Vinoth Chandar wrote: > > > Given enough time has passed, Sudha can be our RM for 0.6.0. > > > > On the release blocker progress,

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-03 Thread Vinoth Chandar
Given enough time has passed, Sudha can be our RM for 0.6.0. On the release blocker progress, we landed few blockers over the weekend, with some almost ready for landing Will send out a status update again tomorrow night PST! On Mon, Aug 3, 2020 at 8:17 AM Vinoth Chandar wrote: > Hi an

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-03 Thread Vinoth Chandar
gt; released? Can't find any dates on Hudi related pages. > > On Thu, Jul 30, 2020 at 10:36 AM Vinoth Chandar wrote: > > > Is anyone able to help with the at risk items? :) > > > > On Thu, Jul 30, 2020 at 7:07 AM leesf wrote: > > > > > @Vinoth Chandar

Re: DISCUSS code, config, design walk through sessions

2020-08-02 Thread Vinoth Chandar
020 at 11:52 PM, Zijing Guo > > > > > > > wrote: > > > > > > > > > Thanks for the great session Vinoth! Can we have those session > in a > > > > > regular basis? I personally find today's session are super helpful! > > >

Re: PSA: master integ-tests failing

2020-08-02 Thread Vinoth Chandar
from cursory looks of the parent pom.xml I > > couldn't find anything wrong. > > > > Thanks, > > Nishith > > > > On Fri, Jul 31, 2020 at 8:23 AM Vinoth Chandar > wrote: > > > > > Hello all, > > > > > > integ-tests are currently fa

PSA: master integ-tests failing

2020-07-31 Thread Vinoth Chandar
Hello all, integ-tests are currently failing due to exceeding the log limit on master branch. Nishith is actively debugging what's going on. I request you to hold off merging more PRs in the meantime, until we resolve this. @ nishith , please update this thread, when master is stable again than

Re: [DISCUSS] Release 0.6.0 timelines

2020-07-30 Thread Vinoth Chandar
Is anyone able to help with the at risk items? :) On Thu, Jul 30, 2020 at 7:07 AM leesf wrote: > @Vinoth Chandar Thanks for the reminder, marked to > blocker, and next week would be ok to me. > > Vinoth Chandar 于2020年7月30日周四 上午11:35写道: > > > @leesf can we please mark

Re: DISCUSS code, config, design walk through sessions

2020-07-30 Thread Vinoth Chandar
Thanks everyone who joined! I am hanging out in #general on slack, if we want to finish off any remaining questions. Please @vc me for questions. On Thu, Jul 30, 2020 at 8:00 AM Vinoth Chandar wrote: > yes! Please join > > On Thu, Jul 30, 2020 at 7:35 AM Pratyaksh Sharma > wr

Re: DISCUSS code, config, design walk through sessions

2020-07-30 Thread Vinoth Chandar
yes! Please join On Thu, Jul 30, 2020 at 7:35 AM Pratyaksh Sharma wrote: > Hi Vinoth, > > Is this happening now? > > On Mon, Jul 27, 2020 at 3:50 AM Vinoth Chandar wrote: > > > Hi all, > > > > We will be using the conference link we use for

Re: [DISCUSS] Release 0.6.0 timelines

2020-07-29 Thread Vinoth Chandar
Q2 > > > > has been really hard with COVID and everything going on. Given that > we > > > are > > > > at this point, I feel by delaying the RC by a week or so more if we > can > > > get > > > > some of the 'At risk' items i

[DISCUSS] Release 0.6.0 timelines

2020-07-28 Thread Vinoth Chandar
Hello all, Just wanted to kickstart a thread to firm up the RC cut date for 0.6.0 and pick a RM. (any volunteers?, if not I self nominate myself) Here's an update on where we are at with the remaining release blockers. I have marked items as "At risk" assuming we cut RC sometime next week. Please

Weekly Sync Minutes 20200728

2020-07-28 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/20200728+Weekly+Sync+Minutes Thanks! Vinoth

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-28 Thread Vinoth Chandar
Thanks for being so awesome, Raymond! On Tue, Jul 28, 2020 at 4:23 PM Shiyan Xu wrote: > yup i can make a PR for this. > > On Tue, Jul 28, 2020 at 2:30 PM Vinoth Chandar wrote: > > > Makes sense. Can we update some docs with this IDE setup? > > > > On Tue, Jul

Re: Implement an asynchronous write commit callback

2020-07-28 Thread Vinoth Chandar
+1 we could support a built in Kafka based notification mechanism.. Could we keep that in hudi-utilities instead of hudi-client? On Thu, Jul 23, 2020 at 11:18 PM wangxianghu wrote: > Hi Gray > Thanks for reply. It is the latter. We can use the callback to publish > write commit message to extern

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-28 Thread Vinoth Chandar
Makes sense. Can we update some docs with this IDE setup? On Tue, Jul 28, 2020 at 10:32 AM Shiyan Xu wrote: > Sure... here it is > https://gist.github.com/xushiyan/db4d4067657abe6b8872ef12473b7087 > > On Tue, Jul 28, 2020 at 9:53 AM Vinoth Chandar wrote: > > > Unfortunate

Re: [DISCUSS] Hyperspace + Hudi

2020-07-28 Thread Vinoth Chandar
; - Plugs in at the time of spark query planning to allow for automatic > > > indexing optimizations based on the created index (something I found > > > interesting and worth exploring especially for RFC-08) > > > > > > +1 on stepping the gas on RFC-08/15 for record level +

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-28 Thread Vinoth Chandar
t;> On Mon, Jul 27, 2020 at 11:11 PM Y Ethan Guo > >> wrote: > >> > >>> I see. I'll check the travis CI setup later. I'm unblocked now for > >>> running the unit tests locally. > >>> > >>> Thanks, > >>> - Ethan &g

Re: [DISCUSS] Adding Metrics to Hudi Common

2020-07-28 Thread Vinoth Chandar
> > +1 > > > > Having the metrics flexibly in common will help in building observability > > in other modules. > > > > Thanks, > > Nishith > > > > > On Jul 28, 2020, at 7:28 AM, Vinoth Chandar wrote: > > > > > > +1 as well.

Re: [DISCUSS] Adding Metrics to Hudi Common

2020-07-28 Thread Vinoth Chandar
+1 as well. Given we support many reporters now. Could you please further improve/retain modularity. On Mon, Jul 27, 2020 at 6:30 PM vino yang wrote: > Hi Modi, > > +1 for this proposal. > > I agree with your opinion that the metric report should not only report the > client's metrics. > > And

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-27 Thread Vinoth Chandar
this error, set spark.driver.allowMultipleContexts = true. The > currently running SparkContext was created at: > > - "mvn test -Punit-tests -pl hudi-client -B" from CI: It encounters > "java.lang.OutOfMemoryError: GC overhead limit exceeded". > > What I do now

Re: [DISCUSS] Hyperspace + Hudi

2020-07-27 Thread Vinoth Chandar
ng roadmap and would be > good to see some collaboration here as well. > > Thanks, > Nishith > > On Sun, Jul 26, 2020 at 3:28 PM Vinoth Chandar wrote: > > > Hello all, > > > > In case you have not followed Hyperspace is a new indexing subsystem for > > S

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-26 Thread Vinoth Chandar
Hi Ethan, For purposes of unblocking yourself, can you try running them locally via mvn command via terminal? thanks vinoth On Sun, Jul 26, 2020 at 4:12 PM Y Ethan Guo wrote: > Hi, > > I'm working on hudi-client module and I notice that if I run all unit tests > under hudi-client locally in In

[DISCUSS] Hyperspace + Hudi

2020-07-26 Thread Vinoth Chandar
Hello all, In case you have not followed Hyperspace is a new indexing subsystem for Spark from Microsoft. It seemed like a very interesting project and I tried to explore if it can help us with an indexing option inside Hudi. TL;DR : - Was exploring if hyperspace can be used an alternative fo

Re: DISCUSS code, config, design walk through sessions

2020-07-26 Thread Vinoth Chandar
7:53 AM Adam Feldman wrote: > Great! Thank you > > On Thu, Jul 23, 2020, 10:49 Vinoth Chandar wrote: > > > Hi Adam, > > > > Next week. July 30th 8AM PST. > > > > I will be sending dial in information over the weekend. > > > > > > > >

Re: DISCUSS code, config, design walk through sessions

2020-07-23 Thread Vinoth Chandar
; > > > > > Sent from Yahoo Mail for iPhone > > > > > > On Wednesday, July 15, 2020, 11:42 PM, Vinoth Chandar > > > wrote: > > > > Great! Moving on to date. Would July 23/30 Thursday 8 AM PST work for > > everyone? > > > > O

<    1   2   3   4   5   6   7   8   9   10   >