Re: [VOTE] Release 1.0.0-beta1, release candidate #1

2023-11-13 Thread Bhavani Sudha
+1 (binding)

- compile ok
- checksum ok
- ran some ide tests - ok
- ran quickstart examples with generated keys in spark sql spark 3.4- ok
- release validation script - ok

Thanks,
Sudha

On Mon, Nov 13, 2023 at 6:10 PM sagar sumit  wrote:

> +1 (binding)
> - Ran Spark (PySpark) quickstart on Spark 3.3.1
> - Ran Flink SQL quickstart on 1.16.2
> - Ran async clustering and compaction
>
> On Tue, Nov 14, 2023 at 3:55 AM Vinoth Chandar  wrote:
>
> > +1 (binding)
> >
> > On Sun, Nov 12, 2023 at 10:07 PM Y Ethan Guo  wrote:
> >
> > > +1 (binding)
> > >
> > > - Source, bundle validation pass
> > > - Ran Spark Quickstart (Datasource in Scala, SQL) on Spark 3.3
> > > - Ran long-running Hudi streamer jobs writing COW and MOR tables
> > >
> > > On Sat, Nov 11, 2023 at 12:24 AM sagar sumit 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #1 for the version
> > > > 1.0.0-beta1,
> > > > as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > >
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > >
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > >
> > > > * JIRA release notes [1],
> > > >
> > > > * the official Apache source release and binary convenience releases
> to
> > > be
> > > > deployed to dist.apache.org [2], which are signed with the key with
> > > > fingerprint 888A9341E600EB8550AACD5EFB1B7504F7F770C9 [3],
> > > >
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > >
> > > > * source code tag "1.0.0-beta-rc1" [5]
> > > >
> > > >
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Sagar Sumit
> > > > Release Manager
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12351210
> > > >
> > > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-1.0.0-beta1-rc1
> > > >
> > > > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> > > >
> > > > [4]
> > > https://repository.apache.org/content/repositories/orgapachehudi-1129
> > > >
> > > > [5]
> > https://github.com/apache/hudi/releases/tag/release-1.0.0-beta1-rc1
> > > >
> > >
> >
>


Re: Calling for 0.12.4 release

2023-09-22 Thread Bhavani Sudha
+1 Thanks Yue!

On Fri, Sep 22, 2023 at 6:26 AM Vinoth Chandar  wrote:

> +1 thanks Yue!
>
> On Thu, Sep 21, 2023 at 18:19 Danny Chan  wrote:
>
> > Thanks Yue Zhang for the contribution ~
> >
> > Best,
> > Danny
> >
> > Y Ethan Guo  于2023年9月2日周六 00:24写道:
> > >
> > > Thanks, Yue Zhang, for volunteering to be the RM!
> > >
> > > On Thu, Aug 31, 2023 at 4:38 PM Yue Zhang 
> > wrote:
> > >
> > > > Hi Hudiers,
> > > > I volunteer to be the RM for the next 0.12.4 if u don’t mind
> > > > YueZhang
> > > >  Replied Message 
> > > > | From | Y Ethan Guo |
> > > > | Date | 09/01/2023 07:34 |
> > > > | To | dev |
> > > > | Subject | Calling for 0.12.4 release |
> > > > Hi folks,
> > > >
> > > > It's been 4+ months since Hudi 0.12.3 was released.  As we want to
> > maintain
> > > > 0.12.x LTS releases, shall we, as a community, follow up with 0.12.4
> > > > release to pick up recent bug fixes and improvements?  Any volunteer
> > for
> > > > 0.12.4 Release Manager is welcome.
> > > >
> > > > Thanks,
> > > > - Ethan
> > > >
> >
>


Re: [VOTE] Release 0.14.0, release candidate #3

2023-09-22 Thread Bhavani Sudha
+1 (binding)

- compile ok
- checksum ok
- ran some ide tests - ok
- ran quickstart examples with generated keys - ok
- release validation script - ok

/tmp/validation_scratch_dir_001 ~/hudi/scripts

Current directory: /tmp/validation_scratch_dir_001/local_svn_dir

Downloading from svn co
https://dist.apache.org/repos/dist/dev/hudi/hudi-0.14.0-rc3

Validating hudi-0.14.0-rc3 with release type "dev"

Checking Checksum of Source Release

Checksum Check of Source Release - [OK]


  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

 Dload  Upload   Total   SpentLeft
 Speed

100 76359  100 763590 0   145k  0 --:--:-- --:--:-- --:--:--
 147k

Checking Signature

Signature Check - [OK]


Checking for binary files in the source files

No Binary Files in the source files? - [OK]


Checking for DISCLAIMER

DISCLAIMER file exists ? [OK]


Checking for LICENSE and NOTICE

License file exists ? [OK]

Notice file exists ? [OK]


Performing custom Licensing Check

Licensing Check Passed [OK]


Running RAT Check

.

[INFO]


[INFO] BUILD SUCCESS

[INFO]


[INFO] Total time:  9.163 s

[INFO] Finished at: 2023-09-22T10:49:03-07:00

[INFO]


RAT Check Passed [OK]


Thanks,
Sudha

On Fri, Sep 22, 2023 at 10:29 AM Jonathan Vexler  wrote:

> +1 (non-binding)
> - Tested Spark Datasource and Spark Sql core flow tests
> -Tested reading from bootstrap tables
>
>
> On Fri, Sep 22, 2023 at 12:39 PM sagar sumit  wrote:
>
> > +1 (non-binding)
> >
> > - Long-running deltastreamer [OK]
> > - Hive metastore sync [OK]
> > - Query using Presto and Trino [OK]
> >
> > Regards,
> > Sagar
> >
> > On Fri, Sep 22, 2023 at 9:53 PM Aditya Goenka 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > - Tested Spark Sql workflows , delta streamer , spark structured
> > streaming
> > > for both types of tables with and without record key.
> > > - Meta Sync tests
> > > - Tests for data-skipping with both Column stats and RLI.
> > >
> > > On Fri, Sep 22, 2023 at 9:38 PM Vinoth Chandar 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > >
> > > >- Ran rc checks on RC2 only, but nothing has seemed to change.
> > > >- Tested Spark Datasource/SQL flows around new features like auto
> > key
> > > >generation. This is a simpler SQL experience.
> > > >
> > > >Thanks to all the contributors !
> > > >
> > > >
> > > > On Tue, Sep 19, 2023 at 11:56 AM Prashant Wason
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on the *release candidate #3* for the
> version
> > > > > 0.14.0, as follows:
> > > > >
> > > > > [ ] +1, Approve the release
> > > > >
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > >
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > >
> > > > > * JIRA release notes [1],
> > > > >
> > > > > * the official Apache source release and binary convenience
> releases
> > to
> > > > be
> > > > > deployed to dist.apache.org
> > > > > 
> [2],
> > > > which
> > > > > are signed with the key with
> > > > > fingerprint 75C5744E9E5CD5C48E19C082C4D858D73B9DB1B8 [3],
> > > > >
> > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > >
> > > > > * source code tag "0.14.0-rc3" [5],
> > > > >
> > > > >
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Prashant Wason
> > > > >
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12352700
> > > > >
> > > > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.14.0-rc3/
> > > > >
> > > > > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> > > > >
> > > > > [4]
> > > >
> https://repository.apache.org/content/repositories/orgapachehudi-1127/
> > > > >
> > > > > [5] https://github.com/apache/hudi/releases/tag/release-0.14.0-rc3
> > > > > 
> > > > >
> > > >
> > >
> >
>


Re: Link for Deltastreamer

2023-05-16 Thread Bhavani Sudha
Hi Selva,

I noticed that you had raised a similar slack thread. I responded with
additional info there. Please let us know if you have any other questions.

Thanks,
Sudha

On Tue, May 2, 2023 at 9:40 PM selvaraj periyasamy <
selvaraj.periyasamy1...@gmail.com> wrote:

> Team,
>
> We are interested in implementing Deltastream to consume kafka events and
> write them on hdfs and then do hivesync. Could you share the example link
> for this ?


> 1. And also we need to configure keystore and truststore file and passwords
> for kafka connection . Can they be configured ? Could you share some
> examples ?
>
> 2. Is there a way to feed keystore passwords from java/scala code during
> run time as our passwords encrypted snd can’t be stored in plaintext format
> in properties file.
>
> Thanks in advance,
> Selva
>


Re: [VOTE] Release 0.12.2, release candidate #1

2022-12-23 Thread Bhavani Sudha
+1 binding

[OK] Build successfully multiple supported spark versions

[OK] Ran validation script

[OK] Ran QuickStart on spark 3.2


./release/validate_staged_release.sh --release=0.12.2 --rc_num=1

/tmp/validation_scratch_dir_001 ~/hudi/scripts

Downloading from svn co https://dist.apache.org/repos/dist/dev/hudi

Validating hudi-0.12.2-rc1 with release type "dev"

Checking Checksum of Source Release

Checksum Check of Source Release - [OK]


  % Total% Received % Xferd  Average Speed   TimeTime Time
Current

 Dload  Upload   Total   SpentLeft
Speed

100 69274  100 692740 0  97810  0 --:--:-- --:--:-- --:--:--
98962

Checking Signature

Signature Check - [OK]


Checking for binary files in source release

No Binary Files in Source Release? - [OK]


Checking for DISCLAIMER

DISCLAIMER file exists ? [OK]


Checking for LICENSE and NOTICE

License file exists ? [OK]

Notice file exists ? [OK]


Performing custom Licensing Check

Licensing Check Passed [OK]


Running RAT Check

RAT Check Passed [OK]


~/hudi/scripts



On Thu, Dec 22, 2022 at 8:18 PM sagar sumit  wrote:

> +1 (non-binding)
>
> Ran long-running deltastreamer.
> Validated meta sync and queried tables through Presto/Trino.
>
> On Fri, Dec 23, 2022 at 5:14 AM Sivabalan  wrote:
>
> > +1 binding.
> >
> > release Validation script succeeded.
> > Ran tests w/ diff spark versions for spark-ds writers and deltastreamer
> for
> > few hours.
> > Ran multi-writer tests.
> >
> >
> > On Thu, 22 Dec 2022 at 08:04, Satish Kotha 
> wrote:
> >
> > > We discussed on slack. Because the below commits didn’t meet code
> freeze
> > > date, we are skipping these in 0.12.2 release.
> > >
> > > Please test the release and appreciate feedback.
> > >
> > > On Tue, Dec 20, 2022 at 10:41 PM Danny Chan 
> > wrote:
> > >
> > > > Hi, there are another 2 fix that i want to include:
> > > >
> > > >
> > >
> >
> https://github.com/apache/hudi/commit/c288a506d4c0b7c1272538d95928df118e4d79ac
> > > >
> > > >
> > >
> >
> https://github.com/apache/hudi/commit/211af1a4fd76ce84ce80f4d1b2befe5fc9954888
> > > >
> > > > Best,
> > > > Danny
> > > >
> > > > Satish Kotha  于2022年12月20日周二 11:50写道:
> > > > >
> > > > > small correction in the first line: Please review and vote on
> > > > > the release candidate #1 for the version 0.12.2,
> > > > >
> > > > >
> > > > > On Mon, Dec 19, 2022 at 6:37 PM Satish Kotha <
> satish.ko...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the release candidate #1 for the
> version
> > > > 0.12.1,
> > > > > > as follows:
> > > > > >
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > >
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> > releases
> > > > to be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> > with
> > > > > > fingerprint 6DA0B39A13C2658D22AE7D14D08C4B6BD98EA659 [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-0.12.2-rc1" [5],
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks to Sagar, Raymond and Shiva and others from the community
> > for
> > > > > > helping me through the release process.
> > > > > >
> > > > > > Thanks,
> > > > > > Release Manager
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12352249=Html=12322822=Create_token=A5KQ-2QAV-T4JA-FDED_88b472602a0f3c72f949e98ae8087a47c815053b_lin
> > > > > >
> > > > > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.12.2-rc1/
> > > > > > [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
> > > > > > [4]
> > > >
> https://repository.apache.org/content/repositories/orgapachehudi-1106
> > > > > > [5]
> https://github.com/apache/hudi/releases/tag/release-0.12.2-rc1
> > > > > >
> > > >
> > > --
> > > -
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: [DISCUSS] Merging Nov and Dec community sync calls

2022-11-18 Thread Bhavani Sudha
Thank you all. I'll update the calendar invites and the website for
necessary changes.

-Sudha


On Thu, Nov 17, 2022 at 9:01 AM Pratyaksh Sharma 
wrote:

> +1 as well.
>
> On Thu, Nov 17, 2022 at 9:57 PM sagar sumit 
> wrote:
>
> > +1
> >
> > On Thu, Nov 17, 2022 at 9:44 AM Sivabalan  wrote:
> >
> > > +1 makes sense.
> > >
> > > On Wed, 16 Nov 2022 at 17:40, Y Ethan Guo  wrote:
> > >
> > > > +1 on having a single community sync all on Dec 14 during the holiday
> > > > season.
> > > >
> > > > On Wed, Nov 16, 2022 at 5:12 PM Bhavani Sudha <
> bhavanisud...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello Hudi community,
> > > > >
> > > > > We have monthly community sync calls on the last wednesday of every
> > > > month.
> > > > > For November and December months these collide with public holidays
> > and
> > > > > cause limited attendance due to the same. For this reason, I am
> > > proposing
> > > > > to merge Nov and Dec sync calls into one and placing them in the
> 2nd
> > > week
> > > > > of December. I am thinking of December 14th for the community sync
> > > call.
> > > > > Please let me know your thoughts.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>


[DISCUSS] Merging Nov and Dec community sync calls

2022-11-16 Thread Bhavani Sudha
Hello Hudi community,

We have monthly community sync calls on the last wednesday of every month.
For November and December months these collide with public holidays and
cause limited attendance due to the same. For this reason, I am proposing
to merge Nov and Dec sync calls into one and placing them in the 2nd week
of December. I am thinking of December 14th for the community sync call.
Please let me know your thoughts.


Thanks,
Sudha


Re: [VOTE] Release 0.12.1, release candidate #2

2022-10-13 Thread Bhavani Sudha
+1 (binding)


[OK] Build successfully multiple supported spark versions

[OK] Ran validation script

[OK] Ran some IDE tests


sudha[21:59:36] scripts % ./release/validate_staged_release.sh
--release=0.12.1 --rc_num=2
/tmp/validation_scratch_dir_001 ~/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist/dev/hudi
Validating hudi-0.12.1-rc2 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 65803  100 658030 0   135k  0 --:--:-- --:--:-- --:--:--
 138k
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]

~/hudi/scripts


Thanks,

Sudha

On Thu, Oct 13, 2022 at 9:22 PM Danny Chan  wrote:

> +1 (binding)
>
> Flink quickstart OK
> Long-running Flink SQL Job OK
> Flink Hive Sync OK
> Flink compaction and cleaning OK
> Compile the source code OK
>
> Regards,
> Danny
>
> Rahil C  于2022年10月14日周五 02:46写道:
> >
> > +1 (non-binding)
> >
> > Ran hudi-spark bundle against EMR integration tests
> >
> >
> >
> > On Thu, Oct 13, 2022 at 11:09 AM Shiyan Xu 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Primary key fingerprint: B430 5519 F36D D7E8 B7E6  A684 58B8 5B81 4778
> 3CE2
> > > Signature Check - [OK]
> > >
> > > Checking for binary files in source release
> > > No Binary Files in Source Release? - [OK]
> > >
> > > Checking for DISCLAIMER
> > > DISCLAIMER file exists ? [OK]
> > >
> > > Checking for LICENSE and NOTICE
> > > License file exists ? [OK]
> > > Notice file exists ? [OK]
> > >
> > > Performing custom Licensing Check
> > > Licensing Check Passed [OK]
> > >
> > >   RAT Check Passed [OK]
> > >
> > > On Fri, Oct 14, 2022 at 12:43 AM Sivabalan  wrote:
> > >
> > > > +1 binding.
> > > >
> > > > Ran a suite of integration tests spanning diff spark versions,
> > > > deltastreamer, spark datasource writer for both table types and w/
> and
> > > w/o
> > > > metadata table.
> > > >
> > > > On Thu, 13 Oct 2022 at 05:16, sagar sumit  wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Spark quickstart OK
> > > > > Long-running delta streamer OK
> > > > > Hive Sync OK
> > > > > Async table services OK
> > > > >
> > > > > Regards,
> > > > > Sagar
> > > > >
> > > > > On Tue, Oct 11, 2022 at 8:20 PM zhaojing yu 
> > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the release candidate #2 for the
> version
> > > > > 0.12.1,
> > > > > > as follows:
> > > > > >
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > >
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> releases
> > > to
> > > > > be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> with
> > > > > > fingerprint B4305519F36DD7E8B7E6A68458B85B8147783CE2 [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-0.12.1-rc2" [5],
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Release Manager
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12352182
> > > > > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.12.1-rc2/
> > > > > > [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
> > > > > > [4]
> > > > >
> https://repository.apache.org/content/repositories/orgapachehudi-1101/
> > > > > > [5]
> https://github.com/apache/hudi/releases/tag/release-0.12.1-rc2
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Shiyan
> > >
>


Re: [VOTE] Release 0.12.1, release candidate #1

2022-10-06 Thread Bhavani Sudha
+1 (binding)


[OK] Build successfully multiple supported spark versions

[OK] Ran validation script

[OK] Ran quickstart tests with spark 2.4

[OK] Ran some IDE tests


sudha[10:02:57] scripts % ./release/validate_staged_release.sh
--release=0.12.1 --rc_num=1

/tmp/validation_scratch_dir_001 ~/hudi/scripts

Downloading from svn co https://dist.apache.org/repos/dist/dev/hudi

Validating hudi-0.12.1-rc1 with release type "dev"

Checking Checksum of Source Release

Checksum Check of Source Release - [OK]


  % Total% Received % Xferd  Average Speed   TimeTime Time
Current

 Dload  Upload   Total   SpentLeft
Speed

100 65803  100 658030 0   141k  0 --:--:-- --:--:-- --:--:--
143k

Checking Signature

Signature Check - [OK]


Checking for binary files in source release

No Binary Files in Source Release? - [OK]


Checking for DISCLAIMER

DISCLAIMER file exists ? [OK]


Checking for LICENSE and NOTICE

License file exists ? [OK]

Notice file exists ? [OK]


Performing custom Licensing Check

Licensing Check Passed [OK]


Running RAT Check

RAT Check Passed [OK]


~/hudi/scripts

sudha[10:05:01] scripts %


Thanks,

Sudha

On Thu, Oct 6, 2022 at 8:50 AM Prasanna Rajaperumal 
wrote:

> +1 from my side too.
>
> On Thu, Oct 6, 2022 at 2:41 PM Shiyan Xu 
> wrote:
>
> > +1 (binding) Ran some examples with different spark bundles.
> >
> > On Thu, Oct 6, 2022 at 10:26 AM zhaojing yu  wrote:
> >
> > > Since the temporary library has been closed, the code was redeployed
> > > https://repository.apache.org/content/repositories/orgapachehudi-1095,
> > > please review it again.
> > >
> >
> >
> > --
> > Best,
> > Shiyan
> >
>


Re: [VOTE] Release 0.12.0, release candidate #2

2022-08-14 Thread Bhavani Sudha
+1 (binding)


[OK] Build successfully all supported spark version

[OK] Ran validation script

[OK] Ran quickstart tests with spark 2.4

[OK] Ran some IDE tests


sudha[9:33:26] scripts % ./release/validate_staged_release.sh
--release=0.12.0 --rc_num=2

/tmp/validation_scratch_dir_001 ~/hudi/scripts

Downloading from svn co https://dist.apache.org/repos/dist/dev/hudi

Validating hudi-0.12.0-rc2 with release type "dev"

Checking Checksum of Source Release

Checksum Check of Source Release - [OK]


  % Total% Received % Xferd  Average Speed   TimeTime Time
Current

 Dload  Upload   Total   SpentLeft
Speed

100 62287  100 622870 0  39174  0  0:00:01  0:00:01 --:--:--
39149

Checking Signature

Signature Check - [OK]


Checking for binary files in source release

No Binary Files in Source Release? - [OK]


Checking for DISCLAIMER

DISCLAIMER file exists ? [OK]


Checking for LICENSE and NOTICE

License file exists ? [OK]

Notice file exists ? [OK]


Performing custom Licensing Check

Licensing Check Passed [OK]


Running RAT Check

RAT Check Passed [OK]


Thanks,

Sudha

On Sun, Aug 14, 2022 at 11:16 AM Y Ethan Guo  wrote:

> +1 (non-binding)
>
> - [OK] checksums and signatures
> - [OK] ran release validation script
> - [OK] built successfully (Spark 2.4, 3.2, 3.3)
> - [OK] ran Spark quickstart with Spark 3.3.0
> - [OK] ran a few tests on schema evolution
> - [OK] Presto connector performance
>
> Best,
> - Ethan
>
> On Thu, Aug 11, 2022 at 5:22 AM sagar sumit  wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for the version
> 0.12.0,
> > as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint FD215342E3199419ADFBF41DD4623E3AA16D75B0 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-0.12.0-rc2" [5],
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12351209
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.12.0-rc2/
> > [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachehudi-1090/
> > [5] https://github.com/apache/hudi/releases/tag/release-0.12.0-rc2
> >
>


Apache Hudi products available for purchase in Redbubble store

2022-06-27 Thread Bhavani Sudha
Dear Hudi users,

Recently we added our logo in the Redbubble store. Checkout the link
https://www.redbubble.com/shop/ap/113207590 to shop more Apache Hudi
merchandise if you are interested :)

Thanks,
Sudha


Re: [VOTE] Release 0.11.1, release candidate #2

2022-06-15 Thread Bhavani Sudha
+1 (binding)

- [OK] checksums and signatures
- [OK] ran validation script
- [OK] built successfully
- [OK] ran spark quickstart
- [OK] Ran few tests in IDE

sudha[14:58:21] scripts % ./release/validate_staged_release.sh
--release=0.11.1 --rc_num=2

/tmp/validation_scratch_dir_001 ~/hudi/scripts

Downloading from svn co https://dist.apache.org/repos/dist/dev/hudi

Validating hudi-0.11.1-rc2 with release type "dev"

Checking Checksum of Source Release

Checksum Check of Source Release - [OK]


  % Total% Received % Xferd  Average Speed   TimeTime Time
Current

 Dload  Upload   Total   SpentLeft
Speed

100 55975  100 559750 0   103k  0 --:--:-- --:--:-- --:--:--
103k

Checking Signature

Signature Check - [OK]


Checking for binary files in source release

No Binary Files in Source Release? - [OK]


Checking for DISCLAIMER

DISCLAIMER file exists ? [OK]


Checking for LICENSE and NOTICE

License file exists ? [OK]

Notice file exists ? [OK]


Performing custom Licensing Check

Licensing Check Passed [OK]


Running RAT Check

RAT Check Passed [OK]


~/hudi/scripts

sudha[14:59:28] scripts %


Thanks,

Sudha

On Tue, Jun 14, 2022 at 9:08 PM Danny Chan  wrote:

> Thanks Ethan, would appreciate it if
>
> https://issues.apache.org/jira/browse/HUDI-4255
>
> can be involved, the bug may cause the flink bucket index throws
> FileNotFoundException in some cases.
>
> Best,
> Danny
>
> Y Ethan Guo  于2022年6月13日周一 07:17写道:
> >
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for the version
> 0.11.1,
> > as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 888A9341E600EB8550AACD5EFB1B7504F7F770C9 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-0.11.1-rc2" [5],
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12351597
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.11.1-rc2/
> > [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachehudi-1084/
> > [5] https://github.com/apache/hudi/releases/tag/release-0.11.1-rc2
>


Re: [VOTE] Monthly Community Sync Time

2022-05-20 Thread Bhavani Sudha
Vote is now closed.

We had 5 votes with 4 of 5 preferring 9 am.

I will proceed with updating the time to 9 am pacific time on the website.

Thanks all!

On Fri, May 20, 2022 at 11:17 AM Bhavani Sudha 
wrote:

> My preference is 9 am as well.
>
> On Wed, May 18, 2022 at 3:14 PM Y Ethan Guo 
> wrote:
>
>> +1 for new time.  I prefer 9AM PT.
>>
>> On Tue, May 17, 2022 at 11:40 PM Pratyaksh Sharma 
>> wrote:
>>
>> > I would go with 8 AM PT.
>> >
>> > If that is not feasible, then 8.30 AM.
>> >
>> > On Wed, May 18, 2022 at 7:14 AM Vinoth Govindarajan <
>> > vinoth.govindara...@gmail.com> wrote:
>> >
>> > > +1
>> > >
>> > > I vote for 9 am as well.
>> > >
>> > >
>> > >
>> > > On Tue, May 17, 2022, 1:31 PM Vinoth Chandar <
>> > > mail.vinoth.chan...@gmail.com>
>> > > wrote:
>> > >
>> > > > +1 for changing.
>> > > >
>> > > > 9AM is my preference
>> > > >
>> > > > On Tue, May 17, 2022 at 1:20 PM Bhavani Sudha <
>> bhavanisud...@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > The Community sync happens last Wednesday of every month.
>> Currently
>> > it
>> > > is
>> > > > > scheduled at 7 am which is way too early for a lot of folks.
>> > Following
>> > > > are
>> > > > > proposed times for the meeting. Please find the
>> > > > > corresponding discuss thread here
>> > > > > <https://lists.apache.org/thread/thwj6ro3y1j9p13kft3319p18tm6s50m
>> >
>> > for
>> > > > > more
>> > > > > context.
>> > > > >
>> > > > > - 8:00 AM pacific time
>> > > > > - 8:30 AM pacific time
>> > > > > - 9:00 AM pacific time
>> > > > >
>> > > > > Please indicate your preferred time. This thread will be open for
>> 72
>> > > > hours.
>> > > > > And will be adopted by majority approval after that.
>> > > > >
>> > > > > Thanks,
>> > > > > Sudha
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [VOTE] Monthly Community Sync Time

2022-05-20 Thread Bhavani Sudha
My preference is 9 am as well.

On Wed, May 18, 2022 at 3:14 PM Y Ethan Guo 
wrote:

> +1 for new time.  I prefer 9AM PT.
>
> On Tue, May 17, 2022 at 11:40 PM Pratyaksh Sharma 
> wrote:
>
> > I would go with 8 AM PT.
> >
> > If that is not feasible, then 8.30 AM.
> >
> > On Wed, May 18, 2022 at 7:14 AM Vinoth Govindarajan <
> > vinoth.govindara...@gmail.com> wrote:
> >
> > > +1
> > >
> > > I vote for 9 am as well.
> > >
> > >
> > >
> > > On Tue, May 17, 2022, 1:31 PM Vinoth Chandar <
> > > mail.vinoth.chan...@gmail.com>
> > > wrote:
> > >
> > > > +1 for changing.
> > > >
> > > > 9AM is my preference
> > > >
> > > > On Tue, May 17, 2022 at 1:20 PM Bhavani Sudha <
> bhavanisud...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > The Community sync happens last Wednesday of every month. Currently
> > it
> > > is
> > > > > scheduled at 7 am which is way too early for a lot of folks.
> > Following
> > > > are
> > > > > proposed times for the meeting. Please find the
> > > > > corresponding discuss thread here
> > > > > <https://lists.apache.org/thread/thwj6ro3y1j9p13kft3319p18tm6s50m>
> > for
> > > > > more
> > > > > context.
> > > > >
> > > > > - 8:00 AM pacific time
> > > > > - 8:30 AM pacific time
> > > > > - 9:00 AM pacific time
> > > > >
> > > > > Please indicate your preferred time. This thread will be open for
> 72
> > > > hours.
> > > > > And will be adopted by majority approval after that.
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > >
> > >
> >
>


[VOTE] Monthly Community Sync Time

2022-05-17 Thread Bhavani Sudha
Hi everyone,

The Community sync happens last Wednesday of every month. Currently it is
scheduled at 7 am which is way too early for a lot of folks. Following are
proposed times for the meeting. Please find the
corresponding discuss thread here
 for more
context.

- 8:00 AM pacific time
- 8:30 AM pacific time
- 9:00 AM pacific time

Please indicate your preferred time. This thread will be open for 72 hours.
And will be adopted by majority approval after that.

Thanks,
Sudha


Re: [DISCUSS] Hudi community sync time

2022-05-17 Thread Bhavani Sudha
Sounds good. Thank you all for chiming in. Based on the responses we have
had here, we can move the existing community sync to later time.  I will
send a separate voting thread to finalize the exact time.

Thanks,
Sudha

On Thu, Apr 28, 2022 at 1:55 AM Pratyaksh Sharma 
wrote:

> I would propose 8 AM or 8.30 AM PST though since 9 AM PST will clash with
> my other meetings.
> But happy to go with time that suits most of the folks.
>
> On Thu, Apr 28, 2022 at 3:31 AM Vinoth Govindarajan <
> vinoth.govindara...@gmail.com> wrote:
>
> > +1 for 9 am PST call, the current time is super early hence I missed one
> of
> > the meetings in the past.
> >
> > Best,
> > Vinoth
> >
> >
> > On Tue, Apr 26, 2022 at 8:01 PM Vinoth Chandar 
> wrote:
> >
> > > +1 as well. Current PST times are pretty hard for many folks.
> > >
> > > On Sat, Apr 16, 2022 at 6:20 AM Gary Li 
> > wrote:
> > >
> > > > +1 for splitting into two sessions. The current schedule is
> challenging
> > > for
> > > > both US and Chinese folks. We can organize another session for the
> > > Chinese
> > > > timezone.
> > > >
> > > > Calling out for folks living in the Chinese timezone, please reply to
> > > this
> > > > email thread if you are interested to join a sync meeting. We can
> > > schedule
> > > > one if we have enough interest.
> > > >
> > > > Best,
> > > > Gary
> > > >
> > > > On Sat, Apr 16, 2022 at 2:36 AM Bhavani Sudha <
> bhavanisud...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > Our current monthly community syncs happen around 7 am pacific time
> > on
> > > > the
> > > > > last wednesday of each month. It is already 10 pm in China and we
> > dont
> > > > get
> > > > > to see Chinese folks in the community sync call. We have users from
> > > > > different time zones and finding an overlap is challenging as it
> is.
> > In
> > > > > this context I am proposing the following:
> > > > >
> > > > > - We can split the community syncs into two - one catered towards
> > > Chinese
> > > > > time and the other one that happens currently for the rest of the
> > > folks.
> > > > > - If we split it into two different syncs then, we can move the 7
> am
> > > > > pacific time to 8 am or 9am as well.
> > > > >
> > > > > Please share your thoughts on this proposal.
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release 0.11.0, release candidate #3

2022-04-27 Thread Bhavani Sudha
+1 (binding)

- [OK] checksums and signatures
- [OK] ran validation script
- [OK] built successfully
- [OK] ran spark quickstart
- [OK] Ran few tests in IDE

sudha[23:09:51] scripts %  ./release/validate_staged_release.sh
--release=0.11.0 --rc_num=3
/tmp/validation_scratch_dir_001 ~/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.11.0-rc3 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 55975  100 559750 0   109k  0 --:--:-- --:--:-- --:--:--
 108k
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]

~/hudi/scripts
sudha[23:10:56] scripts %


Thanks,
Sudha

On Tue, Apr 26, 2022 at 8:24 PM Vinoth Chandar  wrote:

> +1 (binding)
>
> Ran RC checks. Passed
>
> On Sun, Apr 24, 2022 at 6:18 AM Shiyan Xu 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #3 for the version
> 0.11.0,
> > as follows:
> >
> > [ ] +1, Approve the release
> >
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> >
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint E1FACC15B67B2C5149224052D3B314F3B6E9C746 [3],
> >
> > * all artifacts to be deployed to the Maven Central Repository [4],
> >
> > * source code tag "0.11.0-rc3" [5],
> >
> >
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> >
> >
> > Thanks,
> >
> > Release Manager
> >
> >
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12350673
> >
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.11.0-rc3/
> >
> > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> >
> > [4]
> https://repository.apache.org/content/repositories/orgapachehudi-1078/
> >
> > [5] https://github.com/apache/hudi/releases/tag/release-0.11.0-rc3
> >
>


[DISCUSS] Hudi community sync time

2022-04-15 Thread Bhavani Sudha
Hello all,

Our current monthly community syncs happen around 7 am pacific time on the
last wednesday of each month. It is already 10 pm in China and we dont get
to see Chinese folks in the community sync call. We have users from
different time zones and finding an overlap is challenging as it is. In
this context I am proposing the following:

- We can split the community syncs into two - one catered towards Chinese
time and the other one that happens currently for the rest of the folks.
- If we split it into two different syncs then, we can move the 7 am
pacific time to 8 am or 9am as well.

Please share your thoughts on this proposal.

Thanks,
Sudha


Re: [VOTE] Release 0.10.1, release candidate #2

2022-01-24 Thread Bhavani Sudha
+1 binding

Ran RC check, quickstart and some IDE tests.

Thanks,
Sudha

On Mon, Jan 24, 2022 at 9:23 AM sagar sumit  wrote:

> +1
>
> - Builds for Spark2/3 [OK]
> - Spark quickstart [OK]
> - Docker Demo (Hive/Presto querying) [OK]
> - Long-running deltastreamer continuous mode with async
> compaction/clustering [OK]
>
> Regards,
> Sagar
>
> On Mon, Jan 24, 2022 at 10:23 PM Sivabalan  wrote:
>
>> Hey folks,
>>  Can we get some attention on this. I expect participation from PMCs
>> and committers atleast. Would appreciate, if you folks can spare some time
>> on RC testing and voting.
>>
>>
>> On Mon, 24 Jan 2022 at 07:54, Pratyaksh Sharma 
>> wrote:
>>
>> > +1
>> >
>> > - Compilation OK
>> > - Validation script OK
>> >
>> > On Sun, Jan 23, 2022 at 8:09 PM Nishith  wrote:
>> >
>> > > +1 binding
>> > >
>> > > -Nishith
>> > >
>> > > > On Jan 22, 2022, at 7:49 PM, Vinoth Chandar 
>> wrote:
>> > > >
>> > > > +1 (binding)
>> > > >
>> > > > Ran my rc checks on updated link and changing my vote to a +1
>> > > >
>> > > >> On Sat, Jan 22, 2022 at 4:10 AM Sivabalan 
>> wrote:
>> > > >>
>> > > >> my bad, the link([2]) was wrong. It is
>> > > >> https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.1-rc2/.
>> > > >> Can you take a look please?
>> > > >>
>> > > >>> On Sat, 22 Jan 2022 at 00:08, Vinoth Chandar 
>> > > wrote:
>> > > >>>
>> > > >>> -1
>> > > >>>
>> > > >>> The artifact version is wrong! It should be 0.10.*1*
>> > > >>>
>> > > >>>
>> > > >>>   - hudi-0.10.0-rc2.src.tgz
>> > > >>>   <
>> > > >>>
>> > > >>
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc2/hudi-0.10.0-rc2.src.tgz
>> > > 
>> > > >>>   - hudi-0.10.0-rc2.src.tgz.asc
>> > > >>>   <
>> > > >>>
>> > > >>
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc2/hudi-0.10.0-rc2.src.tgz.asc
>> > > 
>> > > >>>   - hudi-0.10.0-rc2.src.tgz.sha512
>> > > >>>   <
>> > > >>>
>> > > >>
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc2/hudi-0.10.0-rc2.src.tgz.sha512
>> > > 
>> > > >>>
>> > > >>> grep version hudi-0.10.0-rc2/pom.xml | grep rc2
>> > > >>>  0.10.0-rc2
>> > > >>>
>> > > >>>
>> > > >>> Why are all the arc
>> > > >>>
>> > >  On Thu, Jan 20, 2022 at 3:53 AM Sivabalan 
>> > wrote:
>> > > >>>
>> > >  Hi everyone,
>> > > 
>> > >  Please review and vote on the release candidate #2 for the
>> version
>> > > >>> 0.10.1,
>> > >  as follows:
>> > > 
>> > >  [ ] +1, Approve the release
>> > > 
>> > >  [ ] -1, Do not approve the release (please provide specific
>> > comments)
>> > > 
>> > > 
>> > >  The complete staging area is available for your review, which
>> > > includes:
>> > > 
>> > >  * JIRA release notes [1],
>> > > 
>> > >  * the official Apache source release and binary convenience
>> releases
>> > > to
>> > > >>> be
>> > >  deployed to dist.apache.org [2], which are signed with the key
>> with
>> > >  fingerprint ACD52A06633DB3B2C7D0EA5642CA2D3ED5895122 [3],
>> > > 
>> > >  * all artifacts to be deployed to the Maven Central Repository
>> [4],
>> > > 
>> > >  * source code tag "release-0.10.1-rc2" [5],
>> > > 
>> > > 
>> > >  The vote will be open for at least 72 hours. It is adopted by
>> > majority
>> > >  approval, with at least 3 PMC affirmative votes.
>> > > 
>> > > 
>> > >  Thanks,
>> > >  Release Manager
>> > > 
>> > > 
>> > >  [1]
>> > > 
>> > > 
>> > > >>>
>> > > >>
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12351135
>> > > 
>> > >  [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc2
>> > > 
>> > >  [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
>> > > 
>> > >  [4]
>> > > 
>> > > 
>> > > >>>
>> > > >>
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachehudi-1052/org/apache/hudi/
>> > > 
>> > >  [5] https://github.com/apache/hudi/tree/release-0.10.1-rc2
>> > > 
>> > >  --
>> > >  Regards,
>> > >  -Sivabalan
>> > > 
>> > > >>>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> Regards,
>> > > >> -Sivabalan
>> > > >>
>> > >
>> >
>>
>>
>> --
>> Regards,
>> -Sivabalan
>>
>


Re: [VOTE] Release 0.10.0, release candidate #3

2021-12-04 Thread Bhavani Sudha
+1 (binding)

- [OK] checksums and signatures
- [OK] ran validation script
- [OK] built successfully
- [OK] ran spark quickstart
- [OK] Ran few tests in IDE



bsaktheeswaran@Bhavanis-MacBook-Pro scripts %
./release/validate_staged_release.sh --release=0.10.0 --rc_num=3
/tmp/validation_scratch_dir_001 ~/Sudha/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.10.0-rc3 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 45904  100 459040 0  85323  0 --:--:-- --:--:-- --:--:--
85165
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]

Thanks,
Sudha

On Sat, Dec 4, 2021 at 6:59 AM Vinoth Chandar  wrote:

> +1 (binding)
>
> Ran the RC checks in [1] . This is a huge release, thanks everyone for all
> the hard work!
>
> [1] https://gist.github.com/vinothchandar/68b34f3051e41752ebffd6a3edeb042b
>
> On Sat, Dec 4, 2021 at 5:20 AM Danny Chan  wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #3 for the version
> 0.10.0,
> > as follows:
> >
> > [ ] +1, Approve the release
> >
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> >
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 9A48922F682AB05D1AE4A3E7C2931E4BDB03D5AE [3],
> >
> > * all artifacts to be deployed to the Maven Central Repository [4],
> >
> > * source code tag "release-0.10.0-rc3" [5],
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> >
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12350285
> >
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.10.0-rc3/
> >
> > [3] https://dist.apache.org/repos/dist/dev/hudi/KEYS
> >
> > [4]
> >
> >
> https://repository.apache.org/content/repositories/orgapachehudi-1048/org/apache/hudi/
> >
> > [5] https://github.com/apache/hudi/tree/release-0.10.0-rc3
> >
>


Upcoming monthly community call

2021-11-09 Thread Bhavani Sudha
Dear community,

We are starting monthly community calls from this month. The first call is
scheduled for Nov 24th 7am pacific time. We will be having project updates,
user presentations followed by open floor Q If you would like to present
your use case please sign up here - https://forms.gle/aMkb93ViHhzRRXqV9 .
One of the PMC members will follow up with you.

More details and future community call dates here -
https://hudi.apache.org/community/syncs

Thanks,
Sudha


Re: [VOTE] Release 0.9.0, release candidate #2

2021-08-23 Thread Bhavani Sudha
+1 (binding)

On Mon, Aug 23, 2021 at 3:23 PM Sivabalan  wrote:

> +1 (binding)
>
> 1. Release validation succeeded
> 2. Ran quick start for two variants (spark2, scala11 and spark3, scala12)
> for all operations.
> 3. Ran docker demo and verified all 3 query engines (spark sql, hive,
> presto) and 3 query types(snapshot, read optimized, incremental) across two
> tables.
>
> ./release/validate_staged_release.sh --release=0.9.0 --rc_num=2
> /tmp/validation_scratch_dir_001
> ~/Documents/personal/projects/a_hudi/hudi/scripts
> local dir local_svn_dir
> Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
> Validating hudi-0.9.0-rc2 with release type "dev"
> Checking Checksum of Source Release
> Checksum Check of Source Release - [OK]
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
>  Current
>  Dload  Upload   Total   SpentLeft
>  Speed
> 100 42380  100 423800 0   156k  0 --:--:-- --:--:-- --:--:--
>  156k
> Checking Signature
> Signature Check - [OK]
>
> Checking for binary files in source release
> No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
> DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Performing custom Licensing Check
> Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
>
>
>
>
>
>
>
> On Sun, Aug 22, 2021 at 9:09 PM Vinoth Chandar  wrote:
>
> > +1 (binding)
> >
> > RC check [1] passed
> >
> > [1]
> https://gist.github.com/vinothchandar/68b34f3051e41752ebffd6a3edeb042b
> >
> >
> > On Sun, Aug 22, 2021 at 1:28 PM Sivabalan  wrote:
> >
> > > We can keep the specific discussion out of this voting thread. Have
> > started
> > > a new thread here
> > > <
> > >
> >
> https://lists.apache.org/thread.html/r3bae7622904b04c7d1fb2ddaf5226e37166d5fbb1721f403b1b04545%40%3Cdev.hudi.apache.org%3E
> > > >
> > > to
> > > continue this discussion. We can keep this thread just for voting.
> > Thanks.
> > >
> > > On Sun, Aug 22, 2021 at 2:13 AM Danny Chan 
> wrote:
> > >
> > > > It's not a surprise that 0.9 has a longer release process, the Spark
> > SQL
> > > > was added and many promotions from the Flink engine. We need more
> > > patience
> > > > for this release IMO.
> > > >
> > > > Having another minor release like 0.9.1 is a solution but not a good
> > one,
> > > > people have much more promise to the major release and it carries
> > > > many expectations. If people report the problems during the release
> > > > process, just accept it if it is not a big PR/fix, and there are
> only a
> > > few
> > > > ones up to now. I would not take too much time.
> > > >
> > > > I know that it has been about 4 months since the last release, but
> > people
> > > > want a complete release version not a defective one.
> > > >
> > > > Best,
> > > > Danny
> > > >
> > > > Sivabalan  于2021年8月22日周日 上午11:50写道:
> > > >
> > > > > I would like to share my thoughts on the release process in
> general.
> > I
> > > > will
> > > > > read more about what exactly qualifies for -1 and will look into
> what
> > > > Peng
> > > > > and Danny has put up. But some thoughts on the release in general.
> > > > >
> > > > > Every release process is very tedious and time consuming and RM
> does
> > > put
> > > > in
> > > > > non-trivial amount of work in getting the release out. To make the
> > > > process
> > > > > smooth, RM started an email thread by Aug 3, calling for any
> release
> > > > > blockers. Would like to understand, if these were surfaced in that
> > > > thread?
> > > > > What I am afraid of is, we might keep delaying our release by
> adding
> > > more
> > > > > patches/bug fixes with every candidate. For instance, if we
> consider
> > > > these
> > > > > and RM works on RC3 and puts up a vote in 5 days and what if
> someone
> > > else
> > > > > wants to add a couple of more fixes or improvements to the release?
> > If
> > > > it's
> > > > > a very serious bug that one cannot do basic operations like
> > > insert/upsert
> > > > > in any of the engines or some serious regression, yeah we can
> > > definitely
> > > > > block the release. But if there are corner case bugs, or any
> > > improvements
> > > > > in general, we can always have another release immediately
> following
> > > > this.
> > > > > This is my humble opinion having gone through the release process
> > > myself
> > > > in
> > > > > the past and have helped others in doing the release in Hudi. It's
> > been
> > > > > more than 4 months we have had a release. Would be good for us to
> be
> > > > > mindful of that as well. Maybe this is common in other projects,
> but
> > I
> > > am
> > > > > not aware of that. Please enlighten me if you have experience with
> > > other
> > > > > projects.
> > > > >
> > > > > I would like to hear from other PMCs and experts who are more
> > > > > knowledgeable about the release process.
> > > > > And 

Re: [DISCUSS] Enable Github Discussions

2021-08-11 Thread Bhavani Sudha
+1

Thanks,
Sudha

On Wed, Aug 11, 2021 at 7:08 PM vino yang  wrote:

> +1
>
> Best,
> Vino
>
> Pratyaksh Sharma  于2021年8月12日周四 上午2:16写道:
>
> > +1
> >
> > I have never used it, but we can try this out. :)
> >
> > On Thu, Jul 15, 2021 at 9:43 AM Vinoth Chandar 
> wrote:
> >
> > > Hi all,
> > >
> > > I would like to propose that we explore the use of github discussions.
> > Few
> > > other apache projects have also been trying this out.
> > >
> > > Please chime in
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


Re: [VOTE] Move content off cWiki

2021-07-19 Thread Bhavani Sudha
+1  - Approve the move

On Mon, Jul 19, 2021 at 5:16 PM vbal...@apache.org 
wrote:

>
> +1 - Approve the move
>
>
> On Monday, July 19, 2021, 04:04:39 PM PDT, Prashant Wason
>  wrote:
>
>  +1 - Approve the move
>
> On Mon, Jul 19, 2021 at 3:44 PM Vinoth Chandar  wrote:
>
> > Hi all,
> >
> > Starting a vote based on the DISCUSS thread here [1], to consolidate
> > content from cWiki into Github wiki and project's master branch (for
> design
> > docs)
> >
> > Please chime with a
> >
> > +1 - Approve the move
> > -1  - Disapprove the move (please state your reasoning)
> >
> > The vote will use lazy consensus, needing three +1s to pass, remaining
> open
> > for 72 hours.
> >
> > Thanks
> > Vinoth
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/rb0a96bc10788c9635cc1a35ade7d5d42997a5c9591a5ec5d5a99adf0%40%3Cdev.hudi.apache.org%3E
> >
>


Re: Welcome our PMC Member, Raymond Xu

2021-07-17 Thread Bhavani Sudha
Huge congratulations Raymond! well deserved.

On Sat, Jul 17, 2021 at 2:35 PM Sivabalan  wrote:

> Congratz and well deserved 
>
> On Sat, Jul 17, 2021 at 4:29 PM Raymond Xu 
> wrote:
>
>> Thank you all!
>>
>> On Fri, Jul 16, 2021 at 9:32 PM Navinder Brar
>>  wrote:
>>
>> > Congratulations, Raymond!
>> >
>> >
>> > -Navinder
>> >
>> > On Saturday, July 17, 2021, 9:37 AM, Nishith 
>> wrote:
>> >
>> > Congratulations Raymond! Huge shout out for your valuable contributions!
>> >
>> > > On Jul 16, 2021, at 5:28 PM, Vinoth Chandar 
>> wrote:
>> > >
>> > > Folks,
>> > >
>> > > I am incredibly happy to share the addition of Raymond Xu to the Hudi
>> > PMC.
>> > > Raymond has been a valuable member of our community, over the past few
>> > > years now. Always hustlin and taking on the most underappreciated, but
>> > > extremely valuable aspects of the project, mostly recently with
>> getting
>> > our
>> > > tests working smoothly on Azure CI!
>> > >
>> > > Please join me in congratulating Raymond!
>> > >
>> > > Onwards,
>> > > Vinoth
>> >
>> >
>> >
>> >
>>
> --
> Regards,
> -Sivabalan
>


Re: Amazon Athena expands Apache Hudi Support

2021-07-16 Thread Bhavani Sudha
Thats awesome news. Thanks for sharing Udit.

- Sudha

On Fri, Jul 16, 2021 at 7:07 PM Udit Mehrotra  wrote:

> Hi Folks,
>
> Happy to announce that Amazon Athena has now upgraded to the latest Hudi
> 0.8.0 release. In addition, Athena now supports two additional features:
>
>- Snapshot/Real time query support for Merge on Read tables
>- Query support for tables created with *BOOTSTRAP* operation
>
> Following are the public documentation for the new supports:
>
>- What’s new:
>
> https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-athena-expands-apache-hudi-support/
>- Updated Athena Hudi usage AWS doc:
>https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html
>
> Thanks,
> Udit Mehrotra
> SDE | AWS EMR
>


Re: Welcome New Committers: Pengzhiwei and DannyChan

2021-07-16 Thread Bhavani Sudha
Big congratulations to both of you. Very well deserved!

Cheers,
Sudha

On Fri, Jul 16, 2021 at 8:56 AM Sivabalan  wrote:

> Not to hijack the limelight from *Pengzhiwei *and* DannyChan.* btw, Big
> Kudos to the Chinese community at large. Great adoption and good going :)
> Really excited for the future of Hudi across the globe ! :) btw, fyi, I
> don't get to see the image you attached leesf.
>
>
> On Fri, Jul 16, 2021 at 11:29 AM Sivabalan  wrote:
>
> > Congrats guys! Well deserved.
> >
> > On Fri, Jul 16, 2021 at 9:12 AM Gary Li  wrote:
> >
> >> Congrats Zhiwei and Danny! It's awesome to work with you guys.
> >>
> >> Best,
> >> Gary
> >>
> >>
> >> On Fri, Jul 16, 2021 at 7:55 PM wangxianghu  wrote:
> >>
> >> > Congratulations!well deserved !
> >> >
> >> > > 在 2021年7月16日,18:52,vino yang  写道:
> >> > >
> >> > > Congratulation
> >> >
> >> >
> >>
> > --
> > Regards,
> > -Sivabalan
> >
>
>
> --
> Regards,
> -Sivabalan
>


Re: [DISCUSS] Consolidate all dev collaboration to Github

2021-07-15 Thread Bhavani Sudha
Completely agree on B. On A I feel the necessity to centralize everything
in one place but also without losing the capabilities of Jira. I think we
will have to explore tools in eitherways.

Thanks,
Sudha

On Thu, Jul 15, 2021 at 10:42 PM vino yang  wrote:

> +1 for option B.
>
> Best,
> Vino
>
> Sivabalan  于2021年7月16日周五 上午10:35写道:
>
> > +1 on B. Not sure on A though. I understand the intent to have all in
> > one place. but not very sure if we can get all functionality (version,
> > type, component, status, parent- child relation), etc ported over to
> > github. I assume labels are the only option we have to achieve these.
> > Probably, we should also document the labels in detail so that anyone
> > looking to take a look at untriaged issues should know how/where to look
> > at. If we plan to use GH issues for all, I am sure there will be a lot of
> > proliferation of issues.
> >
> > On Fri, Jul 9, 2021 at 12:29 PM Vinoth Chandar 
> wrote:
> >
> > > Based on this, I will start consolidating more of the cWiki content to
> > > github wiki and master branch?
> > >
> > > JIRA vs GH Issue still probably needs more feedback. I do see the
> > tradeoffs
> > > there.
> > >
> > > On Fri, Jul 9, 2021 at 2:39 AM wei li  wrote:
> > >
> > > > +1
> > > >
> > > > On 2021/07/02 03:40:51, Vinoth Chandar  wrote:
> > > > > Hi all,
> > > > >
> > > > > When we incubated Hudi, we made some initial choices around
> > > collaboration
> > > > > tools of choice. I am wondering if there are still optimal, given
> the
> > > > scale
> > > > > of the community at this point.
> > > > >
> > > > > Specifically, two points.
> > > > >
> > > > > A) Our issue tracker is JIRA, while we just use Github Issues for
> > > support
> > > > > triage. While JIRA is pretty advanced and gives us the ability to
> > track
> > > > > releases, versions and kanban boards, there are few practical
> > > operational
> > > > > problems.
> > > > >
> > > > > - Developers often open bug fixes/PR which all need to be
> > continuously
> > > > > tagged against a release version (fix version)
> > > > > - Referencing JIRAs from Pull Requests is great (we cannot do
> things
> > > like
> > > > > `fixes #1234` to close issues when PR lands, not an easy way to
> click
> > > and
> > > > > get to the JIRA)
> > > > > - Many more developers have a github account, to contribute to Hudi
> > > > though,
> > > > > they need an additional sign-up on jira.
> > > > >
> > > > > So wondering if we should just use one thing - Github Issues, and
> > build
> > > > > scripts/hubot or something to get the missing project management
> from
> > > > > boards.
> > > > >
> > > > > B) Our design docs are on cWiki. Even though we link it off the
> site,
> > > > from
> > > > > my experience, many do not discover them.
> > > > > For large PRs, we need to manually enforce that design and code are
> > in
> > > > sync
> > > > > before we land. If we can, I would love to make RFC being in good
> > > shape a
> > > > > pre-requisite for landing the PR.
> > > > > Once again, separate signup is needed to write design docs or
> comment
> > > on
> > > > > them.
> > > > >
> > > > > So, wondering if we can move our process docs etc into Github Wiki
> > and
> > > > RFCs
> > > > > to the master branch in a rfc folder, and we just use github PRs to
> > > raise
> > > > > RFCs and discuss them.
> > > > >
> > > > > This all also makes it easy for us to measure community activity
> and
> > > keep
> > > > > streamlining our processes.
> > > > >
> > > > > personally, these different channels are overwhelming to me
> at-least
> > :)
> > > > >
> > > > > Love to hear thoughts. Please specify if you are for,against each
> of
> > A
> > > > and
> > > > > B.
> > > > >
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: Welcome new committers and PMC Members!

2021-05-11 Thread Bhavani Sudha
Congratulations @Gary Li and @Wenning Ding!

On Tue, May 11, 2021 at 12:42 PM Vinoth Chandar  wrote:

> Hello all,
>
> Please join me in congratulating our newest set of committers and PMCs.
>
> *Wenning Ding (Committer) *
> Wenning has been a consistent contributor to Hudi, over the past year or
> so. He has added some critical bug fixes, lots of good contributions
> around Spark!
>
> *Gary Li (PMC Member) *
> Gary is a regular feature on all our support channels. He has contributed
> numerous features to Hudi, and evangelized across many companies including
> Bosch/Bytedance. Most of all, he is a solid team player and an asset to the
> project.
>
> Thanks so much for your continued contributions, to make Hudi better and
> better!
>
> Thanks
> Vinoth
>
>


Re: [DISCUSS] Hudi is the data lake platform

2021-04-12 Thread Bhavani Sudha
+1 . Cannot agree more. I think this makes total sense and will provide for
a much better representation of the project.

On Mon, Apr 12, 2021 at 10:30 PM Vinoth Chandar  wrote:

> Hello all,
>
> Reading one more article today, positioning Hudi, as just a table format,
> made me wonder, if we have done enough justice in explaining what we have
> built together here.
> I tend to think of Hudi as the data lake platform, which has the following
> components, of which - one if a table format, one is a transactional
> storage layer.
> But the whole stack we have is definitely worth more than the sum of all
> the parts IMO (speaking from my own experience from the past 10+ years of
> open source software dev).
>
> Here's what we have built so far.
>
> a) *table format* : something that stores table schema, a metadata table
> that stores file listing today, and being extended to store column ranges
> and more in the future (RFC-27)
> b) *aux metadata* : bloom filters, external record level indexes today,
> bitmaps/interval trees and other advanced on-disk data structures tomorrow
> c) *concurrency control* : we always supported MVCC based log based
> concurrency (serialize writes into a time ordered log), and we now also
> have OCC for batch merge workloads with 0.8.0. We will have multi-table and
> fully non-blocking writers soon (see future work section of RFC-22)
> d) *updates/deletes* : this is the bread-and-butter use-case for Hudi, but
> we support primary/unique key constraints and we could add foreign keys as
> an extension, once our transactions can span tables.
> e) *table services*: a hudi pipeline today is self-managing - sizes files,
> cleans, compacts, clusters data, bootstraps existing data - all these
> actions working off each other without blocking one another. (for most
> parts).
> f) *data services*: we also have higher level functionality with
> deltastreamer sources (scalable DFS listing source, Kafka, Pulsar is
> coming, ...and more), incremental ETL support, de-duplication, commit
> callbacks, pre-commit validations are coming, error tables have been
> proposed. I could also envision us building towards streaming egress, data
> monitoring.
>
> I also think we should build the following (subject to separate
> DISCUSS/RFCs)
>
> g) *caching service*: Hudi specific caching service that can hold mutable
> data and serve oft-queried data across engines.
> h) t*imeline metaserver:* We already run a metaserver in spark
> writer/drivers, backed by rocksDB & even Hudi's metadata table. Let's turn
> it into a scalable, sharded metastore, that all engines can use to obtain
> any metadata.
>
> To this end, I propose we rebrand to "*Data Lake Platform*" as opposed to
> "ingests & manages storage of large analytical datasets over DFS (hdfs or
> cloud stores)." and convey the scope of our vision,
> given we have already been building towards that. It would also provide new
> contributors a good lens to look at the project from.
>
> (This is very similar to for e.g, the evolution of Kafka from a pub-sub
> system, to an event streaming platform - with addition of
> MirrorMaker/Connect etc. )
>
> Please share your thoughts!
>
> Thanks
> Vinoth
>


Re: Apache Hudi 0.8.0 Released

2021-04-09 Thread Bhavani Sudha
Thanks Gary. Great job!

On Fri, Apr 9, 2021 at 10:12 PM vino yang  wrote:

> Thanks Gary, great work!
>
> Best,
> Vino
>
> Danny Chan  于2021年4月10日周六 上午10:27写道:
>
> > Cheers ~
> >
> > Best,
> > Danny Chan
> >
> > Vinoth Chandar  于2021年4月10日周六 上午12:43写道:
> >
> > > Thanks Gary! +1 fantastic job with the release!
> > >
> > > Please also announce on Slack (if not done already)
> > >
> > > I shared some tweets at https://twitter.com/apachehudi
> > >
> > > On Fri, Apr 9, 2021 at 7:44 AM leesf  wrote:
> > >
> > > > Thanks gary for driving the release, great job.
> > > >
> > > > Pratyaksh Sharma  于2021年4月9日周五 下午10:40写道:
> > > >
> > > > > Great news!
> > > > >
> > > > > On Fri, Apr 9, 2021 at 11:42 AM Sivabalan 
> > wrote:
> > > > >
> > > > > > Awesome! Great job Gary on the release work!
> > > > > >
> > > > > > On Fri, Apr 9, 2021 at 1:59 AM Gary Li  >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Vinoth.
> > > > > > >
> > > > > > > The page for 0.8.0 is ready
> > > > > > >
> https://hudi.apache.org/docs/0.8.0-spark_quick-start-guide.html.
> > > > > > > The release note could be found here
> > > > > > https://hudi.apache.org/releases.html
> > > > > > >
> > > > > > > Best,
> > > > > > > Gary Li
> > > > > > >
> > > > > > > On Thu, Apr 8, 2021 at 12:15 AM Vinoth Chandar <
> > vin...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > This is awesome! Thanks for sharing, Gary!
> > > > > > > >
> > > > > > > > Are we waiting for the site to be rendered with 0.8.0 release
> > > info
> > > > > and
> > > > > > > > homepage update?
> > > > > > > >
> > > > > > > > On Wed, Apr 7, 2021 at 7:54 AM Gary Li <
> > yanjia.gary...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > We are excited to share that Apache Hudi 0.8.0 was
> released.
> > > > Since
> > > > > > the
> > > > > > > > > 0.7.0 release, we resolved 97 JIRA tickets and made 120
> code
> > > > > commits.
> > > > > > > We
> > > > > > > > > implemented many new features, bugfix, and performance
> > > > improvement.
> > > > > > > > Thanks
> > > > > > > > > to all the contributors who had made this happened.
> > > > > > > > >
> > > > > > > > > *Release Highlights*
> > > > > > > > >
> > > > > > > > > *Flink Integration*
> > > > > > > > > Since the initial support for the Hudi Flink Writer in the
> > > 0.7.0
> > > > > > > release,
> > > > > > > > > the Hudi community made great progress on improving the
> > > > Flink/Hudi
> > > > > > > > > integration, including redesigning the Flink writer
> pipeline
> > > with
> > > > > > > better
> > > > > > > > > performance and scalability, state-backed indexing with
> > > bootstrap
> > > > > > > > support,
> > > > > > > > > Flink writer for MOR table, batch reader for COW table,
> > > > > streaming
> > > > > > > > > reader for MOR table, and Flink SQL connector for both
> source
> > > and
> > > > > > sink.
> > > > > > > > In
> > > > > > > > > the 0.8.0 release, the user is able to use all those
> features
> > > > with
> > > > > > > Flink
> > > > > > > > > 1.11+.
> > > > > > > > >
> > > > > > > > > Please see [RFC-24](
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal
> > > > > > > > > )
> > > > > > > > > for more implementation details of the Flink writer and
> > follow
> > > > this
> > > > > > > > [page](
> > > > > > > > > https://hudi.apache.org/docs/flink-quick-start-guide.html)
> > to
> > > > get
> > > > > > > > started
> > > > > > > > > with Flink!
> > > > > > > > >
> > > > > > > > > *Parallel Writers Support*
> > > > > > > > > As many users requested, now Hudi supports multiple
> ingestion
> > > > > writers
> > > > > > > to
> > > > > > > > > the same Hudi Table with optimistic concurrency control.
> Hudi
> > > > > > supports
> > > > > > > > file
> > > > > > > > > level OCC, i.e., for any 2 commits (or writers) happening
> to
> > > the
> > > > > same
> > > > > > > > > table, if they do not have writes to overlapping files
> being
> > > > > changed,
> > > > > > > > both
> > > > > > > > > writers are allowed to succeed. This feature is currently
> > > > > > experimental
> > > > > > > > and
> > > > > > > > > requires either Zookeeper or HiveMetastore to acquire
> locks.
> > > > > > > > >
> > > > > > > > > Please see [RFC-22](
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers
> > > > > > > > > )
> > > > > > > > > for more implementation details and follow this [page](
> > > > > > > > > https://hudi.apache.org/docs/concurrency_control.html) to
> > get
> > > > > > started
> > > > > > > > with
> > > > > > > > > concurrency control!
> > > > > > > > >
> > > > > > > > > *Writer side improvements*
> > > > > > > > > - 

Re: [VOTE] Release 0.8.0, release candidate #1

2021-03-30 Thread Bhavani Sudha
+1 (binding)

- compile ok
- quickstart ok
- checksum ok
- ran some ide tests - ok
- release validation script - ok
/tmp/validation_scratch_dir_001 ~/Downloads/hudi-0.8.0-rc1/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.8.0-rc1 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 38466  100 384660 0  77709  0 --:--:-- --:--:-- --:--:--
77709
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]



On Mon, Mar 29, 2021 at 9:35 AM Gary Li  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 0.8.0,
> as follows:
>
> [ ] +1, Approve the release
>
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
>
> The complete staging area is available for your review, which includes:
>
> * JIRA release notes [1],
>
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint E2A9714E0FBA3A087BDEE655E72873D765D6C406 [3],
>
> * all artifacts to be deployed to the Maven Central Repository [4],
>
> * source code tag "release-0.8.0-rc1" [5],
>
>
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
>
>
> Thanks,
>
> Release Manager
>
>
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12349423
>
> [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.8.0-rc1/
>
> [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
>
> [4]
>
> https://repository.apache.org/content/repositories/orgapachehudi-1032/org/apache/hudi/
>
> [5] https://github.com/apache/hudi/tree/release-0.8.0-rc1
>


Re: Congrats to our newest committers!

2021-01-27 Thread Bhavani Sudha
Congratulations to both of you!

On Wed, Jan 27, 2021 at 2:39 PM Prashant Wason 
wrote:

> Congratulations to both of you!
>
> On Wed, Jan 27, 2021 at 2:01 PM Udit Mehrotra  wrote:
>
> > Congratulations to both ! Well deserved..
> >
> > - Udit
> >
> > On Wed, Jan 27, 2021 at 1:18 PM nishith agarwal 
> > wrote:
> >
> > > Congratulations to both!
> > >
> > > -Nishith
> > >
> > > On Wed, Jan 27, 2021 at 11:49 AM Sivabalan  wrote:
> > >
> > > > Congratulations folks !
> > > >
> > > > On Wed, Jan 27, 2021 at 12:48 PM Pratyaksh Sharma <
> > pratyaks...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Congratulations both of you!
> > > > >
> > > > > On Wed, Jan 27, 2021 at 8:43 PM Vinoth Chandar 
> > > > wrote:
> > > > >
> > > > > > Congrats both! Well deserved indeed! Glad to have you on the
> > > community.
> > > > > >
> > > > > > On Wed, Jan 27, 2021 at 7:00 AM Shi ShaoFeng <
> > shaofeng...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Congratulations, Wang Xianghu and Li Wei!
> > > > > > >
> > > > > > > 在 2021/1/27 下午9:17,“vino yang” 写入:
> > > > > > >
> > > > > > > Congrats to both of them!
> > > > > > > Well deserved!
> > > > > > >
> > > > > > > Best,
> > > > > > > Vino
> > > > > > >
> > > > > > > Trevor Zhang  于2021年1月27日周三 下午7:20写道:
> > > > > > >
> > > > > > > > Congratulations to  Wang Xianghu and  Li Wei.
> > > > > > > >
> > > > > > > > Best ,
> > > > > > > >
> > > > > > > > Trevor
> > > > > > > >
> > > > > > > > leesf  于2021年1月27日周三 下午7:16写道:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am very happy to announce our newest committers.
> > > > > > > > >
> > > > > > > > > Wang Xianghu: Xianghu has done a great job in
> decoupling
> > > hudi
> > > > > > with
> > > > > > > spark
> > > > > > > > > and implemented the first version of flink and
> > contributed
> > > > bug
> > > > > > > fixes,
> > > > > > > > also
> > > > > > > > > he is very active in answering users questions in china
> > > > wechat
> > > > > > > group.
> > > > > > > > >
> > > > > > > > > Li Wei: Liwei has also done a great job in driving
> major
> > > > > features
> > > > > > > like
> > > > > > > > > RFC-19 together with satish, also contributed many
> > features
> > > > and
> > > > > > > bug fixes
> > > > > > > > > in core modules.
> > > > > > > > >
> > > > > > > > > Please join me in congratulating them!
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Leesf
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> >
>


Re: [VOTE] Release 0.7.0, release candidate #2

2021-01-23 Thread Bhavani Sudha
+1 (binding)

- compile ok
- quickstart ok
- checksum ok
- ran some ide tests - ok
- release validation script - ok
./release/validate_staged_release.sh --release=0.7.0 --rc_num=2
/tmp/validation_scratch_dir_001 ~/Sudha/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.7.0-rc2 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 34972  100 349720 0  76026  0 --:--:-- --:--:-- --:--:--
76026
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]


On Sat, Jan 23, 2021 at 8:58 PM Gary Li  wrote:

> +1(not binding)
>
>   1.  build succeeded
>   2.  validation script succeeded
>   3.  ran tests locally
>
> 
> From: vbal...@apache.org 
> Sent: Sunday, January 24, 2021 6:18 AM
> To: dev@hudi.apache.org 
> Subject: Re: [VOTE] Release 0.7.0, release candidate #2
>
>
> +1 (binding)
> ```1. Ran release validation script successfully.2. Build successful3.
> Quickstart succeeded.
>
> Checking Checksum of Source Release
>  Checksum Check of Source Release - [OK]
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current Dload  Upload   Total   Spent
> Left  Speed100 34972  100 349720 0  88987  0 --:--:-- --:--:--
> --:--:-- 88987Checking Signature Signature Check - [OK]
> Checking for binary files in source release
>  No Binary Files in Source Release? - [OK]
> Checking for DISCLAIMER DISCLAIMER file exists ? [OK]
> Checking for LICENSE and NOTICE License file exists ? [OK] Notice file
> exists ? [OK]
> Performing custom Licensing Check Licensing Check Passed [OK]
> Running RAT Check RAT Check Passed [OK]
> ~/code/oss/upstream_hudi/scripts```On Saturday, January 23, 2021,
> 05:55:10 AM PST, Sivabalan  wrote:
>
>  Got it, I didn't do -1, but just wanted to remind you, so that you don't
> miss it when you redo the steps again to promote the final one.
>
> +1 binding.
> But do ensure when you release, the staged repo (promoted candidate) has
> only one set of artifacts and it's a new repo.
>
>
> On Sat, Jan 23, 2021 at 2:03 AM nishith agarwal 
> wrote:
>
> > +1 binding
> >
> > - Build Successful
> > - Release validation script Successful
> > - Quick start runs Successfully
> >
> > Checking Checksum of Source Release
> > Checksum Check of Source Release - [OK]
> >
> >  % Total% Received % Xferd  Average Speed  TimeTimeTime
> >  Current
> >  Dload  Upload  Total  SpentLeft
> >  Speed
> > 100 34972  100 3497200  96076  0 --:--:-- --:--:-- --:--:--
> > 96076
> > Checking Signature
> > Signature Check - [OK]
> >
> > Checking for binary files in source release
> > No Binary Files in Source Release? - [OK]
> >
> > Checking for DISCLAIMER
> > DISCLAIMER file exists ? [OK]
> >
> > Checking for LICENSE and NOTICE
> > License file exists ? [OK]
> > Notice file exists ? [OK]
> >
> > Performing custom Licensing Check
> > Licensing Check Passed [OK]
> >
> > Running RAT Check
> > RAT Check Passed [OK]
> >
> > Thanks,
> > Nishith
> >
> > On Fri, Jan 22, 2021 at 9:28 PM Vinoth Chandar 
> wrote:
> >
> > > Thanks Siva! I am not sure if thats a required aspect for the binding
> > vote.
> > > Its a minor aspect that does not interfere with testing/validation in
> > > anyway. The actual release artifact needs to be rebuilt and repushed
> > anyway
> > > from a separate repo. Like I noted, I found the wiki instructions bit
> > > ambiguous and I intend to make it clearer going forward so we can avoid
> > > this in future.
> > >
> > > I request everyone to consider this explanation, when casting your
> vote.
> > >
> > > Thanks
> > > Vinoth
> > >
> > >
> > > On Fri, Jan 22, 2021 at 8:35 PM Sivabalan  wrote:
> > >
> > > > - checksums and signatures [OK]
> > > > - successfully built [OK]
> > > > - ran quick start guide [OK]
> > > > - Ran release validation guide [OK]
> > > > - Ran test suite job w/ inserts, upserts, deletes and
> validation(spark
> > > sql
> > > > and hive). Also same job w/ metadata enabled as well [OK]
> > > >
> > > > - Artifacts in staging repo : should be in separate repo where only
> rc2
> > > is
> > > > present. Right now, I see both rc1 and rc2 are present in the same
> > repo.
> > > >
> > > > Will add my binding vote once artifacts are fixed.
> > > >
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 9:17 PM Udit Mehrotra 
> > wrote:
> > > >
> > > > > +1
> > > > > - Build successful
> > 

Re: [VOTE] Release 0.7.0, release candidate #1

2021-01-22 Thread Bhavani Sudha
+1 (binding)

- compile ok
- quickstart ok
- checksum ok
- ran some ide tests - ok
- release validation script - ok
./release/validate_staged_release.sh --release=0.7.0 --rc_num=1
/tmp/validation_scratch_dir_001 ~/Sudha/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.7.0-rc1 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 34972  100 349720 0  78237  0 --:--:-- --:--:-- --:--:--
78237
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]



On Thu, Jan 21, 2021 at 8:46 PM Sivabalan  wrote:

> +1 binding
>
> - checksums and signatures [OK]
> - successfully built [OK]
> - ran quick start guide [OK]
> - Ran release validation guide [OK]
> - Verified artifacts in staging repo [OK]
> - Ran test suite job w/ inserts, upserts, deletes and validation(spark sql
> and hive). Also same job w/ metadata enabled as well [OK]
>
>
> ./release/validate_staged_release.sh --release=0.7.0 --rc_num=1
> /tmp/validation_scratch_dir_001
> ~/Documents/personal/projects/siva_hudi/hudi_070_rc1/hudi-0.7.0-rc1/scripts
> Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
> Validating hudi-0.7.0-rc1 with release type "dev"
> Checking Checksum of Source Release
> Checksum Check of Source Release - [OK]
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
>  Current
>  Dload  Upload   Total   SpentLeft
>  Speed
> 100 34972  100 349720 0   105k  0 --:--:-- --:--:-- --:--:--
>  104k
> Checking Signature
> Signature Check - [OK]
>
> Checking for binary files in source release
> No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
> DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Performing custom Licensing Check
> Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
>
>
> On Thu, Jan 21, 2021 at 8:21 PM Satish Kotha  >
> wrote:
>
> > +1,
> >
> > 1) Able to build
> > 2) Integration tests pass
> > 3) Unit tests pass locally
> > 4) Successfully ran clustering on a small dataset (metadata table not
> > enabled)
> > 5) Verified insert, upsert, insert_overwrite works using QuickStart
> > commands on COW table (metadata table not enabled)
> >
> >
> >
> > On Thu, Jan 21, 2021 at 12:44 AM Vinoth Chandar 
> wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> 0.7.0,
> > > as follows:
> > >
> > > [ ] +1, Approve the release
> > >
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > >
> > > * JIRA release notes [1],
> > >
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],
> > >
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > >
> > > * source code tag "release-0.7.0-rc1" [5],
> > >
> > >
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Release Manager
> > >
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12348721
> > >
> > >
> > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/
> > >
> > > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> > >
> > > [4]
> > https://repository.apache.org/content/repositories/orgapachehudi-1027/
> > >
> > > [5] https://github.com/apache/hudi/tree/release-0.7.0-rc1
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>


Re: Congrats to our newest committers!

2020-12-03 Thread Bhavani Sudha
Congratulations Satish and Prashant!

On Thu, Dec 3, 2020 at 11:03 AM Pratyaksh Sharma 
wrote:

> Congratulations Satish and Prashant!
>
> On Fri, Dec 4, 2020 at 12:22 AM Vinoth Chandar  wrote:
>
> > Hi all,
> >
> > I am really happy to announce our newest set of committers.
> >
> > *Satish Kotha*: Satish has ramped very quickly across our entire code
> base
> > and contributed bug fixes and also drove large, unique features like
> > clustering, replace/overwrite which are about to go out in the 0.7.0
> > release. These efforts largely complete parts of our vision and it could
> > have happened without Satish.
> >
> > *Prashant Wason*: In addition to a number of patches, Prashant has been
> > shouldering massive responsibility on RFC-15, and thanks to his efforts,
> we
> > have a simplified design, very solid implementation right now, that is
> > being tested now for 0.7.0 release again.
> >
> > Please join me in congratulating them on this great milestone!
> >
> > Thanks,
> > Vinoth
> >
>


Re: [DISCUSS] 0.7.0 release timelines

2020-12-01 Thread Bhavani Sudha
I vote for option 2 too.

On Tue, Dec 1, 2020 at 7:36 PM Sivabalan  wrote:

> I would vote for Option2 given that features are already being tested. if
> it's half way through development, may be would have given it a thought.
> But let's hear from the community.
>
>
> On Mon, Nov 30, 2020 at 8:15 PM Vinoth Chandar  wrote:
>
> > Hello all,
> >
> > We still have a few features to land for the 0.7.0 release. Specifically,
> > RFC-15 and Clustering have PRs, undergoing test/production validation at
> > the moment.
> >
> > Based on the JIRAs, I see two options
> >
> > Option 1:  Cut RC by next week or so, and push out the larger features
> to a
> > (hopefully quick) 0.8.0. We already have a few large features in
> > master/pending PRs (spark3, flink, replace/overwrite etc..)
> > Option 2:  Wait till December end to cut RC, with all the originally
> > planned feature set.
> >
> > Please chime in with your thoughts.
> >
> > Thanks
> > Vinoth
> >
>
>
> --
> Regards,
> -Sivabalan
>


Reg weekly sync meeting

2020-10-29 Thread Bhavani Sudha
Hello all,
I was wondering if it would make sense to move the weekly sync meeting to
bi-weekly to amortize time and be efficient, especially since people across
different time zones attend. We could still retain the same time but change
the cadence to one in two weeks instead. What do you think?

Thanks,
Sudha


Re: [DISCUSS] New Community Weekly Sync up Time

2020-09-15 Thread Bhavani Sudha
The current time suited well for me personally.
Moving that to 1 hour earlier should be okay mostly. I might be little late
depending on kid care duties some days. We can go ahead with the change if
timing is fine with everyone.

Thanks,
Sudha

On Tue, Sep 15, 2020 at 7:08 AM Vinoth Chandar  wrote:

> Folks,
>
> Please chime in with your opinions. I still can see some regulars (e.g
> Nishith, Sudha, Gary) who have not chimed in
>
> On Tue, Sep 15, 2020 at 12:22 AM Pratyaksh Sharma 
> wrote:
>
> > Hi,
> >
> > Just wanted to confirm the time for this week's sync up. @Vinoth Chandar
> > 
> >
> > On Thu, Sep 10, 2020 at 1:58 AM Pratyaksh Sharma 
> > wrote:
> >
> > > Great. I request others to also please chime in so that we can finalise
> > > the time for sync up.
> > >
> > > On Wed, Sep 9, 2020 at 9:00 AM Balaji Varadarajan
> > >  wrote:
> > >
> > >>  +1
> > >> On Tuesday, September 8, 2020, 05:54:52 PM PDT, Mehrotra, Udit
> > >>  wrote:
> > >>
> > >>  I am okay with this too.
> > >>
> > >> On 9/8/20, 5:33 PM, "Raymond Xu" 
> wrote:
> > >>
> > >> CAUTION: This email originated from outside of the organization.
> Do
> > >> not click links or open attachments unless you can confirm the sender
> > and
> > >> know the content is safe.
> > >>
> > >>
> > >>
> > >> I'm ok with 1 hr earlier.
> > >>
> > >> On Tue, Sep 8, 2020, 5:09 PM Vinoth Chandar 
> > >> wrote:
> > >>
> > >> > Anyone else wants to chime in for a new time, that works for
> > >> everyone?
> > >> >
> > >> > Personally, I can do this time.
> > >> >
> > >> >  love to hear more inputs.
> > >> >
> > >> > On Wed, Sep 2, 2020 at 10:16 AM Pratyaksh Sharma <
> > >> pratyaks...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi everyone,
> > >> > >
> > >> > > Currently we are having weekly sync ups between 9 PM - 10 PM
> PST
> > >> on
> > >> > > tuesdays. Since I have switched my job last to last month (in
> > >> India),
> > >> > this
> > >> > > time is exactly clashing with the daily standup time at my
> > >> current org.
> > >> > > This is the reason I have not been able to attend the syncups
> > for
> > >> quite
> > >> > > some time.
> > >> > >
> > >> > > Hence just wanted to check with everyone if we could move the
> > >> sync up
> > >> > time
> > >> > > to 1 hour before, i.e have it from 8 PM - 9 PM every tuesday?
> > >> Please let
> > >> > me
> > >> > > know if this is suitable.
> > >> > >
> > >> >
> > >>
> > >>
> > >
> > >
> >
>


Re: Congrats to our newest committers!

2020-09-04 Thread Bhavani Sudha
Congratulations everyone, well deserved!

-Sudha

On Fri, Sep 4, 2020 at 8:29 AM Nishith  wrote:

> Congratulations all!
>
> Sent from my iPhone
>
> > On Sep 4, 2020, at 6:10 AM, Sivabalan  wrote:
> >
> > my bad. all 4.
> >
> >> On Fri, Sep 4, 2020 at 8:48 AM vino yang  wrote:
> >>
> >> Congrats to all 3!
> >>
> >> Best,
> >> Vino
> >>
> >> Balaji Varadarajan  于2020年9月4日周五 上午10:25写道:
> >>
> >>> Udit, Gary, Raymond and Pratyaksh,
> >>> Many congratulations :) Well deserved. Looking forward to your
> continued
> >>> contributions.
> >>> Balaji.V
> >>>On Thursday, September 3, 2020, 07:19:45 PM PDT, Sivabalan <
> >>> n.siv...@gmail.com> wrote:
> >>>
> >>> Congrats to all 3. Much deserved and really excited to see more
> >> committers
> >>> 
> >>>
>  On Thu, Sep 3, 2020 at 9:23 PM leesf  wrote:
> >>>
>  Congrats everyone, well deserved !
> 
> 
> 
>  selvaraj periyasamy  于2020年9月4日周五
> 
>  上午5:05写道:
> 
> 
> 
> > Congrats everyone !
> 
> >
> 
> > On Thu, Sep 3, 2020 at 1:59 PM Vinoth Chandar 
> >>> wrote:
> 
> >
> 
> >> Hi all,
> 
> >>
> 
> >> I am really excited to share the good news about our new committers
> >>> on
> 
> > the
> 
> >> project!
> 
> >>
> 
> >> *Udit Mehrotra *: Udit has travelled with the project since
> >> sept/oct
>  last
> 
> >> year and immensely helped us making Hudi work well with the AWS
> 
> > ecosystem.
> 
> >> His most notable contributions are towards driving large parts of
> >> the
> 
> >> implementation of RFC-12, Hive/Spark integration points. He has
> >> also
> 
> > helped
> 
> >> our users in various tricky issues.
> 
> >>
> 
> >> *Gary Li:* Gary is a great success story for the project, starting
> >>> out
>  as
> 
> >> an early user and steadily grown into a strong contributor, who has
> 
> >> demonstrated the ability to take up challenging implementations
> >> (e.g
> 
> > Impala
> 
> >> support, MOR snapshot query impl on Spark), as well as patiently
> 
> >> iterate through feedback and evolve the design/code. He has also
> >> been
> 
> >> helping users on Slack and mailing lists
> 
> >>
> 
> >> *Raymond Xu:* Raymond has also been a consistent feature on our
> >>> mailing
> 
> >> lists, slack and github. He has been proposing immensely valuable
> 
> >> test/tooling improvements. He has contributed a great deal of code
> >> as
> 
> > well,
> 
> >> towards the same. Many many users thank Raymond for the generous
> >> help
>  on
> 
> >> Slack.
> 
> >>
> 
> >> *Pratyaksh Sharma:* This is yet another great example of user ->
> 
> >> contributor -> committer. Pratyaksh has been a great champion for
> >> the
> 
> >> project, over the past year or so, steadily contributing many
> 
> > improvements
> 
> >> around the Delta Streamer tool.
> 
> >>
> 
> >> Please join me in, congratulating them on this well deserved
> >>> milestone!
> 
> >>
> 
> >> Onwards and upwards,
> 
> >> Vinoth
> 
> >>
> 
> >
> 
>  --
> >>> Regards,
> >>> -Sivabalan
> >>
> >
> >
> > --
> > Regards,
> > -Sivabalan
>


Re: [DISCUSS] Formalizing the release process

2020-09-01 Thread Bhavani Sudha
+1 on the release process formalization.

On Tue, Sep 1, 2020 at 10:21 AM Vinoth Chandar  wrote:

> Hi all,
>
> Love to start a discussion around how we can formalize the release
> process, timelines more so that we can ensure timely and quality releases.
>
> Below is an outline of an idea that was discussed in the last community
> sync (also in the weekly sync notes).
>
> - We will do a "feature driven" major version release, every 3 months or
> so. i.e going from version x.y to x.y+1. The idea here is this ships once
> all the committed features are code complete, tested and verified.
> - We keep doing patches, bug fixes and usability improvements to the
> project always. So, we will also do a "time driven" minor version release
> x.y.z → x.y.z+1 every month or so
> - We will always be releasing from master and thus major release features
> need to be guarded by flags, on minor versions.
> - We will try to avoid patch releases. i.e cherry-picking a few commits
> onto an earlier release version. (during 0.5.3 we actually found the
> cherry-picking of master onto 0.5.2 pretty tricky and even error-prone).
> Some cases, we may have to just make patch releases. But only extenuating
> circumstances. Over time, with better tooling and a larger community, we
> might be able to do this.
>
> As for the major release planning process.
>
>- PMC/Committers can come up with an initial list sourced based on
>user asks, support issue
>- List is shared with the community, for feedback. community can
>suggest new items, re-prioritizations
>- Contributors are welcome to commit more features/asks, (with due
>process)
>
> I would love to hear +1s, -1s and also any new, completely different ideas
> as well. Let's use this thread to align ourselves.
>
> Once we align ourselves, there are some release certification tools that
> need to be built out. Hopefully, we can do this together. :)
>
>
> Thanks
> Vinoth
>


Re: DevX, Test infra Rgdn

2020-09-01 Thread Bhavani Sudha
+1 This will definitely reduce time to capture regressions and pave way for
frequent release cycles.



On Mon, Aug 31, 2020 at 9:59 PM Balaji Varadarajan
 wrote:

>  +1. This would be a great contribution as all developers will benefit
> from this work.
> On Monday, August 31, 2020, 08:07:08 AM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  +1 this is a great way to also ramp on the code base
>
> On Sun, Aug 30, 2020 at 8:00 AM Sivabalan  wrote:
>
> > As Hudi matures as a project, we need to get our devX and test infra rock
> > solid. Availability of test utils and base classes for ease of writing
> more
> > tests, stable integration tests, ease of debuggability, micro benchmarks,
> > performance test infra, automating checkstyle formatting, nightly
> snapshot
> > builds and so on.
> >
> > We have identified and categorized these into different areas as below.
> >
> > - Test fixes and some clean up. // There are a lot of jira tickets
> > lying around in this section.
> > - Test refactoring. // For ease of development, and reduce clutter, we
> need
> > to work on refactoring test infra like having more test utils, base
> classes
> > etc.
> > - More tests to improve coverage in some areas.
> > - CI stability and ease of debugging integration tests.
> > - Checkstyle, sl4j, warnings, spotless, etc.
> > - Micro benchmarks. // add benchmarking framework to hudi. and then
> > identify regressions on any key paths.
> > - Long running test suite
> > - Config clean ups in hudi client
> > - Perf test environment
> > - Nightly builds
> >
> > As we plan out work in each of these sections, we are looking for help
> from
> > the community in getting these done. Plan is to put together a few
> umbrella
> > tickets for each of these areas and will have a coordinator. Coordinator
> > will be one who has expertise in the area of interest. Coordinator will
> > plan out the work in their resp area and will help drive the initiative
> > with help from the community depending on who volunteers to help out.
> >
> > I understand the list is huge. Some work areas will be well defined and
> > should be able to get it done if we allocate enough time and resources.
> But
> > some are exploratory in nature and need some initial push to get the ball
> > rolling.
> >
> > Very likely some of the work items in these would be well defined and
> > should be easy for new folks to contribute. We are not really having any
> > target timeframe in mind(as we had 1 month for bug bash), but would like
> to
> > get concrete work items done in decent time and have others ready by the
> > next major release(for eg, perf test env) depending on resources.
> >
> > Let us know if you would be interested to help our community in this
> > regard.
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: [ANNOUNCE] Apache Hudi 0.6.0 released

2020-08-24 Thread Bhavani Sudha
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha 
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>


Re: [ANNOUNCE] Apache Hudi 0.6.0 released

2020-08-24 Thread Bhavani Sudha
Moving announce@ to bcc to avoid disruptions.

On Mon, Aug 24, 2020 at 8:36 PM Bhavani Sudha 
wrote:

> The Apache Hudi team is pleased to announce the release of Apache Hudi
> 0.6.0.
>
> Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> Incrementals. Apache Hudi manages storage of large analytical datasets on
> DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
> provides the ability to query them.
>
> This release comes 2 months after 0.5.3. It includes more than 200
> resolved issues, comprising new features, perf improvements, as well as
> general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> efficiently bootstrap large datasets into Hudi without having to copy the
> data (experimental feature), via both Spark datasource writer and
> DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
> faster than bloom index for cases where updates/deletes spread across a
> large portion of the table. With this version, rollbacks are done using
> marker files and a supporting upgrade and downgrade infrastructure is
> provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> (experimental feature) is added in this version to support ingesting
> multiple kafka streams in a single DeltaStreamer deployment for enhancing
> operational experience. Bulk inserts are further improved by avoiding any
> dataframe-rdd conversions, accompanied with configurable sorting modes.
> While this conversion of dataframe to rdd, is not a bottleneck for
> upsert/deletes, subsequent releases will expand this to other write
> operations. Other performance improvements include supporting async
> compaction for spark streaming writes.
>
> For details on how to use Hudi, please look at the quick start page
> located at:
> https://hudi.apache.org/docs/quick-start-guide.html
>
> If you'd like to download the source release, you can find it here:
> https://github.com/apache/hudi/releases/tag/release-0.6.0
>
> You can read more about the release (including release notes) here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346663
>
> We would like to thank all contributors, the community, and the Apache
> Software Foundation for enabling this release and we look forward to
> continued collaboration. We welcome your help and feedback. For more
> information on how to report problems, and to get involved, visit the
> project website at:
> http://hudi.apache.org/
>
> Thanks to everyone involved!
> - Bhavani Sudha
>


[ANNOUNCE] Apache Hudi 0.6.0 released

2020-08-24 Thread Bhavani Sudha
The Apache Hudi team is pleased to announce the release of Apache Hudi
0.6.0.

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
Incrementals. Apache Hudi manages storage of large analytical datasets on
DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
provides the ability to query them.

This release comes 2 months after 0.5.3. It includes more than 200 resolved
issues, comprising new features, perf improvements, as well as general
improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to efficiently
bootstrap large datasets into Hudi without having to copy the data
(experimental feature), via both Spark datasource writer and
DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be
faster than bloom index for cases where updates/deletes spread across a
large portion of the table. With this version, rollbacks are done using
marker files and a supporting upgrade and downgrade infrastructure is
provided to users for smooth transition. HoodieMultiDeltaStreamer tool
(experimental feature) is added in this version to support ingesting
multiple kafka streams in a single DeltaStreamer deployment for enhancing
operational experience. Bulk inserts are further improved by avoiding any
dataframe-rdd conversions, accompanied with configurable sorting modes.
While this conversion of dataframe to rdd, is not a bottleneck for
upsert/deletes, subsequent releases will expand this to other write
operations. Other performance improvements include supporting async
compaction for spark streaming writes.

For details on how to use Hudi, please look at the quick start page located
at:
https://hudi.apache.org/docs/quick-start-guide.html

If you'd like to download the source release, you can find it here:
https://github.com/apache/hudi/releases/tag/release-0.6.0

You can read more about the release (including release notes) here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346663

We would like to thank all contributors, the community, and the Apache
Software Foundation for enabling this release and we look forward to
continued collaboration. We welcome your help and feedback. For more
information on how to report problems, and to get involved, visit the
project website at:
http://hudi.apache.org/

Thanks to everyone involved!
- Bhavani Sudha


[RESULT] [VOTE] Release 0.6.0, release candidate #1

2020-08-23 Thread Bhavani Sudha
Hello all,

I'm happy to announce that we have unanimously approved this release.

There are 9 approving votes, 5 of which are binding:

* Vino Yang (Binding)
* Leesf (Binding)
* Vinoth (Binding)
* Balaji (Binding)
* Sudha (Binding)
* Trevor Zhang
* Sivabalan
* Gary Li
* Raymond


There are no disapproving votes.

Link to Voting thread:
https://lists.apache.org/thread.html/r43ebf459a0c4f99cc918b7ef0110de3f5785346b57bf3f8ba2a64378%40%3Cdev.hudi.apache.org%3E


Thanks everyone!


Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-22 Thread Bhavani Sudha
Thank you all. Closing the voting as we have got
sufficient votes. Will send out tally in a separate email.


On Sat, Aug 22, 2020 at 8:34 PM Shiyan Xu 
wrote:

> Submitted the PR to update testing commands
> https://github.com/apache/hudi/pull/2010
>
> +1 (non-binding)
>
> - packaging ok
> - local tests ok
>
>
>
> On Sat, Aug 22, 2020 at 5:07 PM Udit Mehrotra 
> wrote:
>
> > +1 (non-binding)
> >
> > - Compiles successfully
> > - Ran tests on EMR with bunch of upserts/delete commits, and verified
> query
> > results through spark datasource, spark-sql, hive and presto for COW/MOR
> > tables
> > - Ran insert/bulk insert/upserts on 100GB tpcds table
> > - Ran release validation scripts successfully
> > - Unit tests fail for me locally, as I observe similar behavior as Siva
> > mentioned in https://issues.apache.org/jira/browse/HUDI-1211
> > - Docker integration tests run successfully.
> >
> > Thanks,
> > Udit
> >
> > On Sat, Aug 22, 2020 at 4:16 PM Bhavani Sudha 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Downloaded tar and verified compile [OK]
> > >
> > > Run integration test locally. [OK]
> > >
> > > Run a few tests in IDE. [OK]
> > >
> > > Run quickstart [OK]
> > >
> > > Verify NOTICE and LICENSE exists [OK]
> > >
> > > Check Checksum [OK]
> > >
> > > Check no Binary files in source release [OK]
> > >
> > > Rat Check Passed [OK]
> > >
> > > On Sat, Aug 22, 2020 at 1:41 PM Balaji Varadarajan <
> varadar...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1(binding)
> > > > 1. Ran long running structured streaming writes on fake data and
> > verified
> > > > compactions and ingestion is happening without errors.
> > > > 2. Ran both scala and python based quickstart without any errors.
> There
> > > was
> > > > an issue in the documented quickstart steps (not in hudi) for python
> > > > example. Will send a doc PR shortly.
> > > > 3. Release Validation script passed locally.
> > > >
> > > > ```
> > > > MacBook-Pro:scripts balaji.varadarajan$
> > > > ./release/validate_staged_release.sh --release=0.6.0 --rc_num=1
> > > > /tmp/validation_scratch_dir_001 ~/code/oss/upstream_hudi/scripts
> > > > Checking Checksum of Source Release
> > > > Checksum Check of Source Release - [OK]
> > > >
> > > >   % Total% Received % Xferd  Average Speed   TimeTime
>  Time
> > > >  Current
> > > >  Dload  Upload   Total   Spent
> Left
> > > >  Speed
> > > > 100 30225  100 302250 0  46215  0 --:--:-- --:--:--
> > --:--:--
> > > > 46145
> > > > Checking Signature
> > > > Signature Check - [OK]
> > > >
> > > > Checking for binary files in source release
> > > > No Binary Files in Source Release? - [OK]
> > > >
> > > > Checking for DISCLAIMER
> > > > DISCLAIMER file exists ? [OK]
> > > >
> > > > Checking for LICENSE and NOTICE
> > > > License file exists ? [OK]
> > > > Notice file exists ? [OK]
> > > >
> > > > Performing custom Licensing Check
> > > > Licensing Check Passed [OK]
> > > >
> > > > Running RAT Check
> > > > RAT Check Passed [OK]
> > > >
> > > > ~/code/oss/upstream_hudi/scripts
> > > > MacBook-Pro:scripts balaji.varadarajan$ echo $?
> > > > 0
> > > > MacBook-Pro:scripts balaji.varadarajan$
> > > > ```
> > > >
> > > >
> > > > On Sat, Aug 22, 2020 at 8:27 AM Vinoth Chandar 
> > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > - Ran the rc checks, I typically do
> > > > > - Tested a smoke test on both cow, mor tables
> > > > > - by running lot commits over longer period of time,
> > > > > - verifying the state of the dataset
> > > > >- count validation match.
> > > > >
> > > > > On Sat, Aug 22, 2020 at 6:08 AM leesf  wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > > - mvn clean package -DskipTests OK
> > > > > > - ran quickstart guide OK (still get the exception ERROR
> > 

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-22 Thread Bhavani Sudha
+1 (binding)

Downloaded tar and verified compile [OK]

Run integration test locally. [OK]

Run a few tests in IDE. [OK]

Run quickstart [OK]

Verify NOTICE and LICENSE exists [OK]

Check Checksum [OK]

Check no Binary files in source release [OK]

Rat Check Passed [OK]

On Sat, Aug 22, 2020 at 1:41 PM Balaji Varadarajan 
wrote:

> +1(binding)
> 1. Ran long running structured streaming writes on fake data and verified
> compactions and ingestion is happening without errors.
> 2. Ran both scala and python based quickstart without any errors. There was
> an issue in the documented quickstart steps (not in hudi) for python
> example. Will send a doc PR shortly.
> 3. Release Validation script passed locally.
>
> ```
> MacBook-Pro:scripts balaji.varadarajan$
> ./release/validate_staged_release.sh --release=0.6.0 --rc_num=1
> /tmp/validation_scratch_dir_001 ~/code/oss/upstream_hudi/scripts
> Checking Checksum of Source Release
> Checksum Check of Source Release - [OK]
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
>  Current
>  Dload  Upload   Total   SpentLeft
>  Speed
> 100 30225  100 302250 0  46215  0 --:--:-- --:--:-- --:--:--
> 46145
> Checking Signature
> Signature Check - [OK]
>
> Checking for binary files in source release
> No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
> DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Performing custom Licensing Check
> Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
>
> ~/code/oss/upstream_hudi/scripts
> MacBook-Pro:scripts balaji.varadarajan$ echo $?
> 0
> MacBook-Pro:scripts balaji.varadarajan$
> ```
>
>
> On Sat, Aug 22, 2020 at 8:27 AM Vinoth Chandar  wrote:
>
> > +1 (binding)
> >
> > - Ran the rc checks, I typically do
> > - Tested a smoke test on both cow, mor tables
> > - by running lot commits over longer period of time,
> > - verifying the state of the dataset
> >- count validation match.
> >
> > On Sat, Aug 22, 2020 at 6:08 AM leesf  wrote:
> >
> > > +1 (binding)
> > > - mvn clean package -DskipTests OK
> > > - ran quickstart guide OK (still get the exception ERROR
> > > view.PriorityBasedFileSystemView: Got error running preferred function.
> > > Trying secondary
> > > org.apache.hudi.exception.HoodieRemoteException: 192.168.1.102:56544
> > > failed
> > > to respond
> > > at
> > >
> > >
> >
> org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestBaseFile(RemoteHoodieTableFileSystemView.java:426)
> > > at
> > >
> > >
> >
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:96)
> > > at
> > >
> > >
> >
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestBaseFile(PriorityBasedFileSystemView.java:139),
> > > but still ran successfully)
> > > - writing demos to sync to hive & dla OK
> > >
> > > Sivabalan  于2020年8月22日周六 上午5:29写道:
> > >
> > > > +1 (non binding)
> > > > - Compilation successful
> > > > - Ran validation script which verifies checksum, keys, license, etc.
> > > > - Ran quick start
> > > > - Ran some tests from intellij.
> > > >
> > > > JFYI: when I ran mvn test, encountered some test failures due to
> > multiple
> > > > spark contexts. Have raised a ticket here
> > > > <https://issues.apache.org/jira/browse/HUDI-1211>. But all tests are
> > > > succeeding in CI and I could run from within intellij. So, not
> blocking
> > > the
> > > > RC.
> > > >
> > > > Checking Checksum of Source Release-e Checksum Check of Source
> Release
> > -
> > > > [OK]
> > > >   % Total% Received % Xferd  Average Speed   TimeTime
>  Time
> > > > Current
> > > >Dload  Upload
> > > > Total   SpentLeft  Speed
> > > > 100 30225  100 302250 0   106k  0 --:--:-- --:--:--
> > --:--:--
> > > > 106k
> > > > Checking Signature
> > > > -e Signature Check - [OK]
> > > > Checking for binary files in source release
> > > > -e No Binary Files in Source Release? - [OK]
> > > > Checking for DISCLAIMER
> > > >

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Bhavani Sudha
Vino yang,

I am working on the release blog. While the RC is in progress, the doc and
site updates are happening this week.

Thanks,
Sudha

On Fri, Aug 21, 2020 at 4:23 AM vino yang  wrote:

> +1 from my side
>
> I checked:
>
> - ran `mvn clean package` [OK]
> - ran `mvn test` in my local [OK]
> - signature [OK]
>
> BTW, where is like of the release blog?
>
> Best,
> Vino
>
> Bhavani Sudha  于2020年8月20日周四 下午12:03写道:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version 0.6.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 7F66CD4CE990983A284672293224F200E1FC2172 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-0.6.0-rc1" [5],
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346663
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.6.0-rc1/
> > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachehudi-1025/
> > [5] https://github.com/apache/hudi/tree/release-0.6.0-rc1
> >
>


Re: contribution permission apply

2020-08-20 Thread Bhavani Sudha
use Jira username is already in contributor role. Please let me know if you
are not able to assign tickets to yourself.

On Thu, Aug 20, 2020 at 11:07 AM wowtua...@gmail.com 
wrote:

>
>
> I want to contribute to Apache Hudi.
>
> Would you please give me the permission as a contributor ?
>
> My JIRA username is Trevorzhang.
>
>
> wowtua...@gmail.com
>


[VOTE] Release 0.6.0, release candidate #1

2020-08-19 Thread Bhavani Sudha
Hi everyone,
Please review and vote on the release candidate #1 for the version 0.6.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 7F66CD4CE990983A284672293224F200E1FC2172 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-0.6.0-rc1" [5],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346663
[2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.6.0-rc1/
[3] https://dist.apache.org/repos/dist/release/hudi/KEYS
[4] https://repository.apache.org/content/repositories/orgapachehudi-1025/
[5] https://github.com/apache/hudi/tree/release-0.6.0-rc1


Re: [DISCUSS] Release 0.6.0 timelines

2020-08-18 Thread Bhavani Sudha
Quick update on the RC.

Found a build issue when building scala 2.12 and sent a PR for that -
https://github.com/apache/hudi/pull/1976 . Working on resolving this in the
release branch and updating RC. Will update soon.

Thanks,
Sudha

On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar  wrote:

> Thanks Sudha! This is means master is now open for regular PRs. Thanks for
> your patience, everyone.
>
> On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha 
> wrote:
>
> > Hello all,
> >
> > We have cut the release branch -
> > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is already
> > Friday, we will be sending the release candidate early next week (after
> > some testing).
> >
> > Happy Friday!
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org 
> > wrote:
> >
> > >
> > > Hi Folks,
> > > We are continuing to work on CI stabilization and will cut the release
> > > once we stabilize the builds hopefully tonight/tomorrow.
> > > Thanks,Balaji.V
> > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> > > vin...@apache.org> wrote:
> > >
> > >  Hello all,
> > >
> > > Update on this. We have landed most of the blockers for the 0.6.0
> release
> > > and I am currently working on the last major blocker, HUDI-1013.
> > > We are working through some unexpected CI flakiness. We hope to
> stabilize
> > > master, cut the RC, and then open up master for regular PR merges.
> > > ETA for this is tomorrow night PST (Aug 12, PST).
> > >
> > > We will keep this thread posted!
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar 
> wrote:
> > >
> > > > Small correction:
> > > >
> > > > >> Vinoth working on code review, tests for PR 1876,
> > > > This is landed!
> > > >
> > > >
> > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha <
> bhavanisud...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hello all,
> > > >>
> > > >> We are targeting the end of this week to cut RC. Here is an update
> of
> > > >> where
> > > >> we are at release blockers.
> > > >>
> > > >> 0.6.0 Release blocker status (board
> > > >> <
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69
> > > >> >)
> > > >> ,
> > > >>
> > > >>- Spark Datasource/MOR https://github.com/apache/hudi/pull/1848
> > > needs
> > > >> to
> > > >>be tested by gary/balaji (About to land)
> > > >>- Hive Sync restructuring (Review done, about to land)
> > > >>- Bootstrap
> > > >>  - Vinoth working on code review, tests for PR 1876,
> > > >>  - then udit will rework PR 1702 (In Code review)
> > > >>  - then we will review, land PR 1870, 1869
> > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well tested
> > > >> already
> > > >>  - Dependent PR 1149 to be landed,
> > > >>  - and modes to be respected in V2 impl as well (At risk)
> > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> > > >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR
> > nearing
> > > >>landing
> > > >>- HUDI-1098 : Marker file issue with non-existent files. (In Code
> > > >> review)
> > > >>- Spark Streaming + Async Compaction , test complete, code review
> > > >>comments and land PR 1752 (About to land)
> > > >>- Spark DataSource/Hive MOR Incremental Query HUDI-920 (At risk)
> > > >>- Flink/Multi Engine refactor, will need a large rebase and
> rework,
> > > >>review, land (At risk for 0.6.0)
> > > >>- BloomIndex V2 - Global index implementation. (At risk)
> > > >>- HUDI-845 : Parallel writing i.e allow multiple writers (Pushed
> > out
> > > of
> > > >>0.6.0)
> > > >>- HUDI-860 : Small File Handling without memory caching (Pushed
> out
> > > of
> > > >>   

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-14 Thread Bhavani Sudha
Hello all,

We have cut the release branch -
https://github.com/apache/hudi/tree/release-0.6.0 . Since it is already
Friday, we will be sending the release candidate early next week (after
some testing).

Happy Friday!

Thanks,
Sudha

On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org 
wrote:

>
> Hi Folks,
> We are continuing to work on CI stabilization and will cut the release
> once we stabilize the builds hopefully tonight/tomorrow.
> Thanks,Balaji.V
> On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  Hello all,
>
> Update on this. We have landed most of the blockers for the 0.6.0 release
> and I am currently working on the last major blocker, HUDI-1013.
> We are working through some unexpected CI flakiness. We hope to stabilize
> master, cut the RC, and then open up master for regular PR merges.
> ETA for this is tomorrow night PST (Aug 12, PST).
>
> We will keep this thread posted!
>
> Thanks
> Vinoth
>
> On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar  wrote:
>
> > Small correction:
> >
> > >> Vinoth working on code review, tests for PR 1876,
> > This is landed!
> >
> >
> > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha 
> > wrote:
> >
> >> Hello all,
> >>
> >> We are targeting the end of this week to cut RC. Here is an update of
> >> where
> >> we are at release blockers.
> >>
> >> 0.6.0 Release blocker status (board
> >> <
> >>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69
> >> >)
> >> ,
> >>
> >>- Spark Datasource/MOR https://github.com/apache/hudi/pull/1848
> needs
> >> to
> >>be tested by gary/balaji (About to land)
> >>- Hive Sync restructuring (Review done, about to land)
> >>- Bootstrap
> >>  - Vinoth working on code review, tests for PR 1876,
> >>  - then udit will rework PR 1702 (In Code review)
> >>  - then we will review, land PR 1870, 1869
> >>- Bulk insert V2 PR 1834, lower risk, independent PR, well tested
> >> already
> >>  - Dependent PR 1149 to be landed,
> >>  - and modes to be respected in V2 impl as well (At risk)
> >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review)
> >>- HUDI-1054- Marker list perf improvement, Udit has a PR out
> >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR nearing
> >>landing
> >>- HUDI-1098 : Marker file issue with non-existent files. (In Code
> >> review)
> >>- Spark Streaming + Async Compaction , test complete, code review
> >>comments and land PR 1752 (About to land)
> >>- Spark DataSource/Hive MOR Incremental Query HUDI-920 (At risk)
> >>- Flink/Multi Engine refactor, will need a large rebase and rework,
> >>review, land (At risk for 0.6.0)
> >>- BloomIndex V2 - Global index implementation. (At risk)
> >>- HUDI-845 : Parallel writing i.e allow multiple writers (Pushed out
> of
> >>0.6.0)
> >>- HUDI-860 : Small File Handling without memory caching (Pushed out
> of
> >>0.6.0)
> >>
> >>
> >> Thanks,
> >> Sudha
> >>
> >> On Mon, Aug 3, 2020 at 3:41 PM Vinoth Chandar 
> wrote:
> >>
> >> > +1 (we need to formalize this well)
> >> > But having just blockers land first, would help not just with
> rebasing,
> >> but
> >> > also wind down towards cutting an RC by end of week.
> >> >
> >> >
> >> > On Mon, Aug 3, 2020 at 2:53 PM Bhavani Sudha  >
> >> > wrote:
> >> >
> >> > > Hello all,
> >> > >
> >> > > As we are all hustling towards getting the blockers in, I wanted to
> >> > propose
> >> > > a code/merge freeze until we cut a release for 0.6.0  and restrict
> it
> >> to
> >> > > only merging blockers identified for this release. It would reduce
> >> > rebasing
> >> > > time for blockers in progress. If we feel some issue is a serious
> >> blocker
> >> > > we can discuss it here and bump it's priority.
> >> > >
> >> > > Please share your thoughts or concerns.
> >> > >
> >> > > Thanks,
> >> > > Sudha
> >> > >
> >> > >
> >> > > On Mon, Aug 3, 2020 at 8:19 AM Vinoth Chandar 
> >> wrote:
> >&

Re: Review Hudi MOR support from PrestoDB

2020-08-05 Thread Bhavani Sudha
This PR is landed today and will be available in the next Presto release.
Thanks to Brandan for the Presto fixes.

- Sudha


On Tue, Jul 14, 2020 at 9:18 PM Bhavani Sudha 
wrote:

> Hello all,
>
> Brandon has opened a PR for adding support for MOR tables from Presto -
> https://github.com/prestodb/presto/pull/14795 . Please provide feedback
> if you are interested.
>
> Thanks,
> Sudha
>


Re: [DISCUSS] Release 0.6.0 timelines

2020-08-04 Thread Bhavani Sudha
Hello all,

We are targeting the end of this week to cut RC. Here is an update of where
we are at release blockers.

0.6.0 Release blocker status (board
<https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69>)
,

   - Spark Datasource/MOR https://github.com/apache/hudi/pull/1848 needs to
   be tested by gary/balaji (About to land)
   - Hive Sync restructuring (Review done, about to land)
   - Bootstrap
  - Vinoth working on code review, tests for PR 1876,
  - then udit will rework PR 1702 (In Code review)
  - then we will review, land PR 1870, 1869
   - Bulk insert V2 PR 1834, lower risk, independent PR, well tested already
  - Dependent PR 1149 to be landed,
  - and modes to be respected in V2 impl as well (At risk)
   - Upgrade Downgrade Hooks, PR 1858 : (In Code review)
   - HUDI-1054- Marker list perf improvement, Udit has a PR out
   - HUDI-115 : Overwrite with... ordering issue, Sudha has a PR nearing
   landing
   - HUDI-1098 : Marker file issue with non-existent files. (In Code review)
   - Spark Streaming + Async Compaction , test complete, code review
   comments and land PR 1752 (About to land)
   - Spark DataSource/Hive MOR Incremental Query HUDI-920 (At risk)
   - Flink/Multi Engine refactor, will need a large rebase and rework,
   review, land (At risk for 0.6.0)
   - BloomIndex V2 - Global index implementation. (At risk)
   - HUDI-845 : Parallel writing i.e allow multiple writers (Pushed out of
   0.6.0)
   - HUDI-860 : Small File Handling without memory caching (Pushed out of
   0.6.0)


Thanks,
Sudha

On Mon, Aug 3, 2020 at 3:41 PM Vinoth Chandar  wrote:

> +1 (we need to formalize this well)
> But having just blockers land first, would help not just with rebasing, but
> also wind down towards cutting an RC by end of week.
>
>
> On Mon, Aug 3, 2020 at 2:53 PM Bhavani Sudha 
> wrote:
>
> > Hello all,
> >
> > As we are all hustling towards getting the blockers in, I wanted to
> propose
> > a code/merge freeze until we cut a release for 0.6.0  and restrict it to
> > only merging blockers identified for this release. It would reduce
> rebasing
> > time for blockers in progress. If we feel some issue is a serious blocker
> > we can discuss it here and bump it's priority.
> >
> > Please share your thoughts or concerns.
> >
> > Thanks,
> > Sudha
> >
> >
> > On Mon, Aug 3, 2020 at 8:19 AM Vinoth Chandar  wrote:
> >
> > > Given enough time has passed, Sudha can be our RM for 0.6.0.
> > >
> > > On the release blocker progress, we landed few blockers over the
> weekend,
> > > with some almost ready for landing
> > >
> > > Will send out a status update again tomorrow night PST!
> > >
> > > On Mon, Aug 3, 2020 at 8:17 AM Vinoth Chandar 
> wrote:
> > >
> > > > Hi anton.
> > > >
> > > > We were hoping to cut a release by last weekend. New target is this
> > > > weekend!
> > > > (tbh we were thrown off a bit due to COVID in Q2, given a lot of
> > > > PMC/Committers had additional kid care duties. Now we are back to
> > normal
> > > > cadence)
> > > >
> > > > Going forward, I plan to start a discussion around planning,
> > prioritizing
> > > > and other release processes after 0.6.0. Would be great to have the
> > > > community weigh in even more in these things upfront.
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Jul 31, 2020 at 6:49 PM Anton Zuyeu 
> > > wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> I apologize for possibly dumb question but when was 0.6.0 planned to
> > be
> > > >> released? Can't find any dates on Hudi related pages.
> > > >>
> > > >> On Thu, Jul 30, 2020 at 10:36 AM Vinoth Chandar 
> > > >> wrote:
> > > >>
> > > >> > Is anyone able to help with the at risk items? :)
> > > >> >
> > > >> > On Thu, Jul 30, 2020 at 7:07 AM leesf 
> wrote:
> > > >> >
> > > >> > > @Vinoth Chandar  Thanks for the reminder,
> > marked
> > > >> to
> > > >> > > blocker, and next week would be ok to me.
> > > >> > >
> > > >> > > Vinoth Chandar  于2020年7月30日周四 上午11:35写道:
> > > >> > >
> > > >> > > > @leesf   can we please mark the relevant
> > > >> > ticket(s)
> > > >> > > > with blocker priority, so it's ea

Re: [DISCUSS] Release 0.6.0 timelines

2020-08-03 Thread Bhavani Sudha
Hello all,

As we are all hustling towards getting the blockers in, I wanted to propose
a code/merge freeze until we cut a release for 0.6.0  and restrict it to
only merging blockers identified for this release. It would reduce rebasing
time for blockers in progress. If we feel some issue is a serious blocker
we can discuss it here and bump it's priority.

Please share your thoughts or concerns.

Thanks,
Sudha


On Mon, Aug 3, 2020 at 8:19 AM Vinoth Chandar  wrote:

> Given enough time has passed, Sudha can be our RM for 0.6.0.
>
> On the release blocker progress, we landed few blockers over the weekend,
> with some almost ready for landing
>
> Will send out a status update again tomorrow night PST!
>
> On Mon, Aug 3, 2020 at 8:17 AM Vinoth Chandar  wrote:
>
> > Hi anton.
> >
> > We were hoping to cut a release by last weekend. New target is this
> > weekend!
> > (tbh we were thrown off a bit due to COVID in Q2, given a lot of
> > PMC/Committers had additional kid care duties. Now we are back to normal
> > cadence)
> >
> > Going forward, I plan to start a discussion around planning, prioritizing
> > and other release processes after 0.6.0. Would be great to have the
> > community weigh in even more in these things upfront.
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Jul 31, 2020 at 6:49 PM Anton Zuyeu 
> wrote:
> >
> >> Hi All,
> >>
> >> I apologize for possibly dumb question but when was 0.6.0 planned to be
> >> released? Can't find any dates on Hudi related pages.
> >>
> >> On Thu, Jul 30, 2020 at 10:36 AM Vinoth Chandar 
> >> wrote:
> >>
> >> > Is anyone able to help with the at risk items? :)
> >> >
> >> > On Thu, Jul 30, 2020 at 7:07 AM leesf  wrote:
> >> >
> >> > > @Vinoth Chandar  Thanks for the reminder, marked
> >> to
> >> > > blocker, and next week would be ok to me.
> >> > >
> >> > > Vinoth Chandar  于2020年7月30日周四 上午11:35写道:
> >> > >
> >> > > > @leesf   can we please mark the relevant
> >> > ticket(s)
> >> > > > with blocker priority, so it's easier to track?
> >> > > >
> >> > > > Looks like we are nearing a choice for RM.
> >> > > > Any more thoughts on timelines? Looks like everyone so far is
> >> leaning
> >> > > > towards completeness of the release over doing it sooner?
> >> > > >
> >> > > > On Wed, Jul 29, 2020 at 6:36 PM vino yang 
> >> > wrote:
> >> > > >
> >> > > > > +1 on Sudha being RM for the release. And looking forward to
> >> 0.6.0.
> >> > > > >
> >> > > > > Best,
> >> > > > > Vino
> >> > > > >
> >> > > > > leesf  于2020年7月30日周四 上午9:15写道:
> >> > > > >
> >> > > > > > +1 on Sudha on being RM, and PR#1810
> >> > > > > > https://github.com/apache/hudi/pull/1810 (abstract hive sync
> >> > module)
> >> > > > > would
> >> > > > > > also goes to this release.
> >> > > > > >
> >> > > > > > Sivabalan  于2020年7月30日周四 上午2:18写道:
> >> > > > > >
> >> > > > > > > +1 on Sudha being RM for the release. Makes sense to push
> the
> >> > > release
> >> > > > > by
> >> > > > > > a
> >> > > > > > > week.
> >> > > > > > >
> >> > > > > > > On Wed, Jul 29, 2020 at 1:35 AM vbal...@apache.org <
> >> > > > vbal...@apache.org
> >> > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > >  +1 on Sudha on being RM for this release. Also agree on
> >> > pushing
> >> > > > the
> >> > > > > > > > release date by a week.
> >> > > > > > > > Balaji.V
> >> > > > > > > > On Tuesday, July 28, 2020, 10:08:41 PM PDT, Bhavani
> >> Sudha <
> >> > > > > > > > bhavanisud...@gmail.com> wrote:
> >> > > > > > > >
> >> > > > > > > >  Thanks Vinoth for the update. I can volunteer to RM this
> >> > > release.
> >> > > > > &

Re: [DISCUSS] Release 0.6.0 timelines

2020-07-28 Thread Bhavani Sudha
Thanks Vinoth for the update. I can volunteer to RM this release.

Understand 0.6.0 release is delayed than what we originally discussed. Q2
has been really hard with COVID and everything going on. Given that we are
at this point, I feel by delaying the RC by a week or so more if we can get
some of the 'At risk' items in, I would vote for that. That is just my
personal opinion. I ll let others chime in.

Thanks,
Sudha

On Tue, Jul 28, 2020 at 9:48 PM Vinoth Chandar  wrote:

> Hello all,
>
> Just wanted to kickstart a thread to firm up the RC cut date for 0.6.0 and
> pick a RM. (any volunteers?, if not I self nominate myself)
>
> Here's an update on where we are at with the remaining release blockers. I
> have marked items as "At risk" assuming we cut RC sometime next week.
> Please chime in with your thoughts. Ideally, we don't take any more
> blockers. If we also want to knock off the at risk items, then we would
> at-least push dates by another week (my guess).
>
> 0.6.0 Release blocker status (board
> <
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397=HUDI=detail=HUDI-69
> >)
> ,
>
>- Spark Datasource/MOR https://github.com/apache/hudi/pull/1848 needs
> to
>be tested by gary/balaji
>- Bootstrap
>   - Vinoth working on code review, tests for PR 1876,
>   - then udit will rework PR 1702
>   - then we will review, land PR 1870, 1869
>   - Also need to fix HUDI-999, HUDI-1021
>- Bulk insert V2 PR 1834, lower risk, independent PR, well tested
> already
>   - Dependent PR 1149 to be landed,
>   - and modes to be respected in V2 impl as well (At risk)
>- Upgrade Downgrade Hooks, PR 1858 : Siva has a PR out, code completing
>this week
>- HUDI-1054- Marker list perf improvement, Udit has a PR out
>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR nearing
>landing
>- HUDI-1098 : Marker file issue with non-existent files. Siva to begin
>impl
>- Spark Streaming + Async Compaction , test complete, code review
>comments and land PR 1752
>- Spark DataSource/Hive MOR Incremental Query HUDI-920 (At risk)
>- Flink/Multi Engine refactor, will need a large rebase and rework,
>review, land (At risk for 0.6.0, high scope, may not have enough time)
>- BloomIndex V2 - Global index implementation. (At risk)
>- HUDI-845 : Parallel writing i.e allow multiple writers (At risk)
>- HUDI-860 : Small File Handling without memory caching (At risk)
>
>
> Thanks
> Vinoth
>


Review Hudi MOR support from PrestoDB

2020-07-14 Thread Bhavani Sudha
Hello all,

Brandon has opened a PR for adding support for MOR tables from Presto -
https://github.com/prestodb/presto/pull/14795 . Please provide feedback if
you are interested.

Thanks,
Sudha


Re: [DISCUSS] Make delete marker configurable?

2020-07-06 Thread Bhavani Sudha
+1 as well. Thanks Raymond for explaining.


On Sun, Jun 28, 2020 at 11:35 AM Shiyan Xu 
wrote:

> Hi Sudha, the delete marker being configurable can give more flexibility to
> users when process delete events; they can check any bool field they may
> have on their own schema.
>
> On Sat, Jun 27, 2020 at 5:32 PM Bhavani Sudha 
> wrote:
>
> > Hi Raymond,
> >
> > I am trying to understand  the use case . Can you please provide more
> > context on what problem this addresses ?
> >
> >
> > Thanks,
> > Sudha
> >
> > On Fri, Jun 26, 2020 at 9:02 PM Shiyan Xu 
> > wrote:
> >
> > > Hi all,
> > >
> > > A small suggestion: as delta streamer relies on `_hoodie_is_deleted` to
> > do
> > > hard delete, can we make it configurable? as in users can specify any
> > > boolean field for delete marker and `_hoodie_is_deleted` remains as
> > > default.
> > >
> > > Regards,
> > > Raymond
> > >
> >
>


Re: DISCUSS code, config, design walk through sessions

2020-07-06 Thread Bhavani Sudha
+1 this is a great idea!

On Mon, Jul 6, 2020 at 7:54 AM vino yang  wrote:

> +1
>
> Adam Feldman  于2020年7月6日周一 下午9:55写道:
>
> > Interested
> >
> > On Mon, Jul 6, 2020, 08:29 Sivabalan  wrote:
> >
> > > +1 for sure
> > >
> > > On Mon, Jul 6, 2020 at 4:42 AM Gurudatt Kulkarni 
> > > wrote:
> > >
> > > > +1
> > > > Really a great idea. Will help in understanding the project better.
> > > >
> > > > On Mon, Jul 6, 2020 at 1:35 PM Pratyaksh Sharma <
> pratyaks...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > This is a great idea and really helpful one.
> > > > >
> > > > > On Mon, Jul 6, 2020 at 1:09 PM  wrote:
> > > > >
> > > > > > +1
> > > > > > It can also attract more partners to join us.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 07/06/2020 15:34, Ranganath Tirumala wrote:
> > > > > > +1
> > > > > >
> > > > > > On Mon, 6 Jul 2020 at 16:59, David Sheard <
> > > > > > david.she...@datarefactory.com.au>
> > > > > > wrote:
> > > > > >
> > > > > > > Perfect
> > > > > > >
> > > > > > > On Mon, 6 Jul. 2020, 1:30 pm Vinoth Chandar, <
> vin...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > As we scale the community, its important that more of us are
> > able
> > > > to
> > > > > > help
> > > > > > > > users, users becoming contributors.
> > > > > > > >
> > > > > > > > In the past, we have drafted faqs, trouble shooting guides.
> > But I
> > > > > feel
> > > > > > > > sometimes, more hands on walk through sessions over video
> could
> > > > help.
> > > > > > > >
> > > > > > > > I am happy to spend 2 hours each on code/configs,
> > > > > > > design/perf/architecture.
> > > > > > > > Have the session be recorded as well for future.
> > > > > > > >
> > > > > > > > What does everyone think?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinoth
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Ranganath Tirumala
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>


Re: [DISCUSS] Make delete marker configurable?

2020-06-27 Thread Bhavani Sudha
Hi Raymond,

I am trying to understand  the use case . Can you please provide more
context on what problem this addresses ?


Thanks,
Sudha

On Fri, Jun 26, 2020 at 9:02 PM Shiyan Xu 
wrote:

> Hi all,
>
> A small suggestion: as delta streamer relies on `_hoodie_is_deleted` to do
> hard delete, can we make it configurable? as in users can specify any
> boolean field for delete marker and `_hoodie_is_deleted` remains as
> default.
>
> Regards,
> Raymond
>


Re: [DISCUSS] Publishing benchmarks for releases

2020-06-22 Thread Bhavani Sudha
+1 as well

On Mon, Jun 22, 2020 at 4:19 AM David Sheard <
david.she...@datarefactory.com.au> wrote:

> Like the idea
>
> On Mon, 22 Jun. 2020, 9:56 am Sivabalan,  wrote:
>
> > Hey folks,
> > Is it a common practise to publish benchmarks for releases? I have
> put
> > up an initial PR  to add jmh
> > benchmark support to a couple of Hudi operations. If the community feels
> > positive on publishing benchmarks, we can add support for more operations
> > and for every release, we could publish some benchmark numbers.
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: IllegalStateException: Hudi File Id (...) has more than 1 pending compactions. Hudi 0.5.3 + S3

2020-06-21 Thread Bhavani Sudha
If you are running inline compaction it should not cause two pending
compactions on the same file group. Along with above details, can you
please open a [SUPPORT] git issue with full stack trace and also a `ls` of
you .hoodie folder if possible?

Thanks,
Sudha

On Thu, Jun 18, 2020 at 9:57 PM Zuyeu, Anton 
wrote:

> Hi Team,
>
> We are trying to run incremental updates to our MoR hudi table on S3 and
> it looks like inevitably after 20-30 commits table gets corrupted. We do
> initial data import and enable incremental upserts then we verify that
> tables are readable by running:
> hive> select * from table_name _ro limit 1;
>
> but after letting incremental upserts to run for several hours , the
> mentioned above select query starts throwing exceptions like:
> Failed with exception java.io.IOException:java.lang.IllegalStateException:
> Hudi File Id (HoodieFileGroupId{partitionPath='983',
> fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending
> compactions.
>
> Checking compactions mentioned in exception message via hudi-cli, do
> indeed verifies that fileid is present in both compactions. The upsert
> settings that we use are:
> hudiOptions = Map[String,String](
>   HoodieWriteConfig.TABLE_NAME → inputTableName,
>   "hoodie.consistency.check.enabled"->"true",
>   "hoodie.compact.inline.max.delta.commits"->"30",
>   "hoodie.compact.inline"->"true",
>   "hoodie.clean.automatic"->"true",
>   "hoodie.cleaner.commits.retained"->"1000",
>   "hoodie.keep.min.commits"->"1001",
>   "hoodie.keep.max.commits"->"1050",
>   DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
>   DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
>   DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY ->
> classOf[ComplexKeyGenerator].getName,
>   DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY
> ->"partition_val_str",
>   DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
>   DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
>   DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
>   DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY →
> "partition_val_str",
>   DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY →
> classOf[MultiPartKeysValueExtractor].getName,
>   DataSourceWriteOptions.HIVE_URL_OPT_KEY
> ->s"jdbc:hive2://$hiveServer2URI:1"
>
> )
>
> Any suggestions on what can cause or how to possibly debug this issue
> would help a lot.
>
> Thank you,
> Anton Zuyeu
>


Re: [DISCUSS] Introduce a write committed callback hook

2020-06-21 Thread Bhavani Sudha
+1 . I think this is a valid use case and would be useful in general.

On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar  wrote:

> +1 as well
>
> > We expect to introduce a proactive notification(event callback)
> mechanism. For example, a hook can be introduced after a successful commit.
>
> This would be very useful. We could write to a variety of event bus-es and
> notify new data arrival.
>
> On Sat, Jun 20, 2020 at 2:51 AM wangxianghu  wrote:
>
> > +1 for this, I think this is a feature worth doing.
> > Think about it in the filed of offline computing, data changes happens
> > hourly or daily, if there is no a notification mechanism to inform the
> > downstream,  then the tasks downstream will keeping running all the day
> > along, but the time really processing data maybe very short, this
> situation
> > will surely cause resource wastes.
> > > 2020年6月20日 上午8:13,vino yang  写道:
> > >
> > > Hi all,
> > >
> > > Currently, we have a need to incrementally process and build a new
> table
> > > based on an original hoodie table. We expect that after a new commit is
> > > completed on the original hoodie table, it could be retrieved ASAP, so
> > that
> > > it can be used for incremental view queries. Based on the existing
> > > capabilities, one approach we can use is to continuously poll Hoodie's
> > > Timeline to check for new commits. This is a very common processing
> way,
> > > but it will cause unnecessary waste of resources.
> > >
> > > We expect to introduce a proactive notification(event callback)
> > mechanism.
> > > For example, a hook can be introduced after a successful commit.
> External
> > > processors interested in the commit, such as scheduling systems, can
> use
> > > the hook as their own trigger. When a certain commit is completed, the
> > > scheduling system can pull up the task of obtaining incremental data
> > > through the API in the callback. Thereby completing the processing of
> > > incremental data.
> > >
> > > There is currently a `postCommit` method in Hudi's client module, and
> the
> > > existing implementation is mainly used for compression and cleanup
> after
> > > commit. And the triggering time is a little early. Not after everything
> > is
> > > processed, we found that it may still cause the rollback of the commit
> > due
> > > to the exception. We need to find a new location to trigger this hook
> to
> > > ensure that the commit is deterministic.
> > >
> > > This is one of our scene requirements, and it will be a very useful
> > feature
> > > combined with the incremental query, it can make the incremental
> > processing
> > > more timely.
> > >
> > > We hope to hear what the community thinks of this proposal. Any
> comments
> > > and opinions are appreciated.
> > >
> > > Best,
> > > Vino
> >
> >
>


[DISCUSS] Regarding nightly builds

2020-06-18 Thread Bhavani Sudha
Hello all,

Should we have nightly builds that way we can point users to those builds
for the latest features introduced, instead of being blocked on the next
release. Also this kind of gives an early feedback on new features or fixes
 if any further improvements are needed.  Does anyone know if and how other
Apache projects handle nightly builds?

Thanks,
Sudha


Re: [ANNOUNCE] Apache Hudi 0.5.3 released

2020-06-18 Thread Bhavani Sudha
Great job. Thanks Siva for driving this to completion.

-Sudha

On Thu, Jun 18, 2020 at 4:36 AM Vinoth Chandar  wrote:

> Thanks for all the great work!
> Onto 0.6.0 now!
>
> On Thu, Jun 18, 2020 at 4:06 AM leesf  wrote:
>
> > Great, thanks siva and sudha!
> >
> > vino yang  于2020年6月18日周四 下午2:16写道:
> >
> > > Great job!
> > >
> > > Thanks for your hard work, Siva and Sudha!
> > >
> > > Best,
> > > Vino
> > >
> > > nishith agarwal  于2020年6月18日周四 上午11:09写道:
> > >
> > > > Great job Siva and Sudha, thanks for driving this!
> > > >
> > > > -Nishith
> > > >
> > > > On Wed, Jun 17, 2020 at 7:16 PM  wrote:
> > > >
> > > > > Super news :)  The very first release after graduation. Awesome job
> > > Siva
> > > > > and Sudha for spearheading the release of 0.5.3.
> > > > > Balaji.V
> > > > >
> > > > > Sent from Yahoo Mail for iPhone
> > > > >
> > > > >
> > > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan <
> n.siv...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > The Apache Hudi community is pleased to announce the release of
> > Apache
> > > > Hudi
> > > > > 0.5.3.
> > > > >
> > > > >
> > > > >
> > > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes
> and
> > > > > Incrementals. Apache Hudi manages storage of large analytical
> > datasets
> > > on
> > > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible
> storage)
> > > and
> > > > > provides the ability to update/delete records as well capture
> > changes.
> > > > >
> > > > >
> > > > >
> > > > > 0.5.3 is a bug fix release and is the first release after
> graduating
> > as
> > > > > TLP. It includes more than 35 resolved issues, comprising general
> > > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline
> > Server
> > > > and
> > > > > Incremental Cleaning by default for both delta-streamer and spark
> > > > > datasource writes. Apart from multiple bug fixes, this release also
> > > > > improves write performance like avoiding unnecessary loading of
> data
> > > > after
> > > > > writes and improving parallelism while searching for existing files
> > for
> > > > > writing new records.
> > > > >
> > > > >
> > > > >
> > > > > For details on how to use Hudi, please look at the quick start page
> > > > located
> > > > > at https://hudi.apache.org/docs/quick-start-guide.html
> > > > >
> > > > > If you'd like to download the source release, you can find it here:
> > > > >
> > > > > https://github.com/apache/hudi/releases/tag/release-0.5.3
> > > > >
> > > > > You can read more about the release (including release notes) here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12348256
> > > > >
> > > > >
> > > > >
> > > > > We would like to thank all contributors, the community, and the
> > Apache
> > > > > Software Foundation for enabling this release and we look forward
> to
> > > > > continued collaboration. We welcome your help and feedback. For
> more
> > > > > information on how to report problems, and to get involved, visit
> the
> > > > > project website at: http://hudi.apache.org/
> > > > >
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager)
> > > > >
> > > > > On behalf of the Apache Hudi
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


20200616 Weekly Sync Minutes

2020-06-16 Thread Bhavani Sudha
https://cwiki.apache.org/confluence/display/HUDI/20200616+Weekly+Sync+Minutes


Thanks,
Sudha


Re: [VOTE] Release 0.5.3, release candidate #2

2020-06-11 Thread Bhavani Sudha
+1 (binding)

Downloaded tar and verified compile [OK]

Run integration test locally. [OK]

Run a few tests in IDE. [OK]

Run quickstart [OK]

Verify NOTICE and LICENSE exists [OK]

Check Checksum [OK]

Check no Binary files in source release [OK]

Rat Check Passed [OK]


Thanks,

Sudha

On Wed, Jun 10, 2020 at 2:57 PM Sivabalan  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version 0.5.3,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>  The complete staging area is available for your review, which includes:
>
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 001B66FA2B2543C151872CCC29A4FD82F1508833 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-0.5.3-rc2" [5],
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
>
> Thanks,
> Release Manager
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12348256
>
> [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.5.3-rc2/
>
> [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
>
> [4] https://repository.apache.org/content/repositories/orgapachehudi-1023/
>
> [5] https://github.com/apache/hudi/tree/release-0.5.3-rc2
>


Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Bhavani Sudha
Ah okay. Thanks for letting us know. I created a Jira here to capture this
thread - https://issues.apache.org/jira/browse/HUDI-1020. Feel free to add
to the jira.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 11:03 AM Mario de Sá Vera 
wrote:

> Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
> unfortunately ... but restrict myself to be an enthusiastic as my current
> employer applies some severe restrictions.
>
> I would be more than happy to contribute by specifying the requirements but
> from a code developer perspective I will have to pass that for now...
>
> Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <
> bhavanisud...@gmail.com>
> escreveu:
>
> > Definitely. I was trying to add you to the Hudi contributors so you can
> > create a Jira . For that I need a jira id. If you have not already signed
> > up, please sign up for Jira and let me know your jira id.
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
> > wrote:
> >
> > > Hi Sudha,
> > >
> > > Can you or Vinoth help me with this? How can we create a JIRA for that
> ?
> > >
> > > I can collaborate bringing the description and definition of done.
> > >
> > > Thanks,
> > >
> > > Mario.
> > >
> > > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, 
> > wrote:
> > >
> > > > Hi Mario,
> > > >
> > > > Can you please share your jira id ?
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> > > > wrote:
> > > >
> > > > > hey Vinoth, I noticed you added this suggestion to the weekly log
> ..
> > > that
> > > > > is great ! just let me know if I am able to create a JIRA , as I
> > tried
> > > to
> > > > > go to HUDI project in Apache and did not find a way to do it. I can
> > > bring
> > > > > in a good description of the benefits etc...
> > > > >
> > > > > thanks, Mario.
> > > > >
> > > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> > vin...@apache.org
> > > >
> > > > > escreveu:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We can probably make a new JIRA. Not sure if there is an existing
> > > JIRA
> > > > to
> > > > > > re-use.
> > > > > > The Following modules are good to look at.
> > > > > >
> > > > > > hudi-timeline-service
> > > > > > packaging/hudi-timeline-server-bundle
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> > desav...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry Vinoth for not being clear... If that is a work in
> progress
> > > > would
> > > > > > you
> > > > > > > have a jira I could follow up and contribute to ? If not , what
> > is
> > > > the
> > > > > > > module name you suggest me looking at?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, 
> > > wrote:
> > > > > > >
> > > > > > > > Sorry did not understand the last part. :) are you suggesting
> > we
> > > > > > create a
> > > > > > > > jira
> > > > > > > >
> > > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > > desav...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > That sounds great ! Will check that and keep an eye on the
> > long
> > > > > > running
> > > > > > > > > server approach... once it gets a ticket I could watch for
> > just
> > > > let
> > > > > > me
> > > > > > > > know
> > > > > > > > > please.
> > > > > > > > >
> > > > > &g

Re: How to extend the timeline server schema to accommodate business metadata

2020-06-10 Thread Bhavani Sudha
Definitely. I was trying to add you to the Hudi contributors so you can
create a Jira . For that I need a jira id. If you have not already signed
up, please sign up for Jira and let me know your jira id.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera 
wrote:

> Hi Sudha,
>
> Can you or Vinoth help me with this? How can we create a JIRA for that ?
>
> I can collaborate bringing the description and definition of done.
>
> Thanks,
>
> Mario.
>
> On Tue, 9 Jun 2020, 23:46 Bhavani Sudha,  wrote:
>
> > Hi Mario,
> >
> > Can you please share your jira id ?
> >
> > Thanks,
> > Sudha
> >
> > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera 
> > wrote:
> >
> > > hey Vinoth, I noticed you added this suggestion to the weekly log ..
> that
> > > is great ! just let me know if I am able to create a JIRA , as I tried
> to
> > > go to HUDI project in Apache and did not find a way to do it. I can
> bring
> > > in a good description of the benefits etc...
> > >
> > > thanks, Mario.
> > >
> > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar  >
> > > escreveu:
> > >
> > > > Hi,
> > > >
> > > > We can probably make a new JIRA. Not sure if there is an existing
> JIRA
> > to
> > > > re-use.
> > > > The Following modules are good to look at.
> > > >
> > > > hudi-timeline-service
> > > > packaging/hudi-timeline-server-bundle
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera  >
> > > > wrote:
> > > >
> > > > > Sorry Vinoth for not being clear... If that is a work in progress
> > would
> > > > you
> > > > > have a jira I could follow up and contribute to ? If not , what is
> > the
> > > > > module name you suggest me looking at?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mario.
> > > > >
> > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, 
> wrote:
> > > > >
> > > > > > Sorry did not understand the last part. :) are you suggesting we
> > > > create a
> > > > > > jira
> > > > > >
> > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > desav...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > That sounds great ! Will check that and keep an eye on the long
> > > > running
> > > > > > > server approach... once it gets a ticket I could watch for just
> > let
> > > > me
> > > > > > know
> > > > > > > please.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, 
> > > wrote:
> > > > > > >
> > > > > > > > Hi Mario,
> > > > > > > >
> > > > > > > > We actually started with the idea of making the timeline
> > server,
> > > a
> > > > > long
> > > > > > > > running service.  We have a module if you notice that builds
> > our
> > > a
> > > > > > bundle
> > > > > > > > that you could deploy. May be you can play with it and see if
> > > that
> > > > > > sounds
> > > > > > > > interesting to you. It will definitely have some rough edges
> > > given
> > > > > it’s
> > > > > > > not
> > > > > > > > been widely used.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinoth
> > > > > > > >
> > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > desav...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> sometime
> > > > > > thinking
> > > > > > > > over
> > > > > > > > > another possibility which would be externalising the Hudi
> > > > timeline
> > > > > >

Re: Wish to contribute

2020-06-09 Thread Bhavani Sudha
Done! Welcome to Hudi :)

On Tue, Jun 9, 2020 at 3:45 PM Alan Chu  wrote:

> My mistake, it's chualan. Thanks!
>
>
> On Tue, Jun 9, 2020 at 5:44 PM Bhavani Sudha 
> wrote:
>
> > Hi Alan,
> >
> > Please share your jira id.
> >
> > Thanks,
> > Sudha
> >
> > On Tue, Jun 9, 2020 at 3:13 PM Alan Chu  wrote:
> >
> > > Hi,
> > >
> > > I'd love to contribute to Hudi, please add me to the contributor list
> if
> > > possible, thanks!
> > >
> > >
> > > Best,
> > > Alan Chu
> > >
> >
>


Re: How to extend the timeline server schema to accommodate business metadata

2020-06-09 Thread Bhavani Sudha
Hi Mario,

Can you please share your jira id ?

Thanks,
Sudha

On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera  wrote:

> hey Vinoth, I noticed you added this suggestion to the weekly log .. that
> is great ! just let me know if I am able to create a JIRA , as I tried to
> go to HUDI project in Apache and did not find a way to do it. I can bring
> in a good description of the benefits etc...
>
> thanks, Mario.
>
> Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar 
> escreveu:
>
> > Hi,
> >
> > We can probably make a new JIRA. Not sure if there is an existing JIRA to
> > re-use.
> > The Following modules are good to look at.
> >
> > hudi-timeline-service
> > packaging/hudi-timeline-server-bundle
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera 
> > wrote:
> >
> > > Sorry Vinoth for not being clear... If that is a work in progress would
> > you
> > > have a jira I could follow up and contribute to ? If not , what is the
> > > module name you suggest me looking at?
> > >
> > > Regards,
> > >
> > > Mario.
> > >
> > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar,  wrote:
> > >
> > > > Sorry did not understand the last part. :) are you suggesting we
> > create a
> > > > jira
> > > >
> > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera 
> > > > wrote:
> > > >
> > > > > That sounds great ! Will check that and keep an eye on the long
> > running
> > > > > server approach... once it gets a ticket I could watch for just let
> > me
> > > > know
> > > > > please.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, 
> wrote:
> > > > >
> > > > > > Hi Mario,
> > > > > >
> > > > > > We actually started with the idea of making the timeline server,
> a
> > > long
> > > > > > running service.  We have a module if you notice that builds our
> a
> > > > bundle
> > > > > > that you could deploy. May be you can play with it and see if
> that
> > > > sounds
> > > > > > interesting to you. It will definitely have some rough edges
> given
> > > it’s
> > > > > not
> > > > > > been widely used.
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > desav...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > > > thinking
> > > > > > over
> > > > > > > another possibility which would be externalising the Hudi
> > timeline
> > > > > > service
> > > > > > > itself to an external server holding both operational (ie Hudi)
> > and
> > > > > > > business metadata.
> > > > > > >
> > > > > > > would you guys have any opinion on that ? would that be easy
> as I
> > > do
> > > > > not
> > > > > > > seem to see a way yet , except reading about RocksDB but that
> is
> > > > still
> > > > > > not
> > > > > > > quite clear.
> > > > > > >
> > > > > > > best regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > mail.vinoth.chan...@gmail.com> escreveu:
> > > > > > >
> > > > > > > > Hi Mario,
> > > > > > > >
> > > > > > > > Thanks for the detailed explanation. Hudi already allows
> extra
> > > > > metadata
> > > > > > > to
> > > > > > > > be written atomically with each commit i.e write operation.
> In
> > > > fact,
> > > > > > that
> > > > > > > > is how we track checkpoints for our delta streamer tool.. It
> > may
> > > > not
> > > > > > > solve
> > > > > > > > the need for querying the data together with this
> information.
> > > but
> > > > > > gives
> > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > >
> > > > > > > > >>If we enable the timeline service metadata model to be
> > extended
> > > > we
> > > > > > > could
> > > > > > > > use the service instance itself to support specialised
> queries
> > > that
> > > > > > > involve
> > > > > > > > business qualifiers in order to return a proper set of
> metadata
> > > > > > pointing
> > > > > > > to
> > > > > > > > the related commits
> > > > > > > >
> > > > > > > > This is a good idea actually.. There is another active
> discuss
> > > > thread
> > > > > > on
> > > > > > > > making the metadata queryable.. there is also
> > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we
> paused
> > > for
> > > > > > now..
> > > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > vinoth
> > > > > > > >
> > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > desav...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Balaji,
> > > > > > > > >
> > > > > > > > > business metadata are all types of info related to the
> > business
> > > > > where
> > > > > > > the
> > > > > > > > > Hudi solution is being used... from a COB (ie close of
> > business
> > > > > date)
> > > > > > > > > related to that commit to any qualifier related to that
> > 

Re: Wish to contribute

2020-06-09 Thread Bhavani Sudha
Hi Alan,

Please share your jira id.

Thanks,
Sudha

On Tue, Jun 9, 2020 at 3:13 PM Alan Chu  wrote:

> Hi,
>
> I'd love to contribute to Hudi, please add me to the contributor list if
> possible, thanks!
>
>
> Best,
> Alan Chu
>


Re: TLP Announcement

2020-06-04 Thread Bhavani Sudha
Thank you all and congratulations. This is a big milestone!

-Sudha

On Thu, Jun 4, 2020 at 9:21 AM vino yang  wrote:

> Great news!
>
> Thanks for the whole community!
>
> Best,
> Vino
>
> Pratyaksh Sharma  于2020年6月4日周四 下午11:23写道:
>
> > That is a great news.
> >
> > On Thu, Jun 4, 2020 at 7:58 PM Vinoth Chandar  wrote:
> >
> > > Hello all,
> > >
> > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for
> > all
> > > your contributions! We could not have been achieved that without such a
> > > great community effort!
> > >
> > > Please help spread the word!
> > >
> > > - GlobeNewswire
> > >
> > >
> >
> http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html
> > >  - ASF "Foundation" blog https://s.apache.org/odtwv
> > >  - @TheASF twitter feed
> > > https://twitter.com/TheASF/status/1268528110959497217
> > >  - The ASF on LinkedIn
> > > https://www.linkedin.com/company/the-apache-software-foundation
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-17 Thread Bhavani Sudha
Hello all,

I wanted to send a quick update on 0.5.3 readiness and code freeze.
There were few more candidate that requested to go in 0.5.3. Of those, we
are still waiting on the following 5 PRs to be reviewed and landed.


   - #1633 HUDI-858 <https://jira.apache.org/jira/browse/HUDI-858> Allow
   multiple operations to be executed within a single commit


   - #1634 HUDI-846 <https://jira.apache.org/jira/browse/HUDI-846>Enable
   Incremental cleaning and embedded timeline-server by default


   - #1596 HUDI-863 <https://jira.apache.org/jira/browse/HUDI-863> get
   decimal properties from derived spark DataType


   - #1602 HUDI-494 <https://jira.apache.org/jira/browse/HUDI-494> fix
   incorrect record size estimation


   - #1636 HUDI-895 <https://jira.apache.org/jira/browse/HUDI-895> Remove
   unnecessary listing .hoodie folder when using timeline server


Given these are partially reviewed or almost about to land, I suggest we
wait for two or three days before code freeze and prioritize these PRs for
0.5.3

Please follow the updates on https://jira.apache.org/jira/browse/HUDI-890 for
the entire list of PRs that are going in 0.5.3

Thanks,
Sudha




On Wed, May 13, 2020 at 5:00 AM cooper  wrote:

> +1
>
> Bhavani Sudha  于2020年5月13日周三 下午1:19写道:
>
> > Thank you all. I created a jira here -
> > https://jira.apache.org/jira/browse/HUDI-890 that tracks the list of
> > patches going into this release. So far I am able to cherry pick these
> > commits and get a successful local run of tests.
> >
> > Let us aim to code freeze by end of this week (May 16th EOD) for more
> > patches that can go into 0.5.3 release.
> >
> > Please respond if you think there are more candidate PRs (criteria: perf
> > improvements/bug fixes) that can be included in 0.5.3.
> >
> > Thanks,
> > Sudha
> >
> > On Thu, May 7, 2020 at 4:16 PM Minjeong Noh 
> > wrote:
> >
> > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/dbc9acd23a4eb208c7cd458bb3adaf54731d4145
> > > <
> > >
> >
> https://github.com/apache/incubator-hudi/commit/dbc9acd23a4eb208c7cd458bb3adaf54731d4145
> > >
> > >
> > >
> > > On 2020/05/06 20:31:00, Vinoth Chandar  wrote:
> > > > Hi Sudha,>
> > > >
> > > > +1 on the overall idea.. I tried to pick out few of these PRs that
> are>
> > > >
> > > >  - Small enough to apply easily>
> > > >  - Have limited scope, fixing pointed problems>
> > > >  - Have high impact on performance or usability>
> > > >
> > > > [HUDI-799] Use appropriate FS when loading configs>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/acb1ada2f756b49d9f9a0aa152f99fcc9e86dde7
> > >
> > >
> > > >
> > > > [HUDI-713] Fix conversion of Spark array of struct type to Avro
> schema>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/ce0a4c64d07d6eea926d1bfb92b69ae387b88f50
> > >
> > >
> > > >
> > > > [HUDI-656][Performance] Return a dummy Spark relation after writing>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/c40a0d4e91896dece51969f5308016ecb3aa635c
> > >
> > >
> > > >
> > > > [HUDI-850] Avoid unnecessary listings in incremental cleaning mode>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97
> > >
> > >
> > > >
> > > > [HUDI-724] Parallelize getSmallFiles for partitions>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/1f5b0c77d6c87a936f2d34287ec6a1df1cb18b33
> > >
> > >
> > > >
> > > > [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/2d040145810b8b14c59c5882f9115698351039d1
> > >
> > >
> > > >
> > > > Add constructor to HoodieROTablePathFilter>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/418f9bb2e91ed6c02077d36e49a47f0c8d08303a
> > >
> > >
> > > >
> > > > [HUDI-539] Make ROPathFilter conf member serializable>
> > > >
> > >
> >
> https://github.com/apache/incubator-hudi/commit/e3019031d8fff60df4fec82eac3fd5c044011635
> > >
> > >
> > > >
> > > >

Re: Apache Hudi Graduation vote on general@incubator

2020-05-16 Thread Bhavani Sudha
Hello all,

Just wanted to clarify that the voting is happening in the
gene...@incubator.apache.org . To register vote, please click the link that
Vinoth shared previously (
https://lists.apache.org/thread.html/r8039c8eece636df8c81a24c26965f5c1556a3c6404de02912d6455b4%40%3Cgeneral.incubator.apache.org%3E)
and reply there so it goes to the same voting thread.

Thanks,
Sudha

On Fri, May 15, 2020 at 7:06 PM Vinoth Chandar  wrote:

> Hello all,
>
> Just started the VOTE on the IPMC general list [1]
>
> If you are an IPMC member, you do a *binding *vote
> If you are not, you can still do a *non-binding* vote
>
> Please take a moment to vote.
>
> [1]
>
> https://lists.apache.org/thread.html/r8039c8eece636df8c81a24c26965f5c1556a3c6404de02912d6455b4%40%3Cgeneral.incubator.apache.org%3E
>
> Thanks
> Vinoth
>


Re: [DISCUSS] Logos on project front page.

2020-05-13 Thread Bhavani Sudha
+1

>From the doc it seems like we can probably move them to secondary page.

Thanks,
Sudha

On Tue, May 12, 2020 at 11:56 PM vino yang  wrote:

> +1 to follow the best practices.
>
> vbal...@apache.org  于2020年5月13日周三 上午10:31写道:
>
> >
> > I agree on following the best practices.
> > Balaji.VOn Tuesday, May 12, 2020, 06:52:59 PM PDT, Vinoth Chandar <
> > vin...@apache.org> wrote:
> >
> >  Hello all,
> >
> > This was raised during the graduation discussion. We have been referred
> to
> > [1]. The doc ends saying. "These best practices for linking to outside
> > pages on project websites are meant as suggestions for projects. PMCs are
> > free to adopt (or not) any of these suggestions for their sites.".
> >
> > But I would prefer to play by the best practices if we can..
> >
> > Can you all chime in with your thoughts?
> >
> >
> >
> > [1] https://www.apache.org/foundation/marks/linking
> >
>


Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-12 Thread Bhavani Sudha
Thank you all. I created a jira here -
https://jira.apache.org/jira/browse/HUDI-890 that tracks the list of
patches going into this release. So far I am able to cherry pick these
commits and get a successful local run of tests.

Let us aim to code freeze by end of this week (May 16th EOD) for more
patches that can go into 0.5.3 release.

Please respond if you think there are more candidate PRs (criteria: perf
improvements/bug fixes) that can be included in 0.5.3.

Thanks,
Sudha

On Thu, May 7, 2020 at 4:16 PM Minjeong Noh  wrote:

>
> https://github.com/apache/incubator-hudi/commit/dbc9acd23a4eb208c7cd458bb3adaf54731d4145
> <
> https://github.com/apache/incubator-hudi/commit/dbc9acd23a4eb208c7cd458bb3adaf54731d4145>
>
>
> On 2020/05/06 20:31:00, Vinoth Chandar  wrote:
> > Hi Sudha,>
> >
> > +1 on the overall idea.. I tried to pick out few of these PRs that are>
> >
> >  - Small enough to apply easily>
> >  - Have limited scope, fixing pointed problems>
> >  - Have high impact on performance or usability>
> >
> > [HUDI-799] Use appropriate FS when loading configs>
> >
> https://github.com/apache/incubator-hudi/commit/acb1ada2f756b49d9f9a0aa152f99fcc9e86dde7>
>
> >
> > [HUDI-713] Fix conversion of Spark array of struct type to Avro schema>
> >
> https://github.com/apache/incubator-hudi/commit/ce0a4c64d07d6eea926d1bfb92b69ae387b88f50>
>
> >
> > [HUDI-656][Performance] Return a dummy Spark relation after writing>
> >
> https://github.com/apache/incubator-hudi/commit/c40a0d4e91896dece51969f5308016ecb3aa635c>
>
> >
> > [HUDI-850] Avoid unnecessary listings in incremental cleaning mode>
> >
> https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97>
>
> >
> > [HUDI-724] Parallelize getSmallFiles for partitions>
> >
> https://github.com/apache/incubator-hudi/commit/1f5b0c77d6c87a936f2d34287ec6a1df1cb18b33>
>
> >
> > [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned>
> >
> https://github.com/apache/incubator-hudi/commit/2d040145810b8b14c59c5882f9115698351039d1>
>
> >
> > Add constructor to HoodieROTablePathFilter>
> >
> https://github.com/apache/incubator-hudi/commit/418f9bb2e91ed6c02077d36e49a47f0c8d08303a>
>
> >
> > [HUDI-539] Make ROPathFilter conf member serializable>
> >
> https://github.com/apache/incubator-hudi/commit/e3019031d8fff60df4fec82eac3fd5c044011635>
>
> >
> > Add changes for presto mor queries>
> >
> https://github.com/apache/incubator-hudi/commit/e21441ad8317f302fed947c414e059a332e4d1ef>
>
> >
> > [HUDI-782] Add support of Aliyun object storage service.>
> >
> https://github.com/apache/incubator-hudi/commit/5d717a28f45137bea71dffa31b0ae7ccbf1bda00>
>
> >
> >
> > Please chime in with your thoughts, as well.>
> >
> > I think there are some bug fixes in the pending PRs as well. esp from
> Alex>
> > and Pratyaksh .>
> >
> > Thanks>
> > Vinoth>
> >
> >
> > On Tue, May 5, 2020 at 9:33 PM Bhavani Sudha >
> > wrote:>
> >
> > > Hello all,>
> > >>
> > > I am wondering if we should do a 0.5.3 release by backporting all
> minor to>
> > > medium bug fixes (that are in master already) to 0.5.2 and do a minor>
> > > release ? That way we can use some time to reserve 0.6.0 release for
> all>
> > > major features that are upcoming and/or almost there. Please share
> your>
> > > thoughts. If you agree also please share the list of fixes that you
> know of>
> > > that can go into 0.5.3.>
> > >>
> > > Thanks,>
> > > Sudha>
> > >>
> >


Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-07 Thread Bhavani Sudha
Thanks for the responses. We will go ahead with 0.5.3 efforts.

I was going to RM the 0.6.0 release. I ll focus on this one instead. If
anyone else wants to drive the 0.6.0 please feel free to do so.

Thanks,
Sudha


On Wed, May 6, 2020 at 10:55 PM vbal...@apache.org 
wrote:

>  +1 for releasing 0.5.3.
> Balaji.V
> On Wednesday, May 6, 2020, 10:36:54 PM PDT, Y Ethan Guo <
> ethan.guoyi...@gmail.com> wrote:
>
>  +1
>
> On Wed, May 6, 2020 at 6:29 PM vino yang  wrote:
>
> > +1 for 0.5.3 as well
> >
> > Nishith  于2020年5月7日周四 上午8:16写道:
> >
> > > +1 on the idea
> > >
> > > Sent from my iPhone
> > >
> > > > On May 6, 2020, at 3:09 PM, Shiyan Xu 
> > > wrote:
> > > >
> > >
> >


Re: [VOTE] Apache Hudi graduation to top level project

2020-05-06 Thread Bhavani Sudha
+1



On Wed, May 6, 2020 at 1:58 PM Vinoth Chandar  wrote:

> Hello all,
>
> Per our discussion on the dev mailing list (
>
> https://lists.apache.org/thread.html/rc98303d9f09665af90ab517ea0baeb7c374e9a5478d8424311e285cd%40%3Cdev.hudi.apache.org%3E
> )
>
> I would like to call a VOTE for Apache Hudi graduating as a top level
> project.
>
> If this vote passes, the next step would be to submit the resolution below
> to the Incubator PMC, who would vote on sending it on to the Apache Board.
>
> Vote:
> [ ] +1 - Recommend graduation of Apache Hudi as a TLP
> [ ] -1 - Do not recommend graduation of Apache Hudi because...
>
> The VOTE is open for a minimum of 72 hours.
>
> Establish the Apache Hudi Project
>
> WHEREAS, the Board of Directors deems it to be in the best interests of the
> Foundation and consistent with the Foundation's purpose to establish a
> Project Management Committee charged with the creation and maintenance of
> open-source software, for distribution at no charge to the public, related
> to providing atomic upserts and incremental data streams on Big Data.
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC),
> to be known as the "Apache Hudi Project", be and hereby is established
> pursuant to Bylaws of the Foundation; and be it further
>
> RESOLVED, that the Apache Hudi Project be and hereby is responsible for the
> creation and maintenance of software related to providing atomic upserts
> and incremental data streams on Big Data; and be it further
>
> RESOLVED, that the office of "Vice President, Apache Hudi" be and hereby is
> created, the person holding such office to serve at the direction of the
> Board of Directors as the chair of the Apache Hudi Project, and to have
> primary responsibility for management of the projects within the scope of
> responsibility of the Apache Hudi Project; and be it further
>
> RESOLVED, that the persons listed immediately below be and hereby are
> appointed
> to serve as the initial members of the Apache Hudi Project:
>
>  * Anbu Cheeralan   
>
>  * Balaji Varadarajan
>
>  * Bhavani Sudha Saktheeswaran   
>
>  * Luciano Resende 
>
>  * Nishith Agarwal
>
>  * Prasanna Rajaperumal
>
>  * Shaofeng Li   
>
>  * Steve Blackmon  
>
>  * Suneel Marthi  
>
>  * Thomas Weise
>
>  * Vino Yang   
>
>  * Vinoth Chandar
>
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Vinoth Chandar be appointed to
> the office of Vice President, Apache Hudi, to serve in accordance with and
> subject to the direction of the Board of Directors and the Bylaws of the
> Foundation until death, resignation, retirement, removal of
> disqualification, or until a successor is appointed; and
>
> be it further
>
> RESOLVED, that the Apache Hudi Project be and hereby is tasked with the
> migration and rationalization of the Apache Incubator Hudi podling; and
>
> be it further
>
> RESOLVED, that all responsibilities pertaining to the Apache Incubator Hudi
> podling encumbered upon the Apache Incubator PMC are hereafter discharged.
>


[DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-05 Thread Bhavani Sudha
Hello all,

I am wondering if we should do a 0.5.3 release by backporting all minor to
medium bug fixes (that are in master already) to 0.5.2 and do a minor
release ? That way we can use some time to reserve 0.6.0 release for all
major features that are upcoming and/or almost there. Please share your
thoughts. If you agree also please share the list of fixes that you know of
that can go into 0.5.3.

Thanks,
Sudha


Re: Table name is not respected while inserting record with different table name with Append mode

2020-04-30 Thread Bhavani Sudha
Thanks for reporting this Akash. I created the Jira to track this -
https://jira.apache.org/jira/browse/HUDI-852 . Feel free to take a stab if
interested. Let me know so I can re-assign it to you.

Thanks,
Sudha

On Wed, Apr 29, 2020 at 10:31 PM aakash aakash 
wrote:

> Hi,
>
> While running commands from Hudi quick start guide, I found that the
> library does not check for the table name in the request against the table
> name in the metadata available in the base path, I think it should throw
> TableAlreadyExist, In case of Save mode: *overwrite *it warns.
>
> *spark-2.4.4-bin-hadoop2.7/bin/spark-shell   --packages
>
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
>  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'*
>
> scala> df.write.format("hudi").
>  | options(getQuickstartWriteConfigs).
>  | option(PRECOMBINE_FIELD_OPT_KEY, "ts").
>  | option(RECORDKEY_FIELD_OPT_KEY, "uuid").
>  | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
> * | option(TABLE_NAME, "test_table").*
>  | mode(*Append*).
>  | save(basePath)
> 20/04/29 17:23:42 WARN DefaultSource: Snapshot view not supported yet via
> data source, for MERGE_ON_READ tables. Please query the Hive table
> registered using Spark SQL.
>
> scala>
>
> No exception is thrown if we run this
>
> scala> df.write.format("hudi").
>  | options(getQuickstartWriteConfigs).
>  | option(PRECOMBINE_FIELD_OPT_KEY, "ts").
>  | option(RECORDKEY_FIELD_OPT_KEY, "uuid").
>  | option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
> * | option(TABLE_NAME, "foo_table").*
>  | mode(*Append*).
>  | save(basePath)
> 20/04/29 17:24:37 WARN DefaultSource: Snapshot view not supported yet via
> data source, for MERGE_ON_READ tables. Please query the Hive table
> registered using Spark SQL.
>
> scala>
>
>
> scala> df.write.format("hudi").
>  |   options(getQuickstartWriteConfigs).
>  |   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
>  |   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
>  |   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
>  |   option(TABLE_NAME, *tableName*).
>  |   mode(*Overwrite*).
>  |   save(basePath)
> *20/04/29 22:25:16 WARN HoodieSparkSqlWriter$: hoodie table at
> file:/tmp/hudi_trips_cow already exists. Deleting existing data &
> overwriting with new data.*
> 20/04/29 22:25:18 WARN DefaultSource: Snapshot view not supported yet via
> data source, for MERGE_ON_READ tables. Please query the Hive table
> registered using Spark SQL.
>
> scala>
>
>
> Regards,
> Aakash
>


Re: [DISCUSS] Bug bash?

2020-04-28 Thread Bhavani Sudha
On Mon, Apr 27, 2020 at 10:21 PM Vinoth Chandar  wrote:

> Great!  I will prep the bugs and do uniform assignment across people in
> this thread :)
>
> Sudha (RM for next release), please co-ordinate the timing of this based on
> the release timeline.
>

>> Sounds good.

>
> if this works well, we can make this a Hudi release tradition :)
>
> On Thu, Apr 23, 2020 at 6:45 PM Mehrotra, Udit 
> wrote:
>
> > +1 Happy to participate
> >
> > On 4/23/20, 6:32 PM, "vino yang"  wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender and
> > know the content is safe.
> >
> >
> >
> > +1
> >
> > Shiyan Xu  于2020年4月24日周五 上午9:11写道:
> >
> > > +1 would like to participate
> > >
> > > On Thu, Apr 23, 2020 at 5:51 PM Dongdong Hong <
> hongdd2...@gmail.com>
> > > wrote:
> > >
> > > > +1 sounds great!
> > > >
> > > > Sivabalan  于2020年4月23日周四 下午9:30写道:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken 
> > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wow, challenging job, +1
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Lamber-Ken
> > > > > >
> > > > > > At 2020-04-23 04:51:01, "Vinoth Chandar" 
> > wrote:
> > > > > > >Just floating a very random idea here. :)
> > > > > > >
> > > > > > >Would there be interest in doing a bug bash for a week,
> where
> > we
> > > > > > >aggressively close out some pesky bugs that have been
> > lingering
> > > > around..
> > > > > > If
> > > > > > >enough committers and contributors are around, we can move
> the
> > > needle.
> > > > > We
> > > > > > >could time this a week before cutting RC for next release.
> > > > > > >
> > > > > > >Thanks
> > > > > > >Vinoth
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > -Sivabalan
> > > > >
> > > >
> > >
> >
> >
> >
>


Re: [DISCUSS] Readiness for graduation to TLP

2020-04-28 Thread Bhavani Sudha
+1 to pursue graduation. I certainly think we are ready. Will chime in on
voting thread when you start it.

Thanks,
Sudha

On Mon, Apr 27, 2020 at 10:06 PM Vinoth Chandar  wrote:

> Hello all,
>
> I would like to start a discussion on our readiness to pursue graduation to
> TLP and potentially follow up with a VOTE with a formal resolution. To seed
> the discussion, our  community's achievements since entering the Incubator
> in early 2018 include the following:
>
> - Accepted > 500 patches from 90 contributors, including 15+ new  design
> proposals
> - Performed 3 releases with 3 different release managers
> - Invited 5 new committers (all of them accepted)
> - invited 3 of those new committers to join the PMC (all of them accepted)
> - Migrated our web site to ASF infrastructure [1]
> - Migrated developer conversations to the list at dev@hudi.apache.org
> - Migrated all issue tracking to JIRA [2]
> - Apache Hudi name search has been approved [3]
> - We have built a meritocratic, open collaborative process, the Apache way
> - Our PMC is diverse and consists of members from ~10 organizations
>
> Please chime in with your thoughts.
>
> Thanks
> Vinoth
>


Re: [DISCUSS] Bug bash?

2020-04-22 Thread Bhavani Sudha
+1 Sounds like a good idea

On Wed, Apr 22, 2020 at 1:51 PM Vinoth Chandar  wrote:

> Just floating a very random idea here. :)
>
> Would there be interest in doing a bug bash for a week, where we
> aggressively close out some pesky bugs that have been lingering around.. If
> enough committers and contributors are around, we can move the needle. We
> could time this a week before cutting RC for next release.
>
> Thanks
> Vinoth
>


[DISCUSS] Next Release timeline

2020-04-22 Thread Bhavani Sudha
Hello all,

I wanted to kick start the discussion on timeline and logistics for the
next release. Here are couple things we need to figure out.

   1. Should the next release be a minor or major release?
   2. If its a minor release do we move the master back to 0.5.3 (
   currently the master is at 0.6.0-SNAPSHOT).
   3. Depending on minor or major release what is the timeline we should
   target?

Here is my opinion:
In addition to bug fixes, we have major features ( bootstrap, new indexes
and bulk insert mode ) - either ready or almost ready. Hence, I propose we
go with a major release. Assuming its a major release, may be mid of May
might be a good timeline?


I can volunteer to be the release manager if the community is okay with it.
Please share your thoughts.

Thanks,
Sudha


Re: [ATTN] JUnit 5 adoption

2020-04-22 Thread Bhavani Sudha
+1. Thanks for the update Raymond and great work on the migration.

-Sudha

On Tue, Apr 21, 2020 at 10:39 PM Vinoth Chandar  wrote:

> +1 Appreciate the efforts, Raymond!
>
> [Wondering if there is a way to stick a checkstyle rule to this effect.
> guess it won't check for new changes alone, rather complain about existing
> junit 4 tests?]
>
> On Tue, Apr 21, 2020 at 5:10 PM Shiyan Xu 
> wrote:
>
> > Hi all,
> >
> > We're in progress with JUnit 5 migration for all test classes. So far the
> > JUnit 5 dependencies (including Mockito) have been added to all modules.
> > The APIs/modules migration status is shown here
> > https://github.com/apache/incubator-hudi/pull/1530#issue-405575235
> >
> > I would like to kindly ask for support from the community in these 2
> > aspects
> >
> > - To PR submitters: for newly added test classes, please start using
> JUnit
> > 5 APIs (org.junit.jupiter.*)
> > - To PR reviewers: please help look out for the JUnit adopt in the new
> test
> > classes
> >
> > Really appreciate the coordination efforts on this matter.
> >
> > Thank you.
> >
> > Regards,
> > Raymond
> >
>


Re: [DISCUSS] moving blog from cwiki to website

2020-04-21 Thread Bhavani Sudha Saktheeswaran
+1

On Tue, Apr 21, 2020 at 10:23 PM tison  wrote:

> Hi Vinoth,
>
> +1 for moving blogs.
>
> cwiki looks belong to developer's scope and the first experience of users
> is more likely our website.
>
> Best,
> tison.
>
>
> Vinoth Chandar  于2020年4月22日周三 下午1:09写道:
>
> > Hi community,
> >
> > What does everyone feel about moving blogs we have on cwiki now over to
> > site so they are better discovered?
> >
> > Thanks
> > Vinoth
> >
>


Re: run example

2020-04-14 Thread Bhavani Sudha
Hi @cooper,

Can you please copy paste the issue or create a github issue? Mailing list
does not work well for images attached.

Thanks,
Sudha



On Tue, Apr 14, 2020 at 4:45 AM cooper  wrote:

> hi,all:
> I try to run the demo HoodieClientExample,I find the
> "hoodie.keep.min.commits" > "hoodie.cleaner.commits.retained",so the
> program runs fail,I don't know whether the logic is correct
> [image: 微信图片_20200414192957.png]
> [image: 微信图片_20200414193042.png]
>


Re: how to understand incremental query?

2020-04-14 Thread Bhavani Sudha
Hi there,

Thanks for trying the quickstart. You might want to reload the data in step
5 before trying the incremental query. Since the tempview is not refreshed
it would still keep serving old data. We have used only spark in the
quickstart to make it easier for users to try Hudi APIs without external
dependencies. However, ideally if the incremental query is against a table
registered with Hive or in S3, the last step should work without need for
reload.

Hope that helps!

Thanks,
Sudha

On Tue, Apr 14, 2020 at 8:59 AM crazymb  wrote:

> Hello, everyone,
>
>  I am a new user of Hudi. After reading quickstart, I started
> experimenting, but I am a little confused about incremental query. Please
> help me. My question: When I insert new data every time, I execute the
> previous incremental query. Why is there no new insert data?
>
>
> My test environment is 0.5.2
>
>
> 1. Create table according to quickstart, insert 10 data, table type:
> COPY_ON_WRITE
> 2. Generate 10 new data and append to the table
> 3. According to quickstart, perform an incremental query and return 10
> results
> 4. Generate 10 new results again and append to the table
> 5. Only execute the SQL of incremental query (do not execute
> spark.read.load and createOrReplaceTempView), but still return 10 results?
> Should 20 results be returned?
>
>
> Thanks a lot!


Re: New PPMC Member : Bhavani Sudha

2020-04-08 Thread Bhavani Sudha
Thank you all :) I am excited to be part of Hudi PPMC.

-Sudha

On Wed, Apr 8, 2020 at 12:48 PM Shiyan Xu 
wrote:

> Congrats Sudha! Well deserved!
>
> On Tue, Apr 7, 2020 at 8:46 PM vino yang  wrote:
>
> > Congrats sudha, well deserved!
> >
> > Best,
> > Vino
> >
> > leesf  于2020年4月8日周三 上午9:31写道:
> >
> > > Congrats sudha, well deserved!
> > >
> > > Balaji Varadarajan  于2020年4月8日周三 上午6:55写道:
> > >
> > > >  Congratulations Sudha :) Well deserved.  Welcome to PPMC.
> > > > Balaji.V
> > > >
> > > > On Tuesday, April 7, 2020, 03:04:37 PM PDT, Gary Li <
> > > > yanjia.gary...@gmail.com> wrote:
> > > >
> > > >  Congrats Sudha! Appreciated all the work you have done!
> > > >
> > > > On Tue, Apr 7, 2020 at 2:57 PM Y Ethan Guo  >
> > > > wrote:
> > > >
> > > > > Congrats!!!
> > > > >
> > > > > On Tue, Apr 7, 2020 at 2:55 PM Vinoth Chandar 
> > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I am very excited to share that we have new PPMC member - Sudha.
> > She
> > > > has
> > > > > > been a great champion for the project for almost couple years
> now,
> > > > > driving
> > > > > > a lot of presto/query engine facing changes and most of all being
> > the
> > > > > face
> > > > > > of our community to new users on Slack, over the past few months.
> > > > > >
> > > > > > Please join me in congratulating her!
> > > > > >
> > > > > > On behalf of Hudi PPMC,
> > > > > > Vinoth
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: New Committer: lamber-ken

2020-04-07 Thread Bhavani Sudha
Congratulations Lamber. Well deserved!

-Sudha

On Tue, Apr 7, 2020 at 2:22 PM Gary Li  wrote:

> Congrats lamber! Well deserved!
>
> On Tue, Apr 7, 2020 at 2:18 PM Vinoth Chandar  wrote:
>
> > Hello Apache Hudi Community,
> >
> > The Podling Project Management Committee (PPMC) for Apache
> > Hudi (Incubating) has invited lamber-ken (Xie Lei) to become a committer
> > and we are pleased to announce that he has accepted.
> >
> > lamber-ken has had a large impact by in hudi, with some sustained efforts
> > in the past several months. He has rebuilt our site ground up, automated
> > doc workflows, helped fixed a lot of bugs and also been super helpful for
> > the community at large.
> >
> > Congratulations lamber-ken !! Please join me in recognizing his efforts!
> >
> > On behalf of PPMC,
> > Vinoth
> >
>


Re: [DISSCUSS] Troubleshooting flow

2020-04-02 Thread Bhavani Sudha
Also one thing I wanted to note. I feel it should be okay to answer simple
`what does this mean` type of questions in slack and move debugging type of
questions to GH issues. What do you all think?

Thanks,
Sudha

On Thu, Apr 2, 2020 at 11:45 AM Bhavani Sudha 
wrote:

> Agree on using GH issues to post code snippets or debugging issues.
>
> Regarding mirroring slack to commits, the last time I checked there was no
> options that was readily available ( there were one or two paid products).
> It looked like we can possibly develop our own IFTT/ web hook on slack. Not
> sure how much of work that is.
>
>
> Thanks,
> Sudha
>
>
> On Thu, Apr 2, 2020 at 8:40 AM Vinoth Chandar  wrote:
>
>> Hello all,
>>
>> Actually that's how we have been using GH issues.. Both slack/ml are
>> inconvenient for sharing code and having long threaded conversations.
>> (same
>> issues raised here).
>>
>> That said, we could definitely formalize this and look to move slack
>> threads into GH issue for triaging (then follow up with JIRA, if real bug)
>> before they get too long.
>>
>> >>slack has some answerbot to auto reply and promote users to create GH
>> issues.
>> Worth looking into.. There was also a conversation around mirroring
>> #general into commits or something for indexing/searching.. ?
>>
>>
>> On Thu, Apr 2, 2020 at 1:36 AM vino yang  wrote:
>>
>> > Hi Lamber-Ken,
>> >
>> > Thanks for rasing this problem.
>> >
>> > >> 3. threads cann't be indexed by search engines
>> >
>> > Yes, I always thought that it would be better to have a "users" ML, but
>> it
>> > is not clear whether only the Top-Level Project can have this ML.
>> >
>> > Best,
>> > Vino
>> >
>> >
>> > Shiyan Xu  于2020年4月1日周三 上午4:54写道:
>> >
>> > > Good idea to use GH issues as triage.
>> > >
>> > > Not sure if slack has some answerbot to auto reply and promote users
>> to
>> > > create GH issues. If it can be configured that way, that'd be great
>> for
>> > > this purpose :)
>> > >
>> > > On Tue, 31 Mar 2020, 10:03 lamberken,  wrote:
>> > >
>> > > > Hi team,
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Many users use slack ask for support when they met bugs / problems
>> > > > currently.
>> > > >
>> > > > but there are some disadvantages we need to consider:
>> > > >
>> > > > 1. code snippet display is not friendly.
>> > > >
>> > > > 2. we may miss some questions when questions come up at the same
>> time.
>> > > >
>> > > > 3. threads cann't be indexed by search engines
>> > > >
>> > > > ...
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > So, I suggest we should guide users to use GitHub issues as much as
>> we
>> > > can.
>> > > >
>> > > > step1: guide users use GitHub issues to report their questions
>> > > >
>> > > > step2: developers can pick up some issues which they are interested
>> in.
>> > > >
>> > > > step3: raise a related JIRA if needed
>> > > >
>> > > > step4: add some useful notes to troubleshooting guide
>> > > >
>> > > >
>> > > >
>> > > > Any thoughts are welcome, thanks : )
>> > > >
>> > > >
>> > > > Best,
>> > > > Lamber-Ken
>> > >
>> >
>>
>


Re: [DISSCUSS] Troubleshooting flow

2020-04-02 Thread Bhavani Sudha
Agree on using GH issues to post code snippets or debugging issues.

Regarding mirroring slack to commits, the last time I checked there was no
options that was readily available ( there were one or two paid products).
It looked like we can possibly develop our own IFTT/ web hook on slack. Not
sure how much of work that is.


Thanks,
Sudha


On Thu, Apr 2, 2020 at 8:40 AM Vinoth Chandar  wrote:

> Hello all,
>
> Actually that's how we have been using GH issues.. Both slack/ml are
> inconvenient for sharing code and having long threaded conversations. (same
> issues raised here).
>
> That said, we could definitely formalize this and look to move slack
> threads into GH issue for triaging (then follow up with JIRA, if real bug)
> before they get too long.
>
> >>slack has some answerbot to auto reply and promote users to create GH
> issues.
> Worth looking into.. There was also a conversation around mirroring
> #general into commits or something for indexing/searching.. ?
>
>
> On Thu, Apr 2, 2020 at 1:36 AM vino yang  wrote:
>
> > Hi Lamber-Ken,
> >
> > Thanks for rasing this problem.
> >
> > >> 3. threads cann't be indexed by search engines
> >
> > Yes, I always thought that it would be better to have a "users" ML, but
> it
> > is not clear whether only the Top-Level Project can have this ML.
> >
> > Best,
> > Vino
> >
> >
> > Shiyan Xu  于2020年4月1日周三 上午4:54写道:
> >
> > > Good idea to use GH issues as triage.
> > >
> > > Not sure if slack has some answerbot to auto reply and promote users to
> > > create GH issues. If it can be configured that way, that'd be great for
> > > this purpose :)
> > >
> > > On Tue, 31 Mar 2020, 10:03 lamberken,  wrote:
> > >
> > > > Hi team,
> > > >
> > > >
> > > >
> > > >
> > > > Many users use slack ask for support when they met bugs / problems
> > > > currently.
> > > >
> > > > but there are some disadvantages we need to consider:
> > > >
> > > > 1. code snippet display is not friendly.
> > > >
> > > > 2. we may miss some questions when questions come up at the same
> time.
> > > >
> > > > 3. threads cann't be indexed by search engines
> > > >
> > > > ...
> > > >
> > > >
> > > >
> > > >
> > > > So, I suggest we should guide users to use GitHub issues as much as
> we
> > > can.
> > > >
> > > > step1: guide users use GitHub issues to report their questions
> > > >
> > > > step2: developers can pick up some issues which they are interested
> in.
> > > >
> > > > step3: raise a related JIRA if needed
> > > >
> > > > step4: add some useful notes to troubleshooting guide
> > > >
> > > >
> > > >
> > > > Any thoughts are welcome, thanks : )
> > > >
> > > >
> > > > Best,
> > > > Lamber-Ken
> > >
> >
>


Re: [NOTIFICATION] Auto generation asf-site feedback

2020-03-25 Thread Bhavani Sudha
This is really cool. Thanks for doing this!

-Sudha

On Wed, Mar 25, 2020 at 5:19 AM lamberken  wrote:

>
>
>
> Thanks  : )
>
>
>
>
> At 2020-03-25 09:50:59, "vino yang"  wrote:
> >Great job!
> >
> >Thanks to lamber-ken for driving and getting this done!
> >
> >Best,
> >Vino
> >
> >Vinoth Chandar  于2020年3月25日周三 上午8:34写道:
> >
> >> Currently, the new site is published to a "test-content" folder.  Our
> plan
> >> is to try this for 1 week and then actually cut over to "content" which
> is
> >> what powers the site.
> >>
> >> Kudos to lamber-ken for the perseverance in getting this done!
> >>
> >> On Tue, Mar 24, 2020 at 5:19 PM lamberken  wrote:
> >>
> >> > Hi team,
> >> >
> >> >
> >> >
> >> >
> >> > After HUDI-504[1] landed, travis will build asf-site branch and update
> >> > site automatically,
> >> >
> >> > developers can focus on add/edit/remove *.md files, will don't need to
> >> > learn about how to build site.
> >> >
> >> >
> >> >
> >> >
> >> > Fell free to report any issues if you see, thanks very much.
> >> >
> >> >
> >> >
> >> >
> >> > [1] https://github.com/apache/incubator-hudi/pull/1412
> >>
>


Re: [VOTE] Release 0.5.2-incubating, release candidate #2

2020-03-21 Thread Bhavani Sudha
+1 (non-binding)

Did same tests as previous RC:
- Verified checksums and signatures [OK]
- Verified NOTICE, DISCLAIME, LICENSE files exist [OK]
- Downloaded and compiled succesfully [OK]
- Verified quickstart [OK]
- Ran some tests in IDE [OK]

Thanks,
Sudha

On Sat, Mar 21, 2020 at 7:43 PM Prasanna Rajaperumal 
wrote:

> +1 binding
>
> (Hope we get past license/notice issues this time)
>
> On 2020/03/21 16:47:57, "vbal...@apache.org"  wrote:
> >  +1 (binding)
> >
> > Ran following checks:
> > 1. Checked out RC candidate source code and compiled successfully
> > 2. Ran Apache Hudi quickstart steps successfully on
> 0.5.2-incubating-rc23. Ran release validation script successfully.
> > (base) varadarb-C02SH0P1G8WL:scripts varadarb$
> ./release/validate_staged_release.sh --release=0.5.2 --rc_num=2
> >
> > Checking Checksum of Source Release
> >
> >
> > Checksum Check of Source Release - [OK]
> >
> >
> >
> > Checking Signature
> >
> > Signature Check - [OK]
> >
> > Checking for binary files in source release
> >
> >
> > No Binary Files in Source Release? - [OK]
> >
> > Checking for DISCLAIMER
> >
> > DISCLAIMER file exists ? [OK]
> >
> > Checking for LICENSE and NOTICE
> >
> > License file exists ? [OK]
> >
> > Notice file exists ? [OK]
> >
> > Performing custom Licensing Check
> >
> > Licensing Check Passed [OK]
> >
> > Running RAT Check
> > RAT Check Passed [OK]
> > Balaji.V
> >
> > On Saturday, March 21, 2020, 08:23:44 AM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
> >
> >  +1 binding
> >
> > Repeated tests from RC1
> >
> > On Sat, Mar 21, 2020 at 5:44 AM vino yang  wrote:
> >
> > > +1 binding
> > >
> > > - checked signature & checksum
> > > - maven clean package -DskipTests
> > > - ran `release/validate_staged_release.sh`
> > > - check RAT (OK)
> > >
> > > Best,
> > > Vino
> > >
> > > Suneel Marthi  于2020年3月21日周六 下午8:33写道:
> > >
> > > > +1 binding
> > > >
> > > > - checked NOTICE and LICENSE
> > > > - verified checksum and signature
> > > > - mvn clean install
> > > >
> > > >
> > > > On Sat, Mar 21, 2020 at 7:01 AM leesf  wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > - verified checksum and signature [OK]
> > > > > - mvn clean install -DskipTests [OK]
> > > > > - checked the modules in
> > > > >
> > > > >
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachehudi-1019/org/apache/hudi/
> > > > >  [OK]
> > > > >
> > > > > Best,
> > > > > Leesf
> > > > >
> > > > > vino yang  于2020年3月21日周六 下午5:20写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > >
> > > > > > We have prepared the third apache release candidate for Apache
> Hudi
> > > > > > (incubating). The version is: 0.5.2-incubating-rc2. Please
> review and
> > > > > vote
> > > > > > on the release candidate #2 for the version 0.5.2, as follows:
> > > > > >
> > > > > > [ ] +1, Approve the release
> > > > > >
> > > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > >
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> releases
> > > to
> > > > > be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> with
> > > > > > fingerprint C3A96EC77149571AE89F82764C86684D047DE03C [3],
> > > > > >
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-0.5.2-incubating-rc2" [5],
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Vino
> > > > > >
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12346606
> > > > > >
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://dist.apache.org/repos/dist/dev/incubator/hudi/hudi-0.5.2-incubating-rc2/
> > > > > >
> > > > > > [3]
> https://dist.apache.org/repos/dist/release/incubator/hudi/KEYS
> > > > > >
> > > > > > [4]
> > > > >
> https://repository.apache.org/content/repositories/orgapachehudi-1019
> > > > > >
> > > > > > [5]
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/incubator-hudi/tree/release-0.5.2-incubating-rc2
> > > > > >
> > > > >
> > > >
> > >
>


Re: [VOTE] Release 0.5.2-incubating, release candidate #1

2020-03-13 Thread Bhavani Sudha
+1 (non-binding)

- Verified checksums and signatures [OK]
- Verified NOTICE, DISCLAIME, LICENSE files exist [OK]
- Downloaded and compiled succesfully [OK]
- Verified quickstart [OK]
- Ran some tests in IDE [OK]

Thanks,
Sudha

On Fri, Mar 13, 2020 at 12:09 PM vbal...@apache.org 
wrote:

>
> +1 (binding)
>
> 1. Checked out RC candidate source code and compiled successfully
> 2. Ran Apache Hudi quickstart steps successfully on 0.5.2-rc13. Ran
> release validation script successfully.
> (base) varadarb-C02SH0P1G8WL:scripts varadarb$
> ./release/validate_staged_release.sh --release=0.5.2 --rc_num=1
>
> Checking Checksum of Source Release
>
>
>   Checksum Check of Source Release - [OK]
>
>
>
> Checking Signature
>
>   Signature Check - [OK]
>
> Checking for binary files in source release
>
>
>   No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
>
>   DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
>
>   License file exists ? [OK]
>
>   Notice file exists ? [OK]
>
> Performing custom Licensing Check
>
>   Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
> Balaji.V
>
> On Thursday, March 12, 2020, 10:50:39 PM PDT, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  +1 binding
>
> 10:05:53 [hudi-0.5.2]$ shasum -a 512 hudi-${RC_VERSION}-${RC_NUM}.src.tgz >
> sha512
> 10:06:01 [hudi-0.5.2]$ diff sha512
> hudi-${RC_VERSION}-${RC_NUM}.src.tgz.sha512.txt | wc -l
>   0
> 10:06:14 [hudi-0.5.2]
>
>
> 10:17:11 [hudi-0.5.2]$ gpg --verify
> hudi-${RC_VERSION}-${RC_NUM}.src.tgz.asc.txt
> hudi-${RC_VERSION}-${RC_NUM}.src.tgz
> gpg: Signature made Thu Mar 12 01:24:37 2020 PDT
> gpg:using RSA key C3A96EC77149571AE89F82764C86684D047DE03C
> gpg: Good signature from "vinoyang (apache gpg) "
> [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
> owner.
> Primary key fingerprint: C3A9 6EC7 7149 571A E89F  8276 4C86 684D 047D E03C
> 10:21:43 [hudi-0.5.2]$
>
> 10:22:04 [hudi-0.5.2]$ tar -zxvf hudi-${RC_VERSION}-${RC_NUM}.src.tgz
> 10:22:22 [hudi-0.5.2]$ # Notice, DISCLAIMER-WIP, LICENSE
> 10:22:24 [hudi-0.5.2]$ ls hudi-${RC_VERSION}-${RC_NUM}/NOTICE
> hudi-0.5.2-incubating-rc1/NOTICE
> 10:22:31 [hudi-0.5.2]$ ls hudi-${RC_VERSION}-${RC_NUM}/DISC*
> hudi-0.5.2-incubating-rc1/DISCLAIMER
> 10:22:36 [hudi-0.5.2]$ ls hudi-${RC_VERSION}-${RC_NUM}/LICENSE
> hudi-0.5.2-incubating-rc1/LICENSE
>
> 10:23:00 [hudi-0.5.2]$ find hudi-${RC_VERSION}-${RC_NUM}/ -name *.jar | wc
> -l
>   0
> 10:23:09 [hudi-0.5.2]$
>
> 10:23:09 [hudi-0.5.2]$ grep -LR "Licensed to the Apache Software
> Foundation" hudi-${RC_VERSION}-${RC_NUM}
> hudi-0.5.2-incubating-rc1/docker/demo/data/batch_2.json
> hudi-0.5.2-incubating-rc1/docker/demo/data/batch_1.json
> hudi-0.5.2-incubating-rc1/DISCLAIMER
> hudi-0.5.2-incubating-rc1/NOTICE
> hudi-0.5.2-incubating-rc1/hudi-common/src/test/resources/sample.data
>
> hudi-0.5.2-incubating-rc1/hudi-common/src/main/java/org/apache/hudi/common/util/ObjectSizeCalculator.java
>
> hudi-0.5.2-incubating-rc1/hudi-utilities/src/test/resources/IncrementalPull.sqltemplate
> 10:23:30 [hudi-0.5.2]$ grep -e "AvroConversionHelper" -e
> "ObjectSizeCalculator"  hudi-${RC_VERSION}-${RC_NUM}/LICENSE
> This product includes code from
>
> https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/objectsize/ObjectSizeCalculator.java
> with the following license
> * org.apache.hudi.AvroConversionHelper copied from classes in
> org/apache/spark/sql/avro package
> 10:23:48 [hudi-0.5.2]$
>
> 10:24:37 [scripts]$ ./release/validate_staged_release.sh --release=0.5.2
> --rc_num=1
> /tmp/validation_scratch_dir_001
> ~/Cache/hudi-0.5.2/hudi-0.5.2-incubating-rc1/scripts
> Checking Checksum of Source Release
> Checksum Check of Source Release - [OK]
>
>   % Total% Received % Xferd  Average Speed  TimeTimeTime
>  Current
> Dload  Upload  Total  SpentLeft
>  Speed
> 100 21027  100 2102700  50297  0 --:--:-- --:--:-- --:--:--
> 50303
> Checking Signature
> Signature Check - [OK]
>
> Checking for binary files in source release
> No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
> DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Performing custom Licensing Check
> Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
>
> 10:26:15 [scripts]$
>
>
> On Thu, Mar 12, 2020 at 7:15 PM Suneel Marthi  wrote:
>
> > +1 binding
> >
> > 1. Verified Sigs and hashes
> > 2. Downloaded tar and ran a maven compile
> > 3. Verified the NOTICE and License files.
> > 4. Ran thru the Quickstart guide.
> >
> >
> >
> > On Thu, Mar 12, 2020 at 9:01 PM vino yang  wrote:
> >
> > > Hi everyone,
> > >
> > >
> > > We have prepared the third apache release candidate for Apache Hudi
> > > (incubating). The version is: 

Re: Test coverage is now integrated to codecov.io

2020-03-01 Thread Bhavani Sudha
This is super useful. Thanks Ramachandran!

-Sudha

On Sat, Feb 29, 2020 at 7:42 PM leesf  wrote:

> Great job, thanks for your work.
>
> Sivabalan  于2020年2月29日周六 下午12:02写道:
>
> > Good job! thanks for adding.
> >
> > On Fri, Feb 28, 2020 at 5:41 PM vino yang  wrote:
> >
> > >  Hi Ram,
> > >
> > > Thanks for your great work to make the code coverage clear.
> > >
> > > Best,
> > > Vino
> > >
> > > Vinoth Chandar  于2020年2月29日周六 上午4:39写道:
> > >
> > > > Thanks Ram! This will definitely help improve the code quality over
> > time!
> > > >
> > > > On Fri, Feb 28, 2020 at 9:45 AM Ramachandran Madras Subramaniam
> > > >  wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Diff 1347  was
> > > > merged
> > > > > into master yesterday. This enables visibility into code coverage
> of
> > > hudi
> > > > > in general and also provides insights into differential coverage
> > during
> > > > > peer reviews.
> > > > >
> > > > > Since this is very recent and is getting integrated, you might see
> > some
> > > > > partial results in your diff. There can be 2 scenarios here,
> > > > >
> > > > > 1. Your diff is not rebased with latest master and hence the code
> > > > coverage
> > > > > report was not generated. To solve this issue, you just have to
> > rebase
> > > to
> > > > > latest master.
> > > > > 2. Code coverage ran but reported as zero. Three was one diff
> (#1350)
> > > > where
> > > > > we saw this issue yesterday. This in general shouldn't happen.
> Could
> > > have
> > > > > been due to an outage in codecov website. I will be monitoring
> > upcoming
> > > > > diffs for the near future to see if this problem persists. Please
> > ping
> > > me
> > > > > in the diff if you have any questions/concerns regarding code
> > coverage.
> > > > >
> > > > > Thanks,
> > > > > Ram
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


Re: Apache Hudi on AWS EMR

2020-02-19 Thread Bhavani Sudha
Hi Udit,

Just a quick question on Presto EMR. Does EMR Presto support Hudi jars in
its classpath ?

On Tue, Feb 18, 2020 at 12:03 PM Mehrotra, Udit 
wrote:

> Workaround provided by Gary can help querying Hudi tables through Athena
> for Copy On Write tables by basically querying only the latest commit files
> as standard parquet. It would definitely be worth documenting, as several
> people have asked for it and I remember providing the same suggestion on
> slack earlier. I can add if I have the perms.
>
> >> if I connect to the Hive catalog on EMR, which is able to provide the
> Hudi views correctly, I should be able to get correct results on Athena
>
> As Vinoth mentioned, just connecting to metastore is not enough. Athena
> would still use its own Presto which does not support Hudi.
>
> As for Hudi support for Athena:
> Athena does use Presto, but it's their own custom version and I don't
> think they yet have the code that Hudi guys contributed to presto i.e. the
> split annotations etc. Also they don’t have Hudi jars in presto classpath.
> We are not sure of any timelines for this support, but I have heard that
> work should start soon.
>
> Thanks,
> Udit
>
> On 2/18/20, 11:27 AM, "Vinoth Chandar"  wrote:
>
> Thanks everyone for chiming in. Esp Gary for the detailed workaround..
> (should we FAQ this workaround.. food for thought)
>
> >> if I connect to the Hive catalog on EMR, which is able to provide
> the
> Hudi views correctly, I should be able to get correct results on Athena
>
> Knowing how the Presto/Hudi integration works, simply being able to
> read
> from Hive metastore is not enough. Presto has code to specially
> recognize
> Hudi tables and does an additional filtering step, which lets it query
> the
> data in there correctly. (Gary's workaround above keeps just 1 version
> around for a given file (group))..
>
> On Mon, Feb 17, 2020 at 11:28 PM Gary Li 
> wrote:
>
> > Hello, I don't have any experience working with Athena but I can
> share my
> > experience working with Impala. There is a workaround.
> > By setting Hudi config:
> >
> >- hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
> >- hoodie.cleaner.fileversions.retained=1
> >
> > You will have your Hudi dataset as same as plain parquet files. You
> can
> > create a table just like regular parquet. Hudi will write a new
> commit
> > first then delete the older files that have two versions. You need to
> > refresh the table metadata store as soon as the Hudi Upsert job
> finishes.
> > For impala, it's simply REFRESH TABLE xxx. After Hudi vacuumed the
> older
> > files and before refresh the table metastore, the table will be
> unavailable
> > for query(1-5 mins in my case).
> >
> > How can we process S3 parquet files(hourly partitioned) through
> Apache
> > Hudi? Is there any streaming layer we need to introduce?
> > ---
> > Hudi Delta streamer support parquet file. You can do a bulkInsert
> for the
> > first job then use delta streamer for the Upsert job.
> >
> > 3 - What should be the parquet file size and row group size for
> better
> > performance on querying Hudi Dataset?
> > --
> > That depends on the query engine you are using and it should be
> documented
> > somewhere. For impala, the optimal size for query performance is
> 256MB, but
> > the larger file size will make upsert more expensive. The size I
> personally
> > choose is 100MB to 128MB.
> >
> > Thanks,
> > Gary
> >
> >
> >
> > On Mon, Feb 17, 2020 at 9:46 PM Dubey, Raghu
> 
> > wrote:
> >
> > > Athena is indeed Presto inside, but there is lot of custom code
> which has
> > > gone on top of Presto there.
> > > Couple months back I tried running a glue crawler to catalog a
> Hudi data
> > > set and then query it from Athena. The results were not same as
> what I
> > > would get with running the same query using spark SQL on EMR. Did
> not try
> > > Presto on EMR, but assuming it will work fine on EMR.
> > >
> > > Athena integration with Hudi data set is planned shortly, but not
> sure of
> > > the date yet.
> > >
> > > However, recently Athena started supporting integration to a Hive
> catalog
> > > apart from Glue. What that means is in Athena, if I connect to the
> Hive
> > > catalog on EMR, which is able to provide the Hudi views correctly,
> I
> > should
> > > be able to get correct results on Athena. Have not tested it
> though. The
> > > feature is in Preview already.
> > >
> > > Thanks
> > > Raghu
> > > -Original Message-
> > > From: Shiyan Xu 
> > > Sent: Tuesday, February 18, 2020 6:20 AM
> > > To: dev@hudi.apache.org
> > > Cc: Mehrotra, Udit ; Raghvendra Dhar Dubey
> > > 
> > > Subject: Re: Apache Hudi on AWS EMR
> > >
> > > For 

Re: Please welcome our new PPMCs and Committer

2020-02-14 Thread Bhavani Sudha
Hearty congratulations to all of you - @leesf   @vinoyang
and @Sivabalan . Very well deserved.

Thanks,
Sudha

On Fri, Feb 14, 2020 at 11:58 AM Vinoth Chandar  wrote:

> Hello all,
>
> I am incredibly excited to share that we have two new PPMC members :
> *leesf*
> and *vinoyang*, who have been doing such sustained, great work on the
> project over a good part of the last year! I and rest of the PPMC, do hope
> there a bigger and better things to come!
>
> We also have a new committer : *Sivabalan*, who has stepped up to own the
> indexing component in the past few months, and has already delivered
> several key contributions and currently driving some foundational work on
> record level indexing.
>
> Please join me in congratulating them!
>
> Thanks
> Vinoth
>


  1   2   >