Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-27 Thread Danny Chan
Hi Community,

Glad to see that all the blockers are resolved and we can cut a RC now !

If you have any other blockers that you would like to
surface for Hudi 0.10.0, feel free to reach out.

Thanks,
Danny

Manoj Govindassamy  于2021年11月27日周六 下午3:44写道:

> Hi Danny,
>
> All the planned tickets have landed in master and we are good for cutting
> 0.10 RC. Please let us know if you see any CI issues with the latest master
> and we can jump in to do the needful. Thanks for your patience.
>
> thanks,
> Manoj
>
>
>
>
> On Fri, Nov 26, 2021 at 8:07 PM Manoj Govindassamy <
> manoj.govindass...@gmail.com> wrote:
>
> > Hi Danny,
> >
> > We have one last PR https://github.com/apache/hudi/pull/4114 to land to
> > master. We are noticing one test flakiness with this last pending PR. The
> > same test is consistently passing in the local setup though. We are
> waiting
> > for the CI to finish before the merge to master. After this PR we are
> good
> > for cutting the 0.10 RC. Will keep you posted on the status.
> >
> > thanks,
> > Manoj
> >
> >
> >
> >
> > On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu 
> > wrote:
> >
> >> Hi Danny, I'm good with the timeline.
> >>
> >> Cheers,
> >> Raymond
> >>
> >> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit 
> >> wrote:
> >>
> >> > Hi Danny,
> >> >
> >> > I've added one more blocker: HUDI-2742
> >> > 
> >> > I am also good with the timelines.
> >> >
> >> > Regards,
> >> > Sagar
> >> >
> >> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan  wrote:
> >> >
> >> > > Hi Danny,
> >> > >  I am good with the timelines. All my jiras should be completed
> by
> >> > > then.
> >> > >
> >> > >
> >> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <
> ethan.guoyi...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Danny,
> >> > > >
> >> > > > Thanks for summarizing the current progress towards the 0.10.0
> >> release.
> >> > > > I'm good with Nov 26th cutoff.
> >> > > >
> >> > > > Regarding my blockers:
> >> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
> >> Kafka
> >> > > >Connect (Owner: Ethan Guo)
> >> > > > PR is up.  I'm addressing comments.
> >> > > >
> >> > > > - [HUDI-2737] Use earliest instant by default for compaction and
> >> > > >clustering job (Owner: Ethan Guo)
> >> > > > PR is up and approved.  It's near-landing after fixing CI
> failures.
> >> > > >
> >> > > > - [HUDI-2745] Record count does not match input after compaction
> is
> >> > > >scheduled when running Hudi Kafka Connect sink (Owner: Ethan
> Guo)
> >> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> >> > resolve
> >> > > > this issue once done.
> >> > > >
> >> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
> >> Connect
> >> > > >(Owner: Ethan Guo)
> >> > > > This is pending and requires investigation into the archival logic
> >> > which
> >> > > is
> >> > > > not Kafka-connect specific.
> >> > > >
> >> > > > Best,
> >> > > > - Ethan
> >> > > >
> >> > > >
> >> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <
> >> rmahin...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Danny,
> >> > > > >
> >> > > > > I have the following blockers that have a PR up. I am working on
> >> a PR
> >> > > for
> >> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
> >> > > > >
> >> > > > >- [HUDI-2325] Implement and test Hive Sync support for Kafka
> >> > Connect
> >> > > > >(Owner: Rajesh Mahindra)
> >> > > > >- [HUDI-2671] Fix record offset handling in Kafka connect
> >> > > transaction
> >> > > > >participant (Owner: Rajesh Mahindra)
> >> > > > >- [HUDI-2672] Avoid empty commits and rollbacks when there is
> >> no
> >> > > event
> >> > > > >from the topic (Owner: Rajesh Mahindra)
> >> > > > >
> >> > > > > ** Pending
> >> > > > >- [HUDI-1290] Implement Debezium avro source for Delta
> Streamer
> >> > > > >
> >> > > > > Thanks
> >> > > > > Rajesh
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <
> udi...@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > Hi Danny,
> >> > > > > >
> >> > > > > > I have a blocker as well
> >> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
> >> off
> >> > > date
> >> > > > > > works fine for me.
> >> > > > > >
> >> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> >> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> >> > marked
> >> > > > > > in the highlights section. We will work on getting some doc
> >> updates
> >> > > > > > for the same by next week.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Udit
> >> > > > > >
> >> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
> >> vin...@apache.org>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > Hi Danny,
> >> > > > > > >
> >> > > > > > > I have one blocker. I plan to complete it by end of next
> >> week. I
> >> > am
> >> > > > > good
> >> >

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-26 Thread Manoj Govindassamy
Hi Danny,

All the planned tickets have landed in master and we are good for cutting
0.10 RC. Please let us know if you see any CI issues with the latest master
and we can jump in to do the needful. Thanks for your patience.

thanks,
Manoj




On Fri, Nov 26, 2021 at 8:07 PM Manoj Govindassamy <
manoj.govindass...@gmail.com> wrote:

> Hi Danny,
>
> We have one last PR https://github.com/apache/hudi/pull/4114 to land to
> master. We are noticing one test flakiness with this last pending PR. The
> same test is consistently passing in the local setup though. We are waiting
> for the CI to finish before the merge to master. After this PR we are good
> for cutting the 0.10 RC. Will keep you posted on the status.
>
> thanks,
> Manoj
>
>
>
>
> On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu 
> wrote:
>
>> Hi Danny, I'm good with the timeline.
>>
>> Cheers,
>> Raymond
>>
>> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit 
>> wrote:
>>
>> > Hi Danny,
>> >
>> > I've added one more blocker: HUDI-2742
>> > 
>> > I am also good with the timelines.
>> >
>> > Regards,
>> > Sagar
>> >
>> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan  wrote:
>> >
>> > > Hi Danny,
>> > >  I am good with the timelines. All my jiras should be completed by
>> > > then.
>> > >
>> > >
>> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo > >
>> > > wrote:
>> > >
>> > > > Hi Danny,
>> > > >
>> > > > Thanks for summarizing the current progress towards the 0.10.0
>> release.
>> > > > I'm good with Nov 26th cutoff.
>> > > >
>> > > > Regarding my blockers:
>> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
>> Kafka
>> > > >Connect (Owner: Ethan Guo)
>> > > > PR is up.  I'm addressing comments.
>> > > >
>> > > > - [HUDI-2737] Use earliest instant by default for compaction and
>> > > >clustering job (Owner: Ethan Guo)
>> > > > PR is up and approved.  It's near-landing after fixing CI failures.
>> > > >
>> > > > - [HUDI-2745] Record count does not match input after compaction is
>> > > >scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
>> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
>> > resolve
>> > > > this issue once done.
>> > > >
>> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
>> Connect
>> > > >(Owner: Ethan Guo)
>> > > > This is pending and requires investigation into the archival logic
>> > which
>> > > is
>> > > > not Kafka-connect specific.
>> > > >
>> > > > Best,
>> > > > - Ethan
>> > > >
>> > > >
>> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <
>> rmahin...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Danny,
>> > > > >
>> > > > > I have the following blockers that have a PR up. I am working on
>> a PR
>> > > for
>> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
>> > > > >
>> > > > >- [HUDI-2325] Implement and test Hive Sync support for Kafka
>> > Connect
>> > > > >(Owner: Rajesh Mahindra)
>> > > > >- [HUDI-2671] Fix record offset handling in Kafka connect
>> > > transaction
>> > > > >participant (Owner: Rajesh Mahindra)
>> > > > >- [HUDI-2672] Avoid empty commits and rollbacks when there is
>> no
>> > > event
>> > > > >from the topic (Owner: Rajesh Mahindra)
>> > > > >
>> > > > > ** Pending
>> > > > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
>> > > > >
>> > > > > Thanks
>> > > > > Rajesh
>> > > > >
>> > > > >
>> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra 
>> > > wrote:
>> > > > >
>> > > > > > Hi Danny,
>> > > > > >
>> > > > > > I have a blocker as well
>> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
>> off
>> > > date
>> > > > > > works fine for me.
>> > > > > >
>> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
>> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
>> > marked
>> > > > > > in the highlights section. We will work on getting some doc
>> updates
>> > > > > > for the same by next week.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Udit
>> > > > > >
>> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
>> vin...@apache.org>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > Hi Danny,
>> > > > > > >
>> > > > > > > I have one blocker. I plan to complete it by end of next
>> week. I
>> > am
>> > > > > good
>> > > > > > > with the prior Nov 26 cutoff.
>> > > > > > > Does that work for everyone?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Vinoth
>> > > > > > >
>> > > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
>> > danny0...@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Community,
>> > > > > > > >
>> > > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
>> > > share
>> > > > a
>> > > > > > > > summary of the key features/improvements that would be
>> going in
>> > > the
>> > > > > > release
>> > > > > > > > and the current blockers for everyone's visibility.
>> > >

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-26 Thread Manoj Govindassamy
Hi Danny,

We have one last PR https://github.com/apache/hudi/pull/4114 to land to
master. We are noticing one test flakiness with this last pending PR. The
same test is consistently passing in the local setup though. We are waiting
for the CI to finish before the merge to master. After this PR we are good
for cutting the 0.10 RC. Will keep you posted on the status.

thanks,
Manoj




On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu 
wrote:

> Hi Danny, I'm good with the timeline.
>
> Cheers,
> Raymond
>
> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit 
> wrote:
>
> > Hi Danny,
> >
> > I've added one more blocker: HUDI-2742
> > 
> > I am also good with the timelines.
> >
> > Regards,
> > Sagar
> >
> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan  wrote:
> >
> > > Hi Danny,
> > >  I am good with the timelines. All my jiras should be completed by
> > > then.
> > >
> > >
> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo 
> > > wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > Thanks for summarizing the current progress towards the 0.10.0
> release.
> > > > I'm good with Nov 26th cutoff.
> > > >
> > > > Regarding my blockers:
> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
> Kafka
> > > >Connect (Owner: Ethan Guo)
> > > > PR is up.  I'm addressing comments.
> > > >
> > > > - [HUDI-2737] Use earliest instant by default for compaction and
> > > >clustering job (Owner: Ethan Guo)
> > > > PR is up and approved.  It's near-landing after fixing CI failures.
> > > >
> > > > - [HUDI-2745] Record count does not match input after compaction is
> > > >scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> > resolve
> > > > this issue once done.
> > > >
> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
> Connect
> > > >(Owner: Ethan Guo)
> > > > This is pending and requires investigation into the archival logic
> > which
> > > is
> > > > not Kafka-connect specific.
> > > >
> > > > Best,
> > > > - Ethan
> > > >
> > > >
> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra  >
> > > > wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have the following blockers that have a PR up. I am working on a
> PR
> > > for
> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
> > > > >
> > > > >- [HUDI-2325] Implement and test Hive Sync support for Kafka
> > Connect
> > > > >(Owner: Rajesh Mahindra)
> > > > >- [HUDI-2671] Fix record offset handling in Kafka connect
> > > transaction
> > > > >participant (Owner: Rajesh Mahindra)
> > > > >- [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > > event
> > > > >from the topic (Owner: Rajesh Mahindra)
> > > > >
> > > > > ** Pending
> > > > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > > >
> > > > > Thanks
> > > > > Rajesh
> > > > >
> > > > >
> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra 
> > > wrote:
> > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > I have a blocker as well
> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
> off
> > > date
> > > > > > works fine for me.
> > > > > >
> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> > marked
> > > > > > in the highlights section. We will work on getting some doc
> updates
> > > > > > for the same by next week.
> > > > > >
> > > > > > Thanks,
> > > > > > Udit
> > > > > >
> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
> vin...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi Danny,
> > > > > > >
> > > > > > > I have one blocker. I plan to complete it by end of next week.
> I
> > am
> > > > > good
> > > > > > > with the prior Nov 26 cutoff.
> > > > > > > Does that work for everyone?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
> > danny0...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Community,
> > > > > > > >
> > > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> > > share
> > > > a
> > > > > > > > summary of the key features/improvements that would be going
> in
> > > the
> > > > > > release
> > > > > > > > and the current blockers for everyone's visibility.
> > > > > > > >
> > > > > > > > *Highlights*
> > > > > > > >
> > > > > > > >- [HUDI-1290] Implement Debezium avro source for Delta
> > > Streamer
> > > > > > > >- [HUDI-1491] Support partition pruning for MOR snapshot
> > query
> > > > > > > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor
> > > ordering
> > > > > > value
> > > > > > > >when records within multiple log files are merged
> > > > > > > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > > > >- [HUDI-1869] Upgrading Spark3 To 3.1
>

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-20 Thread Raymond Xu
Hi Danny, I'm good with the timeline.

Cheers,
Raymond

On Fri, Nov 19, 2021 at 7:34 PM sagar sumit  wrote:

> Hi Danny,
>
> I've added one more blocker: HUDI-2742
> 
> I am also good with the timelines.
>
> Regards,
> Sagar
>
> On Sat, Nov 20, 2021 at 8:14 AM Sivabalan  wrote:
>
> > Hi Danny,
> >  I am good with the timelines. All my jiras should be completed by
> > then.
> >
> >
> > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo 
> > wrote:
> >
> > > Hi Danny,
> > >
> > > Thanks for summarizing the current progress towards the 0.10.0 release.
> > > I'm good with Nov 26th cutoff.
> > >
> > > Regarding my blockers:
> > > - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> > >Connect (Owner: Ethan Guo)
> > > PR is up.  I'm addressing comments.
> > >
> > > - [HUDI-2737] Use earliest instant by default for compaction and
> > >clustering job (Owner: Ethan Guo)
> > > PR is up and approved.  It's near-landing after fixing CI failures.
> > >
> > > - [HUDI-2745] Record count does not match input after compaction is
> > >scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> resolve
> > > this issue once done.
> > >
> > > - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
> > >(Owner: Ethan Guo)
> > > This is pending and requires investigation into the archival logic
> which
> > is
> > > not Kafka-connect specific.
> > >
> > > Best,
> > > - Ethan
> > >
> > >
> > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra 
> > > wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > I have the following blockers that have a PR up. I am working on a PR
> > for
> > > > the Debezium Source. I am fine with Nov 26th as cut off.
> > > >
> > > >- [HUDI-2325] Implement and test Hive Sync support for Kafka
> Connect
> > > >(Owner: Rajesh Mahindra)
> > > >- [HUDI-2671] Fix record offset handling in Kafka connect
> > transaction
> > > >participant (Owner: Rajesh Mahindra)
> > > >- [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > event
> > > >from the topic (Owner: Rajesh Mahindra)
> > > >
> > > > ** Pending
> > > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > >
> > > > Thanks
> > > > Rajesh
> > > >
> > > >
> > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra 
> > wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have a blocker as well
> > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off
> > date
> > > > > works fine for me.
> > > > >
> > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> marked
> > > > > in the highlights section. We will work on getting some doc updates
> > > > > for the same by next week.
> > > > >
> > > > > Thanks,
> > > > > Udit
> > > > >
> > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar 
> > > > wrote:
> > > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > I have one blocker. I plan to complete it by end of next week. I
> am
> > > > good
> > > > > > with the prior Nov 26 cutoff.
> > > > > > Does that work for everyone?
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
> danny0...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Community,
> > > > > > >
> > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> > share
> > > a
> > > > > > > summary of the key features/improvements that would be going in
> > the
> > > > > release
> > > > > > > and the current blockers for everyone's visibility.
> > > > > > >
> > > > > > > *Highlights*
> > > > > > >
> > > > > > >- [HUDI-1290] Implement Debezium avro source for Delta
> > Streamer
> > > > > > >- [HUDI-1491] Support partition pruning for MOR snapshot
> query
> > > > > > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor
> > ordering
> > > > > value
> > > > > > >when records within multiple log files are merged
> > > > > > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > > >- [HUDI-1869] Upgrading Spark3 To 3.1
> > > > > > >- [HUDI-2101] support z-order for hudi
> > > > > > >- [HUDI-2276] Enable Metadata Table by default for both
> > writers
> > > > and
> > > > > > >readers
> > > > > > >- [HUDI-2581] Analyze metadata size estimate in hudi with
> > Hfile
> > > > for
> > > > > col
> > > > > > >stats partition
> > > > > > >- [HUDI-2634] Improve bootstrap performance for very large
> > > tables
> > > > > > >- [HUDI-2086] redo the logical of mor_incremental_view for
> > hive
> > > > > > >- [HUDI-2191] Bump flink version to 1.13.1
> > > > > > >- [HUDI-2285] Metadata Table Synchronous Design
> > > > > > >- [HUDI-2316] Support Flink batch upsert
> > > > > > >- [HUDI-2371] Improve flink streaming reader
> > > > > > >- [HUD

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread sagar sumit
Hi Danny,

I've added one more blocker: HUDI-2742

I am also good with the timelines.

Regards,
Sagar

On Sat, Nov 20, 2021 at 8:14 AM Sivabalan  wrote:

> Hi Danny,
>  I am good with the timelines. All my jiras should be completed by
> then.
>
>
> On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo 
> wrote:
>
> > Hi Danny,
> >
> > Thanks for summarizing the current progress towards the 0.10.0 release.
> > I'm good with Nov 26th cutoff.
> >
> > Regarding my blockers:
> > - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> >Connect (Owner: Ethan Guo)
> > PR is up.  I'm addressing comments.
> >
> > - [HUDI-2737] Use earliest instant by default for compaction and
> >clustering job (Owner: Ethan Guo)
> > PR is up and approved.  It's near-landing after fixing CI failures.
> >
> > - [HUDI-2745] Record count does not match input after compaction is
> >scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
> > this issue once done.
> >
> > - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
> >(Owner: Ethan Guo)
> > This is pending and requires investigation into the archival logic which
> is
> > not Kafka-connect specific.
> >
> > Best,
> > - Ethan
> >
> >
> > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra 
> > wrote:
> >
> > > Hi Danny,
> > >
> > > I have the following blockers that have a PR up. I am working on a PR
> for
> > > the Debezium Source. I am fine with Nov 26th as cut off.
> > >
> > >- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> > >(Owner: Rajesh Mahindra)
> > >- [HUDI-2671] Fix record offset handling in Kafka connect
> transaction
> > >participant (Owner: Rajesh Mahindra)
> > >- [HUDI-2672] Avoid empty commits and rollbacks when there is no
> event
> > >from the topic (Owner: Rajesh Mahindra)
> > >
> > > ** Pending
> > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > >
> > > Thanks
> > > Rajesh
> > >
> > >
> > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra 
> wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > I have a blocker as well
> > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off
> date
> > > > works fine for me.
> > > >
> > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > > > in the highlights section. We will work on getting some doc updates
> > > > for the same by next week.
> > > >
> > > > Thanks,
> > > > Udit
> > > >
> > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar 
> > > wrote:
> > > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have one blocker. I plan to complete it by end of next week. I am
> > > good
> > > > > with the prior Nov 26 cutoff.
> > > > > Does that work for everyone?
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan 
> > > > wrote:
> > > > >
> > > > > > Hi Community,
> > > > > >
> > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> share
> > a
> > > > > > summary of the key features/improvements that would be going in
> the
> > > > release
> > > > > > and the current blockers for everyone's visibility.
> > > > > >
> > > > > > *Highlights*
> > > > > >
> > > > > >- [HUDI-1290] Implement Debezium avro source for Delta
> Streamer
> > > > > >- [HUDI-1491] Support partition pruning for MOR snapshot query
> > > > > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor
> ordering
> > > > value
> > > > > >when records within multiple log files are merged
> > > > > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > >- [HUDI-1869] Upgrading Spark3 To 3.1
> > > > > >- [HUDI-2101] support z-order for hudi
> > > > > >- [HUDI-2276] Enable Metadata Table by default for both
> writers
> > > and
> > > > > >readers
> > > > > >- [HUDI-2581] Analyze metadata size estimate in hudi with
> Hfile
> > > for
> > > > col
> > > > > >stats partition
> > > > > >- [HUDI-2634] Improve bootstrap performance for very large
> > tables
> > > > > >- [HUDI-2086] redo the logical of mor_incremental_view for
> hive
> > > > > >- [HUDI-2191] Bump flink version to 1.13.1
> > > > > >- [HUDI-2285] Metadata Table Synchronous Design
> > > > > >- [HUDI-2316] Support Flink batch upsert
> > > > > >- [HUDI-2371] Improve flink streaming reader
> > > > > >- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka
> connect
> > > for
> > > > > >immutable data
> > > > > >- [HUDI-2449] Incremental read for Flink
> > > > > >- [HUDI-2562] Embedded timeline server on JobManager
> > > > > >
> > > > > > *Current Blockers*
> > > > > >
> > > > > >- [HUDI-1856] Upstream changes made in PrestoDB to eliminate
> > file
> > > > > >listing to Trino (Owner: Sagar Sumit)
> > > > > >- [

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Sivabalan
Hi Danny,
 I am good with the timelines. All my jiras should be completed by
then.


On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo 
wrote:

> Hi Danny,
>
> Thanks for summarizing the current progress towards the 0.10.0 release.
> I'm good with Nov 26th cutoff.
>
> Regarding my blockers:
> - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>Connect (Owner: Ethan Guo)
> PR is up.  I'm addressing comments.
>
> - [HUDI-2737] Use earliest instant by default for compaction and
>clustering job (Owner: Ethan Guo)
> PR is up and approved.  It's near-landing after fixing CI failures.
>
> - [HUDI-2745] Record count does not match input after compaction is
>scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
> this issue once done.
>
> - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
>(Owner: Ethan Guo)
> This is pending and requires investigation into the archival logic which is
> not Kafka-connect specific.
>
> Best,
> - Ethan
>
>
> On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra 
> wrote:
>
> > Hi Danny,
> >
> > I have the following blockers that have a PR up. I am working on a PR for
> > the Debezium Source. I am fine with Nov 26th as cut off.
> >
> >- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> >(Owner: Rajesh Mahindra)
> >- [HUDI-2671] Fix record offset handling in Kafka connect transaction
> >participant (Owner: Rajesh Mahindra)
> >- [HUDI-2672] Avoid empty commits and rollbacks when there is no event
> >from the topic (Owner: Rajesh Mahindra)
> >
> > ** Pending
> >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> >
> > Thanks
> > Rajesh
> >
> >
> > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra  wrote:
> >
> > > Hi Danny,
> > >
> > > I have a blocker as well
> > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> > > works fine for me.
> > >
> > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > > in the highlights section. We will work on getting some doc updates
> > > for the same by next week.
> > >
> > > Thanks,
> > > Udit
> > >
> > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar 
> > wrote:
> > > >
> > > > Hi Danny,
> > > >
> > > > I have one blocker. I plan to complete it by end of next week. I am
> > good
> > > > with the prior Nov 26 cutoff.
> > > > Does that work for everyone?
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan 
> > > wrote:
> > > >
> > > > > Hi Community,
> > > > >
> > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to share
> a
> > > > > summary of the key features/improvements that would be going in the
> > > release
> > > > > and the current blockers for everyone's visibility.
> > > > >
> > > > > *Highlights*
> > > > >
> > > > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > > >- [HUDI-1491] Support partition pruning for MOR snapshot query
> > > > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> > > value
> > > > >when records within multiple log files are merged
> > > > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > > > >- [HUDI-1869] Upgrading Spark3 To 3.1
> > > > >- [HUDI-2101] support z-order for hudi
> > > > >- [HUDI-2276] Enable Metadata Table by default for both writers
> > and
> > > > >readers
> > > > >- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile
> > for
> > > col
> > > > >stats partition
> > > > >- [HUDI-2634] Improve bootstrap performance for very large
> tables
> > > > >- [HUDI-2086] redo the logical of mor_incremental_view for hive
> > > > >- [HUDI-2191] Bump flink version to 1.13.1
> > > > >- [HUDI-2285] Metadata Table Synchronous Design
> > > > >- [HUDI-2316] Support Flink batch upsert
> > > > >- [HUDI-2371] Improve flink streaming reader
> > > > >- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect
> > for
> > > > >immutable data
> > > > >- [HUDI-2449] Incremental read for Flink
> > > > >- [HUDI-2562] Embedded timeline server on JobManager
> > > > >
> > > > > *Current Blockers*
> > > > >
> > > > >- [HUDI-1856] Upstream changes made in PrestoDB to eliminate
> file
> > > > >listing to Trino (Owner: Sagar Sumit)
> > > > >- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> > > Hudi
> > > > >tables (Owner: Sagar Sumit)
> > > > >- [HUDI-1932] Hive Sync should not always update
> > > last_commit_time_sync
> > > > >(Owner: Raymond Xu)
> > > > >- [HUDI-1937] When clustering fail, generating unfinished
> > > replacecommit
> > > > >timeline. (Owner: Sagar Sumit)
> > > > >- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> > > Sumit)
> > > > >- [HUDI-2314] Add Dyn

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Y Ethan Guo
Hi Danny,

Thanks for summarizing the current progress towards the 0.10.0 release.
I'm good with Nov 26th cutoff.

Regarding my blockers:
- [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
   Connect (Owner: Ethan Guo)
PR is up.  I'm addressing comments.

- [HUDI-2737] Use earliest instant by default for compaction and
   clustering job (Owner: Ethan Guo)
PR is up and approved.  It's near-landing after fixing CI failures.

- [HUDI-2745] Record count does not match input after compaction is
   scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
this issue once done.

- [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
   (Owner: Ethan Guo)
This is pending and requires investigation into the archival logic which is
not Kafka-connect specific.

Best,
- Ethan


On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra  wrote:

> Hi Danny,
>
> I have the following blockers that have a PR up. I am working on a PR for
> the Debezium Source. I am fine with Nov 26th as cut off.
>
>- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>(Owner: Rajesh Mahindra)
>- [HUDI-2671] Fix record offset handling in Kafka connect transaction
>participant (Owner: Rajesh Mahindra)
>- [HUDI-2672] Avoid empty commits and rollbacks when there is no event
>from the topic (Owner: Rajesh Mahindra)
>
> ** Pending
>- [HUDI-1290] Implement Debezium avro source for Delta Streamer
>
> Thanks
> Rajesh
>
>
> On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra  wrote:
>
> > Hi Danny,
> >
> > I have a blocker as well
> > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> > works fine for me.
> >
> > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > in the highlights section. We will work on getting some doc updates
> > for the same by next week.
> >
> > Thanks,
> > Udit
> >
> > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar 
> wrote:
> > >
> > > Hi Danny,
> > >
> > > I have one blocker. I plan to complete it by end of next week. I am
> good
> > > with the prior Nov 26 cutoff.
> > > Does that work for everyone?
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan 
> > wrote:
> > >
> > > > Hi Community,
> > > >
> > > > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > > > summary of the key features/improvements that would be going in the
> > release
> > > > and the current blockers for everyone's visibility.
> > > >
> > > > *Highlights*
> > > >
> > > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > >- [HUDI-1491] Support partition pruning for MOR snapshot query
> > > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> > value
> > > >when records within multiple log files are merged
> > > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > > >- [HUDI-1869] Upgrading Spark3 To 3.1
> > > >- [HUDI-2101] support z-order for hudi
> > > >- [HUDI-2276] Enable Metadata Table by default for both writers
> and
> > > >readers
> > > >- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile
> for
> > col
> > > >stats partition
> > > >- [HUDI-2634] Improve bootstrap performance for very large tables
> > > >- [HUDI-2086] redo the logical of mor_incremental_view for hive
> > > >- [HUDI-2191] Bump flink version to 1.13.1
> > > >- [HUDI-2285] Metadata Table Synchronous Design
> > > >- [HUDI-2316] Support Flink batch upsert
> > > >- [HUDI-2371] Improve flink streaming reader
> > > >- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect
> for
> > > >immutable data
> > > >- [HUDI-2449] Incremental read for Flink
> > > >- [HUDI-2562] Embedded timeline server on JobManager
> > > >
> > > > *Current Blockers*
> > > >
> > > >- [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> > > >listing to Trino (Owner: Sagar Sumit)
> > > >- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> > Hudi
> > > >tables (Owner: Sagar Sumit)
> > > >- [HUDI-1932] Hive Sync should not always update
> > last_commit_time_sync
> > > >(Owner: Raymond Xu)
> > > >- [HUDI-1937] When clustering fail, generating unfinished
> > replacecommit
> > > >timeline. (Owner: Sagar Sumit)
> > > >- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> > Sumit)
> > > >- [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning
> Ding)
> > > >- [HUDI-2325] Implement and test Hive Sync support for Kafka
> Connect
> > > >(Owner: Rajesh Mahindra)
> > > >- [HUDI-2332] Implement scheduling of compaction/ clustering for
> > Kafka
> > > >Connect (Owner: Ethan Guo)
> > > >- [HUDI-2362] Hudi external configuration file support (Owner:
> > Wenning
> > > >Ding)
> > > >-

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Rajesh Mahindra
Hi Danny,

I have the following blockers that have a PR up. I am working on a PR for
the Debezium Source. I am fine with Nov 26th as cut off.

   - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
   (Owner: Rajesh Mahindra)
   - [HUDI-2671] Fix record offset handling in Kafka connect transaction
   participant (Owner: Rajesh Mahindra)
   - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
   from the topic (Owner: Rajesh Mahindra)

** Pending
   - [HUDI-1290] Implement Debezium avro source for Delta Streamer

Thanks
Rajesh


On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra  wrote:

> Hi Danny,
>
> I have a blocker as well
> https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> works fine for me.
>
> Also, just an update on the above list: HUDI-2641, HUDI-2314,
> HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> in the highlights section. We will work on getting some doc updates
> for the same by next week.
>
> Thanks,
> Udit
>
> On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar  wrote:
> >
> > Hi Danny,
> >
> > I have one blocker. I plan to complete it by end of next week. I am good
> > with the prior Nov 26 cutoff.
> > Does that work for everyone?
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan 
> wrote:
> >
> > > Hi Community,
> > >
> > > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > > summary of the key features/improvements that would be going in the
> release
> > > and the current blockers for everyone's visibility.
> > >
> > > *Highlights*
> > >
> > >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > >- [HUDI-1491] Support partition pruning for MOR snapshot query
> > >- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> value
> > >when records within multiple log files are merged
> > >- [HUDI-1827] Add ORC support in Bootstrap Op
> > >- [HUDI-1869] Upgrading Spark3 To 3.1
> > >- [HUDI-2101] support z-order for hudi
> > >- [HUDI-2276] Enable Metadata Table by default for both writers and
> > >readers
> > >- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for
> col
> > >stats partition
> > >- [HUDI-2634] Improve bootstrap performance for very large tables
> > >- [HUDI-2086] redo the logical of mor_incremental_view for hive
> > >- [HUDI-2191] Bump flink version to 1.13.1
> > >- [HUDI-2285] Metadata Table Synchronous Design
> > >- [HUDI-2316] Support Flink batch upsert
> > >- [HUDI-2371] Improve flink streaming reader
> > >- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
> > >immutable data
> > >- [HUDI-2449] Incremental read for Flink
> > >- [HUDI-2562] Embedded timeline server on JobManager
> > >
> > > *Current Blockers*
> > >
> > >- [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> > >listing to Trino (Owner: Sagar Sumit)
> > >- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> Hudi
> > >tables (Owner: Sagar Sumit)
> > >- [HUDI-1932] Hive Sync should not always update
> last_commit_time_sync
> > >(Owner: Raymond Xu)
> > >- [HUDI-1937] When clustering fail, generating unfinished
> replacecommit
> > >timeline. (Owner: Sagar Sumit)
> > >- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> Sumit)
> > >- [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
> > >- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> > >(Owner: Rajesh Mahindra)
> > >- [HUDI-2332] Implement scheduling of compaction/ clustering for
> Kafka
> > >Connect (Owner: Ethan Guo)
> > >- [HUDI-2362] Hudi external configuration file support (Owner:
> Wenning
> > >Ding)
> > >- [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
> > >Sagar Sumit)
> > >- [HUDI-2443] KVComparator in HFile for metadata table is tied to
> HBase
> > >version and shading (Owner: Sagar Sumit)
> > >- [HUDI-2472] Tests failure follow up when metadata is enabled by
> > >default (Owner: Manoj Govindassamy)
> > >- [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
> > >metadata (Owner: Manoj Govindassamy)
> > >- [HUDI-2478] Handle failure mid-way during init buckets (Owner:
> Vinoth
> > >Chandar)
> > >- [HUDI-2480] FileSlice after pending compaction-requested
> instant-time
> > >is ignored by MOR snapshot reader (Owner: Danny Chen)
> > >- [HUDI-2488] Support bootstrapping a single or more partitions in
> > >metadata table while regular writers and table services are in
> progress
> > >(Owner: Vinoth Chandar)
> > >- [HUDI-2527] Flaky test:
> > >
> > >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > >(Owner: sivabalan narayanan)
> > >- [HUDI-2559] Ensure unique timestamps are generated for commit
> times
> > >with concurrent

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Udit Mehrotra
Hi Danny,

I have a blocker as well
https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
works fine for me.

Also, just an update on the above list: HUDI-2641, HUDI-2314,
HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
in the highlights section. We will work on getting some doc updates
for the same by next week.

Thanks,
Udit

On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar  wrote:
>
> Hi Danny,
>
> I have one blocker. I plan to complete it by end of next week. I am good
> with the prior Nov 26 cutoff.
> Does that work for everyone?
>
> Thanks
> Vinoth
>
> On Fri, Nov 19, 2021 at 12:12 AM Danny Chan  wrote:
>
> > Hi Community,
> >
> > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > summary of the key features/improvements that would be going in the release
> > and the current blockers for everyone's visibility.
> >
> > *Highlights*
> >
> >- [HUDI-1290] Implement Debezium avro source for Delta Streamer
> >- [HUDI-1491] Support partition pruning for MOR snapshot query
> >- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
> >when records within multiple log files are merged
> >- [HUDI-1827] Add ORC support in Bootstrap Op
> >- [HUDI-1869] Upgrading Spark3 To 3.1
> >- [HUDI-2101] support z-order for hudi
> >- [HUDI-2276] Enable Metadata Table by default for both writers and
> >readers
> >- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
> >stats partition
> >- [HUDI-2634] Improve bootstrap performance for very large tables
> >- [HUDI-2086] redo the logical of mor_incremental_view for hive
> >- [HUDI-2191] Bump flink version to 1.13.1
> >- [HUDI-2285] Metadata Table Synchronous Design
> >- [HUDI-2316] Support Flink batch upsert
> >- [HUDI-2371] Improve flink streaming reader
> >- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
> >immutable data
> >- [HUDI-2449] Incremental read for Flink
> >- [HUDI-2562] Embedded timeline server on JobManager
> >
> > *Current Blockers*
> >
> >- [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> >listing to Trino (Owner: Sagar Sumit)
> >- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
> >tables (Owner: Sagar Sumit)
> >- [HUDI-1932] Hive Sync should not always update last_commit_time_sync
> >(Owner: Raymond Xu)
> >- [HUDI-1937] When clustering fail, generating unfinished replacecommit
> >timeline. (Owner: Sagar Sumit)
> >- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
> >- [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
> >- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> >(Owner: Rajesh Mahindra)
> >- [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> >Connect (Owner: Ethan Guo)
> >- [HUDI-2362] Hudi external configuration file support (Owner: Wenning
> >Ding)
> >- [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
> >Sagar Sumit)
> >- [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
> >version and shading (Owner: Sagar Sumit)
> >- [HUDI-2472] Tests failure follow up when metadata is enabled by
> >default (Owner: Manoj Govindassamy)
> >- [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
> >metadata (Owner: Manoj Govindassamy)
> >- [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
> >Chandar)
> >- [HUDI-2480] FileSlice after pending compaction-requested instant-time
> >is ignored by MOR snapshot reader (Owner: Danny Chen)
> >- [HUDI-2488] Support bootstrapping a single or more partitions in
> >metadata table while regular writers and table services are in progress
> >(Owner: Vinoth Chandar)
> >- [HUDI-2527] Flaky test:
> >
> >  
> > TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> >(Owner: sivabalan narayanan)
> >- [HUDI-2559] Ensure unique timestamps are generated for commit times
> >with concurrent writers (Owner: sivabalan narayanan)
> >- [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
> >Govindassamy)
> >- [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
> >tables in Presto (Owner: Sagar Sumit)
> >- [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> >- [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
> >- [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
> >flows (Owner: sivabalan narayanan)
> >- [HUDI-2641] One inflight commit rolling back other concurrent inflight
> >commits causing them to fail (Owner: Udit Mehrotra)
> >- [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
> >Sagar Sumit)
> >- [HUDI-2666] async compaction failing with timeline mismatches between
> >serv

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Manoj Govindassamy
Hi Danny,

I am good with the Nov 26th cutoff as well. I am working on the below
in-progress items and have one other pending. For the rest all from the
list, PRs are out or landed. Thanks for compiling the list.

*InProgress:*
 - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
   record payload (Owner: Manoj Govindassamy)
 - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
   metadata (Owner: Manoj Govindassamy)

*Pending:*
- [HUDI-2590] Validate Diff key gen w/ and w/o glob path with and w/o
metadata enabled

*Completed:*
 - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
   Govindassamy)
  - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
   Govindassamy)
  - [HUDI-2472] Tests failure follow up when metadata is enabled by
   default (Owner: Manoj Govindassamy)
  - [HUDI-2666] async compaction failing with timeline mismatches between
   server and client when metadata is enabled (Owner: Manoj Govindassamy)
 - [HUDI-2764] Address test failures after enabling virtual keys support
   for the metadata table (Owner: Manoj Govindassamy)

On Fri, Nov 19, 2021 at 12:12 AM Danny Chan  wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> summary of the key features/improvements that would be going in the release
> and the current blockers for everyone's visibility.
>
> *Highlights*
>
>- [HUDI-1290] Implement Debezium avro source for Delta Streamer
>- [HUDI-1491] Support partition pruning for MOR snapshot query
>- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
>when records within multiple log files are merged
>- [HUDI-1827] Add ORC support in Bootstrap Op
>- [HUDI-1869] Upgrading Spark3 To 3.1
>- [HUDI-2101] support z-order for hudi
>- [HUDI-2276] Enable Metadata Table by default for both writers and
>readers
>- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
>stats partition
>- [HUDI-2634] Improve bootstrap performance for very large tables
>- [HUDI-2086] redo the logical of mor_incremental_view for hive
>- [HUDI-2191] Bump flink version to 1.13.1
>- [HUDI-2285] Metadata Table Synchronous Design
>- [HUDI-2316] Support Flink batch upsert
>- [HUDI-2371] Improve flink streaming reader
>- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
>immutable data
>- [HUDI-2449] Incremental read for Flink
>- [HUDI-2562] Embedded timeline server on JobManager
>
> *Current Blockers*
>
>- [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
>listing to Trino (Owner: Sagar Sumit)
>- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
>tables (Owner: Sagar Sumit)
>- [HUDI-1932] Hive Sync should not always update last_commit_time_sync
>(Owner: Raymond Xu)
>- [HUDI-1937] When clustering fail, generating unfinished replacecommit
>timeline. (Owner: Sagar Sumit)
>- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
>- [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
>- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>(Owner: Rajesh Mahindra)
>- [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>Connect (Owner: Ethan Guo)
>- [HUDI-2362] Hudi external configuration file support (Owner: Wenning
>Ding)
>- [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
>Sagar Sumit)
>- [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
>version and shading (Owner: Sagar Sumit)
>- [HUDI-2472] Tests failure follow up when metadata is enabled by
>default (Owner: Manoj Govindassamy)
>- [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
>metadata (Owner: Manoj Govindassamy)
>- [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
>Chandar)
>- [HUDI-2480] FileSlice after pending compaction-requested instant-time
>is ignored by MOR snapshot reader (Owner: Danny Chen)
>- [HUDI-2488] Support bootstrapping a single or more partitions in
>metadata table while regular writers and table services are in progress
>(Owner: Vinoth Chandar)
>- [HUDI-2527] Flaky test:
>
>  TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
>(Owner: sivabalan narayanan)
>- [HUDI-2559] Ensure unique timestamps are generated for commit times
>with concurrent writers (Owner: sivabalan narayanan)
>- [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
>Govindassamy)
>- [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
>tables in Presto (Owner: Sagar Sumit)
>- [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
>- [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
>- [HUDI-2637] Triage all bugs around Multi-writer and certify the test

Re: [DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Vinoth Chandar
Hi Danny,

I have one blocker. I plan to complete it by end of next week. I am good
with the prior Nov 26 cutoff.
Does that work for everyone?

Thanks
Vinoth

On Fri, Nov 19, 2021 at 12:12 AM Danny Chan  wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> summary of the key features/improvements that would be going in the release
> and the current blockers for everyone's visibility.
>
> *Highlights*
>
>- [HUDI-1290] Implement Debezium avro source for Delta Streamer
>- [HUDI-1491] Support partition pruning for MOR snapshot query
>- [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
>when records within multiple log files are merged
>- [HUDI-1827] Add ORC support in Bootstrap Op
>- [HUDI-1869] Upgrading Spark3 To 3.1
>- [HUDI-2101] support z-order for hudi
>- [HUDI-2276] Enable Metadata Table by default for both writers and
>readers
>- [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
>stats partition
>- [HUDI-2634] Improve bootstrap performance for very large tables
>- [HUDI-2086] redo the logical of mor_incremental_view for hive
>- [HUDI-2191] Bump flink version to 1.13.1
>- [HUDI-2285] Metadata Table Synchronous Design
>- [HUDI-2316] Support Flink batch upsert
>- [HUDI-2371] Improve flink streaming reader
>- [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
>immutable data
>- [HUDI-2449] Incremental read for Flink
>- [HUDI-2562] Embedded timeline server on JobManager
>
> *Current Blockers*
>
>- [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
>listing to Trino (Owner: Sagar Sumit)
>- [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
>tables (Owner: Sagar Sumit)
>- [HUDI-1932] Hive Sync should not always update last_commit_time_sync
>(Owner: Raymond Xu)
>- [HUDI-1937] When clustering fail, generating unfinished replacecommit
>timeline. (Owner: Sagar Sumit)
>- [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
>- [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
>- [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>(Owner: Rajesh Mahindra)
>- [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>Connect (Owner: Ethan Guo)
>- [HUDI-2362] Hudi external configuration file support (Owner: Wenning
>Ding)
>- [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
>Sagar Sumit)
>- [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
>version and shading (Owner: Sagar Sumit)
>- [HUDI-2472] Tests failure follow up when metadata is enabled by
>default (Owner: Manoj Govindassamy)
>- [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
>metadata (Owner: Manoj Govindassamy)
>- [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
>Chandar)
>- [HUDI-2480] FileSlice after pending compaction-requested instant-time
>is ignored by MOR snapshot reader (Owner: Danny Chen)
>- [HUDI-2488] Support bootstrapping a single or more partitions in
>metadata table while regular writers and table services are in progress
>(Owner: Vinoth Chandar)
>- [HUDI-2527] Flaky test:
>
>  TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
>(Owner: sivabalan narayanan)
>- [HUDI-2559] Ensure unique timestamps are generated for commit times
>with concurrent writers (Owner: sivabalan narayanan)
>- [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
>Govindassamy)
>- [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
>tables in Presto (Owner: Sagar Sumit)
>- [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
>- [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
>- [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
>flows (Owner: sivabalan narayanan)
>- [HUDI-2641] One inflight commit rolling back other concurrent inflight
>commits causing them to fail (Owner: Udit Mehrotra)
>- [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
>Sagar Sumit)
>- [HUDI-2666] async compaction failing with timeline mismatches between
>server and client when metadata is enabled (Owner: Manoj Govindassamy)
>- [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
>AbstractTablefileSystemView (Owner: Sagar Sumit)
>- [HUDI-2671] Fix record offset handling in Kafka connect transaction
>participant (Owner: Rajesh Mahindra)
>- [HUDI-2672] Avoid empty commits and rollbacks when there is no event
>from the topic (Owner: Rajesh Mahindra)
>- [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
>Govindassamy)
>- [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
>- [HUDI-27

[DISCUSS] Hudi 0.10.0 Release

2021-11-19 Thread Danny Chan
Hi Community,

As we draw close to doing Hudi 0.10.0 release, I am happy to share a
summary of the key features/improvements that would be going in the release
and the current blockers for everyone's visibility.

*Highlights*

   - [HUDI-1290] Implement Debezium avro source for Delta Streamer
   - [HUDI-1491] Support partition pruning for MOR snapshot query
   - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
   when records within multiple log files are merged
   - [HUDI-1827] Add ORC support in Bootstrap Op
   - [HUDI-1869] Upgrading Spark3 To 3.1
   - [HUDI-2101] support z-order for hudi
   - [HUDI-2276] Enable Metadata Table by default for both writers and
   readers
   - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
   stats partition
   - [HUDI-2634] Improve bootstrap performance for very large tables
   - [HUDI-2086] redo the logical of mor_incremental_view for hive
   - [HUDI-2191] Bump flink version to 1.13.1
   - [HUDI-2285] Metadata Table Synchronous Design
   - [HUDI-2316] Support Flink batch upsert
   - [HUDI-2371] Improve flink streaming reader
   - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
   immutable data
   - [HUDI-2449] Incremental read for Flink
   - [HUDI-2562] Embedded timeline server on JobManager

*Current Blockers*

   - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
   listing to Trino (Owner: Sagar Sumit)
   - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
   tables (Owner: Sagar Sumit)
   - [HUDI-1932] Hive Sync should not always update last_commit_time_sync
   (Owner: Raymond Xu)
   - [HUDI-1937] When clustering fail, generating unfinished replacecommit
   timeline. (Owner: Sagar Sumit)
   - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
   - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
   - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
   (Owner: Rajesh Mahindra)
   - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
   Connect (Owner: Ethan Guo)
   - [HUDI-2362] Hudi external configuration file support (Owner: Wenning
   Ding)
   - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
   Sagar Sumit)
   - [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
   version and shading (Owner: Sagar Sumit)
   - [HUDI-2472] Tests failure follow up when metadata is enabled by
   default (Owner: Manoj Govindassamy)
   - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
   metadata (Owner: Manoj Govindassamy)
   - [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
   Chandar)
   - [HUDI-2480] FileSlice after pending compaction-requested instant-time
   is ignored by MOR snapshot reader (Owner: Danny Chen)
   - [HUDI-2488] Support bootstrapping a single or more partitions in
   metadata table while regular writers and table services are in progress
   (Owner: Vinoth Chandar)
   - [HUDI-2527] Flaky test:
   TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
   (Owner: sivabalan narayanan)
   - [HUDI-2559] Ensure unique timestamps are generated for commit times
   with concurrent writers (Owner: sivabalan narayanan)
   - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
   Govindassamy)
   - [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
   tables in Presto (Owner: Sagar Sumit)
   - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
   - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
   - [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
   flows (Owner: sivabalan narayanan)
   - [HUDI-2641] One inflight commit rolling back other concurrent inflight
   commits causing them to fail (Owner: Udit Mehrotra)
   - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
   Sagar Sumit)
   - [HUDI-2666] async compaction failing with timeline mismatches between
   server and client when metadata is enabled (Owner: Manoj Govindassamy)
   - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
   AbstractTablefileSystemView (Owner: Sagar Sumit)
   - [HUDI-2671] Fix record offset handling in Kafka connect transaction
   participant (Owner: Rajesh Mahindra)
   - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
   from the topic (Owner: Rajesh Mahindra)
   - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
   Govindassamy)
   - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
   - [HUDI-2731] Clustering should work regardless of whether there are
   base files (Owner: Sagar Sumit)
   - [HUDI-2734] Disable metadata by default for flink and java (Owner:
   sivabalan narayanan)
   - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
   (Owner: Ethan Guo)
   - [HUDI-2737] Use earliest instant by default for compaction and
   clustering job (Owner: Ethan Guo)