date:20211105

[DISCUSS] RFC-27 for Data skipping/column stats index rewrite -> github RFC

2021-11-05 Thread Sivabalan

Hey folks,
We have already put up RFC-27 data skipping/column stats index here
.
We have done more  analysis on this end and looking to add/fix more details
to the RFC. As we have moved to github PRs for RFC process, I will rewrite
the RFC using the new process and along the way fix the design and impl
details. High level intent and purpose remains the same, just some
specifics on the design and implementation to be added/fixed.

-- 
Regards,
-Sivabalan

Re: Limitations of non unique keys

2021-11-05 Thread Sivabalan

got you. thanks for the clarification.

On Fri, Nov 5, 2021 at 3:53 PM Vinoth Chandar 
wrote:

> Hi Siva,
>
> I think this is more about bloom filters and record level index, which is
> different from RFC-27.
>
> RFC-08 talks about record level indexing. Bloom filter indexes have a
> discuss thread just kicked off.
>
> Main thing we are trying to solidify in 0.10.0 is foundational
> metadata table and concurrency mechanisms to be able to add an index in the
> background say.
>
> Thanks
> Vinoth
>
> On Fri, Nov 5, 2021 at 8:47 AM Sivabalan  wrote:
>
> > Thanks for bringing this up. We have a RFC-27 on data skipping
> > <
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance
> > >
> > which is the secondary indexing being discussed here. We are flushing out
> > few more details on this end and will put up patches once we figure out
> > the unknowns. We have a WIP patch here
> > , but needs some refactoring
> and
> > updates before we its ready for review.
> > And we are also thinking of moving the existing bloom filters (from data
> > files) into metadata table and re-use them instead of reading from all
> data
> > files with the expectation to boost performance for index lookup. We will
> > start a discussion thread around this and go from there.
> >
> >
> >
> > On Wed, Nov 3, 2021 at 5:36 PM Nicolas Paris 
> > wrote:
> >
> > >
> > > > In another words, we are generalizing this so hudi feels more like
> > > > MySQL and not HBase/Cassandra (key value store). Thats the direction
> > > > we are approaching.
> > >
> > > wow this is amazing. I haven't found yet RFC about this, nor ready to
> > > test PR.
> > >
> > > This answer my initial question: with the secondary indexes options
> > > comming, the hudi key shall be a primary key (if exists). There is no
> > > reason to choose anything else.
> > >
> > > On Wed Nov 3, 2021 at 9:03 PM CET, Vinoth Chandar wrote:
> > > > Hi.
> > > >
> > > > With the indexing approach we are taking, you should be able to add
> > > > secondary indexes on any column. not just the key.
> > > > In another words, we are generalizing this so hudi feels more like
> > MySQL
> > > > and not HBase/Cassandra (key value store). Thats the direction we are
> > > > approaching.
> > > >
> > > > love to hear more feedback.
> > > >
> > > > On Tue, Nov 2, 2021 at 2:29 AM Nicolas Paris <
> nicolas.pa...@riseup.net
> > >
> > > > wrote:
> > > >
> > > > > for example does the move of blooms into hfiles (0.10.0 feature)
> > makes
> > > > > unique bloom keys mandatory ?
> > > > >
> > > > >
> > > > >
> > > > > On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote:
> > > > > >
> > > > > > > Are you asking if there are advantages to allowing duplicates
> or
> > > not
> > > > > having keys in your table?
> > > > > > it's all about allowing duplicates
> > > > > >
> > > > > > use case is say an Order table and choosing key = customer_id
> > > > > > then being able to do indexed delete without need of prescanning
> > the
> > > > > > dataset
> > > > > >
> > > > > > I wonder if there will be trouble I am unaware of with such trick
> > > > > >
> > > > > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Are you asking if there are advantages to allowing duplicates
> or
> > > not
> > > > > > > having
> > > > > > > keys in your table?
> > > > > > >
> > > > > > > Having keys, helps with othe practical scenarios, in addition
> to
> > > what
> > > > > > > you
> > > > > > > called out.
> > > > > > > e.g: Oftentimes, you would want to backfill an insert-only
> table
> > > and
> > > > > you
> > > > > > > don't want to introduce duplicates when doing so.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <
> > > > > nicolas.pa...@riseup.net>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi devs,
> > > > > > > >
> > > > > > > > AFAIK, hudi has been designed to have primary keys in the
> > hudi's
> > > key.
> > > > > > > > However it is possible to also choose a non unique field. I
> > have
> > > > > listed
> > > > > > > > several trouble with such design:
> > > > > > > >
> > > > > > > > Non unique key yield to :
> > > > > > > > - cannot delete / update a unique record
> > > > > > > > - cannot apply primary key for new sql tables feature
> > > > > > > >
> > > > > > > > Is there other downsides to choose a non unique key you have
> in
> > > mind
> > > > > ?
> > > > > > > >
> > > > > > > > In my case, having user_id as a hudi key will help to apply
> > > deletion
> > > > > on
> > > > > > > > the user level in any user table. The table are insert only,
> so
> > > the
> > > > > > > > drawbacks listed above do not really apply. In case of error
> in
> > > the
> > > > > > > > tables I have several options:
> > > > > > > >
> > > > > > > > - rollback to a previous commit
> > > > > > > > - read

Re: [DISCUSS] Trino Plugin for Hudi

2021-11-05 Thread Vinoth Chandar

Could we please kick off an RFC for this?

On Thu, Nov 4, 2021 at 8:58 PM sagar sumit  wrote:

> I have created an umbrella JIRA to track this story:
> https://issues.apache.org/jira/browse/HUDI-2687
> Please also join #trino-hudi-connector channel in Hudi Slack for more
> discussion.
>
> Regards,
> Sagar
>
> On Thu, Oct 21, 2021 at 5:38 PM sagar sumit 
> wrote:
>
> > This patch supports snapshot queries on MOR table:
> > https://github.com/trinodb/trino/pull/9641
> > That works with the existing hive connector.
> >
> > Right now, I have only prototyped snapshot queries on COW table with the
> > new hudi connector in https://github.com/codope/trino/tree/hudi-plugin
> > I will be working on supporting the MOR table as well.
> >
> > Regards,
> > Sagar
> >
> > On Wed, Oct 20, 2021 at 4:48 PM Jian Feng  wrote:
> >
> >> When can Trino support snapshot queries on the Merge-on-read table?
> >>
> >> On Mon, Oct 18, 2021 at 9:06 PM 周康  wrote:
> >>
> >> > +1 i have send a message on trino slack, really appreciate for the new
> >> > trino plugin/connector.
> >> > https://trinodb.slack.com/archives/CP1MUNEUX/p1623838591370200
> >> >
> >> > looking forward to the RFC and more discussion
> >> >
> >> > On 2021/10/17 06:06:09 sagar sumit wrote:
> >> > > Dear Hudi Community,
> >> > >
> >> > > I would like to propose the development of a new Trino
> >> plugin/connector
> >> > for
> >> > > Hudi.
> >> > >
> >> > > Today, Hudi supports snapshot queries on Copy-On-Write (COW) tables
> >> and
> >> > > read-optimized queries on Merge-On-Read tables with Trino, through
> the
> >> > > input format based integration in the Hive connector [1
> >> > > ].
> >> This
> >> > > approach has known performance limitations with very large tables,
> >> which
> >> > > has been since fixed on PrestoDB [2
> >> > > ]. We are
> >> > working on
> >> > > replicating the same fixes on Trino as well [3
> >> > > ].
> >> > >
> >> > > However, as Hudi keeps getting better, a new plugin to provide
> access
> >> to
> >> > > Hudi data and metadata will help in unlocking those capabilities for
> >> the
> >> > > Trino users. Just to name a few benefits, metadata-based listing,
> full
> >> > > schema evolution, etc [4
> >> > > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> >> > >].
> >> > > Moreover, a separate Hudi connector would allow its independent
> >> evolution
> >> > > without having to worry about hacking/breaking the Hive connector.
> >> > >
> >> > > A separate connector also falls in line with our vision [5
> >> > > <
> >> >
> >>
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> >> > >]
> >> > > when we think of a standalone timeline server or a lake cache to
> >> balance
> >> > > the tradeoff between writing and querying. Imagine users having read
> >> and
> >> > > write access to data and metadata in Hudi directly through Trino.
> >> > >
> >> > > I did some prototyping to get the snapshot queries on a Hudi COW
> table
> >> > > working with a new plugin [6
> >> > > ], and I feel the
> >> > effort
> >> > > is worth it. High-level approach is to implement the connector SPI
> [7
> >> > > ] provided
> by
> >> > Trino
> >> > > such as:
> >> > > a) HudiMetadata implements ConnectorMetadata to fetch table
> metadata.
> >> > > b) HudiSplit and HudiSplitManager implement ConnectorSplit and
> >> > > ConnectorSplitManager to produce logical units of data partitioning,
> >> so
> >> > > that Trino can parallelize reads and writes.
> >> > >
> >> > > Let me know your thoughts on the proposal. I can draft an RFC for
> the
> >> > > detailed design discussion once we have consensus.
> >> > >
> >> > > Regards,
> >> > > Sagar
> >> > >
> >> > > References:
> >> > > [1] https://github.com/prestodb/presto/commits?author=vinothchandar
> >> > > [2] https://prestodb.io/blog/2020/08/04/prestodb-and-hudi
> >> > > [3] https://github.com/trinodb/trino/pull/9641
> >> > > [4]
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> >> > > [5]
> >> > >
> >> >
> >>
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> >> > > [6] https://github.com/codope/trino/tree/hudi-plugin
> >> > > [7] https://trino.io/docs/current/develop/connectors.html
> >> > >
> >> >
> >>
> >>
> >> --
> >> *Jian Feng,冯健*
> >> Shopee | Engineer | Data Infrastructure
> >>
> >
>

Re: Limitations of non unique keys

2021-11-05 Thread Vinoth Chandar

Hi Siva,

I think this is more about bloom filters and record level index, which is
different from RFC-27.

RFC-08 talks about record level indexing. Bloom filter indexes have a
discuss thread just kicked off.

Main thing we are trying to solidify in 0.10.0 is foundational
metadata table and concurrency mechanisms to be able to add an index in the
background say.

Thanks
Vinoth

On Fri, Nov 5, 2021 at 8:47 AM Sivabalan  wrote:

> Thanks for bringing this up. We have a RFC-27 on data skipping
> <
> https://cwiki.apache.org/confluence/display/HUDI/RFC-27+Data+skipping+index+to+improve+query+performance
> >
> which is the secondary indexing being discussed here. We are flushing out
> few more details on this end and will put up patches once we figure out
> the unknowns. We have a WIP patch here
> , but needs some refactoring and
> updates before we its ready for review.
> And we are also thinking of moving the existing bloom filters (from data
> files) into metadata table and re-use them instead of reading from all data
> files with the expectation to boost performance for index lookup. We will
> start a discussion thread around this and go from there.
>
>
>
> On Wed, Nov 3, 2021 at 5:36 PM Nicolas Paris 
> wrote:
>
> >
> > > In another words, we are generalizing this so hudi feels more like
> > > MySQL and not HBase/Cassandra (key value store). Thats the direction
> > > we are approaching.
> >
> > wow this is amazing. I haven't found yet RFC about this, nor ready to
> > test PR.
> >
> > This answer my initial question: with the secondary indexes options
> > comming, the hudi key shall be a primary key (if exists). There is no
> > reason to choose anything else.
> >
> > On Wed Nov 3, 2021 at 9:03 PM CET, Vinoth Chandar wrote:
> > > Hi.
> > >
> > > With the indexing approach we are taking, you should be able to add
> > > secondary indexes on any column. not just the key.
> > > In another words, we are generalizing this so hudi feels more like
> MySQL
> > > and not HBase/Cassandra (key value store). Thats the direction we are
> > > approaching.
> > >
> > > love to hear more feedback.
> > >
> > > On Tue, Nov 2, 2021 at 2:29 AM Nicolas Paris  >
> > > wrote:
> > >
> > > > for example does the move of blooms into hfiles (0.10.0 feature)
> makes
> > > > unique bloom keys mandatory ?
> > > >
> > > >
> > > >
> > > > On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote:
> > > > >
> > > > > > Are you asking if there are advantages to allowing duplicates or
> > not
> > > > having keys in your table?
> > > > > it's all about allowing duplicates
> > > > >
> > > > > use case is say an Order table and choosing key = customer_id
> > > > > then being able to do indexed delete without need of prescanning
> the
> > > > > dataset
> > > > >
> > > > > I wonder if there will be trouble I am unaware of with such trick
> > > > >
> > > > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Are you asking if there are advantages to allowing duplicates or
> > not
> > > > > > having
> > > > > > keys in your table?
> > > > > >
> > > > > > Having keys, helps with othe practical scenarios, in addition to
> > what
> > > > > > you
> > > > > > called out.
> > > > > > e.g: Oftentimes, you would want to backfill an insert-only table
> > and
> > > > you
> > > > > > don't want to introduce duplicates when doing so.
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <
> > > > nicolas.pa...@riseup.net>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > AFAIK, hudi has been designed to have primary keys in the
> hudi's
> > key.
> > > > > > > However it is possible to also choose a non unique field. I
> have
> > > > listed
> > > > > > > several trouble with such design:
> > > > > > >
> > > > > > > Non unique key yield to :
> > > > > > > - cannot delete / update a unique record
> > > > > > > - cannot apply primary key for new sql tables feature
> > > > > > >
> > > > > > > Is there other downsides to choose a non unique key you have in
> > mind
> > > > ?
> > > > > > >
> > > > > > > In my case, having user_id as a hudi key will help to apply
> > deletion
> > > > on
> > > > > > > the user level in any user table. The table are insert only, so
> > the
> > > > > > > drawbacks listed above do not really apply. In case of error in
> > the
> > > > > > > tables I have several options:
> > > > > > >
> > > > > > > - rollback to a previous commit
> > > > > > > - read partition/filter overwrite partition
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > >
> > > >
> >
> >
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] Metadata based bloom index

2021-11-05 Thread Vinoth Chandar

+1 on this. I think cloud storage throttling is more of an issue that
causes degradations when tables are enormous.
but this approach should nicely handle that as well

On Fri, Nov 5, 2021 at 9:31 AM Manoj Govindassamy <
manoj.govindass...@gmail.com> wrote:

> Hi Hudi Community,
>
> Hudi has several indices to help lookup records. The most commonly used one
> is the BloomFilter based index. This index today works by loading the bloom
> filter from all the data files of interested partitions. This is a time
> consuming operation. Better would be if can leverage the metadata table
> infrastructure of the Hudi tables. That is, if all the bloom filters can be
> loaded directly from a single metadata table partition, it would greatly
> speed up the entire record key lookup process.
>
> Let me know your thoughts on this high level idea. Planning to start a RFC
> on this and I can share more details on the design and implementation.
>
> Regards,
> Manoj
>

[DISCUSS] Metadata based bloom index

2021-11-05 Thread Manoj Govindassamy

Hi Hudi Community,

Hudi has several indices to help lookup records. The most commonly used one
is the BloomFilter based index. This index today works by loading the bloom
filter from all the data files of interested partitions. This is a time
consuming operation. Better would be if can leverage the metadata table
infrastructure of the Hudi tables. That is, if all the bloom filters can be
loaded directly from a single metadata table partition, it would greatly
speed up the entire record key lookup process.

Let me know your thoughts on this high level idea. Planning to start a RFC
on this and I can share more details on the design and implementation.

Regards,
Manoj

[DISCUSS] RFC for Synchronous Metadata table for File listing

2021-11-05 Thread Sivabalan

RFC-15

made an attempt to boost performance of file listing by storing all file
information in metadata table. As we are looking to build more infra around
metadata table (RFC-27 for data skipping, etc), we felt having a
synchronous design will make it more tighter and will avoid some of the
corner cases with async approach.

So, we will write up a new RFC for file listing based on metadata table
with synchronous updates.

-- 
Regards,
-Sivabalan

Re: Limitations of non unique keys

2021-11-05 Thread Sivabalan

Thanks for bringing this up. We have a RFC-27 on data skipping

which is the secondary indexing being discussed here. We are flushing out
few more details on this end and will put up patches once we figure out
the unknowns. We have a WIP patch here
, but needs some refactoring and
updates before we its ready for review.
And we are also thinking of moving the existing bloom filters (from data
files) into metadata table and re-use them instead of reading from all data
files with the expectation to boost performance for index lookup. We will
start a discussion thread around this and go from there.



On Wed, Nov 3, 2021 at 5:36 PM Nicolas Paris 
wrote:

>
> > In another words, we are generalizing this so hudi feels more like
> > MySQL and not HBase/Cassandra (key value store). Thats the direction
> > we are approaching.
>
> wow this is amazing. I haven't found yet RFC about this, nor ready to
> test PR.
>
> This answer my initial question: with the secondary indexes options
> comming, the hudi key shall be a primary key (if exists). There is no
> reason to choose anything else.
>
> On Wed Nov 3, 2021 at 9:03 PM CET, Vinoth Chandar wrote:
> > Hi.
> >
> > With the indexing approach we are taking, you should be able to add
> > secondary indexes on any column. not just the key.
> > In another words, we are generalizing this so hudi feels more like MySQL
> > and not HBase/Cassandra (key value store). Thats the direction we are
> > approaching.
> >
> > love to hear more feedback.
> >
> > On Tue, Nov 2, 2021 at 2:29 AM Nicolas Paris 
> > wrote:
> >
> > > for example does the move of blooms into hfiles (0.10.0 feature) makes
> > > unique bloom keys mandatory ?
> > >
> > >
> > >
> > > On Thu Oct 28, 2021 at 7:00 PM CEST, Nicolas Paris wrote:
> > > >
> > > > > Are you asking if there are advantages to allowing duplicates or
> not
> > > having keys in your table?
> > > > it's all about allowing duplicates
> > > >
> > > > use case is say an Order table and choosing key = customer_id
> > > > then being able to do indexed delete without need of prescanning the
> > > > dataset
> > > >
> > > > I wonder if there will be trouble I am unaware of with such trick
> > > >
> > > > On Thu Oct 28, 2021 at 2:33 PM CEST, Vinoth Chandar wrote:
> > > > > Hi,
> > > > >
> > > > > Are you asking if there are advantages to allowing duplicates or
> not
> > > > > having
> > > > > keys in your table?
> > > > >
> > > > > Having keys, helps with othe practical scenarios, in addition to
> what
> > > > > you
> > > > > called out.
> > > > > e.g: Oftentimes, you would want to backfill an insert-only table
> and
> > > you
> > > > > don't want to introduce duplicates when doing so.
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Tue, Oct 26, 2021 at 1:37 AM Nicolas Paris <
> > > nicolas.pa...@riseup.net>
> > > > > wrote:
> > > > >
> > > > > > Hi devs,
> > > > > >
> > > > > > AFAIK, hudi has been designed to have primary keys in the hudi's
> key.
> > > > > > However it is possible to also choose a non unique field. I have
> > > listed
> > > > > > several trouble with such design:
> > > > > >
> > > > > > Non unique key yield to :
> > > > > > - cannot delete / update a unique record
> > > > > > - cannot apply primary key for new sql tables feature
> > > > > >
> > > > > > Is there other downsides to choose a non unique key you have in
> mind
> > > ?
> > > > > >
> > > > > > In my case, having user_id as a hudi key will help to apply
> deletion
> > > on
> > > > > > the user level in any user table. The table are insert only, so
> the
> > > > > > drawbacks listed above do not really apply. In case of error in
> the
> > > > > > tables I have several options:
> > > > > >
> > > > > > - rollback to a previous commit
> > > > > > - read partition/filter overwrite partition
> > > > > >
> > > > > > Thanks
> > > > > >
> > >
> > >
>
>

-- 
Regards,
-Sivabalan

Re: Release 0.10.0 planning

2021-11-05 Thread Vinoth Chandar

Let's lock it in, unless someone objects by Monday PST.

On Fri, Nov 5, 2021 at 12:59 AM Gary Li  wrote:

> Nov 26 looks good to me.
>
> Gary
>
> On Wed, Nov 3, 2021 at 10:34 PM Sivabalan  wrote:
>
> > sounds fine by me. Can others (especially PMC and committers) chime in
> here
> > for the proposed date.
> >
> >
> >
> > On Wed, Nov 3, 2021 at 4:11 AM Vinoth Chandar  wrote:
> >
> > > Folks, may be good to push it by a week. Nov 26 can be the RC cut date.
> > >
> > > On Mon, Nov 1, 2021 at 7:41 PM Vinoth Chandar 
> wrote:
> > >
> > > >
> > > > Great! Is everyone good with the nov 19 date? Love to atleast do this
> > > > before nov 26, before holidays kick in!
> > > >
> > > >
> > > > On Mon, Nov 1, 2021 at 7:36 PM Danny Chan 
> > wrote:
> > > >
> > > >> I can take that.
> > > >>
> > > >> Best,
> > > >> Danny
> > > >>
> > > >> Vinoth Chandar  于2021年10月30日周六 上午6:07写道：
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > I propose we cut the RC for 0.10.0 by Nov 19.
> > > >> >
> > > >> > Any volunteers for release manager?
> > > >> >
> > > >> > Thanks
> > > >> > Vinoth
> > > >> >
> > > >> > On Sun, Oct 17, 2021 at 10:45 AM Sivabalan 
> > > wrote:
> > > >> >
> > > >> > > This release has a lot of exciting features lined up. Eagerly
> > > looking
> > > >> > > forward to it.
> > > >> > >
> > > >> > > On Thu, Oct 14, 2021 at 1:17 PM Vinoth Chandar <
> vin...@apache.org
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > It's time for our next release again!
> > > >> > > >
> > > >> > > > I have marked out some blockers here on JIRA.
> > > >> > > >
> > > >> > > >
> https://issues.apache.org/jira/projects/HUDI/versions/12350285
> > > >> > > >
> > > >> > > >
> > > >> > > > Quick highlights:
> > > >> > > > - Metadata table v2, which is synchronously updated
> > > >> > > > - Row writing (Spark) for all write operations
> > > >> > > > - Kafka Connect for append only data model
> > > >> > > > - New indexing schemes moving bloom filters and file range
> > footers
> > > >> into
> > > >> > > > metadata table to improve upsert/delete performance.
> > > >> > > > - Fixes needed for Trino/Presto support.
> > > >> > > > - Most of the "big-needle-mover" PRs that are up already.
> > > >> > > > - Revamp of docs to match our vision.
> > > >> > > >
> > > >> > > > May need some help understanding all the Flink related
> changes.
> > > >> > > >
> > > >> > > > Kindly review and let's use this thread to ratify and discuss
> > > >> > timelines.
> > > >> > > >
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > > Vinoth
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Regards,
> > > >> > > -Sivabalan
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>

Re: Release 0.10.0 planning

2021-11-05 Thread Gary Li

Nov 26 looks good to me.

Gary

On Wed, Nov 3, 2021 at 10:34 PM Sivabalan  wrote:

> sounds fine by me. Can others (especially PMC and committers) chime in here
> for the proposed date.
>
>
>
> On Wed, Nov 3, 2021 at 4:11 AM Vinoth Chandar  wrote:
>
> > Folks, may be good to push it by a week. Nov 26 can be the RC cut date.
> >
> > On Mon, Nov 1, 2021 at 7:41 PM Vinoth Chandar  wrote:
> >
> > >
> > > Great! Is everyone good with the nov 19 date? Love to atleast do this
> > > before nov 26, before holidays kick in!
> > >
> > >
> > > On Mon, Nov 1, 2021 at 7:36 PM Danny Chan 
> wrote:
> > >
> > >> I can take that.
> > >>
> > >> Best,
> > >> Danny
> > >>
> > >> Vinoth Chandar  于2021年10月30日周六 上午6:07写道：
> > >>
> > >> > Hi all,
> > >> >
> > >> > I propose we cut the RC for 0.10.0 by Nov 19.
> > >> >
> > >> > Any volunteers for release manager?
> > >> >
> > >> > Thanks
> > >> > Vinoth
> > >> >
> > >> > On Sun, Oct 17, 2021 at 10:45 AM Sivabalan 
> > wrote:
> > >> >
> > >> > > This release has a lot of exciting features lined up. Eagerly
> > looking
> > >> > > forward to it.
> > >> > >
> > >> > > On Thu, Oct 14, 2021 at 1:17 PM Vinoth Chandar  >
> > >> > wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > It's time for our next release again!
> > >> > > >
> > >> > > > I have marked out some blockers here on JIRA.
> > >> > > >
> > >> > > > https://issues.apache.org/jira/projects/HUDI/versions/12350285
> > >> > > >
> > >> > > >
> > >> > > > Quick highlights:
> > >> > > > - Metadata table v2, which is synchronously updated
> > >> > > > - Row writing (Spark) for all write operations
> > >> > > > - Kafka Connect for append only data model
> > >> > > > - New indexing schemes moving bloom filters and file range
> footers
> > >> into
> > >> > > > metadata table to improve upsert/delete performance.
> > >> > > > - Fixes needed for Trino/Presto support.
> > >> > > > - Most of the "big-needle-mover" PRs that are up already.
> > >> > > > - Revamp of docs to match our vision.
> > >> > > >
> > >> > > > May need some help understanding all the Flink related changes.
> > >> > > >
> > >> > > > Kindly review and let's use this thread to ratify and discuss
> > >> > timelines.
> > >> > > >
> > >> > > >
> > >> > > > Thanks
> > >> > > > Vinoth
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Regards,
> > >> > > -Sivabalan
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

[DISCUSS] RFC-27 for Data skipping/column stats index rewrite -> github RFC

Re: Limitations of non unique keys

Re: [DISCUSS] Trino Plugin for Hudi

Re: Limitations of non unique keys

Re: [DISCUSS] Metadata based bloom index

[DISCUSS] Metadata based bloom index

[DISCUSS] RFC for Synchronous Metadata table for File listing

Re: Limitations of non unique keys

Re: Release 0.10.0 planning

Re: Release 0.10.0 planning

10 matches

Site Navigation

Mail list logo

Footer information