Re: spark 3.2.1 built-in bloom filters

2022-05-19 Thread Nicolas Paris
As we now got hudi 0.11 with multiple columns bloom indexes thougth `hoodie.metadata.index.bloom.filter.column.list`, the question is wether those bloom are used by query planner for e.g id=19 The spark built-in blooms are used in this case, maybe that's also the hudi multi-bloom purpose as well

Re: spark 3.2.1 built-in bloom filters

2022-04-04 Thread Vinoth Chandar
By all means. That would be great. Always looking for helping hand in improving docs On Sat, Apr 2, 2022 at 6:18 AM Nicolas Paris wrote: > Hi Vinoth, > > Thanks for your in depth explanations. I think those details could be > of interest in the documentation. I can work on this if agreed > >

Re: spark 3.2.1 built-in bloom filters

2022-04-02 Thread Nicolas Paris
Hi Vinoth, Thanks for your in depth explanations. I think those details could be of interest in the documentation. I can work on this if agreed On Wed, 2022-03-30 at 14:36 -0700, Vinoth Chandar wrote: > Hi, > > I noticed that it finally landed. We actually began tracking that > JIRA > while

Re: spark 3.2.1 built-in bloom filters

2022-03-30 Thread Vinoth Chandar
Hi, I noticed that it finally landed. We actually began tracking that JIRA while initially writing Hudi at Uber.. Parquet + Bloom Filters has taken just a few years :) I think we could switch out to reading the built-in bloom filters as well. it could make the footer reading lighter potentially.

spark 3.2.1 built-in bloom filters

2022-03-28 Thread Nicolas Paris
Hi, spark 3.2 ships parquet 1.12 which provides built-in bloom filters on arbirtrary columns. I wonder if: - hudi can benefit from them ? (likely in 0.11, but not with MOR tables) - would make sense to replace the hudi blooms with them ? - what would be the advantage of storing our blooms in