Re: [DISCUSS] Metadata based bloom index

2021-11-05 Thread Vinoth Chandar
+1 on this. I think cloud storage throttling is more of an issue that causes degradations when tables are enormous. but this approach should nicely handle that as well On Fri, Nov 5, 2021 at 9:31 AM Manoj Govindassamy < manoj.govindass...@gmail.com> wrote: > Hi Hudi Community, > > Hudi has

[DISCUSS] Metadata based bloom index

2021-11-05 Thread Manoj Govindassamy
Hi Hudi Community, Hudi has several indices to help lookup records. The most commonly used one is the BloomFilter based index. This index today works by loading the bloom filter from all the data files of interested partitions. This is a time consuming operation. Better would be if can leverage