Re: [DISCUSS] hudi index improve

2022-04-27 Thread Vinoth Chandar
Hi all, This is a great discussion and nice to see how all of this is coming together. Penning down my thoughts. A) +1 on exposing INDEX syntax, we can start with Spark/Flink where we have full control on connectors and iterate faster. B) Do we need a manual refresh mode? Almost all databases

Re: [DISCUSS] hudi index improve

2022-04-18 Thread Danny Chan
In general, it seems that the INDEX commands mainly serve the batch scenarios, there are some cases that need to clarify here: 1. When a user creates an index with manuaral refresh first then inserts a batch of data(named d1) into the table, does the index created take effect on d1 ? 2. If a user

Re: [DISCUSS] hudi index improve

2022-04-18 Thread Y Ethan Guo
+1 it would be great to make Hudi's index support all query engines. Given that we already have multi-modal index (column stats index, bloom filter index) in metadata table and there is a proposal to have a metastore server, is the ultimate goal to serve the index from metastore leveraging

Re:[DISCUSS] hudi index improve

2022-04-18 Thread wangxianghu
+1 on index improvement index optimization is a very valuable thing for hudi Looking forward to the design doc At 2022-04-18 11:18:35, "Forward Xu" wrote: >Hi All, > >I want to improve hudi‘s index. There are four main steps to achieve this > >1. Implement index syntax >a. Implement

Re: [DISCUSS] hudi index improve

2022-04-18 Thread Shiyan Xu
+1 great initiative. Please also support Trino. Todd Gao is working on Trino/Presto native connectors. We should align the plan going from there. Looking forward to the RFC. On Mon, Apr 18, 2022 at 11:41 AM 孟涛 wrote: > +1 , it will be a great feature for hudi > index is very import to boost