Re: Using indexes rather than table scans with Calcite

2020-06-03 Thread Roman Kondakov
Hi Haisheng, > If I didn't get it wrong, joining with scan_A means it has to read all the > tuples from table A, this will make index meaningless, because the purpose of > using index in that example is to avoid reading all the tuples from table A. I meant 'rowId' as a physical address (ctid

Re: Using indexes rather than table scans with Calcite

2020-06-02 Thread Haisheng Yuan
> using this approach, bitmap indexes may be represented as set operations > (union, intersect) over idx_a and idx_b followed by joining with scan_A > using 'rowId'. If I didn't get it wrong, joining with scan_A means it has to read all the tuples from table A, this will make index meaningless,

Re: Using indexes rather than table scans with Calcite

2020-06-02 Thread Roman Kondakov
Vladimir, Haisheng, thank you for sharing your thoughts. And special thanks to Haisheng for awesome examples. I found Julian's reasoning about representing bitmap indexes as joins very deep and interesting. As I understand this idea right, for table A(a,b,c) indexes over columns, say, 'a' and

Re: Using indexes rather than table scans with Calcite

2020-06-02 Thread Julian Hyde
Vladimir, I feel the same way. MVs are more powerful and general, and with some effort they could be just as efficient as other approaches. One problem that needs to be solved is the “registration problem”. If you have a lot of MVs they all have to be registered in the planner’s search space,

Re: Using indexes rather than table scans with Calcite

2020-06-02 Thread Vladimir Ozerov
Hi Roman, To me, there is no principal difference between materialized views and rule-based approaches - they are both "rules". In the materialized views approach the rule is to create all alternative access paths unconditionally, this rule is always fired first during the optimization process. A

Re: Using indexes rather than table scans with Calcite

2020-06-01 Thread Haisheng Yuan
Hi Roman, There are some potential advantages. Let's still use this as example. SELECT * FROM foo WHERE a > 100 or b < 1000; Both column a and b have B-Tree indexes, note that it doesn't have to be bitmap index. Bitmap Table Scan on foo -> BitmapOr -> Bitmap Index Scan on

Re: Using indexes rather than table scans with Calcite

2020-06-01 Thread Julian Hyde
I'm pleased there are a variety of approaches. People should use whichever works for them and their use case. The so-called "rule-based" approach is definitely useful for OLTP-style queries where you are accessing a few rows and you need to plan quickly. Haisheng mentioned bitmap indexes a while

Re: Using indexes rather than table scans with Calcite

2020-06-01 Thread Roman Kondakov
Hi Xiening, the example was synthetic. What is still not clear for me is how to exploit index sortedness with a rule based approach. As far as I understand with this approach we need to write complex rules (for example [1]) that should decide which index is useful and which is not useful. These

Re: Using indexes rather than table scans with Calcite

2020-06-01 Thread Xiening Dai
Hi Roman, The example you mentioned is an advanced scenario. Note that there are different types of index, such as clustered index, secondary index, covered and non-covered index. In your case, typical OLTP/OLAP optimizer would create an index-based join on top of the range table scan (or

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Vladimir Ozerov
Hi Roman, This heavily depends on the architecture of the planner. In Hazelcast we have separate logical and physical phases. The goal of logical phase is normalization of a relational tree. In this case your example is converted to: LogicalJoin LogicalConstrainedScan(A, c>100)

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Roman Kondakov
Hi Vladimir, thank you for sharing your point. Could you please clarify some details with a rulse-based index selection? You said > the fundamental problem with "indexes as materialized > views" approach is that you have to register them beforehand, instead of > using them only when needed. I

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Vladimir Ozerov
As already mentioned, the fundamental problem with "indexes as materialized views" approach is that you have to register them beforehand, instead of using them only when needed. On the other hand, the complexity of index planning comes from cost estimation and predicate splitting. Materializations

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread xu
Hi Tim, I am working on MySQL InnoDB adapter and trying to introduce this to Calcite, currently it is only in early stage, and not approved/reviewed by committers yet. Anyway, we are facing the same problem like what index to use, how to push down order by operation, etc. I have developed a

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Haisheng Yuan
Hi Roman, Thank you for sharing your thoughts. > It can be very tricky because the rule should consider not > only filters, but also collations. This leads to increasing the > complexity of such rules. Logical transformation rules like FilterTableScan2IndexTableScanRule should not consider

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Roman Kondakov
Hi Haisheng, The basic rationale behind the using materialized views for secondary index representation instead of special rules like mentioned FilterTableScan2IndexTableScanRule is the simplicity of implementation. You are absolutely right that materialized views approach has an obvious

Re: Using indexes rather than table scans with Calcite

2020-05-30 Thread Haisheng Yuan
Thanks Julian and Roman for sharing the experiences for modeling indexes. Besides using materialized views, which is already proven by Phoenix and Ignite, there is another approach, as mentioned by Vladimir, define your own rules and indexscan operators. FilterTableScan2IndexScanRule and its

Re: Using indexes rather than table scans with Calcite

2020-05-29 Thread Danny Chan
Calcite does support table hint now, it's syntax is Oracle style[1], i saw that many engines support a INDEX hint to force a index scan on table[2] [3], maybe you can have a try also. [1] https://calcite.apache.org/docs/reference.html#sql-hints [2]

Re: Using indexes rather than table scans with Calcite

2020-05-29 Thread Julian Hyde
Materialized views are not a hack, as Vladimir claims. Materialized views are a fundamental concept in relational algebra, and they are an elegant way - in my opinion the correct way - to model indexes (and many other structures). In Calcite materialized views are a feature in the planner that

Re: Using indexes rather than table scans with Calcite

2020-05-29 Thread Roman Kondakov
Hi Tim, In Apache Ignite we've already faced this challenge. We solved it using materialized views and FilterableTable. Let's consider your example: > select * from users where country='UK' and some_other_column='foo'; with a primary index and a sorted secondary index (B+Tree?) over the

Re: Using indexes rather than table scans with Calcite

2020-05-29 Thread Vladimir Ozerov
Hi, Products that use Apache Calcite typically implement index handling on their own, since Apache Calcite has only limited support for physical optimization. You may implement your own index scan operator and rules that use this operator. For example, take a look how index planning is done in

Using indexes rather than table scans with Calcite

2020-05-29 Thread Tim Fox
Hi, I'm building a query engine with Calcite - really enjoying working with Calcite so far! When creating a plan, it seems Calcite always creates a plan where the sources are table scans, however in my implementation the tables can have indexes on them so a table scan is not always the right