Hi Haisheng,
> If I didn't get it wrong, joining with scan_A means it has to read all the
> tuples from table A, this will make index meaningless, because the purpose of
> using index in that example is to avoid reading all the tuples from table A.
I meant 'rowId' as a physical address (ctid
> using this approach, bitmap indexes may be represented as set operations
> (union, intersect) over idx_a and idx_b followed by joining with scan_A
> using 'rowId'.
If I didn't get it wrong, joining with scan_A means it has to read all the
tuples from table A, this will make index meaningless,
Vladimir, Haisheng,
thank you for sharing your thoughts. And special thanks to Haisheng for
awesome examples.
I found Julian's reasoning about representing bitmap indexes as joins
very deep and interesting. As I understand this idea right, for table
A(a,b,c) indexes over columns, say, 'a' and
Vladimir,
I feel the same way. MVs are more powerful and general, and with some effort
they could be just as efficient as other approaches.
One problem that needs to be solved is the “registration problem”. If you have
a lot of MVs they all have to be registered in the planner’s search space,
Hi Roman,
To me, there is no principal difference between materialized views and
rule-based approaches - they are both "rules". In the materialized views
approach the rule is to create all alternative access paths
unconditionally, this rule is always fired first during the optimization
process. A
Hi Roman,
There are some potential advantages.
Let's still use this as example.
SELECT * FROM foo WHERE a > 100 or b < 1000;
Both column a and b have B-Tree indexes, note that it doesn't have to be bitmap
index.
Bitmap Table Scan on foo
-> BitmapOr
-> Bitmap Index Scan on
I'm pleased there are a variety of approaches. People should use
whichever works for them and their use case.
The so-called "rule-based" approach is definitely useful for
OLTP-style queries where you are accessing a few rows and you need to
plan quickly.
Haisheng mentioned bitmap indexes a while
Hi Xiening,
the example was synthetic. What is still not clear for me is how to
exploit index sortedness with a rule based approach. As far as I
understand with this approach we need to write complex rules (for
example [1]) that should decide which index is useful and which is not
useful. These
Hi Roman,
The example you mentioned is an advanced scenario. Note that there are
different types of index, such as clustered index, secondary index, covered and
non-covered index. In your case, typical OLTP/OLAP optimizer would create an
index-based join on top of the range table scan (or
Hi Roman,
This heavily depends on the architecture of the planner. In Hazelcast we
have separate logical and physical phases. The goal of logical phase is
normalization of a relational tree. In this case your example is converted
to:
LogicalJoin
LogicalConstrainedScan(A, c>100)
Hi Vladimir,
thank you for sharing your point. Could you please clarify some details
with a rulse-based index selection? You said
> the fundamental problem with "indexes as materialized
> views" approach is that you have to register them beforehand, instead of
> using them only when needed.
I
As already mentioned, the fundamental problem with "indexes as materialized
views" approach is that you have to register them beforehand, instead of
using them only when needed. On the other hand, the complexity of index
planning comes from cost estimation and predicate splitting.
Materializations
Hi Tim,
I am working on MySQL InnoDB adapter and trying to introduce this to
Calcite, currently it is only in early stage, and not approved/reviewed by
committers yet. Anyway, we are facing the same problem like what index to
use, how to push down order by operation, etc. I have developed a
Hi Roman,
Thank you for sharing your thoughts.
> It can be very tricky because the rule should consider not
> only filters, but also collations. This leads to increasing the
> complexity of such rules.
Logical transformation rules like FilterTableScan2IndexTableScanRule should not
consider
Hi Haisheng,
The basic rationale behind the using materialized views for secondary
index representation instead of special rules like mentioned
FilterTableScan2IndexTableScanRule is the simplicity of implementation.
You are absolutely right that materialized views approach has an obvious
Thanks Julian and Roman for sharing the experiences for modeling indexes.
Besides using materialized views, which is already proven by Phoenix and
Ignite, there is another approach, as mentioned by Vladimir, define your own
rules and indexscan operators.
FilterTableScan2IndexScanRule and its
Calcite does support table hint now, it's syntax is Oracle style[1], i saw that
many engines support a INDEX hint to force a index scan on table[2] [3], maybe
you can have a try also.
[1] https://calcite.apache.org/docs/reference.html#sql-hints
[2]
Materialized views are not a hack, as Vladimir claims. Materialized views are a
fundamental concept in relational algebra, and they are an elegant way - in my
opinion the correct way - to model indexes (and many other structures).
In Calcite materialized views are a feature in the planner that
Hi Tim,
In Apache Ignite we've already faced this challenge. We solved it using
materialized views and FilterableTable. Let's consider your example:
> select * from users where country='UK' and some_other_column='foo';
with a primary index and a sorted secondary index (B+Tree?) over the
Hi,
Products that use Apache Calcite typically implement index handling on
their own, since Apache Calcite has only limited support for physical
optimization. You may implement your own index scan operator and rules that
use this operator. For example, take a look how index planning is done in
Hi,
I'm building a query engine with Calcite - really enjoying working with
Calcite so far!
When creating a plan, it seems Calcite always creates a plan where the
sources are table scans, however in my implementation the tables can have
indexes on them so a table scan is not always the right
21 matches
Mail list logo