Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Alena Rybakina Mon, 31 Jul 2023 09:25:16 -0700

Hi, all!

  CNF -> DNF conversion
=====================


Like many great papers, the MDAM paper takes one core idea, and finds
ways to leverage it to the hilt. Here the core idea is to take
predicates in conjunctive normal form (an "AND of ORs"), and convert
them into disjunctive normal form (an "OR of ANDs"). DNF quals are
logically equivalent to CNF quals, but ideally suited to SAOP-array
style processing by an ordered B-Tree index scan -- they reduce
everything to a series of non-overlapping primitive index scans, that
can be processed in keyspace order. We already do this today in the
case of SAOPs, in effect. The nbtree "next array keys" state machine
already materializes values that can be seen as MDAM style DNF single
value predicates. The state machine works by outputting the cartesian
product of each array as a multi-column index is scanned, but that
could be taken a lot further in the future. We can use essentially the
same kind of state machine to do everything described in the paper --
ultimately, it just needs to output a list of disjuncts, like the DNF
clauses that the paper shows in "Table 3".

In theory, anything can be supported via a sufficiently complete CNF
-> DNF conversion framework. There will likely always be the potential
for unsafe/unsupported clauses and/or types in an extensible system
like Postgres, though. So we will probably need to retain some notion
of safety. It seems like more of a job for nbtree preprocessing (or
some suitably index-AM-agnostic version of the same idea) than the
optimizer, in any case. But that's not entirely true, either (that
would be far too easy).

The optimizer still needs to optimize. It can't very well do that
without having some kind of advanced notice of what is and is not
supported by the index AM. And, the index AM cannot just unilaterally
decide that index quals actually should be treated as filter/qpquals,
after all -- it doesn't get a veto. So there is a mutual dependency
that needs to be resolved. I suspect that there needs to be a two way
conversation between the optimizer and nbtree code to break the
dependency -- a callback that does some of the preprocessing work
during planning. Tom said something along the same lines in passing,
when discussing the MDAM paper last year [2]. Much work remains here.

Honestly, I'm just reading and delving into this thread and other topicsrelated to it, so excuse me if I ask you a few obvious questions.

I noticed that you are going to add CNF->DNF transformation at the indexconstruction stage. If I understand correctly, you will rewriterestrictinfo node,change boolean "AND" expressions to "OR" expressions, but would it bepossible to apply such a procedure earlier? Otherwise I suppose youcould face the problem ofincorrect selectivity of the calculation and, consequently, thecardinality calculation?I can't clearly understand at what stage it is clear that the such atransformation needs to be applied?


--
Regards,
Alena Rybakina
Postgres Professional

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Reply via email to