On 13/2/2024 07:00, jian he wrote:
+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;

I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be enough to dive into details.

if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner cases of merging such expressions.


set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.
The reason here - parallel workers.
If you see into the plan you will find parallel workers without optimization and absence of them in the case of optimization:

Gather  (cost=1000.00..28685.37 rows=87037 width=12)
        (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))

Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
                  (actual rows=90363 loops=1)
   Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely) without transformation and 90363 (more precisely) with the transformation. But if you play with parallel_tuple_cost and parallel_setup_cost, you will end up having these parallel workers:

 Gather  (cost=0.12..11691.03 rows=90600 width=12)
         (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
         Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
         Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an issue of parallel workers, not this specific optimization.

--
regards,
Andrei Lepikhov
Postgres Professional



Reply via email to