Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-29 Thread David Steele

On 3/25/16 11:13 AM, David Steele wrote:


Time is growing short and there seem to be some serious concerns with
this patch.  Can you provide a new patch soon?  If not, I think it might
be be time to mark this "returned with feedback".


I have marked this patch "returned with feedback".  Please feel free to 
resubmit for 9.7!


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-25 Thread David Steele

Hi Teador,

On 3/19/16 8:44 PM, Tomas Vondra wrote:


Sadly the v4 does not work for me - I do get assertion failures.


Time is growing short and there seem to be some serious concerns with 
this patch.  Can you provide a new patch soon?  If not, I think it might 
be be time to mark this "returned with feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-19 Thread Tomas Vondra

Hi Teodor,

Sadly the v4 does not work for me - I do get assertion failures. For 
example with the example Andreas Karlsson posted in this thread:


CREATE EXTENSION btree_gin;
CREATE TABLE test (a int, b int, c int);
CREATE INDEX ON test USING gin (a, b, c);
INSERT INTO test SELECT i % 7, i % 9, i % 11 FROM generate_series(1, 
100) i;

EXPLAIN ANALYZE SELECT * FROM test WHERE (a = 3 OR b = 5) AND c = 2;

It seems working, but only until I run ANALYZE on the table. Once I do 
that, I start getting crashes at this line


*qualcols = list_concat(*qualcols,
list_copy(idx_path->indexqualcols));

in convert_bitmap_path_to_index_clause. Apparently one of the lists is 
T_List while the other one is T_IntList, so list_concat() errors out.


My guess is that the T_BitmapOrPath branch should do

oredqualcols = list_concat(oredqualcols, li_qualcols);
...
*qualcols = list_concat(qualcols, oredqualcols);

instead of

oredqualcols = lappend(oredqualcols, li_qualcols);
...
*qualcols = lappend(*qualcols, oredqualcols);

but once I fixed that I got some other assert failures further down, 
that I haven't tried to fix.


So the patch seems to be broken, and I suspect this might be related to 
the broken index condition reported by Andreas (although I don't see 
that - I either see correct condition or assertion failures).



On 03/17/2016 06:19 PM, Teodor Sigaev wrote:
...


7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
  generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't.
amgetbitmap
  interface is simpler. Anyway, I can add an option for generate_or_bitmap
  to use any index, but, in current state it will just repeat all work.


I agree that the code should not be duplicated, but is this really a 
good solution. Perhaps a refactoring that'd allow sharing most of the 
code would be more appropriate.




But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

>

I'd like to add separate enable_indexorscan


That may be useful, but why shouldn't enable_indexscan=off also disable 
indexorscan? I would find it rather surprising if after setting 
enable_indexscan=off I'd still get index scans for OR-clauses.





That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the  enable_indexscan=off will be included multiple
times.


... and it does not address this at all.

I really doubt a costing derived from the bitmap index scan nodes will 
make much sense - you essentially need to revert unknown parts of the 
costing to only include building the bitmap once, etc.




regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-19 Thread Teodor Sigaev

I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*

Not yet, but will


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

I splitted to two:
1 0001-idx_or_core - only planner and executor changes
2 0002-idx_or_indexes - BRIN/GIN/GiST changes with tests

I don't think that splitting of second patch adds readability but increase 
management diffculties, but if your insist I will split.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

added


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


0001-idx_or_core-v4.patch.gz
Description: GNU Zip compressed data


0002-idx_or_indexes-v4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-18 Thread Teodor Sigaev




I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*


Oops, will add shortly.


The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

Ok, will do it.


1) fields in BrinOpaque are not following the naming convention (all the
existing fields start with bo_)

fixed



2) there's plenty of places violating the usual code style (e.g. for
single-command if branches) - not a big deal for WIP patch, but needs to
get fixed eventually

hope, fixed



3) I wonder whether we really need both SK_OR and SK_AND, considering
they are mutually exclusive. Why not to assume SK_AND by default, and
only use SK_OR? If we really need them, perhaps an assert making sure
they are not set at the same time would be appropriate.

In short: possible ambiguity and increasing stack machine complexity.
Let we have follow expression in reversed polish notation (letters represent a 
condtion, | - OR, & - AND logical operation, ANDs are omitted):

a b c |

Is it ((a & b)| c) or (a & (b | c)) ?

Also, using both SK_ makes code more readable.



4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

Will add soon



5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
initialized when (!leftFinished && !rightFinished), but then gets used
when either part of the condition evaluates to true. Probably should be

 if (!leftFinished || !rightFinished)
 cmp = ...

fixed



6) the code in nodeIndexscan.c should not include call to abort()

 {
 abort();
 elog(ERROR, "unsupported indexqual type: %d",
 (int) nodeTag(clause));
 }

fixed, just forgot to remove



7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

I based on following thoughts:
1 code which tries to find OR-index path will be very similar to existing
  generate_or_bitmap code. Obviously, it should not be duplicated.
2 all existsing indexes have amgetbitmap method, only a few don't. amgetbitmap
  interface is simpler. Anyway, I can add an option for generate_or_bitmap
  to use any index, but, in current state it will just repeat all work.



But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

I'd like to add separate enable_indexorscan


That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the  enable_indexscan=off will be included multiple
times.

9) This already breaks estimation for some reason. Consider this

...

So the OR-clause is estimated to match 0 rows, less than each of the
clauses independently. Needless to say that without the patch this works
just fine.

fixed



10) Also, this already breaks some regression tests, apparently because
it changes how 'width' is computed.

fixed too



So I think this way of building the index path from a BitmapOr path is
pretty much a dead-end.
I don't think so because separate code path to support OR-clause in index will 
significanlty duplicate BitmapOr generator.


Will send next version as soon as possible. Thank you for your attention!

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-18 Thread Andreas Karlsson

I gave this patch a quick spin and noticed a strange query plan.

CREATE TABLE test (a int, b int, c int);
CREATE INDEX ON test USING gin (a, b, c);
INSERT INTO test SELECT i % 7, i % 9, i % 11 FROM generate_series(1, 
100) i;

EXPLAIN ANALYZE SELECT * FROM test WHERE (a = 3 OR b = 5) AND c = 2;

QUERY PLAN 


--
 Bitmap Heap Scan on test  (cost=829.45..4892.10 rows=21819 width=12) 
(actual time=66.494..76.234 rows=21645 loops=1)
   Recheck Cond: a = 3) AND (c = 2)) OR ((b = 5) AND (c = 2))) AND 
(c = 2))

   Heap Blocks: exact=5406
   ->  Bitmap Index Scan on test_a_b_c_idx  (cost=0.00..824.00 
rows=2100 width=0) (actual time=65.272..65.272 rows=21645 loops=1)
 Index Cond: a = 3) AND (c = 2)) OR ((b = 5) AND (c = 2))) 
AND (c = 2))

 Planning time: 0.200 ms
 Execution time: 77.206 ms
(7 rows)

Shouldn't the index condition just be "((a = 3) AND (c = 2)) OR ((b = 5) 
AND (c = 2))"?


Also when applying and reading the patch I noticed some minor 
issues/nitpick.


- I get whitespace warnings from git apply when I apply the patches.
- You have any insconstent style for casts: I think "(Node*)clause" 
should be "(Node *) clause".

- Same with pointers. "List* quals" should be "List *quals"
- I am personally not a fan of seeing the "isorderby == false && 
index->rd_amroutine->amcanorclause" clause twice. Feels like a risk for 
diverging code paths. But it could be that there is no clean alternative.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-03-10 Thread Tomas Vondra
Hi Teodor,


I've looked into v2 of the patch you sent a few days ago. Firstly, I
definitely agree that being able to use OR conditions with an index is
definitely a cool idea.

I do however agree with David that the patch would definitely benefit
from comments documenting various bits that are less obvious to mere
mortals like me, with limited knowledge of the index internals.

I also wonder whether the patch should add explanation of OR-clauses
handling into the READMEs in src/backend/access/*

The patch would probably benefit from transforming it into a patch
series - one patch for the infrastructure shared by all the indexes,
then one patch per index type. That should make it easier to review, and
I seriously doubt we'd want to commit this in one huge chunk anyway.

Now, some review comments from eyeballing the patch. Some of those are
nitpicking, but well ...

1) fields in BrinOpaque are not following the naming convention (all the
existing fields start with bo_)

2) there's plenty of places violating the usual code style (e.g. for
single-command if branches) - not a big deal for WIP patch, but needs to
get fixed eventually

3) I wonder whether we really need both SK_OR and SK_AND, considering
they are mutually exclusive. Why not to assume SK_AND by default, and
only use SK_OR? If we really need them, perhaps an assert making sure
they are not set at the same time would be appropriate.

4) scanGetItem is a prime example of the "badly needs comments" issue,
particularly because the previous version of the function actually had
quite a lot of them while the new function has none.

5) scanGetItem() may end up using uninitialized 'cmp' - it only gets
initialized when (!leftFinished && !rightFinished), but then gets used
when either part of the condition evaluates to true. Probably should be

if (!leftFinished || !rightFinished)
cmp = ...

6) the code in nodeIndexscan.c should not include call to abort()

{
abort();
elog(ERROR, "unsupported indexqual type: %d",
(int) nodeTag(clause));
}

7) I find it rather ugly that the paths are built by converting BitmapOr
paths. Firstly, it means indexes without amgetbitmap can't benefit from
this change. Maybe that's reasonable limitation, though?

But more importantly, this design already has a bunch of unintended
consequences. For example, the current code completely ignores
enable_indexscan setting, because it merely copies the costs from the
bitmap path.

SET enable_indexscan = off;
EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1] OR c && ARRAY[2]);

 QUERY PLAN
---
 Index Scan using t_c_idx on t  (cost=0.00..4.29 rows=0 width=33)
   Index Cond: ((c && '{1}'::integer[]) OR (c && '{2}'::integer[]))
(2 rows)

That's pretty dubious, I guess. So this code probably needs to be made
aware of enable_indexscan - right now it entirely ignores startup_cost
in convert_bitmap_path_to_index_clause(). But of course if there are
multiple IndexPaths, the  enable_indexscan=off will be included multiple
times.

9) This already breaks estimation for some reason. Consider this
example, using a table with int[] column, with gist index built using
intarray:

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1,2,3,4,5,6,7]);

   QUERY PLAN

 Index Scan using t_c_idx on t  (cost=0.28..52.48 rows=12 width=33)
   Index Cond: (c && '{1,2,3,4,5,6,7}'::integer[])
(2 rows)

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[8,9,10,11,12,13,14]);

   QUERY PLAN

 Index Scan using t_c_idx on t  (cost=0.28..44.45 rows=10 width=33)
   Index Cond: (c && '{8,9,10,11,12,13,14}'::integer[])
(2 rows)

EXPLAIN SELECT * FROM t WHERE (c && ARRAY[1,2,3,4,5,6,7])
   OR (c && ARRAY[8,9,10,11,12,13,14]);

   QUERY PLAN

 Index Scan using t_c_idx on t  (cost=0.00..4.37 rows=0 width=33)
   Index Cond: ((c && '{1,2,3,4,5,6,7}'::integer[])
 OR (c && '{8,9,10,11,12,13,14}'::integer[]))
(2 rows)

So the OR-clause is estimated to match 0 rows, less than each of the
clauses independently. Needless to say that without the patch this works
just fine.

10) Also, this already breaks some regression tests, apparently because
it changes how 'width' is computed.

So I think this way of building the index path from a BitmapOr path is
pretty much a dead-end.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-02-29 Thread Teodor Sigaev

Thank you for review!


I'd like to see comments too! but more so in the code. :) I've had a look over
this, and it seems like a great area in which we could improve on, and your
reported performance improvements are certainly very interesting too. However
I'm finding the code rather hard to follow, which might be a combination of my
lack of familiarity with the index code, but more likely it's the lack of

I've added comments, fixed a found bugs.



comments to explain what's going on. Let's just take 1 function as an example:

Here there's not a single comment, so I'm just going to try to work out what's
going on based on the code.

+static void
+compileScanKeys(IndexScanDesc scan)
+{
+GISTScanOpaqueso = (GISTScanOpaque) scan->opaque;
+int*stack,
+stackPos = -1,
+i;
+
+if (scan->numberOfKeys <= 1 || so->useExec == false)
+return;
+
+Assert(scan->numberOfKeys >=3);

Why can numberOfKeys never be 2? I looked at what calls this and I can't really
Because here they are actually an expression, expression could contain 1 or tree 
or more nodes but could not two (operation AND/OR plus two arguments)



work it out. I'm really also not sure what useExec means as there's no comment
fixed. If useExec == false then SkanKeys are implicitly ANDed and stored in just 
array.



in that struct member, and what if numberOfKeys == 1 and useExec == false, won't
this Assert() fail? If that's not a possible situation then why not?

fixed





+ScanKey key = scan->keyData + i;
Is there a reason not to use keyData[i]; ?

That's the same ScanKey key = >keyData[i];
I prefer first form as more clear but I could be wrong - but there are other 
places in code where pointer arithmetic is used.



+if (stackPos >= 0 && (key->sk_flags & (SK_OR | SK_AND)))
+{
+Assert(stackPos >= 1 && stackPos < scan->numberOfKeys);
stackPos >= 1? This seems unnecessary and confusing as the if test surely makes
that impossible.




+
+so->leftArgs[i] = stack[stackPos - 1];
Something is broken here as stackPos can be 0 (going by the if() not the
Assert()), therefore that's stack[-1].

fixed


stackPos is initialised to -1, so this appears to always skip the first element
of the keyData array. If that's really the intention, then wouldn't it be better
to just make the initial condition of the for() look i = 1 ?

done


I'd like to review more, but it feels like a job that's more difficult than it
needs to be due to lack of comments.

Would it be possible to update the patch to try and explain things a little 
better?

Hope, I made cleaner..


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


index_or-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-01-28 Thread Alvaro Herrera
I think this is very exciting stuff, but since you didn't submit an
updated patch after David's review, I'm closing it for now as
returned-with-feedback.  Please submit a new version once you have it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2016-01-10 Thread David Rowley
On 27 December 2015 at 07:04, Teodor Sigaev  wrote:

> I'd like to present OR-clause support for indexes. Although OR-clauses
> could be supported by bitmapOR index scan it isn't very effective and such
> scan lost any order existing in index. We (with Alexander Korotkov)
> presented results on Vienna's conference this year. In short, it provides
> performance improvement:
>
> EXPLAIN ANALYZE
> SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000;
> me=0.080..0.267 rows=173 loops=1)
>  Recheck Cond: ((id = 5) OR (id = 500) OR (id = 5000))
>  Heap Blocks: exact=172
>  ->  Bitmap Index Scan on idx_gin  (cost=0.00..57.50 rows=15000
> width=0) (actual time=0.059..0.059 rows=147 loops=1)
>Index Cond: ((id = 5) OR (id = 500) OR (id = 5000))
>  Planning time: 0.077 ms
>  Execution time: 0.308 ms   <---
> QUERY PLAN
>
> ---
>  Aggregate  (cost=51180.53..51180.54 rows=1 width=0) (actual
> time=796.766..796.766 rows=1 loops=1)
>->  Index Only Scan using idx_btree on tst  (cost=0.42..51180.40
> rows=55 width=0) (actual time=0.444..796.736 rows=173 loops=1)
>  Filter: ((id = 5) OR (id = 500) OR (id = 5000))
>  Rows Removed by Filter: 999829
>  Heap Fetches: 102
>  Planning time: 0.087 ms
>  Execution time: 796.798 ms  <--
> QUERY PLAN
>
> -
>  Aggregate  (cost=21925.63..21925.64 rows=1 width=0) (actual
> time=160.412..160.412 rows=1 loops=1)
>->  Seq Scan on tst  (cost=0.00..21925.03 rows=237 width=0) (actual
> time=0.535..160.362 rows=175 loops=1)
>  Filter: ((id = 5) OR (id = 500) OR (id = 5000))
>  Rows Removed by Filter: 999827
>  Planning time: 0.459 ms
>  Execution time: 160.451 ms
>
>
> It also could work together with KNN feature of GiST and in this case
> performance improvement could be up to several orders of magnitude, in
> artificial example it was 37000 times faster.
>
> Not all  indexes can support oR-clause, patch adds support to GIN, GiST
> and BRIN indexes. pg_am table is extended for adding amcanorclause column
> which indicates possibility of executing of OR-clause by index.
>
>  indexqual and indexqualorig doesn't contain implicitly-ANDed list of
> index qual expressions, now that lists could contain OR RestrictionInfo.
> Actually, the patch just tries to convert BitmapOr node to IndexScan or
> IndexOnlyScan. Thats significantly simplifies logic to find possible
> clause's list for index.
> Index always gets a array of ScanKey but for indexes which support
> OR-clauses
> array  of ScanKey is actually exection tree in reversed polish notation
> form. Transformation is done in ExecInitIndexScan().
>
> The problems on the way which I see for now:
> 1 Calculating cost. Right now it's just a simple transformation of costs
> computed for BitmapOr path. I'd like to hope that's possible and so index's
> estimation function could be non-touched. So, they could believe that all
> clauses are implicitly-ANDed
> 2 I'd like to add such support to btree but it seems that it should be a
> separated patch. Btree search algorithm doesn't use any kind of stack of
> pages and algorithm to walk over btree doesn't clear for me for now.
> 3 I could miss some places which still assumes  implicitly-ANDed list of
> clauses although regression tests passes fine.
>
> Hope, hackers will not have an strong objections to do that. But obviously
> patch
> requires further work and I'd like to see comments, suggestions and
> recommendations. Thank you.


Hi,

I'd like to see comments too! but more so in the code. :) I've had a look
over this, and it seems like a great area in which we could improve on, and
your reported performance improvements are certainly very interesting too.
However I'm finding the code rather hard to follow, which might be a
combination of my lack of familiarity with the index code, but more likely
it's the lack of comments to explain what's going on. Let's just take 1
function as an example:

Here there's not a single comment, so I'm just going to try to work out
what's going on based on the code.

+static void
+compileScanKeys(IndexScanDesc scan)
+{
+ GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
+ int *stack,
+ stackPos = -1,
+ i;
+
+ if (scan->numberOfKeys <= 1 || so->useExec == false)
+ return;
+
+ Assert(scan->numberOfKeys >=3);

Why can numberOfKeys never be 2? I looked at what calls this and I can't
really work it out. I'm really also not sure what useExec means as there's
no comment in that struct member, and what if numberOfKeys == 1 and useExec
== false, won't this Assert() fail? If that's not a possible situation then
why not?

Re: [HACKERS] POC, WIP: OR-clause support for indexes

2015-12-26 Thread Teodor Sigaev

This is great.   I got a question, is it possible make btree index to support OR
as well?  Is btree supports more invasive, in the sense that we need to do
enhance ScanKey to supports an array of values?
Btree now works by follow: find the max/min tuple which satisfies condtions and 
then executes forward/backward scan over leaf pages. For complicated clauses 
it's not obvious how to find min/max tuple. Scanning whole index isn't an option 
from preformance point of view.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC, WIP: OR-clause support for indexes

2015-12-26 Thread Feng Tian
Hi, Teodor,

This is great.   I got a question, is it possible make btree index to
support OR as well?  Is btree supports more invasive, in the sense that we
need to do enhance ScanKey to supports an array of values?

Thanks,
Feng

On Sat, Dec 26, 2015 at 10:04 AM, Teodor Sigaev  wrote:

> I'd like to present OR-clause support for indexes. Although OR-clauses
> could be supported by bitmapOR index scan it isn't very effective and such
> scan lost any order existing in index. We (with Alexander Korotkov)
> presented results on Vienna's conference this year. In short, it provides
> performance improvement:
>
> EXPLAIN ANALYZE
> SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000;
> me=0.080..0.267 rows=173 loops=1)
>  Recheck Cond: ((id = 5) OR (id = 500) OR (id = 5000))
>  Heap Blocks: exact=172
>  ->  Bitmap Index Scan on idx_gin  (cost=0.00..57.50 rows=15000
> width=0) (actual time=0.059..0.059 rows=147 loops=1)
>Index Cond: ((id = 5) OR (id = 500) OR (id = 5000))
>  Planning time: 0.077 ms
>  Execution time: 0.308 ms   <---
> QUERY PLAN
>
> ---
>  Aggregate  (cost=51180.53..51180.54 rows=1 width=0) (actual
> time=796.766..796.766 rows=1 loops=1)
>->  Index Only Scan using idx_btree on tst  (cost=0.42..51180.40
> rows=55 width=0) (actual time=0.444..796.736 rows=173 loops=1)
>  Filter: ((id = 5) OR (id = 500) OR (id = 5000))
>  Rows Removed by Filter: 999829
>  Heap Fetches: 102
>  Planning time: 0.087 ms
>  Execution time: 796.798 ms  <--
> QUERY PLAN
>
> -
>  Aggregate  (cost=21925.63..21925.64 rows=1 width=0) (actual
> time=160.412..160.412 rows=1 loops=1)
>->  Seq Scan on tst  (cost=0.00..21925.03 rows=237 width=0) (actual
> time=0.535..160.362 rows=175 loops=1)
>  Filter: ((id = 5) OR (id = 500) OR (id = 5000))
>  Rows Removed by Filter: 999827
>  Planning time: 0.459 ms
>  Execution time: 160.451 ms
>
>
> It also could work together with KNN feature of GiST and in this case
> performance improvement could be up to several orders of magnitude, in
> artificial example it was 37000 times faster.
>
> Not all  indexes can support oR-clause, patch adds support to GIN, GiST
> and BRIN indexes. pg_am table is extended for adding amcanorclause column
> which indicates possibility of executing of OR-clause by index.
>
>  indexqual and indexqualorig doesn't contain implicitly-ANDed list of
> index qual expressions, now that lists could contain OR RestrictionInfo.
> Actually, the patch just tries to convert BitmapOr node to IndexScan or
> IndexOnlyScan. Thats significantly simplifies logic to find possible
> clause's list for index.
> Index always gets a array of ScanKey but for indexes which support
> OR-clauses
> array  of ScanKey is actually exection tree in reversed polish notation
> form. Transformation is done in ExecInitIndexScan().
>
> The problems on the way which I see for now:
> 1 Calculating cost. Right now it's just a simple transformation of costs
> computed for BitmapOr path. I'd like to hope that's possible and so index's
> estimation function could be non-touched. So, they could believe that all
> clauses are implicitly-ANDed
> 2 I'd like to add such support to btree but it seems that it should be a
> separated patch. Btree search algorithm doesn't use any kind of stack of
> pages and algorithm to walk over btree doesn't clear for me for now.
> 3 I could miss some places which still assumes  implicitly-ANDed list of
> clauses although regression tests passes fine.
>
> Hope, hackers will not have an strong objections to do that. But obviously
> patch
> requires further work and I'd like to see comments, suggestions and
> recommendations. Thank you.
>
>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>