Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-18 Thread Dmitriy Sarafannikov

> It seems related to this thread? :
> https://www.postgresql.org/message-id/flat/5037A9C5.4030701%40optionshouse.com#5037a9c5.4030...@optionshouse.com
> 
> And this wiki page : https://wiki.postgresql.org/wiki/Loose_indexscan

Yep. Now i can see 2 use cases for this feature:
1. DISTINCT queries.
2. Effectively scanning multicolumn index if first column is omitted and has 
low cardinality 


> Not an answer to your question, but generally +1 for working on this
> area.  I did some early proto-hacking a while ago, which I haven't had
> time to get back to yet:
> 
> https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com
>  
> 
> 
> That was based on the idea that a DISTINCT scan using a btree index to
> skip ahead is always going to be using the leading N columns and
> always going to be covered by the index, so I might as well teach
> Index Only Scan how to do it directly rather than making a new node to
> sit on top.  As you can see in that thread I did start thinking about
> using a new node to sit on top and behave a bit like a nested loop for
> the more advanced feature of an Index Skip Scan (trying every value of
> (a) where you had an index on (a, b, c) but had a WHERE clause qual on
> (b, c)), but the only feedback I had so far was from Robert Haas who
> thought that too should probably be pushed into the index scan.

Thank you for information, i will look at this thread.

> FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
> family of related features (DISTINCT being the easiest to build).  It
> seems more descriptive than the MySQL term.  (DB2 added this a little
> while ago and went with 'index jump scan', suggesting that we should
> consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
> English idiom meaning a short distance)).

Maybe skip would be better, but there will be no problems with something like 
patents?
I mean database which name beginning with big letter «O»? As i know, it has 
Index Skip Scan.




Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-18 Thread Adrien Nayrat
On 09/17/2017 07:43 PM, Dmitriy Sarafannikov wrote:
[...]

It seems related to this thread? :
https://www.postgresql.org/message-id/flat/5037A9C5.4030701%40optionshouse.com#5037a9c5.4030...@optionshouse.com

And this wiki page : https://wiki.postgresql.org/wiki/Loose_indexscan


Regards,


-- 
Adrien NAYRAT

http://dalibo.com - http://dalibo.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-17 Thread Thomas Munro
On Mon, Sep 18, 2017 at 5:43 AM, Dmitriy Sarafannikov
 wrote:
> Hi hackers,
>
> Everybody knows, that we have unefficient execution of query like "SELECT
> DISTINCT id from mytable"
> if table has many-many rows and only several unique id values. Query plan
> looks like Unique + IndexScan.
>
> I have tried to implement this feature in new type of node called Loose
> Scan.
> This node must appears in plan together with IndexScan or IndexOnlyScan just
> like Unique node in this case.
> But instead of filtering rows with equal values, LooseScan must retreive
> first row from IndexScan,
> then remember and return this. With all subsequent calls LooseScan must
> initiate calling index_rescan via ExecReScan
> with search value that > or < (depending on scan direction) of previous
> value.
> Total cost of this path must be something like total_cost =
> n_distinct_values * subpath->startup_cost
> What do you think about this idea?
>
> I was able to create new LooseScan node, but i ran into difficulties with
> changing scan keys.
> I looked (for example) on the ExecReScanIndexOnlyScan function and I find it
> difficult to understand
> how i can reach the ioss_ScanKeys through the state of executor. Can you
> help me with this?
>
> I also looked on the Nested Loop node, which as i think must change scan
> keys, but i didn't become clear for me.
> The only thought that came to my head, that maybe i incorrectly create this
> subplan.
> I create it just like create_upper_unique_plan, and create subplan for
> IndexScan via create_plan_recurse.
> Can you tell me where to look or maybe somewhere there are examples?

Not an answer to your question, but generally +1 for working on this
area.  I did some early proto-hacking a while ago, which I haven't had
time to get back to yet:

https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

That was based on the idea that a DISTINCT scan using a btree index to
skip ahead is always going to be using the leading N columns and
always going to be covered by the index, so I might as well teach
Index Only Scan how to do it directly rather than making a new node to
sit on top.  As you can see in that thread I did start thinking about
using a new node to sit on top and behave a bit like a nested loop for
the more advanced feature of an Index Skip Scan (trying every value of
(a) where you had an index on (a, b, c) but had a WHERE clause qual on
(b, c)), but the only feedback I had so far was from Robert Haas who
thought that too should probably be pushed into the index scan.

FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
family of related features (DISTINCT being the easiest to build).  It
seems more descriptive than the MySQL term.  (DB2 added this a little
while ago and went with 'index jump scan', suggesting that we should
consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
English idiom meaning a short distance)).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improving DISTINCT with LooseScan node

2017-09-17 Thread Dmitriy Sarafannikov
Hi hackers, Everybody knows, that we have unefficient execution of query like "SELECT DISTINCT id from mytable"if table has many-many rows and only several unique id values. Query plan looks like Unique + IndexScan. I have tried to implement this feature in new type of node called Loose Scan.This node must appears in plan together with IndexScan or IndexOnlyScan just like Unique node in this case.But instead of filtering rows with equal values, LooseScan must retreive first row from IndexScan,then remember and return this. With all subsequent calls LooseScan must initiate calling index_rescan via ExecReScanwith search value that > or < (depending on scan direction) of previous value.Total cost of this path must be something like total_cost = n_distinct_values * subpath->startup_costWhat do you think about this idea? I was able to create new LooseScan node, but i ran into difficulties with changing scan keys.I looked (for example) on the ExecReScanIndexOnlyScan function and I find it difficult to understandhow i can reach the ioss_ScanKeys through the state of executor. Can you help me with this? I also looked on the Nested Loop node, which as i think must change scan keys, but i didn't become clear for me.The only thought that came to my head, that maybe i incorrectly create this subplan.I create it just like create_upper_unique_plan, and create subplan for IndexScan via create_plan_recurse.Can you tell me where to look or maybe somewhere there are examples? Thanks Regards,Dmitriy Sarafannikov