Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-18 Thread Dmitriy Sarafannikov

> It seems related to this thread? :
> https://www.postgresql.org/message-id/flat/5037A9C5.4030701%40optionshouse.com#5037a9c5.4030...@optionshouse.com
> 
> And this wiki page : https://wiki.postgresql.org/wiki/Loose_indexscan

Yep. Now i can see 2 use cases for this feature:
1. DISTINCT queries.
2. Effectively scanning multicolumn index if first column is omitted and has 
low cardinality 


> Not an answer to your question, but generally +1 for working on this
> area.  I did some early proto-hacking a while ago, which I haven't had
> time to get back to yet:
> 
> https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com
>  
> 
> 
> That was based on the idea that a DISTINCT scan using a btree index to
> skip ahead is always going to be using the leading N columns and
> always going to be covered by the index, so I might as well teach
> Index Only Scan how to do it directly rather than making a new node to
> sit on top.  As you can see in that thread I did start thinking about
> using a new node to sit on top and behave a bit like a nested loop for
> the more advanced feature of an Index Skip Scan (trying every value of
> (a) where you had an index on (a, b, c) but had a WHERE clause qual on
> (b, c)), but the only feedback I had so far was from Robert Haas who
> thought that too should probably be pushed into the index scan.

Thank you for information, i will look at this thread.

> FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
> family of related features (DISTINCT being the easiest to build).  It
> seems more descriptive than the MySQL term.  (DB2 added this a little
> while ago and went with 'index jump scan', suggesting that we should
> consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
> English idiom meaning a short distance)).

Maybe skip would be better, but there will be no problems with something like 
patents?
I mean database which name beginning with big letter «O»? As i know, it has 
Index Skip Scan.




Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-18 Thread Adrien Nayrat
On 09/17/2017 07:43 PM, Dmitriy Sarafannikov wrote:
[...]

It seems related to this thread? :
https://www.postgresql.org/message-id/flat/5037A9C5.4030701%40optionshouse.com#5037a9c5.4030...@optionshouse.com

And this wiki page : https://wiki.postgresql.org/wiki/Loose_indexscan


Regards,


-- 
Adrien NAYRAT

http://dalibo.com - http://dalibo.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-17 Thread Thomas Munro
On Mon, Sep 18, 2017 at 5:43 AM, Dmitriy Sarafannikov
 wrote:
> Hi hackers,
>
> Everybody knows, that we have unefficient execution of query like "SELECT
> DISTINCT id from mytable"
> if table has many-many rows and only several unique id values. Query plan
> looks like Unique + IndexScan.
>
> I have tried to implement this feature in new type of node called Loose
> Scan.
> This node must appears in plan together with IndexScan or IndexOnlyScan just
> like Unique node in this case.
> But instead of filtering rows with equal values, LooseScan must retreive
> first row from IndexScan,
> then remember and return this. With all subsequent calls LooseScan must
> initiate calling index_rescan via ExecReScan
> with search value that > or < (depending on scan direction) of previous
> value.
> Total cost of this path must be something like total_cost =
> n_distinct_values * subpath->startup_cost
> What do you think about this idea?
>
> I was able to create new LooseScan node, but i ran into difficulties with
> changing scan keys.
> I looked (for example) on the ExecReScanIndexOnlyScan function and I find it
> difficult to understand
> how i can reach the ioss_ScanKeys through the state of executor. Can you
> help me with this?
>
> I also looked on the Nested Loop node, which as i think must change scan
> keys, but i didn't become clear for me.
> The only thought that came to my head, that maybe i incorrectly create this
> subplan.
> I create it just like create_upper_unique_plan, and create subplan for
> IndexScan via create_plan_recurse.
> Can you tell me where to look or maybe somewhere there are examples?

Not an answer to your question, but generally +1 for working on this
area.  I did some early proto-hacking a while ago, which I haven't had
time to get back to yet:

https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

That was based on the idea that a DISTINCT scan using a btree index to
skip ahead is always going to be using the leading N columns and
always going to be covered by the index, so I might as well teach
Index Only Scan how to do it directly rather than making a new node to
sit on top.  As you can see in that thread I did start thinking about
using a new node to sit on top and behave a bit like a nested loop for
the more advanced feature of an Index Skip Scan (trying every value of
(a) where you had an index on (a, b, c) but had a WHERE clause qual on
(b, c)), but the only feedback I had so far was from Robert Haas who
thought that too should probably be pushed into the index scan.

FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
family of related features (DISTINCT being the easiest to build).  It
seems more descriptive than the MySQL term.  (DB2 added this a little
while ago and went with 'index jump scan', suggesting that we should
consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
English idiom meaning a short distance)).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers