On Mon, Sep 18, 2017 at 5:43 AM, Dmitriy Sarafannikov
<dsarafanni...@yandex.ru> wrote:
> Hi hackers,
> Everybody knows, that we have unefficient execution of query like "SELECT
> DISTINCT id from mytable"
> if table has many-many rows and only several unique id values. Query plan
> looks like Unique + IndexScan.
> I have tried to implement this feature in new type of node called Loose
> Scan.
> This node must appears in plan together with IndexScan or IndexOnlyScan just
> like Unique node in this case.
> But instead of filtering rows with equal values, LooseScan must retreive
> first row from IndexScan,
> then remember and return this. With all subsequent calls LooseScan must
> initiate calling index_rescan via ExecReScan
> with search value that > or < (depending on scan direction) of previous
> value.
> Total cost of this path must be something like total_cost =
> n_distinct_values * subpath->startup_cost
> What do you think about this idea?
> I was able to create new LooseScan node, but i ran into difficulties with
> changing scan keys.
> I looked (for example) on the ExecReScanIndexOnlyScan function and I find it
> difficult to understand
> how i can reach the ioss_ScanKeys through the state of executor. Can you
> help me with this?
> I also looked on the Nested Loop node, which as i think must change scan
> keys, but i didn't become clear for me.
> The only thought that came to my head, that maybe i incorrectly create this
> subplan.
> I create it just like create_upper_unique_plan, and create subplan for
> IndexScan via create_plan_recurse.
> Can you tell me where to look or maybe somewhere there are examples?

Not an answer to your question, but generally +1 for working on this
area.  I did some early proto-hacking a while ago, which I haven't had
time to get back to yet:


That was based on the idea that a DISTINCT scan using a btree index to
skip ahead is always going to be using the leading N columns and
always going to be covered by the index, so I might as well teach
Index Only Scan how to do it directly rather than making a new node to
sit on top.  As you can see in that thread I did start thinking about
using a new node to sit on top and behave a bit like a nested loop for
the more advanced feature of an Index Skip Scan (trying every value of
(a) where you had an index on (a, b, c) but had a WHERE clause qual on
(b, c)), but the only feedback I had so far was from Robert Haas who
thought that too should probably be pushed into the index scan.

FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
family of related features (DISTINCT being the easiest to build).  It
seems more descriptive than the MySQL term.  (DB2 added this a little
while ago and went with 'index jump scan', suggesting that we should
consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
English idiom meaning a short distance)).

Thomas Munro

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to