[HACKERS] Improving DISTINCT with LooseScan node

Hi hackers,

Everybody knows, that we have unefficient execution of query like "SELECT DISTINCT id from mytable"

if table has many-many rows and only several unique id values. Query plan looks like Unique + IndexScan.

I have tried to implement this feature in new type of node called Loose Scan.

This node must appears in plan together with IndexScan or IndexOnlyScan just like Unique node in this case.

But instead of filtering rows with equal values, LooseScan must retreive first row from IndexScan,

then remember and return this. With all subsequent calls LooseScan must initiate calling index_rescan via ExecReScan

with search value that > or < (depending on scan direction) of previous value.

Total cost of this path must be something like total_cost = n_distinct_values * subpath->startup_cost

What do you think about this idea?

I was able to create new LooseScan node, but i ran into difficulties with changing scan keys.

I looked (for example) on the ExecReScanIndexOnlyScan function and I find it difficult to understand

how i can reach the ioss_ScanKeys through the state of executor. Can you help me with this?

I also looked on the Nested Loop node, which as i think must change scan keys, but i didn't become clear for me.

The only thought that came to my head, that maybe i incorrectly create this subplan.

I create it just like create_upper_unique_plan, and create subplan for IndexScan via create_plan_recurse.

Can you tell me where to look or maybe somewhere there are examples?

Thanks

Regards,

Dmitriy Sarafannikov

Reply via email to