[HACKERS] How far are projections pushed down the execution tree?

2010-03-02 Thread tmp
Consider a table and a query referring to only a subset of the columns 
in that table. How early in the query evaluation is the projection 
carried out?


Are the columns to be selected filtered out as early as in the very 
access method that reads the table rows from the buffer, or are the 
projection handled later, after the whole row has been fetched by the 
access method?


Does it depend on the complexity of the query, how far down the three 
that the projection is handled out?


Thanks!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How far are projections pushed down the execution tree?

2010-03-02 Thread tmp

Thanks for the clarification!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-16 Thread tmp

You could add it to here -- note that if we decide it isn't worth it it'll
just get removed.


Which category would you recommend? Optimizer / Executor?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-05 Thread tmp

I would tend to think it's worth it myself.


I am unfortunately not familiar enough with the postgresql code base to 
be comfortable to provide a patch. Can I submit this optimization 
request to some sort of issue tracker or what should I do?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimizing DISTINCT with LIMIT

2008-12-04 Thread tmp

As far as I have understood the following query
  SELECT DISTINCT foo
  FROM bar
  LIMIT baz
is done by first sorting the input and then traversing the sorted data, 
ensuring uniqueness of output and stopping when the LIMIT threshold is 
reached. Furthermore, a part of the sort procedure is to traverse input 
at least one time.


Now, if the input is large but the LIMIT threshold is small, this 
sorting step may increase the query time unnecessarily so here is a 
suggestion for optimization:
  If the input is sufficiently large and the LIMIT threshold 
sufficiently small, maintain the DISTINCT output by hashning while 
traversing the input and stop when the LIMIT threshold is reached. No 
sorting required and *at* *most* one read of input.


Use case: Websites that needs to present small samples of huge queries fast.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizing DISTINCT with LIMIT

2008-12-04 Thread tmp

In principle, if there are no aggregate functions, then nodeAgg could
return a row immediately upon making any new entry into the hash table.
Whether it's worth the code uglification is debatable ... I think it
would require a third major pathway through nodeAgg.


Regarding whether it's worth the effort: In each of my three past jobs 
(all using postgresql) I have met several queries that would fetch a 
small subset of a large - even huge - input. I think that types of 
queries are relatively common out there, but if they are executed for 
e.g. a web-client it is simply a no-go with the current late LIMIT 
evaluation.


Also, it is my impression that many people use LIMIT to minimize the 
evaluation time of sub queries from which the outer query only needs a 
small subset of the sub query output.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers