Re: [HACKERS] Sorting. When?
So, invoking or not invoking sorting depends on different parameters of the specific DBMS, does it? This also means that it depends on the specific implementation of the Planner and, as a consequence, *on the specific DBMS*? I mean, different DBMS can chose differently on invoking sorting even if they are executing the same query over the same set of data? Fava. Il 11/02/2011 22:49, Nicolas Barbier ha scritto: 2011/2/11 Kevin Grittner: "mac_man2...@yahoo.it" wrote: I need to know, from an algorithmic point of view, in which cases sorting is invoked. [..] Are your really looking to categorize the types of queries where sorting is *invoked*, or the ones where it is *considered*? Or perhaps only those where it is *required*, since there are no possible plans without sorting? Or, if you are seeking the exact rules that are used by the planner to determine all possible plans from which the one with minimum cost is chosen (and hence all ways in which sorts can be used), I think that the source code is the only complete reference. A non-complete introduction is: http://www.postgresql.org/docs/9.0/static/planner-optimizer.html> Basically, physical operators (seq scan, index scan, hash join, merge join, nested loop, filter, set operation, etc) may require their input to satisfy certain sort constraints (for example, both inputs of a merge join need to be sorted on the join attribute(s)). If it happens to be of lowest cost to explicitly sort the inputs right before consuming them, that will be done. If there is a way to get the same input in an already-ordered way (for example an index scan, or the output of a merge join), so that the cost is less than the non-ordered way + an explicit sort, then that already-ordered way will be chosen. Super-basic example: SELECT * FROM t ORDER BY a; This may either perform a seq scan of table t and then do an explicit sort, or perform a full index scan of an index on attribute a (provided that such an index exists), in which case the explicit sort is not needed because the index scan will deliver the rows in already-sorted order. Which option is chosen depends on the cost: The costs of both plans are calculated and the least costly plan is chosen. See the (non-exhaustive) list of things that influence the costs provided by Kevin to get a feeling for how many variables there are that influence this choice. Nicolas
Re: [HACKERS] Sorting. When?
Nicolas, thanks. Unfortunately I don't think I can get precise infos from that link. That "explains" how the EXPLAIN works, while I need to know, from an algorithmic point of view, in which cases sorting is invoked. Actually, maybe I can spend some time in trying to perform samples queries and trying to deduce which operations calls the sorting module. But I think it is not the most effective way to do that, since that would mean running a bounch of queries for different values of work_mem, or for different size of the involved tables. Even if I try to do that, some cases can not be evident to my sight. I am searching for someone telling me (how to get) a list of operations invoking sorting, and in which cases they do that. Just for example: - ORDER BY, always invokes sorting. - DISTINCT, always invokes sorting - Merge Join, just in case (..bla bla bla..) - ... Is it possible? Any other suggestion? Thanks for your time. Best regards. Fava Il 11/02/2011 11:50, Nicolas Barbier ha scritto: [ Please don't top-post.http://en.wikipedia.org/wiki/Posting_style> ] 2011/2/11 mac_man2...@yahoo.it: So, is there any precise way to discover when sorting is invoked? EXPLAIN shows how a query would be executed; explicit sorts should be mostly obvious. http://www.postgresql.org/docs/9.0/static/sql-explain.html> Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sorting. When?
Thank you all for your replies. So, is there any precise way to discover when sorting is invoked? Thanks. Regards. Fava Il 11/02/2011 01:17, Robert Haas ha scritto: On Thu, Feb 10, 2011 at 6:21 PM, Nicolas Barbier wrote: 2011/2/10 mac_man2...@yahoo.it: Which operations invoke the sorting algorithms implemented in the sorting module (tuplesort.c) ? Of course, one of those operations invoking sorting is the ORDER BY clause and the DISTINCT too. Moreover, the Merge Join should be implemented invoking sorting. Is there any other operation invoking sorting? AFAIK, all set operators except for UNION ALL. (I am probably missing a whole boatload of other things.) Merge joins don't necessarily involve a sort - you could do a merge over a pair of index scans, for example. Set operations can be implemented using hashing or sorting, too. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sorting. When?
Hi. Which operations invoke the sorting algorithms implemented in the sorting module (tuplesort.c) ? Of course, one of those operations invoking sorting is the ORDER BY clause and the DISTINCT too. Moreover, the Merge Join should be implemented invoking sorting. Is there any other operation invoking sorting? Thanks. Regards. Fava -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Disk caching
Hi to all. I am trying to see how PostgreSQL performance changes on the basis of work_mem. So, I am going to execute the 22 queries of TPCH (http://www.tpc.org/tpch/) again and again, each time for a different value of work_mem. Since I am interested just in work_mem variations, I should prevent each query to take advantages from revious executions of the 22 queries them selves. For example, taking cache advantages. So, taking into account that the 22 queries are those http://pastebin.com/7Dg50YRZ and are executed on tables of hundreds of MB and 1) Is it sufficient to run change the values of work_mem through psql and running the queries again without restarting postgres? 2) Or, should I restart postgres? 3) Or, shoud I restart the machine each time I execute the 22 queries? Thanks for your time. Regards. Manolo. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers