Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Alvaro Herrera Sent: Friday, April 13, 2007 4:24 PM To: Tom Lane Cc: [EMAIL PROTECTED]; PostgreSQL Performance; Steve Subject: Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance) Tom Lane wrote: One idea I thought about was to sort by index scan cost, using selectivity only as a tiebreaker for cost, rather than the other way around as is currently done. This seems fairly plausible because indexscans that are cheaper than other indexscans likely return fewer rows too, and so selectivity is already accounted for to some extent --- at least you can't have an enormously worse selectivity at lower cost, whereas Steve's example proves it doesn't work the other way. But I'm worried about breaking the reasoning about redundant indexes that's mentioned in the comments. Another alternative that would respond to the immediate problem is to maintain the current sort order, but as we come to each index, consider using that one alone, and throw away whatever AND we might have built up if that one alone beats the AND-so-far. This seems more conservative, as it's unlikely to break any cases that work well now, but on the other hand it feels like plastering another wart atop a structure that's already rather rickety. Has anyone got any thoughts about the best way to do this? How about doing both: sort the index by index scan cost; then pick the first index on the list and start adding indexes when they lower the cost. When adding each index, consider it by itself against the already stacked indexes. If the cost is lower, put this index at the top of the list, and restart the algorithm (after the sorting step of course). I think the concern about condition redundancy should be attacked separately. How about just comparing whether they have common prefixes of conditions? I admit I don't understand what would happen with indexes defined like (lower(A), B, C) versus (A, B) for example. Instead of sorting, I suggest the quickselect() algorithm, which is O(n). Probably, if the list is small, it won't matter much, but it might offer some tangible benefit. Here is an example of the algorithm: #include stdlib.h typedef double Etype; /* Season to taste. */ extern EtypeRandomSelect(Etype * A, size_t p, size_t r, size_t i); extern size_t RandRange(size_t a, size_t b); extern size_t RandomPartition(Etype * A, size_t p, size_t r); extern size_t Partition(Etype * A, size_t p, size_t r); /* ** ** In the following code, every reference to CLR means: ** **Introduction to Algorithms **By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest **ISBN 0-07-013143-0 */ /* ** CLR, page 187 */ Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i) { size_t q, k; if (p == r) return A[p]; q = RandomPartition(A, p, r); k = q - p + 1; if (i = k) return RandomSelect(A, p, q, i); else return RandomSelect(A, q + 1, r, i - k); } size_t RandRange(size_t a, size_t b) { size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b - a)); return c + a; } /* ** CLR, page 162 */ size_t RandomPartition(Etype A[], size_t p, size_t r) { size_t i = RandRange(p, r); Etype Temp; Temp = A[p]; A[p] = A[i]; A[i] = Temp; return Partition(A, p, r); } /* ** CLR, page 154 */ size_t Partition(Etype A[], size_t p, size_t r) { Etype x, temp; size_t i, j; x = A[p]; i = p - 1; j = r + 1; for (;;) { do { j--; } while (!(A[j] = x)); do { i++; } while (!(A[i] = x)); if (i j) { temp = A[i]; A[i] = A[j]; A[j] = temp; } else return j; } } ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Table function support
Hello I searched some notes about this topic. I didn't find any usefull sample. Lot of samples are about external stored procedures and others about using table expression like create function foo(i1) returns table (a1 int) as return table(select a1 from tab) isn't clear if table attributes are related to output variables, but nobody join it together. SQL/PSM sample: create function accounts_of (customer_name char(20)) returns table ( account_number char(10), branch_name char(15) balance numeric(12,2)) return table (select account_number, branch_name, balance from account A where exists ( select * from depositor D where D.customer_name = accounts_of.customer_name and D.account_number = A.account_number )) correct calling of it is: select * from table (accounts_of (Smith)) next sample: CREATE FUNCTION filmtyp (art CHAR(2)) RETURNS TABLE (titel VARCHAR(75), jahr INTEGER) LANGUAGE SQL READS SQL DATA NO EXTERNAL ACTION DETERMINISTIC RETURN SELECT titel, jahr FROM film WHERE film.art = filmtyp.art Table functions are named as parametrised views too. I don't thing using OUT variables is good idea, because you will have problems with colum's names, which is problem for plpgsql. http://www.wiscorp.com/SQL2003Features.pdf http://wwwdvs.informatik.uni-kl.de/courses/NEDM/SS2004/Vorlesungsunterlagen/NEDM.Chapter.03.User-defined_Routines_and_Object_Behavior.pdf Regards Pavel Stehule From: Tom Lane [EMAIL PROTECTED] To: Pavel Stehule [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: Re: [PATCHES] Table function support Date: Tue, 10 Apr 2007 18:17:14 -0400 Pavel Stehule [EMAIL PROTECTED] writes: this patch allows using SQL2003 syntax for set returning functions. It is based on using new type of argmode - PROARGMODE_TABLE. I've been looking at this, and my feeling is that we should drop the PROARGMODE_TABLE business and just define RETURNS TABLE(x int, y int) as exactly equivalent to RETURNS SETOF RECORD with x and y treated as OUT parameters. There isn't any advantage to distinguishing the cases that outweighs breaking client code that looks at pg_proc.proargmodes. I don't believe that the SQL spec prevents us from exposing those parameter names to PL functions, especially since none of our PLs are in the standard at all. regards, tom lane _ Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Dann Corbit [EMAIL PROTECTED] writes: Instead of sorting, I suggest the quickselect() algorithm, which is O(n). What for? Common cases have less than half a dozen entries. That is not the place we need to be spending engineering effort --- what we need to worry about is what's the choice algorithm, not implementation details. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Alvaro Herrera [EMAIL PROTECTED] writes: I think the concern about condition redundancy should be attacked separately. How about just comparing whether they have common prefixes of conditions? I admit I don't understand what would happen with indexes defined like (lower(A), B, C) versus (A, B) for example. I understand that issue a bit better than I did when I wrote the comment (so I suppose I better rewrite it). The $64 reason for rejecting AND-combinations of indexes that are using some of the same WHERE-conditions is that if we don't, we effectively double-count the selectivity of those conditions, causing us to prefer useless AND-combinations. An example is WHERE A 42 AND B 100 where we have an index on A and one on (A,B). The selectivity calculation will blindly assume that the selectivities of the two indexes are independent and thus prefer to BitmapAnd them, when obviously there is no point in using both. Ideally we should improve the selectivity calculation to not get fooled like this, but that seems hard and probably slow. So for the moment we have the heuristic that no WHERE-clause should be used twice in any AND-combination. Given that we are using that heuristic, it becomes important that we visit the indexes in the right order --- as the code stands, in the (A) vs (A,B) case it will consider only the first index it arrives at in the selectivity sort order, because the second will be rejected on the basis of re-use of the WHERE A 42 condition. Sorting by selectivity tends to ensure that we pick the index that can make use of as many WHERE-clauses as possible. The idea of considering each index alone fixes the order dependency for cases where a single index is the best answer, but it doesn't help much for cases where you really do want a BitmapAnd, only not one using the index with the individually best selectivity. We really need a heuristic here --- exhaustive search will be impractical in cases where there are many indexes, because you'd be looking at 2^N combinations. (In Steve's example there are actually 38 relevant indexes, which is bad database design if you ask me, but in any case we cannot afford to search through 2^38 possibilities.) But the one we're using now is too fragile. Maybe we should use a cutoff similar to the GEQO one: do exhaustive search if there are less than N relevant indexes, for some N. But that's not going to help Steve; we still need a smarter heuristic for what to look for above the cutoff. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Has anyone got any thoughts about the best way to do this? How about doing both: sort the index by index scan cost; then pick the first index on the list and start adding indexes when they lower the cost. When adding each index, consider it by itself against the already stacked indexes. If the cost is lower, put this index at the top of the list, and restart the algorithm (after the sorting step of course). The restart part of that bothers me --- it's not entirely clear that you couldn't get into an infinite loop. (Imagine that A+B is better than A alone, so we adopt it, but worse than C alone, so we take C as the new leader and start over. Then perhaps C+B is better than C alone but worse than A alone, so we take A as the leader and start over. Maybe this is impossible but I'm unsure.) I looked back at the gdb results I'd gotten from Steve's example and noticed that for his 38 indexes there were only three distinct index selectivity values, and the sort step grouped indexes by cost within those groups. In hindsight of course this is obvious: the selectivity depends on the set of WHERE-clauses used, so with two WHERE-clauses there are three possible selectivities (to within roundoff error anyway) depending on whether the index uses one or both clauses. So the existing algorithm gets one thing right: for any two indexes that make use of just the same set of WHERE-clauses, it will always take the one with cheaper scan cost. Thinking more about this leads me to the following proposal: 1. Explicitly group the indexes according to the subset of WHERE-conditions (and partial index conditions, if any) they use. Within each such group, discard all but the cheapest-scan-cost one. 2. Sort the remaining indexes according to scan cost. 3. For each index in order, consider it as a standalone scan, and also consider adding it on to the AND-group led by each preceding index, using the same logic as now: reject using any WHERE-condition twice in a group, and then add on only if the total cost of the AND-group scan is reduced. This would be approximately O(N^2) in the number of relevant indexes, whereas the current scheme is roughly linear (handwaving a bit here because the number of WHERE-clauses is a factor too). But that seems unlikely to be a problem for sane numbers of indexes, unlike the O(2^N) behavior of an exhaustive search. It would get rid of (most of) the order-sensitivity problem, since we would definitely consider the AND-combination of every pair of combinable indexes. I can imagine cases where it wouldn't notice useful three-or-more-way combinations because the preceding two-way combination didn't win, but they seem pretty remote possibilities. Comments? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] StrangelyVariable Query Performance)
On Fri, 2007-04-13 at 18:48 -0400, Tom Lane wrote: Has anyone got any thoughts about the best way to do this? I don't think we know enough to pick one variant that works in all cases. This requires more detailed analysis of various cases. Lets put in a parameter to allow the options to be varied. The purpose would be to look for some more information that allows us to see what the pre-conditions would be for each heuristic. Initially, we say this is a beta-only feature and may be withdrawn in the production version. Why did it not pick my index? is a tiring game, but one that must be played. I hope that the ORDER LIMIT optimization should reduce the number of indexes chosen to maintain sort order in the output. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Makefile patch to make gcov work on Postgres contrib modules
Peter Eisentraut [EMAIL PROTECTED] writes: Am Donnerstag, 12. April 2007 17:08 schrieb Gregory Stark: Unless there's a makefile variable that is included in both CFLAGS and LDFLAGS that the user could use instead? But then that wouldn't work on architectures where ld is used instead of gcc for linking. Perhaps you should start by defining which situation you want to achieve. There are two ways to get gcov to work. Either you add -lcov to the end of the linking step or you use the same -f flags that you use at the compile stage. So what I would like to happen is something like: CFLAGS='-fprofile-arcs -ftest-coverage -O0 -g' ./configure --enable-debug --enable-cassert --enable-depend make make check gcov src/backend/access/common/heaptuple.c Perhaps the flags need to be in a separate variable instead of CFLAGS specifically advertised to ensure the flags will show up in both compile and linking lines. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Server-side support of all encodings
Tatsuo Ishii [EMAIL PROTECTED] writes: Sigh. From the first day when JOHAB was supported (back to 7.3 days), it should had not been in the server encodings. JOHAB's second byte definitely contain 0x41 and above. *johab*.map just reflect the fact. I think we should remove JOHAB from the server encodings list. I'm afraid users who have JOHAB encoded databases get angry, though. I think the best way to proceed is probably to fix this in HEAD but not back-patch it. During a dump and reload the encoding can be corrected to something safe. Ok. Shall I go ahead and remove JOHAB in HEAD? -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Server-side support of all encodings
Tatsuo Ishii [EMAIL PROTECTED] writes: I think the best way to proceed is probably to fix this in HEAD but not back-patch it. During a dump and reload the encoding can be corrected to something safe. Ok. Shall I go ahead and remove JOHAB in HEAD? +1 for me. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match