Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

2007-04-14 Thread Dann Corbit
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Alvaro Herrera
 Sent: Friday, April 13, 2007 4:24 PM
 To: Tom Lane
 Cc: [EMAIL PROTECTED]; PostgreSQL Performance; Steve
 Subject: Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM]
 Strangely Variable Query Performance)
 
 Tom Lane wrote:
 
  One idea I thought about was to sort by index scan cost, using
  selectivity only as a tiebreaker for cost, rather than the other way
  around as is currently done.  This seems fairly plausible because
  indexscans that are cheaper than other indexscans likely return
fewer
  rows too, and so selectivity is already accounted for to some extent
---
  at least you can't have an enormously worse selectivity at lower
cost,
  whereas Steve's example proves it doesn't work the other way.  But
I'm
  worried about breaking the reasoning about redundant indexes that's
  mentioned in the comments.
 
  Another alternative that would respond to the immediate problem is
to
  maintain the current sort order, but as we come to each index,
consider
  using that one alone, and throw away whatever AND we might have
built up
  if that one alone beats the AND-so-far.  This seems more
conservative,
  as it's unlikely to break any cases that work well now, but on the
other
  hand it feels like plastering another wart atop a structure that's
  already rather rickety.
 
  Has anyone got any thoughts about the best way to do this?
 
 How about doing both: sort the index by index scan cost; then pick the
 first index on the list and start adding indexes when they lower the
 cost.  When adding each index, consider it by itself against the
 already stacked indexes.  If the cost is lower, put this index at the
 top of the list, and restart the algorithm (after the sorting step of
 course).
 
 I think the concern about condition redundancy should be attacked
 separately.  How about just comparing whether they have common
prefixes
 of conditions?  I admit I don't understand what would happen with
 indexes defined like (lower(A), B, C) versus (A, B) for example.

Instead of sorting, I suggest the quickselect() algorithm, which is
O(n).
Probably, if the list is small, it won't matter much, but it might offer
some tangible benefit.

Here is an example of the algorithm:

#include stdlib.h
typedef double  Etype; /* Season to taste. */

extern EtypeRandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t   RandRange(size_t a, size_t b);
extern size_t   RandomPartition(Etype * A, size_t p, size_t r);
extern size_t   Partition(Etype * A, size_t p, size_t r);

/*
**
** In the following code, every reference to CLR means:
**
**Introduction to Algorithms
**By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**ISBN 0-07-013143-0
*/


/*
** CLR, page 187
*/
Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t  q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i = k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t RandRange(size_t a, size_t b)
{
size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b
- a));
return c + a;
}

/*
** CLR, page 162
*/
size_t RandomPartition(Etype A[], size_t p, size_t r)
{
size_t  i = RandRange(p, r);
Etype   Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
size_t Partition(Etype A[], size_t p, size_t r)
{
Etype   x,
temp;
size_t  i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] = x));
do {
i++;
} while (!(A[i] = x));
if (i  j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Table function support

2007-04-14 Thread Pavel Stehule

Hello

I searched some notes about this topic. I didn't find any usefull sample. 
Lot of samples are about external stored procedures and others about using 
table expression like


create function foo(i1)
returns table (a1 int)
as
 return table(select a1 from tab)

isn't clear if table attributes are related to output variables, but nobody 
join it together.


SQL/PSM sample:
create function accounts_of (customer_name char(20))
returns table (   account_number char(10),
  branch_name char(15)
  balance numeric(12,2))
return table
(select account_number, branch_name, balance
 from account A
 where exists (
 select *
 from depositor D
 where D.customer_name = accounts_of.customer_name
   and D.account_number = A.account_number ))


correct calling of it is:
select *
from table (accounts_of (‘Smith’))

next sample:
CREATE FUNCTION filmtyp (art CHAR(2))
 RETURNS TABLE (titel VARCHAR(75), jahr INTEGER)
 LANGUAGE SQL
 READS SQL DATA
 NO EXTERNAL ACTION
 DETERMINISTIC
 RETURN
SELECT titel, jahr
FROM film
WHERE film.art = filmtyp.art


Table functions are named as parametrised views too. I don't thing using OUT 
variables is good idea, because you will have problems with colum's names, 
which is problem for plpgsql.


http://www.wiscorp.com/SQL2003Features.pdf
http://wwwdvs.informatik.uni-kl.de/courses/NEDM/SS2004/Vorlesungsunterlagen/NEDM.Chapter.03.User-defined_Routines_and_Object_Behavior.pdf

Regards
Pavel Stehule




From: Tom Lane [EMAIL PROTECTED]
To: Pavel Stehule [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: [PATCHES] Table function support Date: Tue, 10 Apr 2007 
18:17:14 -0400


Pavel Stehule [EMAIL PROTECTED] writes:
 this patch allows using SQL2003 syntax for set returning functions. It 
is

 based on using new type of argmode - PROARGMODE_TABLE.

I've been looking at this, and my feeling is that we should drop the
PROARGMODE_TABLE business and just define RETURNS TABLE(x int, y int)
as exactly equivalent to RETURNS SETOF RECORD with x and y treated as
OUT parameters.  There isn't any advantage to distinguishing the cases
that outweighs breaking client code that looks at pg_proc.proargmodes.
I don't believe that the SQL spec prevents us from exposing those
parameter names to PL functions, especially since none of our PLs are
in the standard at all.

regards, tom lane


_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

2007-04-14 Thread Tom Lane
Dann Corbit [EMAIL PROTECTED] writes:
 Instead of sorting, I suggest the quickselect() algorithm, which is
 O(n).

What for?  Common cases have less than half a dozen entries.  That is
not the place we need to be spending engineering effort --- what we
need to worry about is what's the choice algorithm, not implementation
details.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

2007-04-14 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I think the concern about condition redundancy should be attacked
 separately.  How about just comparing whether they have common prefixes
 of conditions?  I admit I don't understand what would happen with
 indexes defined like (lower(A), B, C) versus (A, B) for example.

I understand that issue a bit better than I did when I wrote the comment
(so I suppose I better rewrite it).  The $64 reason for rejecting
AND-combinations of indexes that are using some of the same
WHERE-conditions is that if we don't, we effectively double-count the
selectivity of those conditions, causing us to prefer useless
AND-combinations.  An example is WHERE A  42 AND B  100 where we
have an index on A and one on (A,B).  The selectivity calculation
will blindly assume that the selectivities of the two indexes are
independent and thus prefer to BitmapAnd them, when obviously there
is no point in using both.  Ideally we should improve the selectivity
calculation to not get fooled like this, but that seems hard and
probably slow.  So for the moment we have the heuristic that no
WHERE-clause should be used twice in any AND-combination.

Given that we are using that heuristic, it becomes important that
we visit the indexes in the right order --- as the code stands,
in the (A) vs (A,B) case it will consider only the first index it
arrives at in the selectivity sort order, because the second will
be rejected on the basis of re-use of the WHERE A  42 condition.
Sorting by selectivity tends to ensure that we pick the index that
can make use of as many WHERE-clauses as possible.

The idea of considering each index alone fixes the order dependency
for cases where a single index is the best answer, but it doesn't
help much for cases where you really do want a BitmapAnd, only not
one using the index with the individually best selectivity.

We really need a heuristic here --- exhaustive search will be
impractical in cases where there are many indexes, because you'd
be looking at 2^N combinations.  (In Steve's example there are
actually 38 relevant indexes, which is bad database design if
you ask me, but in any case we cannot afford to search through
2^38 possibilities.)  But the one we're using now is too fragile.

Maybe we should use a cutoff similar to the GEQO one: do exhaustive
search if there are less than N relevant indexes, for some N.
But that's not going to help Steve; we still need a smarter heuristic
for what to look for above the cutoff.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

2007-04-14 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Has anyone got any thoughts about the best way to do this?

 How about doing both: sort the index by index scan cost; then pick the
 first index on the list and start adding indexes when they lower the
 cost.  When adding each index, consider it by itself against the
 already stacked indexes.  If the cost is lower, put this index at the
 top of the list, and restart the algorithm (after the sorting step of
 course).

The restart part of that bothers me --- it's not entirely clear that
you couldn't get into an infinite loop.  (Imagine that A+B is better
than A alone, so we adopt it, but worse than C alone, so we take C as
the new leader and start over.  Then perhaps C+B is better than C alone
but worse than A alone, so we take A as the leader and start over.
Maybe this is impossible but I'm unsure.)

I looked back at the gdb results I'd gotten from Steve's example and
noticed that for his 38 indexes there were only three distinct index
selectivity values, and the sort step grouped indexes by cost within
those groups.  In hindsight of course this is obvious: the selectivity
depends on the set of WHERE-clauses used, so with two WHERE-clauses
there are three possible selectivities (to within roundoff error anyway)
depending on whether the index uses one or both clauses.  So the
existing algorithm gets one thing right: for any two indexes that make
use of just the same set of WHERE-clauses, it will always take the one
with cheaper scan cost.

Thinking more about this leads me to the following proposal:

1. Explicitly group the indexes according to the subset of
WHERE-conditions (and partial index conditions, if any) they use.
Within each such group, discard all but the cheapest-scan-cost one.

2. Sort the remaining indexes according to scan cost.

3. For each index in order, consider it as a standalone scan, and also
consider adding it on to the AND-group led by each preceding index,
using the same logic as now: reject using any WHERE-condition twice
in a group, and then add on only if the total cost of the AND-group
scan is reduced.

This would be approximately O(N^2) in the number of relevant indexes,
whereas the current scheme is roughly linear (handwaving a bit here
because the number of WHERE-clauses is a factor too).  But that seems
unlikely to be a problem for sane numbers of indexes, unlike the O(2^N)
behavior of an exhaustive search.  It would get rid of (most of) the
order-sensitivity problem, since we would definitely consider the
AND-combination of every pair of combinable indexes.  I can imagine
cases where it wouldn't notice useful three-or-more-way combinations
because the preceding two-way combination didn't win, but they seem
pretty remote possibilities.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] StrangelyVariable Query Performance)

2007-04-14 Thread Simon Riggs
On Fri, 2007-04-13 at 18:48 -0400, Tom Lane wrote:
 Has anyone got any thoughts about the best way to do this?

I don't think we know enough to pick one variant that works in all
cases. This requires more detailed analysis of various cases.

Lets put in a parameter to allow the options to be varied. The purpose
would be to look for some more information that allows us to see what
the pre-conditions would be for each heuristic.

Initially, we say this is a beta-only feature and may be withdrawn in
the production version.

Why did it not pick my index? is a tiring game, but one that must be
played. I hope that the ORDER LIMIT optimization should reduce the
number of indexes chosen to maintain sort order in the output.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Makefile patch to make gcov work on Postgres contrib modules

2007-04-14 Thread Greg Stark
Peter Eisentraut [EMAIL PROTECTED] writes:

 Am Donnerstag, 12. April 2007 17:08 schrieb Gregory Stark:
  Unless there's a makefile variable that is included in both CFLAGS and
  LDFLAGS that the user could use instead? But then that wouldn't work on
  architectures where ld is used instead of gcc for linking.
 
 Perhaps you should start by defining which situation you want to achieve.

There are two ways to get gcov to work. Either you add -lcov to the end of the
linking step or you use the same -f flags that you use at the compile stage.

So what I would like to happen is something like:

CFLAGS='-fprofile-arcs -ftest-coverage -O0 -g'  ./configure --enable-debug 
--enable-cassert --enable-depend
make
make check
gcov src/backend/access/common/heaptuple.c

Perhaps the flags need to be in a separate variable instead of CFLAGS
specifically advertised to ensure the flags will show up in both compile and
linking lines.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Server-side support of all encodings

2007-04-14 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Sigh. From the first day when JOHAB was supported (back to 7.3 days),
  it should had not been in the server encodings. JOHAB's second byte
  definitely contain 0x41 and above. *johab*.map just reflect the
  fact. I think we should remove JOHAB from the server encodings list.
  I'm afraid users who have JOHAB encoded databases get angry, though.
 
 I think the best way to proceed is probably to fix this in HEAD but
 not back-patch it.  During a dump and reload the encoding can be
 corrected to something safe.

Ok. Shall I go ahead and remove JOHAB in HEAD?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Server-side support of all encodings

2007-04-14 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 I think the best way to proceed is probably to fix this in HEAD but
 not back-patch it.  During a dump and reload the encoding can be
 corrected to something safe.

 Ok. Shall I go ahead and remove JOHAB in HEAD?

+1 for me.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match