Re: [PATCHES] PQParam version 0.5

2007-12-06 Thread Tom Lane
Andrew Chernow [EMAIL PROTECTED] writes:
 Here is the lastest pgparam patch.  It is patched against a fresh 
 checkout on 2007-12-05.

What is this for?  Why is it a good idea?  It appears to be a fairly
enormous increase in the size of libpq's API, and I really don't think
I want to buy into the idea that libpq should know explicitly about each
and every backend datatype.  The 100% lack of any documentation in the
patch isn't helping you sell it, BTW.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] PQParam version 0.5

2007-12-06 Thread Andrew Chernow

Tom Lane wrote:

Andrew Chernow [EMAIL PROTECTED] writes:
Here is the lastest pgparam patch.  It is patched against a fresh 
checkout on 2007-12-05.


What is this for?  Why is it a good idea?  It appears to be a fairly
enormous increase in the size of libpq's API, and I really don't think
I want to buy into the idea that libpq should know explicitly about each
and every backend datatype.  The 100% lack of any documentation in the
patch isn't helping you sell it, BTW.

regards, tom lane




enormous increase in the size of libpq's API
We can dramatically reduce the exports by using macros, if preferred.

The 100% lack of any documentation
Okay, we will do this.  For starters, take a look at test.c.  Below is a 
brief description:


1. Managed params, rather than manually building PQexecParam arrays; 
which is a little error prone and tedious.


  PQputint4(conn, 5);
  PQputtextptr(conn, abc);
  PQparamExec(conn, INSERT INTO t VALUES ($1, $2), 1, NULL);
  // the NULL arg is a PGresult**, which is auto-cleared
  // when NULL.  Otherwise *result is assigned.

  // or use the print-style: we changed the argument order since
  // our last release, it felt off.
  PGresult *r;
  PQparamExecf(conn, SELECT * FROM foo(%d, %t), 1, r, 5, abc);

2. In binary result mode, the user has no idea how the data is formatted 
and there are no demarshaling functions, thus making the binary 
parameterized API impractical.  So, we made PQget functions that support 
text or binary results.  The benefit of supporting both is that the new 
PQget functions can be used regardless of how the query was executed.


  long long i8;
  PGinet inet;
  PQgetint8(res, 0, 0, i8);
  PQgetinet(res, 0, 1, inet);

  // coming soon. Currently, no way of doing this now.
  PGarr arr;
  int item, itemlen;
  PQgetarr(res, 0, 0, arr);
  // access 2 dim 2d array - arr[2][7]
  itemlen = PQgetarr2d(arr, item, 2, 7);

3. Client  server should both understand how data is formatted over the 
wire, otherwise the data received by the client is not useful.  Things 
like int4 or even a BOX are not that tricky, but other types are or may 
change between versions.


4. Why do atoi(PQgetvalue(...)) everywhere?

Andrew


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Better default_statistics_target

2007-12-06 Thread Simon Riggs
On Wed, 2007-12-05 at 15:13 -0500, Chris Browne wrote:

 I have the theory (thus far not borne out by any numbers) that it
 might be a useful approach to try to go through the DB schema and use
 what information is there to try to come up with better numbers on a
 per-column basis.

Yeh, agreed.

The difficulty is making this work for generic datatypes.

  - Datestamps tend to imply temporal dispersion, ergo somewhat fewer
bins.  Similar for floats.

Hmmm, not sure about that one. Some date/time columns can change very
quickly over time, so the stats are frequently out of date.

 Then could come a second order perspective, where data would
 actually get sampled from pg_statistics.
 
  - If we look at the number of distinct histogram bins used, for a
particular column, and find that there are some not used, we might
drop bins.

The histograms are height balanced, so they are always all used.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Proposed patch to make mergejoin cost estimation more symmetric

2007-12-06 Thread Tom Lane
There was some discussion earlier today
http://archives.postgresql.org/pgsql-performance/2007-12/msg00081.php
about how mergejoin cost estimation is not smart about the situation
where we will have to scan a lot of rows on one side of the join before
reaching the range of values found on the other side (and hence having
some chance of finding join pairs).  Ideally, that effect should be
factored into the startup cost estimate for the join.  This
consideration is the exact dual of one that we already do have code
for, namely that the merge can stop early if the range of values on
one side ends long before that of the other.  So I looked into whether
that code couldn't be extended to handle this issue too.  It turns out
that it can be, and it actually becomes more symmetrical because it's
considering both max and min values not just max.

Also, I found that there were a couple of new-in-8.3 bugs in that area
--- it's been extended to make estimates for descending-order
mergejoins, but it wasn't getting that quite right.

Since something needs to be done to that code anyway, I'm considering
applying the attached patch.  It's a bit larger than I would normally
consider to be a good idea for an optional patch in late beta.  But then
again I wouldn't have the slightest hesitation about back-patching a
change of this size after final release, so it seems attractive to put
it in now.

Aside from the patch, I have attached a test script that exercises merge
join planning for some simple cases, and the plans output by the script
in CVS HEAD with/without the patch.  The cost estimates with the patch
are in line with expectation, the estimates without, not so much.
In particular, the existing bug can be seen at work here in that the
sixth and eighth test cases (big join highm on b=h order by b desc and
big join high on b=h order by b desc) are given unreasonably small
cost estimates by the unpatched code.  (Note: the two sets of numbers
vary a bit because they were working with nonidentical ANALYZE
statistics.)

Any objections to applying the patch?

regards, tom lane

Index: src/backend/optimizer/path/costsize.c
===
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v
retrieving revision 1.189
diff -c -r1.189 costsize.c
*** src/backend/optimizer/path/costsize.c   15 Nov 2007 22:25:15 -  
1.189
--- src/backend/optimizer/path/costsize.c   6 Dec 2007 23:46:32 -
***
*** 1372,1383 
double  outer_path_rows = PATH_ROWS(outer_path);
double  inner_path_rows = PATH_ROWS(inner_path);
double  outer_rows,
!   inner_rows;
double  mergejointuples,
rescannedtuples;
double  rescanratio;
!   Selectivity outerscansel,
!   innerscansel;
Selectivity joininfactor;
Pathsort_path;  /* dummy for result of 
cost_sort */
  
--- 1372,1387 
double  outer_path_rows = PATH_ROWS(outer_path);
double  inner_path_rows = PATH_ROWS(inner_path);
double  outer_rows,
!   inner_rows,
!   outer_skip_rows,
!   inner_skip_rows;
double  mergejointuples,
rescannedtuples;
double  rescanratio;
!   Selectivity outerstartsel,
!   outerendsel,
!   innerstartsel,
!   innerendsel;
Selectivity joininfactor;
Pathsort_path;  /* dummy for result of 
cost_sort */
  
***
*** 1444,1453 
 * A merge join will stop as soon as it exhausts either input stream
 * (unless it's an outer join, in which case the outer side has to be
 * scanned all the way anyway).  Estimate fraction of the left and right
!* inputs that will actually need to be scanned. We use only the first
!* (most significant) merge clause for this purpose.  Since
!* mergejoinscansel() is a fairly expensive computation, we cache the
!* results in the merge clause RestrictInfo.
 */
if (mergeclauses  path-jpath.jointype != JOIN_FULL)
{
--- 1448,1459 
 * A merge join will stop as soon as it exhausts either input stream
 * (unless it's an outer join, in which case the outer side has to be
 * scanned all the way anyway).  Estimate fraction of the left and right
!* inputs that will actually need to be scanned.  Likewise, we can
!* estimate the number of rows that will be skipped before the first
!* join pair is found, which should be factored into startup cost.
!* We use only the first (most significant) 

[PATCHES] A minor typo fix on pg_standby docs

2007-12-06 Thread FAST PostgreSQL


Rgds,
Arul Shaji
*** pgstandby.sgml	2007-12-06 14:52:18.0 +1100
--- new_pgstandby.sgml	2007-12-07 18:38:26.0 +1100
***
*** 262,268 
   listitemparakeep the last 255 full WAL files, plus the current one/para/listitem
   listitemparasleep for 2 seconds between checks for next WAL file is full/para/listitem
   listitemparanever timeout if file not found/para/listitem
!  listitemparastop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears/para/listitem
  /itemizedlist
 /listitem
  
--- 262,268 
   listitemparakeep the last 255 full WAL files, plus the current one/para/listitem
   listitemparasleep for 2 seconds between checks for next WAL file is full/para/listitem
   listitemparanever timeout if file not found/para/listitem
!  listitemparastop waiting when a trigger file called /tmp/pgsql.trigger.5442 appears/para/listitem
  /itemizedlist
 /listitem
  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match