Re: [PATCHES] PQParam version 0.5
Andrew Chernow [EMAIL PROTECTED] writes: Here is the lastest pgparam patch. It is patched against a fresh checkout on 2007-12-05. What is this for? Why is it a good idea? It appears to be a fairly enormous increase in the size of libpq's API, and I really don't think I want to buy into the idea that libpq should know explicitly about each and every backend datatype. The 100% lack of any documentation in the patch isn't helping you sell it, BTW. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] PQParam version 0.5
Tom Lane wrote: Andrew Chernow [EMAIL PROTECTED] writes: Here is the lastest pgparam patch. It is patched against a fresh checkout on 2007-12-05. What is this for? Why is it a good idea? It appears to be a fairly enormous increase in the size of libpq's API, and I really don't think I want to buy into the idea that libpq should know explicitly about each and every backend datatype. The 100% lack of any documentation in the patch isn't helping you sell it, BTW. regards, tom lane enormous increase in the size of libpq's API We can dramatically reduce the exports by using macros, if preferred. The 100% lack of any documentation Okay, we will do this. For starters, take a look at test.c. Below is a brief description: 1. Managed params, rather than manually building PQexecParam arrays; which is a little error prone and tedious. PQputint4(conn, 5); PQputtextptr(conn, abc); PQparamExec(conn, INSERT INTO t VALUES ($1, $2), 1, NULL); // the NULL arg is a PGresult**, which is auto-cleared // when NULL. Otherwise *result is assigned. // or use the print-style: we changed the argument order since // our last release, it felt off. PGresult *r; PQparamExecf(conn, SELECT * FROM foo(%d, %t), 1, r, 5, abc); 2. In binary result mode, the user has no idea how the data is formatted and there are no demarshaling functions, thus making the binary parameterized API impractical. So, we made PQget functions that support text or binary results. The benefit of supporting both is that the new PQget functions can be used regardless of how the query was executed. long long i8; PGinet inet; PQgetint8(res, 0, 0, i8); PQgetinet(res, 0, 1, inet); // coming soon. Currently, no way of doing this now. PGarr arr; int item, itemlen; PQgetarr(res, 0, 0, arr); // access 2 dim 2d array - arr[2][7] itemlen = PQgetarr2d(arr, item, 2, 7); 3. Client server should both understand how data is formatted over the wire, otherwise the data received by the client is not useful. Things like int4 or even a BOX are not that tricky, but other types are or may change between versions. 4. Why do atoi(PQgetvalue(...)) everywhere? Andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Better default_statistics_target
On Wed, 2007-12-05 at 15:13 -0500, Chris Browne wrote: I have the theory (thus far not borne out by any numbers) that it might be a useful approach to try to go through the DB schema and use what information is there to try to come up with better numbers on a per-column basis. Yeh, agreed. The difficulty is making this work for generic datatypes. - Datestamps tend to imply temporal dispersion, ergo somewhat fewer bins. Similar for floats. Hmmm, not sure about that one. Some date/time columns can change very quickly over time, so the stats are frequently out of date. Then could come a second order perspective, where data would actually get sampled from pg_statistics. - If we look at the number of distinct histogram bins used, for a particular column, and find that there are some not used, we might drop bins. The histograms are height balanced, so they are always all used. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] Proposed patch to make mergejoin cost estimation more symmetric
There was some discussion earlier today http://archives.postgresql.org/pgsql-performance/2007-12/msg00081.php about how mergejoin cost estimation is not smart about the situation where we will have to scan a lot of rows on one side of the join before reaching the range of values found on the other side (and hence having some chance of finding join pairs). Ideally, that effect should be factored into the startup cost estimate for the join. This consideration is the exact dual of one that we already do have code for, namely that the merge can stop early if the range of values on one side ends long before that of the other. So I looked into whether that code couldn't be extended to handle this issue too. It turns out that it can be, and it actually becomes more symmetrical because it's considering both max and min values not just max. Also, I found that there were a couple of new-in-8.3 bugs in that area --- it's been extended to make estimates for descending-order mergejoins, but it wasn't getting that quite right. Since something needs to be done to that code anyway, I'm considering applying the attached patch. It's a bit larger than I would normally consider to be a good idea for an optional patch in late beta. But then again I wouldn't have the slightest hesitation about back-patching a change of this size after final release, so it seems attractive to put it in now. Aside from the patch, I have attached a test script that exercises merge join planning for some simple cases, and the plans output by the script in CVS HEAD with/without the patch. The cost estimates with the patch are in line with expectation, the estimates without, not so much. In particular, the existing bug can be seen at work here in that the sixth and eighth test cases (big join highm on b=h order by b desc and big join high on b=h order by b desc) are given unreasonably small cost estimates by the unpatched code. (Note: the two sets of numbers vary a bit because they were working with nonidentical ANALYZE statistics.) Any objections to applying the patch? regards, tom lane Index: src/backend/optimizer/path/costsize.c === RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v retrieving revision 1.189 diff -c -r1.189 costsize.c *** src/backend/optimizer/path/costsize.c 15 Nov 2007 22:25:15 - 1.189 --- src/backend/optimizer/path/costsize.c 6 Dec 2007 23:46:32 - *** *** 1372,1383 double outer_path_rows = PATH_ROWS(outer_path); double inner_path_rows = PATH_ROWS(inner_path); double outer_rows, ! inner_rows; double mergejointuples, rescannedtuples; double rescanratio; ! Selectivity outerscansel, ! innerscansel; Selectivity joininfactor; Pathsort_path; /* dummy for result of cost_sort */ --- 1372,1387 double outer_path_rows = PATH_ROWS(outer_path); double inner_path_rows = PATH_ROWS(inner_path); double outer_rows, ! inner_rows, ! outer_skip_rows, ! inner_skip_rows; double mergejointuples, rescannedtuples; double rescanratio; ! Selectivity outerstartsel, ! outerendsel, ! innerstartsel, ! innerendsel; Selectivity joininfactor; Pathsort_path; /* dummy for result of cost_sort */ *** *** 1444,1453 * A merge join will stop as soon as it exhausts either input stream * (unless it's an outer join, in which case the outer side has to be * scanned all the way anyway). Estimate fraction of the left and right !* inputs that will actually need to be scanned. We use only the first !* (most significant) merge clause for this purpose. Since !* mergejoinscansel() is a fairly expensive computation, we cache the !* results in the merge clause RestrictInfo. */ if (mergeclauses path-jpath.jointype != JOIN_FULL) { --- 1448,1459 * A merge join will stop as soon as it exhausts either input stream * (unless it's an outer join, in which case the outer side has to be * scanned all the way anyway). Estimate fraction of the left and right !* inputs that will actually need to be scanned. Likewise, we can !* estimate the number of rows that will be skipped before the first !* join pair is found, which should be factored into startup cost. !* We use only the first (most significant)
[PATCHES] A minor typo fix on pg_standby docs
Rgds, Arul Shaji *** pgstandby.sgml 2007-12-06 14:52:18.0 +1100 --- new_pgstandby.sgml 2007-12-07 18:38:26.0 +1100 *** *** 262,268 listitemparakeep the last 255 full WAL files, plus the current one/para/listitem listitemparasleep for 2 seconds between checks for next WAL file is full/para/listitem listitemparanever timeout if file not found/para/listitem ! listitemparastop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears/para/listitem /itemizedlist /listitem --- 262,268 listitemparakeep the last 255 full WAL files, plus the current one/para/listitem listitemparasleep for 2 seconds between checks for next WAL file is full/para/listitem listitemparanever timeout if file not found/para/listitem ! listitemparastop waiting when a trigger file called /tmp/pgsql.trigger.5442 appears/para/listitem /itemizedlist /listitem ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match