Re: [HACKERS] synchronous commit vs. hint bits
hi, On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote: Yes, I would expect that. What kind of increase are you seeing? Is it causing a problem for you, or are you just making an observation? i was curious because my application uses async commits mainly to avoid frequent fsync. i have no numbers right now. Oh, that's interesting. Why do you want to avoid frequent fsyncs? simply because it was expensive on my environment. I thought the point of synchronous_commit=off was to move the fsyncs to the background, but not necessarily to decrease the frequency. it makes sense. but it's normal for users to abuse features. :) YAMAMOTO Takashi -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronous commit vs. hint bits
Hi Robert, On Wednesday, November 30, 2011 02:10:00 PM Robert Haas wrote: On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote: Yes, I would expect that. What kind of increase are you seeing? Is it causing a problem for you, or are you just making an observation? i was curious because my application uses async commits mainly to avoid frequent fsync. i have no numbers right now. Oh, that's interesting. Why do you want to avoid frequent fsyncs? I thought the point of synchronous_commit=off was to move the fsyncs to the background, but not necessarily to decrease the frequency. Is that so? If it wouldn't avoid fsyncs how could you reach multiple thousand TPS in a writing pgbench run on a pretty ordinary system with fsync=on? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Accounting for toast in query planner. (gin/gist indexes).
Jesper Krogh jes...@krogh.cc writes: I have currently hit a problem which I dug into finding the cause for, in particular, searching in GIN indices seems in some situations to un-fairly favor Sequential Scans. Googling a bit I found this page: http://postgis.refractions.net/docs/ch06.html#id2635817 Describing the excact problem. It seemed to be discussed back in the pre 8.1 days and wasn't solved there, is there a chance someone may address it in 9.2 ? http://archives.postgresql.org/pgsql-performance/2005-02/msg00041.php Don't hold your breath. There's a huge amount of work to do there, and nobody's even thinking about it. The problem has actually gotten worse since 8.1, because the index access patterns are more complex and the planner has less not more info about them (because we pushed the determination of what's lossy to run time). Accumulating stats about how toasty the heap tuples are would be only the first step towards producing better cost estimates here --- you'd also need some way of estimating how lossy an index is, for instance, so you could guess how many heap tuples will be visited. And on top of that, for a lot of these complicated operators even our estimates of the final number of matches are pretty crummy. I'm not sure if having an explicit model of what's happening inside the index would help. Maybe it would, since the condition actually being tested by the index is probably simpler than the operator proper, but right now the planner has no clue that they're different. I dont understand the details about lossy-ness, but even with the changes for getting a cost estimate for GIN indexes that went into 9.1 (which made the situation quite a lot better) we still favor Sequential Scans in situations where it is a really bad idea. The cost of doing filtering on something that is huge and relies in toast are not accounted for. Taking the same dataset to the worst situation is a query like: this key exists in all of the dataset, how many are there.. the alternative query-plans are here: 2011-12-01 09:59:46.428 jk=# explain (analyze on, buffers on) select count(id) from ftstest where fts @@ to_tsquery('english','test1'); QUERY PLAN -- Aggregate (cost=9591.79..9591.80 rows=1 width=4) (actual time=109.084..109.084 rows=1 loops=1) Buffers: shared hit=1 read=4595 - Bitmap Heap Scan on ftstest (cost=2149.59..9091.92 rows=199947 width=4) (actual time=38.613..87.784 rows=199939 loops=1) Recheck Cond: (fts @@ '''test1'''::tsquery) Buffers: shared hit=1 read=4595 - Bitmap Index Scan on ftstest_fts_idx1 (cost=0.00..2099.60 rows=199947 width=0) (actual time=37.847..37.847 rows=199939 loops=1) Index Cond: (fts @@ '''test1'''::tsquery) Buffers: shared hit=1 read=152 Total runtime: 109.129 ms (9 rows) Time: 109.842 ms 2011-12-01 10:02:19.047 jk=# set enable_seqscan to on; SET Time: 0.122 ms 2011-12-01 10:02:23.657 jk=# explain (analyze on, buffers on) select count(id) from ftstest where fts @@ to_tsquery('english','test1'); QUERY PLAN Aggregate (cost=7442.87..7442.88 rows=1 width=4) (actual time=25266.047..25266.048 rows=1 loops=1) Buffers: shared hit=651448 read=561776 - Seq Scan on ftstest (cost=0.00..6943.00 rows=199947 width=4) (actual time=0.015..25177.959 rows=199939 loops=1) Filter: (fts @@ '''test1'''::tsquery) Rows Removed by Filter: 61 Buffers: shared hit=651448 read=561776 Total runtime: 25266.080 ms (7 rows) Time: 25266.740 ms 2011-12-01 10:02:49.936 jk=# For FTS it hits double, since the tsvector coloum typically is in TOAST (since it is large) and rarely is in memory (since it is rarely returned to the user or used for anything except from building the index). As can be seen, it hits 651448 buffers in the latter (default) query and only 4596 in the former. Would just naively accounting for the need of TOAST-access level that x10 difference out? Secondly I could bump the default cost of ts_match_vq/ts_match_qv a bit up, since the cost of doing that computation is probably not as cheap as a ordinary boolean function. I'm not trying to argue that the solution is simple, since my insights are limited there, only that the problem forces me to drop in set enable_seqscan to off in random places in the code. (which i definately would prefer to go without). Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow substitute allocators for PGresult.
Hello, me I'll put further thought into dblink-plus taking it into me account. .. me Please let me have a little more time. I've inquired the developer of dblink-plus about RegisterResourceReleaseCallback(). He said that the function is in bad compatibility with current implementation. In addition to this, storing into tuplestore directly seems to me a good idea than palloc'ed PGresult. So I tried to make libpq/PGresult be able to handle alternative tuple store by hinting to PGconn, and modify dblink to use the mechanism as the first sample code. I will show it as a series of patches in next message. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow substitute allocators for PGresult.
Hello, This is the next version of Allow substitute allocators for PGresult. Totally chaning the concept from the previous one, this patch allows libpq to handle alternative tuple store for received tuples. Design guidelines are shown below. - No need to modify existing client code of libpq. - Existing libpq client runs with roughly same performance, and dblink with modification runs faster to some extent and requires less memory. I have measured roughly of run time and memory requirement for three configurations on CentOS6 on Vbox with 2GB mem 4 cores running on Win7-Corei7, transferring (30 bytes * 2 cols) * 200 tuples (120MB net) within this virutal machine. The results are below. xfer time Peak RSS Original: 6.02s 850MB libpq patch + Original dblink : 6.11s 850MB full patch : 4.44s 643MB xfer time here is the mean of five 'real time's measured by running sql script like this after warmup run. === test.sql select dblink_connect('c', 'host=localhost port=5432 dbname=test'); select * from dblink('c', 'select a,c from foo limit 200') as (a text, b bytea) limit 1; select dblink_disconnect('c'); === $ for i in $(seq 1 10); do time psql test -f t.sql; done === Peak RSS is measured by picking up heap Rss in /proc/[pid]/smaps. It seems somewhat slow using patched libpq and original dblink, but it seems within error range too. If this amount of slowdown is not permissible, it might be improved by restoring the static call route before for extra redundancy of the code. On the other hand, full patch version seems obviously fast and requires less memory. Isn't it nice? This patch consists of two sub patches. The first is a patch for libpq to allow rewiring tuple storage mechanism. But default behavior is not changed. Existing libpq client should run with it. The second is modify dblink to storing received tuples into tuplestore directly using the mechanism above. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index a360d78..1af8df6 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -160,7 +160,3 @@ PQconnectStartParams 157 PQping158 PQpingParams 159 PQlibVersion 160 -PQregisterTupleAdder 161 -PQgetAsCstring 162 -PQgetAddTupleParam 163 -PQsetAddTupleErrMes 164 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 437be26..50f3f83 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2692,7 +2692,6 @@ makeEmptyPGconn(void) conn-allow_ssl_try = true; conn-wait_ssl_try = false; #endif - conn-addTupleFunc = NULL; /* * We try to send at least 8K at a time, which is the usual size of pipe @@ -5065,10 +5064,3 @@ PQregisterThreadLock(pgthreadlock_t newhandler) return prev; } - -void -PQregisterTupleAdder(PGconn *conn, addTupleFunction func, void *param) -{ - conn-addTupleFunc = func; - conn-addTupleFuncParam = param; -} diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index c8ec9bd..113aab0 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -48,6 +48,7 @@ char *const pgresStatus[] = { static int static_client_encoding = PG_SQL_ASCII; static bool static_std_strings = false; + static PGEvent *dupEvents(PGEvent *events, int count); static bool PQsendQueryStart(PGconn *conn); static int PQsendQueryGuts(PGconn *conn, @@ -65,9 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn); static int PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target); static int check_field_number(const PGresult *res, int field_num); -static void *pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, - int id, size_t len); -static void *pqAddTuple(PGresult *res, PGresAttValue *tup); + /* * Space management for PGresult. @@ -161,9 +160,6 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) result-curBlock = NULL; result-curOffset = 0; result-spaceLeft = 0; - result-addTupleFunc = pqDefaultAddTupleFunc; - result-addTupleFuncParam = NULL; - result-addTupleFuncErrMes = NULL; if (conn) { @@ -198,12 +194,6 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) } result-nEvents = conn-nEvents; } - - if (conn-addTupleFunc) - { - result-addTupleFunc = conn-addTupleFunc; - result-addTupleFuncParam = conn-addTupleFuncParam; - } } else { @@ -497,33 +487,6 @@ PQresultAlloc(PGresult *res, size_t nBytes) return pqResultAlloc(res, nBytes, TRUE); } -void * -pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, int id, size_t len) -{ - void *p; - - switch (func) - { - case ADDTUP_ALLOC_TEXT: - return pqResultAlloc(res, len, TRUE); - - case ADDTUP_ALLOC_BINARY: -
Re: [HACKERS] Allow substitute allocators for PGresult.
Ouch! I'm sorry for making a reverse patch for the first modification. This is an amendment of the message below. The body text is copied into this message. http://archives.postgresql.org/message-id/20111201.192419.103527179.horiguchi.kyot...@oss.ntt.co.jp === Hello, This is the next version of Allow substitute allocators for PGresult. Totally chaning the concept from the previous one, this patch allows libpq to handle alternative tuple store for received tuples. Design guidelines are shown below. - No need to modify existing client code of libpq. - Existing libpq client runs with roughly same performance, and dblink with modification runs faster to some extent and requires less memory. I have measured roughly of run time and memory requirement for three configurations on CentOS6 on Vbox with 2GB mem 4 cores running on Win7-Corei7, transferring (30 bytes * 2 cols) * 200 tuples (120MB net) within this virutal machine. The results are below. xfer time Peak RSS Original: 6.02s 850MB libpq patch + Original dblink : 6.11s 850MB full patch : 4.44s 643MB xfer time here is the mean of five 'real time's measured by running sql script like this after warmup run. === test.sql select dblink_connect('c', 'host=localhost port=5432 dbname=test'); select * from dblink('c', 'select a,c from foo limit 200') as (a text, b bytea) limit 1; select dblink_disconnect('c'); === $ for i in $(seq 1 10); do time psql test -f t.sql; done === Peak RSS is measured by picking up heap Rss in /proc/[pid]/smaps. It seems somewhat slow using patched libpq and original dblink, but it seems within error range too. If this amount of slowdown is not permissible, it might be improved by restoring the static call route before for extra redundancy of the code. On the other hand, full patch version seems obviously fast and requires less memory. Isn't it nice? This patch consists of two sub patches. The first is a patch for libpq to allow rewiring tuple storage mechanism. But default behavior is not changed. Existing libpq client should run with it. The second is modify dblink to storing received tuples into tuplestore directly using the mechanism above. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 1af8df6..a360d78 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -160,3 +160,7 @@ PQconnectStartParams 157 PQping158 PQpingParams 159 PQlibVersion 160 +PQregisterTupleAdder 161 +PQgetAsCstring 162 +PQgetAddTupleParam 163 +PQsetAddTupleErrMes 164 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 50f3f83..437be26 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2692,6 +2692,7 @@ makeEmptyPGconn(void) conn-allow_ssl_try = true; conn-wait_ssl_try = false; #endif + conn-addTupleFunc = NULL; /* * We try to send at least 8K at a time, which is the usual size of pipe @@ -5064,3 +5065,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler) return prev; } + +void +PQregisterTupleAdder(PGconn *conn, addTupleFunction func, void *param) +{ + conn-addTupleFunc = func; + conn-addTupleFuncParam = param; +} diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 113aab0..c8ec9bd 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -48,7 +48,6 @@ char *const pgresStatus[] = { static int static_client_encoding = PG_SQL_ASCII; static bool static_std_strings = false; - static PGEvent *dupEvents(PGEvent *events, int count); static bool PQsendQueryStart(PGconn *conn); static int PQsendQueryGuts(PGconn *conn, @@ -66,7 +65,9 @@ static PGresult *PQexecFinish(PGconn *conn); static int PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target); static int check_field_number(const PGresult *res, int field_num); - +static void *pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, + int id, size_t len); +static void *pqAddTuple(PGresult *res, PGresAttValue *tup); /* * Space management for PGresult. @@ -160,6 +161,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) result-curBlock = NULL; result-curOffset = 0; result-spaceLeft = 0; + result-addTupleFunc = pqDefaultAddTupleFunc; + result-addTupleFuncParam = NULL; + result-addTupleFuncErrMes = NULL; if (conn) { @@ -194,6 +198,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) } result-nEvents = conn-nEvents; } + + if (conn-addTupleFunc) + { + result-addTupleFunc = conn-addTupleFunc; + result-addTupleFuncParam = conn-addTupleFuncParam; + } } else { @@ -487,6 +497,33 @@ PQresultAlloc(PGresult *res, size_t nBytes)
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 11/25/2011 10:52 AM, Jeff Janes wrote: Given Peter's patch on the same subject, should we now mark this one as rejected in the commitfest app? https://commitfest.postgresql.org/action/patch_view?id=681 That may be premature. While the testing I've done so far suggests Peter's idea is the better route forward, the much higher complexity level does mean it's surely possible for it to be rejected. We have a reviewer signed up for Peter's submission now; I think what to do with Tomas's entry will be an easier to make once that report is in. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On 11/16/2011 01:28 PM, Daniel Farina wrote: As it would turn out, a patch for this has already been submitted: http://archives.postgresql.org/pgsql-hackers/2011-10/msg1.php There was some wrangling on whether it needs to be extended to be useful, but for our purposes the formulation already posted already captures vital value for us, and in that form appears to be fairly uncontentious. I have moved it to the current commitfest, with a comment linking to the 'please revive this patch' thread whereby a second glance at what to do about this was conducted. Yeah, that one got a raw deal; it should be in the *current* CommitFest--you had it in the next one. I'll join the chorus to just allow people to fire just the pg_cancel_backend pea shooter foot gun targeted only at their own feet, make most users happy, and punt anything more complicated off as troublesome relative to its benefit. That's what Daniel has said, what his co-worker Edward also implemented, what Noah thought was good enough given other security mechanisms in place, and what Tom thinks is reasonable too. This I feel is important, so I'm going to add myself as the next reviewer and include it in my testing run tomorrow. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring on DROP/ALTER SET SCHEMA/ALTER RENAME TO statement
2011/11/30 Robert Haas robertmh...@gmail.com: On Sun, Nov 27, 2011 at 3:14 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: If we add new properties that required by AlterObjectNamespace, as you suggested, it will allow to reduce number of caller of this routine mechanically with small changes. Should I try this reworking with this way? Yeah, let's try that for starters, and then see if anything further suggests itself. OK. I marked this patch as Returned with Feedback. I will try it again. In the timeframe of this commit-fest, I'll focuse on remaining my patches and reviewing pgsql-fdw. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inverse convertion for pg_mb2wchar
On Thu, Dec 1, 2011 at 12:30 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 21, 2011 at 11:49 AM, Alexander Korotkov aekorot...@gmail.com wrote: I've a question about pg_mb2wchar function. Is there any way for inverse convertion pg_wchar* to char*? I've looked to pg_wchar_tbl table definition, and I didn't find anything about inverse transformation. So, any change to get inverse convertion? I'm experimenting with index support for regexp search and I'm trying to get some characters back from color map. Well, any char can presumably also be represented as a wchar, but the reverse isn't necessarily true... So, if wchar can't be presented as sequence of chars, it means that it can't occurs in string in server encoding. In this situation it's enough for me to know that it is so for paticular wchar. I found that for UTF8 uncoding wchar is unicode. For single-bytes encodings wchar just hold original value in it's lower byte. And there are some conversions for PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 which are not clear for me, but it's seems to be feasible to write inverse conversion code using existing code of direct conversion. (What's a color map?) Regexp engine translates regexp to finite state automatum. In order to automatum don't have too much arc, characters are grouped to colors. Colormap maps wchar - color number. Since analyze automatum produced by regexp engine, I need to convert wchar to original character. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP: index support for regexp search
On Thu, Dec 1, 2011 at 12:29 AM, Robert Haas robertmh...@gmail.com wrote: Please add this patch here so it does not get lost in the shuffle: https://commitfest.postgresql.org/action/commitfest_view/open Done. -- With best regards, Alexander Korotkov.
Re: [HACKERS] patch for type privileges
On 2011-11-29 18:47, Peter Eisentraut wrote: On tis, 2011-11-29 at 07:07 +0200, Peter Eisentraut wrote: On mån, 2011-11-28 at 11:41 +0100, Yeb Havinga wrote: On 2011-11-15 21:50, Peter Eisentraut wrote: Patch attached. I cannot get the patch to apply, this is the output of patch -p1 --dry-run on HEAD. I need to remerge it against concurrent range type activity. New patch attached. I'm looking at your patch. One thing that puzzled me for a while was that I could not restrict access to base types (either built-in or user defined). Is this intentional? regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
* Robert Haas (robertmh...@gmail.com) wrote: On Wed, Nov 30, 2011 at 3:58 PM, Stephen Frost sfr...@snowman.net wrote: Erm, isn't there a contrib type that already does all that for you..? ip4r or whatever? Just saying, if you're looking for that capability.. Oh, huh, good to know. Still, I'm not sure why you need to load a separate type to get this... there's no reason why the built-in CIDR type couldn't support it. The semantics of that type aren't what people actually want and there's been push-back about changing it due to backwards compatibility, etc. That's my recollection of the situation, anyway. I'm sure there's all kinds of fun talk in the archives about it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] synchronous commit vs. hint bits
On Thu, Dec 1, 2011 at 4:09 AM, Andres Freund and...@anarazel.de wrote: Oh, that's interesting. Why do you want to avoid frequent fsyncs? I thought the point of synchronous_commit=off was to move the fsyncs to the background, but not necessarily to decrease the frequency. Is that so? If it wouldn't avoid fsyncs how could you reach multiple thousand TPS in a writing pgbench run on a pretty ordinary system with fsync=on? Eh, well, what would stop you from achieving that? An fsync operation that occurs in the background doesn't block further transactions from completing. Meanwhile, getting the WAL records on disk faster allows us to set hint bits sooner, which is a significant win, as shown by the numbers I posted upthread. One possible downside of trying to kick off the fsync more quickly is that if there are a continuous stream of background fsyncs going on, a process that needs to do an XLogFlush in the foreground (i.e. a synchronous_commit=on transaction in the middle of many synchronous_commit=off transactions) might be more likely to find an fsync already in progress and therefore need to wait until it completes before starting the next one, slowing things down. But I'm a bit reluctant to believe that is a real effect without some data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
- Цитат от Stephen Frost (sfr...@snowman.net), на 01.12.2011 в 15:56 - * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Nov 30, 2011 at 3:58 PM, Stephen Frost sfr...@snowman.net wrote: Erm, isn't there a contrib type that already does all that for you..? ip4r or whatever? Just saying, if you're looking for that capability.. Oh, huh, good to know. Still, I'm not sure why you need to load a separate type to get this... there's no reason why the built-in CIDR type couldn't support it. The semantics of that type aren't what people actually want and there's been push-back about changing it due to backwards compatibility, etc. That's my recollection of the situation, anyway. I'm sure there's all kinds of fun talk in the archives about it. I have reached one or two times to use build-in inet/cidr types but the lack of indexing support for contains op was stopping me - i have used ip4r extension. I do not think that adding index support to a datatype classifies as semantic change that will break backward compatibility. Best regards -- Luben Karavelov
Re: [HACKERS] synchronous commit vs. hint bits
On Thursday, December 01, 2011 03:11:43 PM Robert Haas wrote: On Thu, Dec 1, 2011 at 4:09 AM, Andres Freund and...@anarazel.de wrote: Oh, that's interesting. Why do you want to avoid frequent fsyncs? I thought the point of synchronous_commit=off was to move the fsyncs to the background, but not necessarily to decrease the frequency. Is that so? If it wouldn't avoid fsyncs how could you reach multiple thousand TPS in a writing pgbench run on a pretty ordinary system with fsync=on? Eh, well, what would stop you from achieving that? An fsync operation that occurs in the background doesn't block further transactions from completing. But it will slow down overall system io. For one an fsync() on linux will cause a queue drain on the io submit queue. For another it counts against the total available random io ops a device can do. Which in turn will cause slowdown for anything else doing syncronous random io. I.e. read(2). Meanwhile, getting the WAL records on disk faster allows us to set hint bits sooner, which is a significant win, as shown by the numbers I posted upthread. Oh, that part I dont doubt. Sorry for that. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility
On 11/30/2011 06:52 PM, Merlin Moncure wrote: On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen mikko.tiiho...@nitorcreations.com wrote: Hi, As discussed few days ago it would be beneficial if we could change the v3 backend-client protocol without breaking backwards compatibility. Here is a working patch that exports a supported_binary_minor constant and a binary_minor session variable and a that can be used by clients to enable newer features. I also added an example usage where the array encoding for constant size elements is optimized if binary_minor version is new enough. I have coded the client side support for binary_minor for jdbc driver and tested that it worked. But lets first discuss if this is an acceptable path forward. Regarding your TODO in the code comments about interactions with replication: I think it should be removed. WAL streaming depends on more things being identical than what is guaranteed here which is basically the protocol + data formats. OK. I'll remove the comments about replication. I think all references to 'protocol' should be removed; Binary wire formats != protocol: the protocol could bump to v4 but the wire formats (by happenstance) could all still continue to work -- therefore the version isn't minor (it's not dependent on protocol version but only on itself). Therefore, don't much like the name 'supported_binary_minor'. How about binary_format_version? I was thinking that it would be possible use the new minor version to introduce also new protocol messages such as streaming of large data into database without knowing it's size beforehand. Also, is a non granular approach really buying us anything here? ISTM *something* is likely to change format on most releases of the server so I'm wondering what's the point (if you don't happen to be on the same x.x release of the server, you might as well assume to not match or at least 'go on your own risk). The value added to the client version query is quite low. You have a very good point about changes in every new postgres version. Either text or the binary encoding is likely to change for some types in each new release. There needs to be decision on official policy about breaking backwards compatibility of postgresql clients. Is it: A) Every x.y postgres release is free to break any parts of the * protocol * text encoding * binary protocol as long as it is documented - all client libraries should be coded so that they refuse to support version x.y+1 or newer (they might have a option to override this but there are no guarantees that it would work) Good: no random bugs when using old client library Bad: initial complaints from users before they understand that this is the best option for everyone B) Every x.y postgres release must guarantee that no client visible parts of protocol, text encoding or binary encoding will change from previous release in the v3 protocol. If any changes are needed then a new protocol version must be created. - very high barrier for new features Good: can upgrade server without updating clients Bad: new features are only introduced very rarely after enough have accumulated Bad: many feature/change patches will rot while waiting for the upcoming new version C) There is effort to try avoiding incompatible changes. Some changes are blocked because it is detected that they can break backwards compatibility while others are let through (often with some compatibility option on server side to fall back to previous functionality (f.ex. bytea hex encoding). - As far as I understand this is the current situation. Good: has worked so far Bad: accumulates compatibility flags in the server D) My proposed compromise where there is one minor_version setting and code in server to support all different minor versions. The client requests the minor version so that all releases default to backwards compatible version. When there combinations starts to create too much maintenance overhead a new clean v4 version of protocol is specified. Good: Keeps full backwards compatibility Good: Allows introducing changes at any time Bad: Accumulates conditional code to server and clients until new version of protocol is released I'd like the official policy to be A, otherwise I'll push for D. -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronous commit vs. hint bits
On Thu, Dec 1, 2011 at 6:11 AM, Robert Haas robertmh...@gmail.com wrote: One possible downside of trying to kick off the fsync more quickly is that if there are a continuous stream of background fsyncs going on, a process that needs to do an XLogFlush in the foreground (i.e. a synchronous_commit=on transaction in the middle of many synchronous_commit=off transactions) might be more likely to find an fsync already in progress and therefore need to wait until it completes before starting the next one, slowing things down. Waiting until the other one completes is how it currently is implemented, but is it necessary from a correctness view? It seems like the WALWriteLock only needs to protect the write, and not the sync (assuming the sync method allows those to be separate actions), and that there could be multiple fsync requests from different processes pending at the same time without a correctness problem. After dropping the WALWriteLock and doing the fsync, it would then have to take the lock again or maybe just a spinlock to update the accounting for how far the log has been flushed. So rather than one committing process blocking on fsync and bunch of others blocking on WALWriteLock, you could have all of them blocking on different fsyncs and let the kernel deal with waking them up. I don't know at all whether this would actually be an improvement, assuming it would even be safe. Reading the xlog.c code, it is hard to tell which designs/features are there for safety and which ones are there for suspected performance reasons. Sorry for high-jacking your topic, it is just something I had been thinking about for a while. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64
On 11/27/2011 09:18 AM, NISHIYAMA Tomoaki wrote: Hi, +/* __MINGW64_VERSION_MAJOR is related to both 32/64 bit gcc compiles by + * mingw-w64, however it gots defined only after Why not use __MINGW32__, which is defined without including any headers? Because it's defined by other than mingw-w64 compilers. I see. That's because mingw (not -w64). Should it be ok if mingw is ok with that condition? This really breaks mingw gcc 4.6.1 :( it does not have crtdefs.h) If moving downward do not break MSVC, perhaps its the good way. Otherwise, we might check for the presence of crtdefs.h with configure? I have looked at this a bit. It's fairly ugly, and the only moderately clean way I see forward is a configure test to check for a mingw-w64 compiler, e.g. by running gcc -E -dM over a file which just includes stdio.h and checking for the definedness of __MINGW64__VERSION_MAJOR, or something similar. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Hi Dimitri, Hi all, On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: As proposed by Jan Wieck on hackers and discussed in Ottawa at the Clusters Hackers Meeting, the most (only?) workable way to ever have DDL triggers is to have command triggers instead. Rather than talking about catalogs and the like, it's all about firing a registered user-defined function before or after processing the utility command, or instead of processing it. This naturally involves a new catalog (pg_cmdtrigger) and some new subcommands of CREATE and DROP TRIGGER, and some support code for calling the function at the right time. Great ;) fwiw I think thats the way forward as well. The patch obviously isn't thought to be commitable yet, so I am not going to do a very detailed code level review. Attached is a patch with low level targeted comments and such I noticed while reading it. At this state the important things are highlevel so I will try to concentrate on those: == the trigger function == I don't like the set of options parsed to the trigger functions very much. To cite an example of yours those currently are: * cmd_string text * cmd_nodestring text * schemaname text * relnametext I think at least a * command_tagtext is missing. Sure, you can disambiguate it by creating a function for every command type but that seems pointlessly complex for many applications. And adding it should be trivial as its already tracked ;) Currently the examples refer to relname as relname which I dislike as that seems to be too restricted to one use case. The code is already doing it correctly (as objectname) though ;) Why is there explicit documentation of not checking the arguments of said trigger function? That seems to be a bit strange to me. schemaname currently is mightily unusable because whether its sent or not depends wether the table was created with a schemaname specified or not. That doesn't seem to be a good approach. That brings me to the next point: == normalization of statements == Currently the normalization is a bit confusing. E.g. for: CREATE SCHEMA barf; SET search_path = barf; CREATE TYPE bart AS ENUM ('a', 'b'); CREATE TABLE bar(a int, b bigint, c serial, d text, e varchar, F text, g bart, h int4); one gets CREATE TABLE bar (a pg_catalog.int4,b pg_catalog.int8,c serial,d text,e pg_catalog.varchar,F text,g bart,h int4); Which points out two problems: * inconsistent schema prefixing * no quoting Imo the output should fully schema qualify anything including operators. Yes, thats rather ugly but anything else seems to be too likely to lead to problems. As an alternative it would be possible to pass the current search path but that doesn't seem to the best approach to me; Currently CHECK() constraints are not decodable, but inside those the same quoting/qualifying rules should apply. Then there is nice stuff like CREATE TABLE (LIKE ...); What shall that return in future? I vote for showing it as the plain CREATE TABLE where all columns are specified. That touches another related topic. Dim's work in adding support for utility cmd support in nodeToString and friends possibly will make the code somewhat command trigger specific. Is there any other usage we envision? == grammar == * multiple actions: I think it would sensible to allow multiple actions on which to trigger to be specified just as you can for normal triggers. I also would like an ALL option * options: Currently the grammar allows options to be passed to command triggers. Do we want to keep that? If yes, with the same arcane set of datatypes as in data triggers? If yes it needs to be wired up. * schema qualification: An option to add triggers only to a specific schema would be rather neat for many scenarios. I am not totally sure if its worth the hassle of defining what happens in the edge cases of e.g. moving from one into another schema though. == locking == Currently there is no locking strategy at all. Which e.g. means that there is no protection against two sessions changing stuff concurrently or such. Was that just left out - which is ok - or did you miss that? I think it would be ok to just always grab row level locks there. == permissions == Command triggers should only be allowed for the database owner. == recursion == I contrast to data triggers command triggers cannot change what is done. They can either prevent it from running or not. Which imo is fine. The question I have is whether it would be a good idea to make it easy to prevent recursion Especially if the triggers wont be per schema. == calling the trigger == Imo the current callsite in ProcessUtility isn't that good. I think it would make more sense moving it into standard_ProcessUtility. It should be *after* the check_xact_readonly check in there in my opinion. I am also don't think its a good idea to run the ExecBeforeOrInsteadOfCommandTriggers stuff there because thats allso
Re: [HACKERS] Inlining comparators as a performance optimisation
Attached is revision of my patch with some clean-ups. In particular, I'm now using switch statements for greater readability, plus supporting fast path sorting of the time datatype. I've also updated the documentation on Date/Time Types to note the additional disadvantage of using the deprecated store timestamp + friends as double precision floating-point numbers compile time option. There is one aspect to this optimisation that I haven't touched on, which is the effect on memory consumption. I think that much of the value that this patch will deliver will come from being able to release sort memory earlier. Consider that the substantial improvements in raw sorting speed (far more substantial than the improvements in query runtime) will sometimes result in a concomitant reduction in the time that the executor holds onto memory allocated for sorting. Maybe the effect will only be really noticeable for plans with a sort node as their root node, but that isn't exactly a rare occurrence, particularly among large, expensive sorts. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services From cffad91578edd698e89ef2bf7384d82aee26b57e Mon Sep 17 00:00:00 2001 From: peter2ndQuadrant pe...@2ndquadrant.com Date: Sun, 20 Nov 2011 20:36:25 + Subject: [PATCH] Initial commit of optimization Stop directly using oid Added int8 quicksort fast path specialisation, which can also be used in place of F_TIMESTAMP_CMP for HAVE_INT64_TIMESTAMP builds. Rebased, revised patch for -hackers, with timestamp and int8 fast path sorting using the same int8 specialization. Remove unneeded line Rebase for -hackers Descriminate against cases where nKeys 1 when selecting sort specialisation. We now support fast path sorting with multiple scanKeys. We now inline for one scanKey, but do not inline our comparetup_heap replacement when multiple scanKeys are used (although this is at the compiler's discretion). However, when multiple scanKeys are used, we still use a specialisation that will elide the SQL-function-call machinery for the first scanKey, which is almost as good for most cases. Further formatting clean-ups Move specialization selection logic to tuplesort_begin_heap(), per suggestion of Robert Haas Add simple switch for testing optimization. Improvements to template qsort_arg comments. Support fast path sorting of dates. Added float fast path sorting Patch mimimization. Additional clean-up. Indentation issue. Specialization clean-ups Use switch statement for sort specialization selection Further comment clean-up. Additional clean-up Make comparators consistent Added date fast path support, documentation updates Fix template_qsort_arg.h assertion Assert nScanKeys == 1 for inlining case --- doc/src/sgml/datatype.sgml |6 + src/backend/utils/sort/tuplesort.c | 223 - src/include/utils/template_qsort_arg.h | 288 src/port/qsort.c |3 +- 4 files changed, 512 insertions(+), 8 deletions(-) create mode 100644 src/include/utils/template_qsort_arg.h diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index fe59a1c..f5dfbbb 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1619,6 +1619,12 @@ SELECT E'\\xDEADBEEF'; floating-point case, large typeinterval/type values degrade in precision as the size of the interval increases. /para + para +Note that if the server is built to represent these types as +floating point numbers, that will prevent it from using fast path +sort specializations for the types, and sorting columns of those +types will be noticeably slower. + /para /note para diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 3505236..0a14a90 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -107,11 +107,13 @@ #include miscadmin.h #include pg_trace.h #include utils/datum.h +#include utils/fmgroids.h #include utils/logtape.h #include utils/lsyscache.h #include utils/memutils.h #include utils/pg_rusage.h #include utils/rel.h +#include utils/template_qsort_arg.h #include utils/tuplesort.h @@ -226,6 +228,13 @@ struct Tuplesortstate Tuplesortstate *state); /* + * This specialization function pointer is sometimes used as an alternative + * to the standard qsort_arg, when it has been determined that we can + * benefit from various per-type performance optimizations. + */ + void (*qsort_arg_spec)(void *a, size_t n, size_t es, void *arg); + + /* * Function to copy a supplied input tuple into palloc'd space and set up * its SortTuple representation (ie, set tuple/datum1/isnull1). Also, * state-availMem must be decreased by the amount of space used for the @@ -457,6 +466,11 @@ static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple
Re: [HACKERS] synchronous commit vs. hint bits
On Thu, Dec 1, 2011 at 9:58 AM, Jeff Janes jeff.ja...@gmail.com wrote: Waiting until the other one completes is how it currently is implemented, but is it necessary from a correctness view? It seems like the WALWriteLock only needs to protect the write, and not the sync (assuming the sync method allows those to be separate actions), and that there could be multiple fsync requests from different processes pending at the same time without a correctness problem. I've wondered about that, too. At least on Linux, the overhead of a system call seems to be pretty low - e.g. the ridiculous number of lseek calls we do on a pgbench -S doesn't seem create much overhead until the inode mutex starts to become contended; and that problem should be fixed in Linux 3.2. But I'm not sure if system calls are similarly cheap on all platforms, or even if it's true on Linux for fsync() in particular. There's another possible approach here, too: instead of waiting to set hint bits until the commit record hits the disk, we could allow the hint bits to set immediately on the condition that we don't write it out until the commit record hits the disk. Bumping the page LSN would do that, but I think that might be problematic since setting hint bits isn't WAL-logged. If so, we could possibly fix that by storing a second LSN for the page out of line, e.g. in the buffer descriptor. That might be even faster than speeding up the WAL flush. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add minor version to v3 protocol to allow changes without breaking backwards compatibility
On Thu, Dec 1, 2011 at 8:42 AM, Mikko Tiihonen mikko.tiiho...@nitorcreations.com wrote: On 11/30/2011 06:52 PM, Merlin Moncure wrote: On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen mikko.tiiho...@nitorcreations.com wrote: Hi, As discussed few days ago it would be beneficial if we could change the v3 backend-client protocol without breaking backwards compatibility. Here is a working patch that exports a supported_binary_minor constant and a binary_minor session variable and a that can be used by clients to enable newer features. I also added an example usage where the array encoding for constant size elements is optimized if binary_minor version is new enough. I have coded the client side support for binary_minor for jdbc driver and tested that it worked. But lets first discuss if this is an acceptable path forward. Regarding your TODO in the code comments about interactions with replication: I think it should be removed. WAL streaming depends on more things being identical than what is guaranteed here which is basically the protocol + data formats. OK. I'll remove the comments about replication. I think all references to 'protocol' should be removed; Binary wire formats != protocol: the protocol could bump to v4 but the wire formats (by happenstance) could all still continue to work -- therefore the version isn't minor (it's not dependent on protocol version but only on itself). Therefore, don't much like the name 'supported_binary_minor'. How about binary_format_version? I was thinking that it would be possible use the new minor version to introduce also new protocol messages such as streaming of large data into database without knowing it's size beforehand. Also, is a non granular approach really buying us anything here? ISTM *something* is likely to change format on most releases of the server so I'm wondering what's the point (if you don't happen to be on the same x.x release of the server, you might as well assume to not match or at least 'go on your own risk). The value added to the client version query is quite low. You have a very good point about changes in every new postgres version. Either text or the binary encoding is likely to change for some types in each new release. There needs to be decision on official policy about breaking backwards compatibility of postgresql clients. Is it: A) Every x.y postgres release is free to break any parts of the * protocol * text encoding * binary protocol as long as it is documented - all client libraries should be coded so that they refuse to support version x.y+1 or newer (they might have a option to override this but there are no guarantees that it would work) Good: no random bugs when using old client library Bad: initial complaints from users before they understand that this is the best option for everyone B) Every x.y postgres release must guarantee that no client visible parts of protocol, text encoding or binary encoding will change from previous release in the v3 protocol. If any changes are needed then a new protocol version must be created. - very high barrier for new features Good: can upgrade server without updating clients Bad: new features are only introduced very rarely after enough have accumulated Bad: many feature/change patches will rot while waiting for the upcoming new version C) There is effort to try avoiding incompatible changes. Some changes are blocked because it is detected that they can break backwards compatibility while others are let through (often with some compatibility option on server side to fall back to previous functionality (f.ex. bytea hex encoding). - As far as I understand this is the current situation. Good: has worked so far Bad: accumulates compatibility flags in the server D) My proposed compromise where there is one minor_version setting and code in server to support all different minor versions. The client requests the minor version so that all releases default to backwards compatible version. When there combinations starts to create too much maintenance overhead a new clean v4 version of protocol is specified. Good: Keeps full backwards compatibility Good: Allows introducing changes at any time Bad: Accumulates conditional code to server and clients until new version of protocol is released I'd like the official policy to be A, otherwise I'll push for D. In issue A, you mentioned that client libraries should refuse to support version x+1 or newer. If by client libraries, you mean 'libpq', then this is absolutely not going to fly -- libpq has no such restriction now and adding it isn't doing users any favors IMO. The problem is not libpq/jdbc etc, but application code. I'll say again, wire format compatibility issues are fundamentally different from the protocol issues; decades of user application
Re: [HACKERS] Inlining comparators as a performance optimisation
On Thu, Dec 1, 2011 at 11:44 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: Attached is revision of my patch with some clean-ups. In particular, I'm now using switch statements for greater readability, plus supporting fast path sorting of the time datatype. I've also updated the documentation on Date/Time Types to note the additional disadvantage of using the deprecated store timestamp + friends as double precision floating-point numbers compile time option. One thing I'm starting to get a bit concerned about is the effect of this patch on the size of the resulting binary. The performance results you've posted are getting steadily more impressive as you get into this, which is cool, but it seems like the number of copies of the qsort logic that you're proposing to include in the resulting binary is also steadily climbing. On my system, a stripped postgres binary built with my usual compile options (except --enable-cassert, which I took out) is 49336 bytes bigger with this patch applied, an increase of close to 1%. We might need to be a little bit choosy about this, because I don't think that we want to end up with a situation where some noticeable percentage of the final binary consists of differently-inlined versions of the quicksort algorithm - especially because there may be other places where we want to do similar kinds of inlining. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
On Thu, Dec 1, 2011 at 9:12 AM, karave...@mail.bg wrote: I do not think that adding index support to a datatype classifies as semantic change that will break backward compatibility. Me neither. The ip4r type also supports ranges that aren't on CIDR-block boundaries, which probably isn't something that makes sense to incorporate into cidr. But not everyone needs that, and some people might also need support for ipv6 CIDR blocks, which ip4r doesn't support. So I don't necessarily see the existence of ip4r as a reason why cidr shouldn't have better indexing support. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FlexLocks
On Wed, Nov 30, 2011 at 7:01 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: OK. There are a few things I found in this pass which missed in the last. One contrib module was missed, I found another typo in a comment, and I think we can reduce the include files a bit. Rather than describe it, I'm attaching a patch file over the top of your patches with what I think might be a good idea. This time with it really attached. Thanks, I've merged those in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Hi, On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, and my proposal would be to add specific call sites to the functions I've provided in the attached patch rather than try to contort them into being a good citizen as a utility command. I would very much prefer to make them utility commands and rip out the executor support for that. It doesn't look that complicated and would allow us to get rid of the partially duplicated defineRelation and related stuff from the executor where its violating my aestetic feelings ;) Is there something making that especially hard that I overlooked? The infrastructure for doing so seems to be there. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Andres Freund and...@anarazel.de writes: On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, and my proposal would be to add specific call sites to the functions I've provided in the attached patch rather than try to contort them into being a good citizen as a utility command. I would very much prefer to make them utility commands and rip out the executor support for that. Is there something making that especially hard that I overlooked? The infrastructure for doing so seems to be there. Well, I think the main problem is going to be shunting the query down the right parsing track (SELECT versus utility-statement) early enough. Making this work cleanly would be a bigger deal than I think you're thinking. But yeah, it would be nicer if the executor didn't have to worry about this. +1 if you're willing to take on the work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgresql 9.1 replication failing
All,I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping. I had some hardware issues on the replica system and now I am getting the following in my pg_log/* files. Same2 lines over and over since yesterday.2011-12-01 07:46:30 EST LOG: restored log file "0001028E00E5" from archive2011-12-01 07:46:30 EST LOG: incorrect resource manager data checksum in record at 28E/E555E1B8Anything I can do on the replica or do I have to start over?Finally, I know this is not the correct list, I tried general with no answer.ThanksJim___Jim Buttafuocoj...@contacttelecom.com603-647-7170 ext. - Office603-490-3409 - Celljimbuttafuoco - Skype
Re: [HACKERS] Postgresql 9.1 replication failing
On Thu, Dec 1, 2011 at 1:41 PM, Jim Buttafuoco j...@contacttelecom.com wrote: 2011-12-01 07:46:30 EST LOG: restored log file 0001028E00E5 from archive 2011-12-01 07:46:30 EST LOG: incorrect resource manager data checksum in record at 28E/E555E1B8 Anything I can do on the replica or do I have to start over? I think you want to rebuild the standby. Even if you could repair the damaged WAL record, how can you have any confidence that there is no other corruption? Note that rsync has some options to only copy the changed data, which might greatly accelerated resyncing the standby from the master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql 9.1 replication failing
Jim Buttafuoco j...@contacttelecom.com writes: All, I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping. I had some hardware issues on the replica system and now I am getting the following in my pg_log/* files. Same 2 lines over and over since yesterday. 2011-12-01 07:46:30 EST LOG: restored log file 0001028E00E5 from archive 2011-12-01 07:46:30 EST LOG: incorrect resource manager data checksum in record at 28E/E555E1B8 Anything I can do on the replica or do I have to start over? INspect that WAL segment or possibly the one immediatly following it in comparison to another copy if you still have it on the master or a central WAL repository. A standby crashing meanwhile copying in a WAL segment and/or synching one to disk could result in ramdon corruption. If you have another copy of the segment and does not compare equal to the one your standby is trying to read, try another copy. Finally, I know this is not the correct list, I tried general with no answer. The admin list is the right one for such a post probably. HTH Thanks Jim ___ [cid] Jim Buttafuoco j...@contacttelecom.com 603-647-7170 ext. - Office 603-490-3409 - Cell jimbuttafuoco - Skype -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 305.321.1144 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql 9.1 replication failing
the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?On Dec 1, 2011, at 2:02 PM, Jerry Sievers wrote:Jim Buttafuoco j...@contacttelecom.com writes:All,I have a large PG 9.1.1 server (over 1TB of data) and replica using log shipping. I had some hardware issues on thereplica system and now I am getting the following in my pg_log/* files. Same 2 lines over and over since yesterday.2011-12-01 07:46:30 EST LOG: restored log file "0001028E00E5" from archive2011-12-01 07:46:30 EST LOG: incorrect resource manager data checksum in record at 28E/E555E1B8Anything I can do on the replica or do I have to start over?INspect that WAL segment or possibly the one immediatly following itin comparison to another copy if you still have it on the master or acentral WAL repository.A standby crashing meanwhile copying in a WAL segment and/or synchingone to disk could result in ramdon corruption.If you have another copy of the segment and does not compare equal tothe one your standby is trying to read, try another copy.Finally, I know this is not the correct list, I tried general with no answer.The admin list is the right one for such a post probably.HTHThanksJim___[cid]Jim Buttafuocoj...@contacttelecom.com603-647-7170 ext. - Office603-490-3409 - Celljimbuttafuoco - Skype-- Jerry SieversPostgres DBA/Development Consultinge: postgres.consult...@comcast.netp: 305.321.1144___Jim Buttafuocoj...@contacttelecom.com603-647-7170 ext. - Office603-490-3409 - Celljimbuttafuoco - Skype
Re: [HACKERS] Command Triggers
On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Tuesday, November 08, 2011 06:47:13 PM Dimitri Fontaine wrote: exception to that rule would be SELECT INTO and CREATE TABLE AS SELECT, and my proposal would be to add specific call sites to the functions I've provided in the attached patch rather than try to contort them into being a good citizen as a utility command. I would very much prefer to make them utility commands and rip out the executor support for that. Is there something making that especially hard that I overlooked? The infrastructure for doing so seems to be there. Well, I think the main problem is going to be shunting the query down the right parsing track (SELECT versus utility-statement) early enough. Making this work cleanly would be a bigger deal than I think you're thinking. Yes. I forgot how ugly SELECT INTO is... But yeah, it would be nicer if the executor didn't have to worry about this. +1 if you're willing to take on the work. I won't promise anything but I will play around with the grammar and see if I find a nice enough way. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: Well, I think the main problem is going to be shunting the query down the right parsing track (SELECT versus utility-statement) early enough. Making this work cleanly would be a bigger deal than I think you're thinking. Obviously that depends on the definition of clean... Changing the grammar to make that explicit seems to involve a bit too many changes on a first glance. The cheap way out seems to be to make the decision in analyze.c:transformQuery. Would that be an acceptable way forward? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Sat, Nov 26, 2011 at 7:20 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: FWIW (scheduling mainly), I don't think I'm going to spend more time on this work until I get some comments, because all that remains to be done is about building on top of what I've already been doing here. +1 overall One of the main use cases for me is the ability to * prevent all commands * prevent extension commands - to maintain stable execution environment Those are especially important because in 9.2 DDL commands will cause additional locking overheads, so preventing DDL will be essential to keeping performance stable in high txn rate databases. So I'd like to see a few more triggers that work out of the box for those cases (perhaps wrapped by a function?). It would also allow a more useful man page example of how to use this. On the patch, header comment for cmdtrigger.c needs updating. Comments for DropCmdTrigger etc look like you forgot to update them after cut and paste. Comments say For any given command tag, you can have either Before and After triggers, or Instead Of triggers, not both., which seems like a great thing to put in the docs. Looks good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql 9.1 replication failing
On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco j...@contacttelecom.comwrote: the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG move on? Regenerate the master. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Postgresql 9.1 replication failing
On Thu, Dec 1, 2011 at 9:08 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco j...@contacttelecom.com wrote: the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG move on? Regenerate the master. typo: regenerate *from* the master -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TupleDescInitEntry failing to initialize varlen members
Hi, I just noticed in gdb that TupleDescInitEntry does not initialize attacl, attoptions, attfdwoptions. This is probably not very serious (otherwise we'd have bugs about it), but it is noticeable in tupdescs constructed by the executor, at least ExecTypeFromTL: (gdb) print *tupDesc-attrs[2] $6 = {attrelid = 0, attname = {data = c, '\000' repeats 62 times}, atttypid = 1114, attstattarget = -1, attlen = 8, attnum = 3, attphysnum = 3, attlognum = 3, attndims = 0, attcacheoff = -1, atttypmod = -1, attbyval = 1 '\001', attstorage = 112 'p', attalign = 100 'd', attnotnull = 0 '\000', atthasdef = 0 '\000', attisdropped = 0 '\000', attislocal = 1 '\001', attinhcount = 0, attcollation = 0, attacl = {2139062142}, attoptions = {{ vl_len_ = \177\177\177\177, vl_dat = \177}}, attfdwoptions = {{ vl_len_ = \177\177\177\177, vl_dat = \177}}} Does anybody think this is worth fixing? -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql 9.1 replication failing
Simon,What do you mean, start over with a base backup?JimOn Dec 1, 2011, at 4:08 PM, Simon Riggs wrote:On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco j...@contacttelecom.com wrote: the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG "move" on?Regenerate the master. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services ___Jim Buttafuocoj...@contacttelecom.com603-647-7170 ext. - Office603-490-3409 - Celljimbuttafuoco - Skype
Re: [HACKERS] patch for type privileges
On tor, 2011-12-01 at 14:37 +0100, Yeb Havinga wrote: On 2011-11-29 18:47, Peter Eisentraut wrote: On tis, 2011-11-29 at 07:07 +0200, Peter Eisentraut wrote: On mån, 2011-11-28 at 11:41 +0100, Yeb Havinga wrote: On 2011-11-15 21:50, Peter Eisentraut wrote: Patch attached. I cannot get the patch to apply, this is the output of patch -p1 --dry-run on HEAD. I need to remerge it against concurrent range type activity. New patch attached. I'm looking at your patch. One thing that puzzled me for a while was that I could not restrict access to base types (either built-in or user defined). Is this intentional? Works for me: =# create user foo; =# revoke usage on type int8 from public; =# \c - foo = create table test1 (a int4, b int8); ERROR: permission denied for type bigint -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for type privileges
On mån, 2011-11-28 at 14:25 -0600, Merlin Moncure wrote: On Tue, Nov 15, 2011 at 2:23 PM, Peter Eisentraut pete...@gmx.net wrote: The basics here are mainly informed by the SQL standard. One thing from there I did not implement is checking for permission of a type used in CAST (foo AS type). This would be doable but relatively complicated, and in practice someone how is not supposed to be able to use the type wouldn't be able to create the cast or the underlying cast function anyway for lack of access to the type. I'm not quite following that: with your patch are you or are you not prohibited from utilizing casts? In other words, if you didn't have USAGE priv, what would happen if you tried this: CREATE VIEW v AS SELECT null::restricted_type::text; ? This is not affected by my patch, so it would do whatever it did before. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TupleDescInitEntry failing to initialize varlen members
Alvaro Herrera alvhe...@alvh.no-ip.org writes: I just noticed in gdb that TupleDescInitEntry does not initialize attacl, attoptions, attfdwoptions. Indeed not, because it's not building a tuple. It's building an array of C structs, and there's nothing useful to do with those fields. (Robert's proposal to not even have such fields be C-visible might make you feel better.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] to_date() marked stable?
Are there people using to_date in indexes or partition functions where changing it to immutable would be useful? I do, but I also create a shell function which sets timezone etc. locally so that to_date is *really* immutable. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql 9.1 replication failing
Hello Jim, I think you not have other possibilities if the archives are corrupted and there are no possibilities to restore it, you need to recreate the standby starting from a base backup. Kind Regards 2011/12/1 Jim Buttafuoco j...@contacttelecom.com Simon, What do you mean, start over with a base backup? Jim On Dec 1, 2011, at 4:08 PM, Simon Riggs wrote: On Thu, Dec 1, 2011 at 7:09 PM, Jim Buttafuoco j...@contacttelecom.comwrote: the WAL file on the master is long gone, how would one inspect the web segment? Any way to have PG move on? Regenerate the master. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services ___ Jim Buttafuoco j...@contacttelecom.com 603-647-7170 ext. - Office 603-490-3409 - Cell jimbuttafuoco - Skype
Re: [HACKERS] backup_label during crash recovery: do we know how to solve it?
On Tue, Nov 29, 2011 at 9:10 PM, Daniel Farina dan...@heroku.com wrote: Reviving a thread that has hit its second birthday: http://archives.postgresql.org/pgsql-hackers/2009-11/msg00024.php In our case not being able to restart Postgres when it has been taken down in the middle of a base backup is starting to manifest as a serious source of downtime: basically, any backend crash or machine restart will cause postgres not to start without human intervention. The message delivered is sufficiently scary and indirect enough (because of the plausible scenarios that could cause corruption if postgres were to make a decision automatically in the most general case) that it's not all that attractive to train a general operator rotation to assess what to do, as it involves reading and then, effectively, ignoring some awfully scary error messages and removing the backup label file. Even if the error messages weren't scary (itself a problem if one comes to the wrong conclusion as a result), the time spent digging around under short notice to confirm what's going on is a high pole in the tent for improving uptime for us, taking an extra five to ten minutes per common encounter. Our problem is compounded by having a lot of databases that take base backups at attenuated rates in an unattended way, and therefore a human who may have been woken up from a sound sleep will have to figure out what was going on before they've reached consciousness, rather than a person with prior knowledge of having started a backup. Also, fairly unremarkable databases can take so long to back up that they may well have a greater than 20% chance of encountering this problem at any particular time: 20% of a day is less than 5 hours per day taken to do on-line backups. Basically, we -- and anyone else with unattended physical backup schemes -- are punished rather severely by the current design. This issue has some more recent related incarnations, even if for different reasons: http://archives.postgresql.org/pgsql-hackers/2011-01/msg00764.php Because backup_label coming or going? confusion in Postgres can have serious consequences, I wanted to post to the list first to solicit a minimal design to solve this problem. If it's fairly small in its mechanics then it may yet be feasible for the January CF. In some ways, I feel like this problem is unsolvable by definition. If a backup is designed to be an exact copy of the data directory taken between pg_start_backup() and pg_stop_backup(), then by definition you can't distinguish between the original and the copy. That's what a copy *is*. Now, we could fix this by requiring an additional step when creating a backup. For example, we could create backup_label.not_really on the master and require the person taking the backup to rename it to backup_label on the slave before starting the postmaster. This could be an optional behavior, to preserve backward compatibility. Now the slave *isn't* an exact copy of the master, so PostgreSQL can distinguish. But it seems that this could also be worked around outside the database. We don't have built-in clusterware, so there must be something in the external environment that knows which server is supposed to be the master and which is supposed to be the standby. So, if you're on the master, remove the backup label file before starting the postmaster. If you're on the standby, don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
* Robert Haas (robertmh...@gmail.com) wrote: Me neither. The ip4r type also supports ranges that aren't on CIDR-block boundaries, which probably isn't something that makes sense to incorporate into cidr. But not everyone needs that, and some people might also need support for ipv6 CIDR blocks, which ip4r doesn't support. So I don't necessarily see the existence of ip4r as a reason why cidr shouldn't have better indexing support. Seems I wasn't clear. The semantic changes were why ip4r was *created* (instead of just using cidr..). The fact that it's got index support is independent from that (though, in my view, shows that people who actually care about this data type use ip4r and don't use cidr, or we'd hear much more complaining..). I don't have any particular care about if cidr has indexing support or not. I'm certainly not *against* it, except insofar as it encourages use of a data type that really could probably be better (by being more like ip4r..). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Inlining comparators as a performance optimisation
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 1, 2011 at 11:44 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: Attached is revision of my patch with some clean-ups. One thing I'm starting to get a bit concerned about is the effect of this patch on the size of the resulting binary. Regardless of that, I'm still of the opinion that this patch is fundamentally misdesigned. The way it's set up, it is only possible to have a fast path for types that are hard-wired into tuplesort.c, and that flies in the face of twenty years' worth of effort to make Postgres an extensible system. I really don't care about performance measurements: this is not an acceptable design, period. What I want to see is some mechanism that pushes this out to the individual datatypes, so that both core and plug-in datatypes have access to the benefits. There are a number of ways that could be attacked, but the most obvious one is to invent additional, optional support function(s) for btree opclasses. I also still think that we should pursue the idea of a lower-overhead API for comparison functions. It may be that it's worth the code bulk to have an inlined copy of qsort for a small number of high-usage datatypes, but I think there are going to be a lot for which we aren't going to want to pay that price, and yet we could get some benefit from cutting the call overhead. A lower-overhead API would also save cycles in usage of the comparison functions from btree itself (remember that?). I think in particular that the proposed approach to multiple sort keys is bogus and should be replaced by just using lower-overhead comparators. The gain per added code byte in doing it as proposed has got to be too low to be reasonable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On 1 December 2011 17:15, Robert Haas robertmh...@gmail.com wrote: One thing I'm starting to get a bit concerned about is the effect of this patch on the size of the resulting binary. The performance results you've posted are getting steadily more impressive as you get into this, which is cool, but it seems like the number of copies of the qsort logic that you're proposing to include in the resulting binary is also steadily climbing. On my system, a stripped postgres binary built with my usual compile options (except --enable-cassert, which I took out) is 49336 bytes bigger with this patch applied, an increase of close to 1%. Do your usual compile options include debug symbols? I've been using standard compile options for development of this patch, for obvious reasons. I get 36690 bytes (just under 36 KiB, or a 0.644% increase). Binary bloat is a legitimate concern. I thought that I was conservative in my choice of specialisations though. We might need to be a little bit choosy about this, because I don't think that we want to end up with a situation where some noticeable percentage of the final binary consists of differently-inlined versions of the quicksort algorithm - especially because there may be other places where we want to do similar kinds of inlining. Thoughts? A simple caveat in a comment along the lines of this mechanism instantiates 2 copies of qsort_arg per type, please use judiciously sounds like the right balance. It could also be possible to be penny wise and pound foolish here though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Thu, Dec 1, 2011 at 8:29 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Do your usual compile options include debug symbols? I've been using standard compile options for development of this patch, for obvious reasons. I get 36690 bytes (just under 36 KiB, or a 0.644% increase). They do, but I ran strip on the resulting binary before taking those measurements, so I thought that would undo it... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Andres Freund and...@anarazel.de writes: On Thursday, December 01, 2011 07:21:25 PM Tom Lane wrote: Making this work cleanly would be a bigger deal than I think you're thinking. Obviously that depends on the definition of clean... Changing the grammar to make that explicit seems to involve a bit too many changes on a first glance. The cheap way out seems to be to make the decision in analyze.c:transformQuery. Would that be an acceptable way forward? ITYM transformStmt, but yeah, somewhere around there is probably reasonable. The way I'm imagining this would work is that IntoClause disappears from Query altogether: analyze.c would build a utility statement CreateTableAs, pull the IntoClause out of the SelectStmt structure and put it into the utility statement, and then proceed much as we do for ExplainStmt (and for the same reasons). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Thu, Dec 1, 2011 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: What I want to see is some mechanism that pushes this out to the individual datatypes, so that both core and plug-in datatypes have access to the benefits. There are a number of ways that could be attacked, but the most obvious one is to invent additional, optional support function(s) for btree opclasses. I feel like what's missing here is a way for the catalogs to refer to a function that doesn't take FunctionCallInfo as an argument. But it strikes me that it wouldn't be very hard to design such a mechanism. It seems like all we really need to do is decide on a nomenclature. The simplest possible approach would probably be to just refer to functions by name. In other words, we add a text or name column to pg_amproc, and then inside the backend there's a function whose job it is to map the string to a function address. However, that would have the annoying effect of causing catalog bloat, so I'm inclined to instead propose an 8-byte integer. (A 4-byte integer is enough, but would increase the risk of collisions between values that might be chosen by third party extensions.) So, the API to this would look something like this: void register_arbitrary_function_pointer(int64 handle, void *funcptr); void *get_arbitrary_function_pointer(int64 handle); So the idea is that if you need to refer to an arbitrary function in a system catalog (such as pg_amproc), you generate a random non-zero 64-bit integer, stuff it into the catalog, and then register the same 64-bit integer with the appropriate function pointer. To avoid increasing backend startup time, we could have a Perl script generate a prefab hash table for the built-in functions; loadable modules would add their entries at PG_init() time using register_arbitrary_function_pointer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
On Thu, Dec 1, 2011 at 7:56 PM, Stephen Frost sfr...@snowman.net wrote: I don't have any particular care about if cidr has indexing support or not. I'm certainly not *against* it, except insofar as it encourages use of a data type that really could probably be better (by being more like ip4r..). Not that you're biased or anything! :-p -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 1, 2011 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: What I want to see is some mechanism that pushes this out to the individual datatypes, so that both core and plug-in datatypes have access to the benefits. There are a number of ways that could be attacked, but the most obvious one is to invent additional, optional support function(s) for btree opclasses. I feel like what's missing here is a way for the catalogs to refer to a function that doesn't take FunctionCallInfo as an argument. But it strikes me that it wouldn't be very hard to design such a mechanism. IMO, entries in pg_proc ought to refer to functions that are callable by the standard interface: breaking that basic contract is not going to lead anywhere pleasant. Nor do I really want yet more columns in pg_proc. Nor does the register a pointer scheme you suggest make any sense to me: you still need to connect these things to catalog entries, pg_opclass entries in particular, and the intermediate handle adds nothing to the situation except for creating a risk of collisions. The scheme that was rolling around in my mind was about like this: * Define btree opclasses to have an optional support function, amprocnum 2, that has a SQL signature of func(internal) returns void. * The actual argument represented by the internal parameter would be a pointer to a struct which is to be filled in by the support function. We'd call the support function once, during tuplesort setup or btree index relcache entry setup, and save the struct somewhere. * The struct contents would be pointers to functions that have a non-FunctionCallInfo interface. We know we need at least two: a full qsort replacement, and a non-FCI comparator. We might want more in future, if someone convinces us that additional specializations of sorting are worth the trouble. So I imagine a struct like this: typedef struct SortSupportFunctions { void(*inline_qsort) (Datum *elements, int nelements); int (*comparator) (Datum a, Datum b); } SortSupportFunctions; with the agreement that the caller must zero out the struct before call, and then the btree support function sets the function pointers for any specializations it's going to offer. If we later add a third or fourth function pointer, datatypes that know about that could fill in those pointers, but datatypes that haven't been updated aren't broken. One thing I'm not too certain about is whether to define the APIs just as above, or to support a passthrough argument of some sort (and if so, what does it reference)? Possibly any datatype that we'd actually care about this for is going to be simple enough to not need any state data. Or possibly not. And what about collations? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why so few built-in range types?
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 1, 2011 at 7:56 PM, Stephen Frost sfr...@snowman.net wrote: I don't have any particular care about if cidr has indexing support or not. I'm certainly not *against* it, except insofar as it encourages use of a data type that really could probably be better (by being more like ip4r..). Not that you're biased or anything! :-p IIRC, a lot of the basic behavior of the inet/cidr types was designed by Paul Vixie (though he's not to blame for their I/O presentation). So I'm inclined to doubt that they're as broken as Stephen claims. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Thu, Dec 1, 2011 at 9:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: IMO, entries in pg_proc ought to refer to functions that are callable by the standard interface: breaking that basic contract is not going to lead anywhere pleasant. Nor do I really want yet more columns in pg_proc. I wasn't proposing to create pg_proc entries for this. Nor does the register a pointer scheme you suggest make any sense to me: you still need to connect these things to catalog entries, pg_opclass entries in particular, and the intermediate handle adds nothing to the situation except for creating a risk of collisions. I think you might be misinterpreting what I had in mind. Right now, pg_amproc entries have an amproc column that points to a pg_proc entry that in turn points to a function that takes FunctionCallInfoData as an argument. What I'm proposing to do is add an additional column to that catalog that points more or less directly to a non-SQL-callable function, but it can't actually just be the address of the function because that's not stable. So what I'm proposing is that we interpose the thinnest possible shim layer between the catalog and a function pointer, and an int64 - function pointer mapping seemed to me like something that would fit the bill. The scheme that was rolling around in my mind was about like this: * Define btree opclasses to have an optional support function, amprocnum 2, that has a SQL signature of func(internal) returns void. * The actual argument represented by the internal parameter would be a pointer to a struct which is to be filled in by the support function. We'd call the support function once, during tuplesort setup or btree index relcache entry setup, and save the struct somewhere. * The struct contents would be pointers to functions that have a non-FunctionCallInfo interface. We know we need at least two: a full qsort replacement, and a non-FCI comparator. We might want more in future, if someone convinces us that additional specializations of sorting are worth the trouble. So I imagine a struct like this: typedef struct SortSupportFunctions { void (*inline_qsort) (Datum *elements, int nelements); int (*comparator) (Datum a, Datum b); } SortSupportFunctions; with the agreement that the caller must zero out the struct before call, and then the btree support function sets the function pointers for any specializations it's going to offer. If we later add a third or fourth function pointer, datatypes that know about that could fill in those pointers, but datatypes that haven't been updated aren't broken. One thing I'm not too certain about is whether to define the APIs just as above, or to support a passthrough argument of some sort (and if so, what does it reference)? Possibly any datatype that we'd actually care about this for is going to be simple enough to not need any state data. Or possibly not. And what about collations? Maybe there should be a comparator_setup function that gets the collation OID and returns void *, and then that void * value gets passed as a third argument to each call to the comparator function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 1, 2011 at 9:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nor does the register a pointer scheme you suggest make any sense to me: you still need to connect these things to catalog entries, pg_opclass entries in particular, and the intermediate handle adds nothing to the situation except for creating a risk of collisions. I think you might be misinterpreting what I had in mind. Right now, pg_amproc entries have an amproc column that points to a pg_proc entry that in turn points to a function that takes FunctionCallInfoData as an argument. What I'm proposing to do is add an additional column to that catalog that points more or less directly to a non-SQL-callable function, but it can't actually just be the address of the function because that's not stable. So what I'm proposing is that we interpose the thinnest possible shim layer between the catalog and a function pointer, and an int64 - function pointer mapping seemed to me like something that would fit the bill. But you still need a lot of mechanism to do that mapping, including an initialization function that has noplace to be executed (unless every .so that defines a btree opclass has to be listed in preload_libraries), so it doesn't seem like the thinnest possible shim to me. One thing I'm not too certain about is whether to define the APIs just as above, or to support a passthrough argument of some sort (and if so, what does it reference)? Possibly any datatype that we'd actually care about this for is going to be simple enough to not need any state data. Or possibly not. And what about collations? Maybe there should be a comparator_setup function that gets the collation OID and returns void *, and then that void * value gets passed as a third argument to each call to the comparator function. Maybe. Or perhaps we could merge that work into the function-pointer-setup function --- that is, give it the collation and some extra workspace to fool with. We would always know the collation at the time we're doing that setup. At this point the struct filled in by the setup function is starting to feel a lot like FmgrInfo, and history teaches us a lot about what needs to be in there. So maybe typedef struct SortSupportInfoData *SortSupportInfo; typedef struct SortSupportInfoData { MemoryContext info_cxt; /* where to allocate subsidiary data */ void*extra; /* any data needed by functions */ Oid collation; /* provided by caller */ void(*inline_qsort) (Datum *elements, int nelements, SortSupportInfo info); int (*comparator) (Datum a, Datum b, SortSupportInfo info); /* possibly other function pointers in future */ } SortSupportInfoData; I am thinking that the btree code, at least, would want to just unconditionally do colsortinfo-comparator(datum1, datum2, colsortinfo) so for an opclass that fails to supply the low-overhead comparator, it would insert into the comparator pointer a shim function that calls the opclass' old-style FCI-using comparator. (Anybody who complains about the added overhead would be told to get busy and supply a low-overhead comparator for their datatype...) But to do that, we have to have enough infrastructure here to cover all cases, so omitting collation or not having a place to stash an FmgrInfo won't do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Thu, Dec 1, 2011 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: But you still need a lot of mechanism to do that mapping, including an initialization function that has noplace to be executed (unless every .so that defines a btree opclass has to be listed in preload_libraries), so it doesn't seem like the thinnest possible shim to me. PG_init? One thing I'm not too certain about is whether to define the APIs just as above, or to support a passthrough argument of some sort (and if so, what does it reference)? Possibly any datatype that we'd actually care about this for is going to be simple enough to not need any state data. Or possibly not. And what about collations? Maybe there should be a comparator_setup function that gets the collation OID and returns void *, and then that void * value gets passed as a third argument to each call to the comparator function. Maybe. Or perhaps we could merge that work into the function-pointer-setup function --- that is, give it the collation and some extra workspace to fool with. We would always know the collation at the time we're doing that setup. Ah! That seems quite nice. At this point the struct filled in by the setup function is starting to feel a lot like FmgrInfo, and history teaches us a lot about what needs to be in there. So maybe typedef struct SortSupportInfoData *SortSupportInfo; typedef struct SortSupportInfoData { MemoryContext info_cxt; /* where to allocate subsidiary data */ void *extra; /* any data needed by functions */ Oid collation; /* provided by caller */ void (*inline_qsort) (Datum *elements, int nelements, SortSupportInfo info); int (*comparator) (Datum a, Datum b, SortSupportInfo info); /* possibly other function pointers in future */ } SortSupportInfoData; I am thinking that the btree code, at least, would want to just unconditionally do colsortinfo-comparator(datum1, datum2, colsortinfo) so for an opclass that fails to supply the low-overhead comparator, it would insert into the comparator pointer a shim function that calls the opclass' old-style FCI-using comparator. (Anybody who complains about the added overhead would be told to get busy and supply a low-overhead comparator for their datatype...) But to do that, we have to have enough infrastructure here to cover all cases, so omitting collation or not having a place to stash an FmgrInfo won't do. I'm slightly worried about whether that'll be adding too much overhead to the case where there is no non-FCI comparator. But it may be no worse than what we're doing now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 1, 2011 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am thinking that the btree code, at least, would want to just unconditionally do colsortinfo-comparator(datum1, datum2, colsortinfo) so for an opclass that fails to supply the low-overhead comparator, it would insert into the comparator pointer a shim function that calls the opclass' old-style FCI-using comparator. (Anybody who complains about the added overhead would be told to get busy and supply a low-overhead comparator for their datatype...) But to do that, we have to have enough infrastructure here to cover all cases, so omitting collation or not having a place to stash an FmgrInfo won't do. I'm slightly worried about whether that'll be adding too much overhead to the case where there is no non-FCI comparator. But it may be no worse than what we're doing now. It should be the same or better. Right now, we are going through FunctionCall2Coll to reach the FCI-style comparator. The shim function would be more or less equivalent to that, and since it's quite special purpose I would hope we could shave a cycle or two. For instance, we could probably afford to set up a dedicated FunctionCallInfo struct associated with the SortSupportInfo struct, and not have to reinitialize one each time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Accounting for toast in query planner. (gin/gist indexes).
jes...@krogh.cc writes: Secondly I could bump the default cost of ts_match_vq/ts_match_qv a bit up, since the cost of doing that computation is probably not as cheap as a ordinary boolean function. Actually, you could try bumping their costs up by more than a bit. It's a tad unfair to blame the toast access costs on the functions, but it might get close enough to what you need. If memory serves, we don't charge those costs against index scans. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers