Re: [HACKERS] REVIEW: pg_stat_statements with query tree based normalization
I am going to produce another revision in response to feedback already received. I intend to outline the steps that it will take to resolve some outstanding issues in the next day or so. It would be nice if you could take a look at the revised patch that is ultimately produced. Should I keep you posted? Please do. I tried the patch because I wanted to investigate the Django test suite. While I did that, I ended up doing a small review of the feature. I can easily do that again with an updated patch. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
I decided to take advantage of my ongoing access to a benchmarking server to see how I could get on with a query with an especially large sort. Now, that box has 16GB of ram, and some people have that much memory in their laptops these days, so I was somewhat limited in how far I could push things. I eventually decided upon much the same benchmark as I'd made in my previous e-mail (the query SELECT * FROM pgbench_accounts ORDER BY bid;, but primed with random data, while being sure to subsequently vacuum). I stress that I have not made any attempt to artificially remove client overhead. I have, however, used a faster disk (caches were not warmed in a seperate run of pgbench or anything like that for either of my two most recent benchmarks, so there may have been and indeed may still be some small bias there), in addition to connecting pgbench with the -h option. Apparently a known problem with Linux kernels using the Completely Fair Scheduler introduced in 2.6.23 is that it does not schedule the pgbench program very well when it's connecting to the database usingUnix-domain sockets, as I have been until now. I'm not sure that this had all that much potential to spoil my results, but didn't want to take the chance . Anyway, the results of running this latest benchmark, with 20 successive runs of each large query, were: Patch: pghost: localhost pgport: nclients: 1 nxacts: 20 dbName: pgbench transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 number of transactions per client: 20 number of transactions actually processed: 20/20 tps = 0.030924 (including connections establishing) tps = 0.030924 (excluding connections establishing) statement latencies in milliseconds: 31163.957400SELECT * FROM pgbench_accounts ORDER BY bid; Master: pghost: localhost pgport: nclients: 1 nxacts: 20 dbName: pgbench pghost: localhost pgport: nclients: 1 nxacts: 20 dbName: pgbench transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 number of transactions per client: 20 number of transactions actually processed: 20/20 tps = 0.026769 (including connections establishing) tps = 0.026769 (excluding connections establishing) statement latencies in milliseconds: 36179.508750SELECT * FROM pgbench_accounts ORDER BY bid; That's a larger proportional improvement than reported on Thursday, and at significantly greater scale. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks
2012/1/19 Robert Haas robertmh...@gmail.com: On Thu, Jan 19, 2012 at 3:51 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2012/1/19 Robert Haas robertmh...@gmail.com: On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: In sepgsql side, it determines a case to apply permission checks according to the contextual information; that is same technique when we implemented create permission. Thus, it could checks db_xxx:{drop} permission correctly. Why do we need the contextual information in this case? Why can't/shouldn't the decision be made solely on the basis of what object is targeted? Several code-paths to remove a particular objects are not appropriate to apply permission checks. For example... [1] Cleanup of temporary objects on_shmem_eixt() registers RemoveTempRelationsCallback(), then it eventually calls deleteWhatDependsOn() that have an invocation of deleteOneObject(). This code path is just an internal cleanup process, not related to permission of the client. [2] Cleanup of transient table at VACUUM FULL/CLUSTER command rebuild_relation() creates a temporary table with make_new_heap(), then it copies the contents of original table according to the order of index, and calls finish_heap_swap() that swaps relation files and removes the temporary table using performDeletion(). This code actually create and drop a table, however, it is quite internal design and not related to permission of the client. So, I think sepgsql should only applied to permission checks object deletion invoked by user's operations, such as DropStmt. I agree with that theory, but isn't this method of implementing that a pretty horrible kludge? For example, if you'd implemented it this way for 9.1, the recent drop-statement refactoring would have broken it. Or if, in the future, we add another type of statement that can drop things, this code will still compile just fine but will no longer work correctly. ISTM that we need a way to either (1) not call the hook at all unless the operation is user-initiated, or (2) call the hook, but pass a flag indicating what sort of operation this is? Yes. This approach requires to continue code revising on sepgsql-side also for each major release. I think the approach (1) raise an issue similar to what we discussed when sepgsql implemented create permission; we have to know details of extension module to determine whether the hook should be called, or not. My preference is (2) that is more reasonable. Let's imagine another possible use of this hook: we want to emit some kind of log message every time a database object gets dropped. I think that's a plausible use-case, and in that case what we'd want is: (1) VACUUM FULL or CLUSTER shouldn't call the hook at all, (2) cleanup of temporary objects should probably call the hook, but ideally with a flag to indicate that it's an internal (DB-initiated) operation, and (3) user activity should definitely call the hook. I'm not sure how we can cleanly get that behavior, but ISTM that's what we want... I think the case (1) should also call the hook but with a flag that indicate database internal stuff, because make_new_heap() calls OAT_POST_CREATE hook, thus, we need to give extensions a chance to cleanup, if it did something on this timing. I'd like to propose to utilize DropBehavior argument of performDeletion() to inform dependency.c its invoked context. If we have a new flag DROP_INTERNAL that can be OR-masked with existing DROP_RESTRICT or DROP_CASCADE, it can be used to inform the current context, then, it shall be used to the flag to the hook. It seems to me this approach is minimum invasive. In addition, we may have a case when extension want to know whether the deletion is cascaded, or explicitly specified by users. If they want to implement same security model on this hook. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP -- renaming implicit sequences
On 19/01/2012, Robert Haas robertmh...@gmail.com wrote: rhaas=# alter sequence foo_a_seq rename to bob; ALTER SEQUENCE If somebody renames the table or the column at this point, it's a good bet that they *don't* want bob renamed. FWIW the patch addresses this case and wouldn't try to rename 'bob'. Another, admittedly minor consideration is that this can introduce some subtle concurrency bugs that we don't have today. For example, suppose we choose a new name for the sequence which isn't in use, but then between the time when we pick the name and the time we do the insert the name becomes used, due to some concurrent transaction. Now we'll fail with a rather baffling error message. That isn't necessarily a huge problem - we have lots of code that is prone to such race conditions - but it's not wonderful either. ... I thought about this, and it seemed to me that (1) the same race already applies when you CREATE a table with a serial column and (2) anyone running a bunch of DDL concurrently with other DDL operations already needs to coordinate their action or deal with occasional name collisions in general. But yeah I see that it's not ideal. I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. Fair enough, I'll leave it there. Thanks for the feedback! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
On 19 Jan 2012 21:00, Robert Haas robertmh...@gmail.com wrote: I agree. It occurs to me that we recently changed the default *text* output format for bytea for reasons not dissimilar to those contemplated here. Presumably, that's a much more disruptive change, and yet we've had minimal complaints because anyone who gets bitten can easily set bytea_output='escape' and the problem goes away. I had a run in with this. JDBC driver versions 9.0 with the default configuration resulted in silent data corruption. The fix was easy, but not having an useful error was what really bothered me. -- Ants Aasma
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
c. Refine error handling of dblink.c. I think it preserves the previous behavior for column number mismatch and type conversion exception. Hello, I don't know if this cover following issue. I just mention it for the case you didn't notice it and would like to handle this rather cosmetic issue as well. http://archives.postgresql.org/pgsql-bugs/2011-08/msg00113.php best regards, Marc Mamin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump custom format specification
Someone has some kind of specification or description of the pg_dump custom format? What I'm trying to do is an utility to remotelly syncronize postgresql dumps, using lib_rsync to syncronize each table independently and copying blobs only when sizes differ. I've made the same using the Tar format, but I think most of the features and optimizations like paralell-restore are now only available with the custom format. I will do the script in Python, so, I need to know how to parse and write pg_dumps in custom format. Thanks in advance. -- Guillermo M. Narvaja Lambda Sistemas S.R.L. www.fierro-soft.com.ar Tel: (5411) 4139-0493/4 Cel: (5411) 15-6783-4435 Email: guillermo.narv...@fierro-soft.com.ar MSN: guillermo_narv...@hotmail.com Skype: guillermonarvaja Lavalleja 519 1er Piso - Ciudad de Buenos Aires - Argentina -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump custom format specification
On 01/21/2012 07:44 AM, Guillermo M. Narvaja wrote: Someone has some kind of specification or description of the pg_dump custom format? What I'm trying to do is an utility to remotelly syncronize postgresql dumps, using lib_rsync to syncronize each table independently and copying blobs only when sizes differ. I've made the same using the Tar format, but I think most of the features and optimizations like paralell-restore are now only available with the custom format. I will do the script in Python, so, I need to know how to parse and write pg_dumps in custom format. You might find this useful: https://gist.github.com/1258232 It's a perl script that parses and prints out the header and TOC of a custom format dump. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Fri, Jan 20, 2012 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote: D'oh. You're right. Looks like I accidentally tried to apply this to the 9.1 sources. Sigh... No worries. It's Friday. Server passed 'make check' with this patch, but when I tried to fire it up for some test runs, it fell over with: FATAL: no more LWLockIds available I assume that it must be dependent on the config settings used. Here are mine: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote: Can I just check with you that the only review comment is a one line change? Seems better to make any additional review comments in one go. No, I haven't done a full review yet - I just noticed that right at the outset and wanted to be sure that got addressed. I can look it over more carefully, and test it. Corrected your point, and another found during retest. Works as advertised, docs corrected. This comment needs updating: /* * To drop an index safely, we must grab exclusive lock on its parent * table. Exclusive lock on the index alone is insufficient because * another backend might be about to execute a query on the parent table. * If it relies on a previously cached list of index OIDs, then it could * attempt to access the just-dropped index. We must therefore take a * table lock strong enough to prevent all queries on the table from * proceeding until we commit and send out a shared-cache-inval notice * that will make them update their index lists. */ OK, I did reword some but clearly not enough. Speaking of that comment, why does a concurrent index drop need any lock at all on the parent table? If, as the current comment text says, the point of locking the parent table is to make sure that no new backends can create relcache entries until our transaction commits, then it's not needed here, or at least not for that purpose. We're going to out-wait anyone who has the index open *after* we've committed our changes to the pg_index entry, so there shouldn't be a race condition there. There may very well be a reason why we need a lock on the parent, or there may not, but that's not it. I feel happier locking the parent as well. I'd rather avoid strange weirdness later as has happened before in this type of discussion. On the flip side, it strikes me that there's considerable danger that ShareUpdateExclusiveLock on the index might not be strong enough. In particular, it would fall down if there's anyone who takes an AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds to access the index data despite its being marked !indisready and !indisvalid. There are certainly instances of this in contrib, such as in pgstatindex(), which I'm pretty sure will abort with a nastygram if somebody truncates away the file under it. That might not be enough reason to reject the patch, but I do think it would be the only place in the system where we allow something like that to happen. I'm not sure whether there are any similar risks in core - I am a bit worried about get_actual_variable_range() and gincostestimate(), but I can't tell for sure whether there is any actual problem there, or elsewhere. I wonder if we should only do the *first* phase of the drop with ShareUpdateExcusliveLock, and then after outwaiting everyone, take an AccessExclusiveLock on the index (but not the table) to do the rest of the work. If that doesn't allow the drop to proceed concurrently, then we go find the places that block against it and teach them not to lock/use/examine indexes that are !indisready. That way, the worst case is that D-I-C is less concurrent than we'd like, rather than making random other things fall over - specifically, anything that count on our traditional guarantee that AccessShareLock is enough to keep the file from disappearing. Your caution is wise. All users of an index have already checked whether the index is usable at plan time, so although there is much code that assumes they can look at the index itself, that is not executed until after the correct checks. I'll look at VACUUM and other utilities, so thanks for that. I don't see much point in having the higher level lock, except perhaps as a test this code works. In the department of minor nitpicks, the syntax synopsis you've added looks wrong: DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ] CONCURRENTLY replaceable class=PARAMETERname/replaceable That implies that you may write if exists, followed by 1 or more names, followed by cascade or restrict. After that you must write CONCURRENTLY, followed by exactly one name. I think what you want is: DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ] DROP INDEX CONCURRENTLY replaceable class=PARAMETERname/replaceable ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ] CONCURRENTLY. It could also be very useful to be able to concurrently drop multiple indexes at once, since that would require waiting out all concurrent transactions only once
Re: [HACKERS] CLOG contention, part 2
On Sat, Jan 21, 2012 at 1:57 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 20, 2012 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote: D'oh. You're right. Looks like I accidentally tried to apply this to the 9.1 sources. Sigh... No worries. It's Friday. Server passed 'make check' with this patch, but when I tried to fire it up for some test runs, it fell over with: FATAL: no more LWLockIds available I assume that it must be dependent on the config settings used. Here are mine: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 wal_writer_delay = 20ms Yes, it was. Sorry about that. New version attached, retesting while you read this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 69b6ef3..6ff6894 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -37,6 +37,7 @@ #include access/transam.h #include miscadmin.h #include pg_trace.h +#include utils/snapmgr.h /* * Defines for CLOG page sizes. A page is the same BLCKSZ as is used @@ -70,10 +71,17 @@ /* * Link to shared-memory data structures for CLOG control + * + * As of 9.2, we have 2 structures for commit log data. + * ClogCtl manages the main read/write part of the commit log, while + * the ClogHistoryCtl manages the now read-only, older part. ClogHistory + * removes contention from the path of transaction commits. */ static SlruCtlData ClogCtlData; +static SlruCtlData ClogHistoryCtlData; -#define ClogCtl (ClogCtlData) +#define ClogCtl (ClogCtlData) +#define ClogHistoryCtl (ClogHistoryCtlData) static int ZeroCLOGPage(int pageno, bool writeXlog); @@ -296,6 +304,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, /* ... then the main transaction */ TransactionIdSetStatusBit(xid, status, lsn, slotno); + + /* When we commit advance ClogCtl's shared RecentXminPageno if needed */ + if (ClogCtl-shared-RecentXminPageno TransactionIdToPage(RecentXmin)) + ClogCtl-shared-RecentXminPageno = TransactionIdToPage(RecentXmin); } /* Set the subtransactions */ @@ -387,6 +399,8 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn) { + SlruCtl clog = ClogCtl; + bool useClogHistory = true; int pageno = TransactionIdToPage(xid); int byteno = TransactionIdToByte(xid); int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT; @@ -397,15 +411,35 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn) /* lock is acquired by SimpleLruReadPage_ReadOnly */ - slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, xid); - byteptr = ClogCtl-shared-page_buffer[slotno] + byteno; + /* + * Decide whether to use main Clog or read-only ClogHistory. + * + * Our knowledge of the boundary between the two may be a little out + * of date, so if we try Clog and can't find it we need to try again + * against ClogHistory. + */ + if (pageno = ClogCtl-recent_oldest_active_page_number) + { + slotno = SimpleLruReadPage_ReadOnly(clog, pageno, xid); + if (slotno = 0) + useClogHistory = false; + } + + if (useClogHistory) + { + clog = ClogHistoryCtl; + slotno = SimpleLruReadPage_ReadOnly(clog, pageno, xid); + Assert(slotno = 0); + } + + byteptr = clog-shared-page_buffer[slotno] + byteno; status = (*byteptr bshift) CLOG_XACT_BITMASK; lsnindex = GetLSNIndex(slotno, xid); - *lsn = ClogCtl-shared-group_lsn[lsnindex]; + *lsn = clog-shared-group_lsn[lsnindex]; - LWLockRelease(CLogControlLock); + LWLockRelease(clog-shared-ControlLock); return status; } @@ -445,15 +479,19 @@ CLOGShmemBuffers(void) Size CLOGShmemSize(void) { - return SimpleLruShmemSize(CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE); + /* Reserve shmem for both ClogCtl and ClogHistoryCtl */ + return SimpleLruShmemSize(2 * CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE); } void CLOGShmemInit(void) { ClogCtl-PagePrecedes = CLOGPagePrecedes; + ClogHistoryCtl-PagePrecedes = CLOGPagePrecedes; SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, CLogControlLock, pg_clog); + SimpleLruInit(ClogHistoryCtl, CLOG History Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, + CLogHistoryControlLock, pg_clog); } /* @@ -592,6 +630,16 @@ CheckPointCLOG(void) TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true); SimpleLruFlush(ClogCtl, true); TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true); + + /* + * Now that we've written out all dirty buffers the only pages that + * will get dirty again will be pages with active transactions on them. + * So we can move forward the oldest_active_page_number and allow + * read only operations via ClogHistoryCtl. + */ + LWLockAcquire(CLogControlLock,
Re: [HACKERS] xlog location arithmetic
On 23-12-2011 12:05, Tom Lane wrote: I too think a datatype is overkill, if we're only planning on providing one function. Just emit the values as numeric and have done. Here it is. Output changed to numeric. While the output was int8 I could use pg_size_pretty but now I couldn't. I attached another patch that implements pg_size_pretty(numeric). -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 48631cc..04bc24d 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14378,6 +14378,9 @@ SELECT set_config('log_statement_stats', 'off', false); indexterm primarypg_xlogfile_name_offset/primary /indexterm + indexterm +primarypg_xlog_location_diff/primary + /indexterm para The functions shown in xref @@ -14450,6 +14453,13 @@ SELECT set_config('log_statement_stats', 'off', false); entrytypetext/, typeinteger//entry entryConvert transaction log location string to file name and decimal byte offset within file/entry /row + row + entry +literalfunctionpg_xlog_location_diff(parameterlocation/ typetext/, parameterlocation/ typetext/)/function/literal +/entry + entrytypenumeric//entry + entryCalculate the difference between two transaction log locations/entry + /row /tbody /tgroup /table @@ -14543,6 +14553,13 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); /para para + functionpg_xlog_location_diff/ calculates the difference in bytes + between two transaction log locations. It can be used with + structnamepg_stat_replication/structname or some functions shown in + xref linkend=functions-admin-backup-table to get the replication lag. + /para + + para For details about proper usage of these functions, see xref linkend=continuous-archiving. /para diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 2e10d4d..e03c5e8 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -26,6 +26,7 @@ #include replication/walreceiver.h #include storage/smgr.h #include utils/builtins.h +#include utils/numeric.h #include utils/guc.h #include utils/timestamp.h @@ -465,3 +466,84 @@ pg_is_in_recovery(PG_FUNCTION_ARGS) { PG_RETURN_BOOL(RecoveryInProgress()); } + +static void +validate_xlog_location(char *str) +{ +#define MAXLSNCOMPONENT 8 + + int len1, len2; + + len1 = strspn(str, 0123456789abcdefABCDEF); + if (len1 1 || len1 MAXLSNCOMPONENT || str[len1] != '/') + ereport(ERROR, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg(invalid input syntax for transaction log location: \%s\, str))); + len2 = strspn(str + len1 + 1, 0123456789abcdefABCDEF); + if (len2 1 || len2 MAXLSNCOMPONENT || str[len1 + 1 + len2] != '\0') + ereport(ERROR, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg(invalid input syntax for transaction log location: \%s\, str))); +} + +/* + * Compute the difference in bytes between two WAL locations. + */ +Datum +pg_xlog_location_diff(PG_FUNCTION_ARGS) +{ + text *location1 = PG_GETARG_TEXT_P(0); + text *location2 = PG_GETARG_TEXT_P(1); + char *str1, *str2; + uint64 xlogid1, xrecoff1; + uint64 xlogid2, xrecoff2; + Numeric result; + + /* + * Read and parse input + */ + str1 = text_to_cstring(location1); + str2 = text_to_cstring(location2); + + validate_xlog_location(str1); + validate_xlog_location(str2); + + if (sscanf(str1, %8lX/%8lX, xlogid1, xrecoff1) != 2) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(could not parse transaction log location \%s\, str1))); + if (sscanf(str2, %8lX/%8lX, xlogid2, xrecoff2) != 2) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(could not parse transaction log location \%s\, str2))); + + /* + * Sanity check + */ + if (xrecoff1 XLogFileSize) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(xrecoff \%lX\ is out of valid range, 0..%X, xrecoff1, XLogFileSize))); + if (xrecoff2 XLogFileSize) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(xrecoff \%lX\ is out of valid range, 0..%X, xrecoff2, XLogFileSize))); + + /* + * result = XLogFileSize * (xlogid1 - xlogid2) + xrecoff1 - xrecoff2 + */ + result = DatumGetNumeric(DirectFunctionCall2(numeric_sub, + DirectFunctionCall1(int8_numeric, Int64GetDatum(xlogid1)), + DirectFunctionCall1(int8_numeric, Int64GetDatum(xlogid2; + result = DatumGetNumeric(DirectFunctionCall2(numeric_mul, + DirectFunctionCall1(int8_numeric, Int64GetDatum(XLogFileSize)), + NumericGetDatum(result))); + result = DatumGetNumeric(DirectFunctionCall2(numeric_add, + NumericGetDatum(result), + DirectFunctionCall1(int8_numeric, Int64GetDatum(xrecoff1;
Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)
On Sat, Jan 21, 2012 at 3:59 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I marked the default leakproof function according to the criteria that does not leak contents of the argument. Indeed, timestamp_ne_timestamptz() has a code path that rises an error of timestamp out of range message. Is it a good idea to avoid mark leakproof on these functions also? I think that anything which looks at the data and uses that as a basis for whether or not to throw an error is non-leakproof. Even if doesn't directly leak an arbitrary value, I think that leaking even some information about what the value is no good. Otherwise, you might imagine that we would allow /(int, int), because it only leaks in the second_arg = 0 case. And you might imagine we'd allow -(int, int) because it only leaks in the case where an overflow occurs. But of course the combination of the two allows writing something of the form 1/(a-constant) and getting it pushed down, and now you have the ability to probe for an arbitrary value. So I think it's just no good to allow any leaking at all: otherwise it'll be unclear how safe it really is, especially when combinations of different functions or operators are involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On 20 Leden 2012, 13:23, Magnus Hagander wrote: On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote: On 20.12.2011 19:59, Tomas Vondra wrote: On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! I've added the docs (see the attachment) and rebased to current head. Tomas Fixed a failing regression test (check of pg_stat_database structure). I'm wondering if this (and also my deadlocks stats patch that's int he queue) should instead of inventing new pgstats messages, add fields to the tabstat message. It sounds like that one is just for tables, but it's already the one collecting info about commits and rollbacks, and it's already sent on every commit. Hmmm, I'm not against that, but I'd recommend changing the message name to something that reflects the reality. If it's not just about table statistics, it should not be named 'tabstats' IMHO. Or maybe split that into two messages, both sent at the commit time. I do like the idea of not sending the message for each temp file, though. One thing that worries me are long running transactions (think about a batch process that runs for several hours within a single transaction). By sending the data only at the commit, such transactions would not be accounted properly. So I'd suggest sending the message either at commit time or after collecting enough data (increment a counter whenever the struct is updated and send a message when the counter = N for a reasonable value of N, say 20). But maybe it already works that way - I haven't checked the current 'tabstat' implementation. Adding two fields to that one would add some extra bytes on every send, but I wonder if that woudl ever affect performance, given the total size of the packet? And it would certainly be lower overhead in the cases that there *have* been temp tables used. It's not about temp tables, it's about temp files. Which IMHO implies that there would be exactly 0.01% performance difference because temporary files are quite expensive. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote: On 20 Leden 2012, 13:23, Magnus Hagander wrote: On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote: On 20.12.2011 19:59, Tomas Vondra wrote: On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! I've added the docs (see the attachment) and rebased to current head. Tomas Fixed a failing regression test (check of pg_stat_database structure). I'm wondering if this (and also my deadlocks stats patch that's int he queue) should instead of inventing new pgstats messages, add fields to the tabstat message. It sounds like that one is just for tables, but it's already the one collecting info about commits and rollbacks, and it's already sent on every commit. Hmmm, I'm not against that, but I'd recommend changing the message name to something that reflects the reality. If it's not just about table statistics, it should not be named 'tabstats' IMHO. Or maybe split that into two messages, both sent at the commit time. Yes, renaming it might be a good idea. If you split it into two messages that would defeat much of the point. Though I just looked at the tabstat code again, and we already split that message up at regular intervals. Which would make it quite weird to have global counters in it as well. But instead of there, perhaps we need a general non table, but more than one type of data message sent out at the same time. There is other stuff in the queue for it. I'm not convinced either way - I'm not against the original way in your patch either. I just wanted to throw the idea out there, and was hoping somebody else would have an opinion too :-) I do like the idea of not sending the message for each temp file, though. One thing that worries me are long running transactions (think about a batch process that runs for several hours within a single transaction). By sending the data only at the commit, such transactions would not be accounted properly. So I'd suggest sending the message either at commit By that argument they are *already* not accounted properly, because the number of rows and those counters are wrong. By sending the temp file data in the middle of the transaction, you won't be able to correlate those numbers with the temp file usage. I'm not saying the other usecase isn't more common, but whichever way you do it, it's going to get inconsistent with *something*. time or after collecting enough data (increment a counter whenever the struct is updated and send a message when the counter = N for a reasonable value of N, say 20). But maybe it already works that way - I haven't checked the current 'tabstat' implementation. No, tabstat is sent at transaction end only. Adding two fields to that one would add some extra bytes on every send, but I wonder if that woudl ever affect performance, given the total size of the packet? And it would certainly be lower overhead in the cases that there *have* been temp tables used. It's not about temp tables, it's about temp files. Which IMHO implies that there would be exactly 0.01% performance difference because temporary files are quite expensive. Bad choice of words, I meant temp files, and was thinking temp files the whole way :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Hi, Thank you for reviewing this patch! Hitoshi Harada umi.tan...@gmail.com writes: The patch applies with one reject, which I could fix easily. The make check passed. Bitrot happens fast in this season… will produce another version of the patch. Table pg_catalog.pg_extension_feature Column | Type | Modifiers +--+--- extoid | oid | not null extfeature | name | not null Indexes: pg_extension_feature_index UNIQUE, btree (extoid, extfeature) pg_extension_feature_oid_index UNIQUE, btree (oid) * I'm not quit sure why pg_extension_feature_index needs extoid column. That allows managing features per extension: you need to know which extension is providing which feature to be able to solve dependencies. * I have a big question to add two-column catalog. I don't mind the actual number of columns, but if the table has only two columns, it implies the design may be bad. Only two column catalog other than this is pg_largeobject_metadata. We need each feature to be a full PostgreSQL object so that we can use the dependency tracking. That allows to manage DROP EXTENSION foo and cascade to extensions that depend on feature(s) provided by foo. It also allows to track at update time when the feature set changes if we're removing a feature that some other installed extension is depending on, allowing our users to know that they should either drop this other extension or update it too. Next, some questions: - Why is the finer dependency needed? Do you have tangible example that struggles with the dependency granularity? I feel so good about the existing dependency on extension as an extension developer of several ones. The problem is not yet very apparent only because extensions are very new. The main thing we address with this patch is depending on a feature that appeared while developing an extension or that gets removed down the line. It allows to depend on features and avoid needing to compare version numbers and maintain a list of which version number is providing which feature. This feature has been asked by several extension users, beginning even before 9.1 got released. - What happens if DROP EXTENSION ... CASCADE? Does it work? It should, what happens when you try? :) - How does pg_upgrade interact with this? The dependency changes. We're compatible because the extension name itself is always considered as a feature and added to the catalogs, so that e.g. require = 'cube' will now target a feature rather than an extension, and this feature defaults to being provided by the 'create extension cube' statement that pg_upgrade issues given 9.1 dependency tracking. A minor memo. list_extension_features(): I guess the size argument for repalloc is bogus. Oh, a quick review of repalloc call points seems to indicate you're right, I'll fix that in the next version of the patch. So, that's pretty much I've reviewed quickly. I'll need more time to look in detail, but I'd like more inputs for the high-level design and direction. I hope to be answering to your questions here, please ask for more details as you need them! Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On 21 Leden 2012, 18:20, Dimitri Fontaine wrote: Hi, Thank you for reviewing this patch! Hitoshi Harada umi.tan...@gmail.com writes: Next, some questions: - Why is the finer dependency needed? Do you have tangible example that struggles with the dependency granularity? I feel so good about the existing dependency on extension as an extension developer of several ones. The problem is not yet very apparent only because extensions are very new. The main thing we address with this patch is depending on a feature that appeared while developing an extension or that gets removed down the line. It allows to depend on features and avoid needing to compare version numbers and maintain a list of which version number is providing which feature. This feature has been asked by several extension users, beginning even before 9.1 got released. It's also about several extension providing the same sort of functionality implemented in different ways. Think about extensions that allow you to send e-mails right from the database. One could do that directly, the other one could implement queuing, another one could be optimized for a specific SMTP server. With features, you could just say 'require = mail' and use the extension that's installed. Yes, this needs a common API. I personally see this as a major step towards fully-fledged package management, similar to those available in modern Linux distributions (apt, yum, portage, ...). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql tab completion for GRANT role
On tis, 2012-01-17 at 13:03 +0100, Pavel Stehule wrote: Hello I did review of this small patch http://archives.postgresql.org/message-id/1326055692.15293.15.ca...@vanquo.pezone.net * This patch was applied without with one hunk and compiled without warnings bash-4.2$ patch -p1 autocompleta.patch patching file src/bin/psql/tab-complete.c Hunk #2 succeeded at 2335 with fuzz 1. * all regress tests was processed * this patch respect Pg coding standards * code is simple and has not impact on core functionality or core performance * regress tests are not requested for interactive function * I checked it, and autocomplete has expected behave. It is ready for commit Committed, thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lots of unused variable warnings in assert-free builds
On ons, 2012-01-18 at 21:16 +0200, Peter Eisentraut wrote: On sön, 2012-01-15 at 08:14 -0500, Andrew Dunstan wrote: It would possibly have some documentary value too. Just looking very quickly at Peter's patch, I don't really understand his assertion that this would significantly butcher the code. The worst effect would be that in a few cases we'd have to break up multiple declarations where one of the variables was in this class. That doesn't seem like a tragedy. Well, I'll prepare a patch like that and then you can judge. So, here are three patches that could solve this issue. pg-cassert-unused-attribute.patch, the one I already showed, using __attribute__((unused). pg-cassert-unused-ifdef.patch, using only additional #ifdef USE_ASSERT_CHECKING. This does have additional documentation value, but you can see that it gets bulky and complicated. (I introduced several bugs merely while creating this patch.) pg-cassert-unused-void.patch is an alternative approach that avoids the warnings by casting the arguments of Assert() to void if assertions are disabled. This has the least code impact, but it changes the traditional semantics of asserts, which is that they disappear completely when not enabled. You can see how this creates a problem in src/backend/replication/syncrep.c, where the nontrivial call to SyncRepQueueIsOrderedByLSN() now becomes visible even in non-assert builds. I played around with some other options like __attribute__((pure)) to make the compiler optimize the function away in that case, but that didn't appear to work. So this might not be a workable solution (and it would be GCC-specific anyway). diff --git i/src/backend/access/hash/hashovfl.c w/src/backend/access/hash/hashovfl.c index 130c296..d4329fb 100644 --- i/src/backend/access/hash/hashovfl.c +++ w/src/backend/access/hash/hashovfl.c @@ -391,7 +391,9 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf, uint32 ovflbitno; int32 bitmappage, bitmapbit; +#ifdef USE_ASSERT_CHECKING Bucket bucket; +#endif /* Get information from the doomed page */ _hash_checkpage(rel, ovflbuf, LH_OVERFLOW_PAGE); @@ -400,7 +402,9 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf, ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage); nextblkno = ovflopaque-hasho_nextblkno; prevblkno = ovflopaque-hasho_prevblkno; +#ifdef USE_ASSERT_CHECKING bucket = ovflopaque-hasho_bucket; +#endif /* * Zero the page for debugging's sake; then write and release it. (Note: diff --git i/src/backend/executor/execCurrent.c w/src/backend/executor/execCurrent.c index b07161f..2a59fc6 100644 --- i/src/backend/executor/execCurrent.c +++ w/src/backend/executor/execCurrent.c @@ -151,7 +151,7 @@ execCurrentOf(CurrentOfExpr *cexpr, { ScanState *scanstate; bool lisnull; - Oid tuple_tableoid; + Oid tuple_tableoid __attribute__((unused)); ItemPointer tuple_tid; /* diff --git i/src/backend/executor/nodeMaterial.c w/src/backend/executor/nodeMaterial.c index b320b54..4ab660e 100644 --- i/src/backend/executor/nodeMaterial.c +++ w/src/backend/executor/nodeMaterial.c @@ -66,7 +66,7 @@ ExecMaterial(MaterialState *node) * Allocate a second read pointer to serve as the mark. We know it * must have index 1, so needn't store that. */ - int ptrno; + int ptrno __attribute__((unused)); ptrno = tuplestore_alloc_read_pointer(tuplestorestate, node-eflags); diff --git i/src/backend/executor/nodeSetOp.c w/src/backend/executor/nodeSetOp.c index 7fa5730..c6a88a8 100644 --- i/src/backend/executor/nodeSetOp.c +++ w/src/backend/executor/nodeSetOp.c @@ -344,7 +344,7 @@ setop_fill_hash_table(SetOpState *setopstate) SetOp *node = (SetOp *) setopstate-ps.plan; PlanState *outerPlan; int firstFlag; - bool in_first_rel; + bool in_first_rel __attribute__((unused)); /* * get state info from node diff --git i/src/backend/executor/nodeWorktablescan.c w/src/backend/executor/nodeWorktablescan.c index e2f3dd4..e72d1cb 100644 --- i/src/backend/executor/nodeWorktablescan.c +++ w/src/backend/executor/nodeWorktablescan.c @@ -30,7 +30,7 @@ static TupleTableSlot * WorkTableScanNext(WorkTableScanState *node) { TupleTableSlot *slot; - EState *estate; + EState *estate __attribute__((unused)); Tuplestorestate *tuplestorestate; /* diff --git i/src/backend/libpq/be-fsstubs.c w/src/backend/libpq/be-fsstubs.c index b864c86..1bf765c 100644 --- i/src/backend/libpq/be-fsstubs.c +++ w/src/backend/libpq/be-fsstubs.c @@ -378,7 +378,7 @@ lo_import_internal(text *filename, Oid lobjOid) { File fd; int nbytes, -tmp; +tmp __attribute__((unused)); char buf[BUFSIZE]; char fnamebuf[MAXPGPATH]; LargeObjectDesc *lobj; diff --git i/src/backend/libpq/pqcomm.c w/src/backend/libpq/pqcomm.c index 35812f4..df41d60 100644 --- i/src/backend/libpq/pqcomm.c +++ w/src/backend/libpq/pqcomm.c @@ -1373,7 +1373,7 @@ fail: void pq_putmessage_noblock(char msgtype, const char
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On 21 Leden 2012, 18:13, Magnus Hagander wrote: On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote: On 20 Leden 2012, 13:23, Magnus Hagander wrote: I'm wondering if this (and also my deadlocks stats patch that's int he queue) should instead of inventing new pgstats messages, add fields to the tabstat message. It sounds like that one is just for tables, but it's already the one collecting info about commits and rollbacks, and it's already sent on every commit. Hmmm, I'm not against that, but I'd recommend changing the message name to something that reflects the reality. If it's not just about table statistics, it should not be named 'tabstats' IMHO. Or maybe split that into two messages, both sent at the commit time. Yes, renaming it might be a good idea. If you split it into two messages that would defeat much of the point. Not really, if combined with the 'send after each N updates or at the transaction end' because then those parts might be sent independently (thus not transmitting the whole message). Though I just looked at the tabstat code again, and we already split that message up at regular intervals. Which would make it quite weird to have global counters in it as well. But instead of there, perhaps we need a general non table, but more than one type of data message sent out at the same time. There is other stuff in the queue for it. I'm not convinced either way - I'm not against the original way in your patch either. I just wanted to throw the idea out there, and was hoping somebody else would have an opinion too :-) Let's make that in a separate patch. It seems like a good thing to do, but I don't think it should be mixed with this patch. I do like the idea of not sending the message for each temp file, though. One thing that worries me are long running transactions (think about a batch process that runs for several hours within a single transaction). By sending the data only at the commit, such transactions would not be accounted properly. So I'd suggest sending the message either at commit By that argument they are *already* not accounted properly, because the number of rows and those counters are wrong. By sending the temp file data in the middle of the transaction, you won't be able to correlate those numbers with the temp file usage. I'm not saying the other usecase isn't more common, but whichever way you do it, it's going to get inconsistent with *something*. The question is whether the patch should be modified to send the message only at commit (i.e. just like tabstats) or if it can remain the way it is now. And maybe we could change the way all those messages are sent (e.g. to the regular submits I've proposed in the previous post) to make them more consistent. I'm not asking for 100% consistency. What I don't like is a batch process consuming resources but not reflected in the stats until the final commit. With a huge spike in the stats right after that, although the activity was actually spread over several hours. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On Sat, Jan 14, 2012 at 08:18:48PM +0100, Marco Nenciarini wrote: This is our latest version of the patch. Gabriele, Gianni and I have discussed a lot and decided to send an initial patch which uses EACH REFERENCES instead of ARRAY REFERENCES. The reason behind this is that ARRAY REFERENCES generates an ambiguous grammar, and we all agreed that EACH REFERENCES makes sense (and the same time does not introduce any new keyword). This is however open for discussion. I greatly like that name; it would still make sense for other aggregate types, should we ever expand its use. Please complete the name change: the documentation, catalog entries, etc should all call them something like each foreign key constraints (I don't particularly like that exact wording). You currently forbid multi-column EACH FKs. I agree that we should allow only one array column per FK; with more, the set of required PK rows would be something like the Cartesian product of the elements of array columns. However, there are no definitional problems, at least for NO ACTION, around a FK constraint having one array column and N scalar columns. Whether or not you implement that now, let's choose a table_constraint syntax leaving that opportunity open. How about: FOREIGN KEY(col_a, EACH col_b, col_c) REFERENCES pktable (a, b, c) You've identified that we cannot generally implement the ON DELETE ARRAY CASCADE action on multidimensional arrays. This patch chooses to downgrade to ON DELETE ARRAY SET NULL in that case. My initial reaction is that it would be better to forbid multidimensional arrays in the column when the delete action is ON DELETE ARRAY SET NULL. That's not satisfying, either, because it makes the definition of conforming data depend on the ON DELETE action. Do we have other options? --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Error | SET NULL | Row | Row | SET DEFAULT| Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | NO ACTION |-|-| RESTRICT |-|-| --- - - To complete the ARRAY - EACH transition, I would suggest names like CASCADE EACH/SET EACH NULL. I like the extensive test cases you have included. There's one more thing they should do: leave tables having EACH REFERENCES relationships in the regression database. This way, pg_dump tests of the regression database will exercise pg_dump with respect to this feature. The patch emits several warnings: heap.c: In function `StoreRelCheck': heap.c:1947: warning: passing argument 17 of `CreateConstraintEntry' makes integer from pointer without a cast index.c: In function `index_constraint_create': index.c:1160: warning: passing argument 17 of `CreateConstraintEntry' makes integer from pointer without a cast In file included from gram.y:13051: scan.c: In function `yy_try_NUL_trans': scan.c:16243: warning: unused variable `yyg' trigger.c: In function `CreateTrigger': trigger.c:454: warning: passing argument 17 of `CreateConstraintEntry' makes integer from pointer without a cast typecmds.c: In function `domainAddConstraint': typecmds.c:2960: warning: passing argument 17 of `CreateConstraintEntry' makes integer from pointer without a cast arrayfuncs.c: In function `array_remove': arrayfuncs.c:5197: warning: unused variable `dimresult' ri_triggers.c: In function `RI_FKey_check': ri_triggers.c:484: warning: too many arguments for format This test case, copied from my previous review except for updating the syntax, still fails: BEGIN; CREATE TABLE parent (c int PRIMARY KEY); CREATE TABLE child (c int[]); INSERT INTO parent VALUES (1); INSERT INTO child VALUES ('{3,1,2}'); ALTER TABLE child ADD FOREIGN KEY (c) EACH REFERENCES parent; -- should error INSERT INTO child VALUES ('{3,1,2}'); -- does error, as expected ROLLBACK; Most of my code comments concern minor matters: *** a/doc/src/sgml/ddl.sgml --- b/doc/src/sgml/ddl.sgml *** CREATE TABLE order_items ( *** 852,857 --- 882,931 /para para + When working with foreign key arrays, you have two more + options that can be used with your + literalEACH REFERENCES/literal definition: + literalARRAY CASCADE/literal and + literalARRAY SET NULL/literal. Depending on + the triggering action (commandDELETE/command or + commandUPDATE/command) on the referenced table, + every element in the referencing array will be either + deleted/updated or set to NULL. + For more detailed information on foreign key arrays + options and special cases, please refer to the + documentation for xref linkend=sql-createtable. +/para + + para + For instance, in the example below, a commandDELETE/command + from the literalcountries/literal table
Re: [HACKERS] Review of patch renaming constraints
On fre, 2012-01-20 at 11:32 +0530, Nikhil Sontakke wrote: Agreed. And right now primary key constraints are not marked as only making them available for inheritance in the future. Or you prefer it otherwise? Anyways, fail to see the direct connection between this and renaming. Might have to look at this patch for that. It checks conisonly to determine whether it needs to rename the constraint in child tables as well. Since a primary has conisonly = false, it goes to the child tables, but the constraint it not there. In the past, we have treated this merely as an implementation artifact: check constraints are inherited, primary key constraints are not. Now we can choose for check constraints, with inherited being the default. Having inheritable primary key constraints is a possible future feature. So we need to think a littler harder now how to work that into the existing logic. This also ties in with the other thread about having this in CREATE TABLE syntax. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On Sun, Jan 15, 2012 at 10:28 PM, Greg Smith g...@2ndquadrant.com wrote: I would like to be able to tell people they need never turn on log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets a good precedent for what direction the database is going. It would be nice for pg_stat_bgwriter's format to settle down for a long period too. While on the topic of the format of pg_stat_bgwriter and it converging on stability: I'm finding the backend_writes column pretty unfortunate. The only use I know of for it is to determine if the bgwriter is lagging behind. Yet it doesn't serve even this purpose because it lumps together the backend writes due to lagging background writes, and the backend writes by design due to the use buffer access strategy during bulk inserts. If I'm running a tightly controlled benchmark on an otherwise unused server and I know that no BAS is being used, then I can meaningfully use backend_writes. That is a pretty limiting limit. I think we should either create a separate column to segregate BAS backend_writes, or just don't report them at all and report only the non-BAS ones into pg_stat_bgwriter. I know you've argued against this before (http://archives.postgresql.org/pgsql-performance/2011-03/msg00340.php), but I think the assumption that all writes are absorbed by the OS without blocking is assuming an awful lot. If the server is not behaving well, then it very well may be blocking on writes. If we are confident that the OS can always efficiently cache writes, why bother with a background writer at all? I think the argument that it is not important to know which strategy motivated a backend write could just as validly be reformulated as an argument that we don't need to know how many backend writes there were in the first place. Knowing only an aggregation of BAS-motivated backend writes and straggling-bgwriter-motivated backend writes just doesn't seem useful to me. Am I missing a use for it? Not only is it probably not useful, it seems to be actively harmful as I've seen several times where it leads people down false paths (including me several times, as about once a year I forget why it is useless and spend time rediscovering it). If we are not willing to separate the two types of backend writes, or just omit the BAS ones altogether, then I would suggest we drop the column entirely. Would a patch to separate the backend writes into two columns be entertained for 9.3? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote: I think it would be better to pre-deduct the tape overhead amount we will need if we decide to switch to tape sort from the availMem before we even start reading (and then add it back if we do indeed make that switch). That way we wouldn't over-run the memory in the first place. However, that would cause apparent regressions in which sorts that previously fit into maintenance_work_mem no longer do. Boosting maintenance_work_mem to a level that was actually being used previously would fix those regressions, but pointing out that the previous behavior was not optimal doesn't change the fact that people are used to it and perhaps tuned to it. So the attached patch seems more backwards-friendly. Hmm. Are people really setting maintenance_work_mem such that it is exactly large enough to quicksort when building an index in one case but not another? Is the difference large enough to warrant avoiding pre-deduction? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On Fri, Jan 13, 2012 at 7:21 AM, Robert Haas robertmh...@gmail.com wrote: I think that people who are using index-only scans are going to want some way to find out the degree to which the scans are in fact index-only. So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. A review: It is not in context diff format. It applies, builds, and runs cleanly, including under --enable-cassert No docs, but the output format of other EXPLAIN ANALYZE fields generally aren't documented either. But it is fairly self-explanatory (provided one knows what an index-only scan is in the first place). No regression tests, but that too seems appropriate. It does what it says. We (or I, anyway) want what it does. As mentioned, we might want counts also exposed via the stats collector, but that is another matter. I don't see how it could possibly cause a meaningful slow-down, and tests on all-memory index-only scans cannot detect a slow down. On my machine, using EXPLAIN ANALYZE causes a 16 fold slow down on an in memory -i -s100 select count(*) from pgbench_accounts conducted after a run of pgbench -T30, so that there are some heap fetches needed. Compared to that ANALYZE slowdown, any additional slow down from this patch is ridiculously slow. It would be nice if you could get the output of this patch (and of the BUFFERS option to EXPLAIN) without incurring the timing overhead of ANALYZE. But again, that is not the subject of this patch. I wondered about the type of ioss_HeapFetches. Other members of execnodes.h structs tend to be int64, not long. But those others are not reported in EXPLAIN. Other things reported in explain.c tend to be long. This seems to be a foot in both worlds, and your response to Peter is convincing. I agree with Tom on the pre increment versus post increment of ioss_HeapFetches. I also wondered if ioss_HeapFetches ought to be initialized to zero. I looked for analogous code but didn't find enough analogy to convince me one way or the other. All the other members of IndexOnlyScanState have entries in the big comment block preceding the typedef, so I would expect ioss_HeapFetches should have one as well. These are all minor issues, and since you are a committer I'm going to mark it ready for committer. As a side-note, I noticed that I needed to run vacuum twice in a row to get the Heap Fetches to drop to zero. I vaguely recall that only one vacuum was needed when ios first went in (and I had instrumented it to elog heap-fetches). Does anyone know if this the expected consequence of one of the recent changes we made to vacuum? Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote: Fixed. The default value of TIMING option did not work as intended, it was set to true even for plain EXPLAIN (without ANALYZE). In that case the EXPLAIN failed. I've applied this over the show Heap Fetches in EXPLAIN ANALYZE output for index-only scans patch. It applied and built and passes installcheck. I have not done a full review of this, but I want to say that I want this very much. Not only can I get the row counts, but also the Heap Fetches and the output of BUFFERS, without the overhead of timing. I haven't looked at contrib aspects of it at all. Thank you for making this. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE
On Wed, Jan 4, 2012 at 12:35 PM, Noah Misch n...@leadboat.com wrote: I neglected to commit after revising the text of a few comments; use this version instead. Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers This is failing make check for me. *** /tmp/foo/src/test/regress/expected/alter_table.out Sat Jan 21 19:51:46 2012 --- /tmp/foo/src/test/regress/results/alter_table.out Sat Jan 21 19:54:18 2012 *** *** 1662,1667 --- 1662,1668 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index norewrite1_parent_pkey for table norewrite1_parent DEBUG: building index norewrite1_parent_pkey on table norewrite1_parent CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent); + DEBUG: building index pg_toast_41491_index on table pg_toast_41491 ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK DEBUG: validating foreign key constraint norewrite1_child_c_fkey CREATE DOMAIN other_int AS int; Settings: name | current_setting -+ version | PostgreSQL 9.2devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit lc_collate | C lc_ctype| C max_connections | 100 max_stack_depth | 2MB server_encoding | SQL_ASCII shared_buffers | 32MB TimeZone| US/Pacific wal_buffers | 1MB Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. This is bit-rotted such that initdb fails creating template1 database in /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3145) is duplicated. I bumped up those oids in the patch, and it passes make check once I figure out how to get the test run under UTF-8. Is it supposed to pass under other encodings? I can't tell from the rest of thread whether it supposed to pass in other encodings or not. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE
On Sat, Jan 21, 2012 at 08:04:20PM -0800, Jeff Janes wrote: This is failing make check for me. *** /tmp/foo/src/test/regress/expected/alter_table.out Sat Jan 21 19:51:46 2012 --- /tmp/foo/src/test/regress/results/alter_table.out Sat Jan 21 19:54:18 2012 *** *** 1662,1667 --- 1662,1668 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index norewrite1_parent_pkey for table norewrite1_parent DEBUG: building index norewrite1_parent_pkey on table norewrite1_parent CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent); + DEBUG: building index pg_toast_41491_index on table pg_toast_41491 ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK DEBUG: validating foreign key constraint norewrite1_child_c_fkey CREATE DOMAIN other_int AS int; Thanks. I've attached a new version fixing this problem. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index cc210f0..bb2cf62 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 276,281 static Oid transformFkeyCheckAttrs(Relation pkrel, --- 276,282 int numattrs, int16 *attnums, Oid *opclasses); static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts); + static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid *funcid); static void validateCheckConstraint(Relation rel, HeapTuple constrtup); static void validateForeignKeyConstraint(char *conname, Relation rel, Relation pkrel, *** *** 357,362 static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMOD --- 358,364 static void ATPostAlterTypeParse(Oid oldId, char *cmd, List **wqueue, LOCKMODE lockmode, bool rewrite); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); + static void TryReuseForeignKey(Oid oldId, Constraint *con); static void change_owner_fix_column_acls(Oid relationOid, Oid oldOwnerId, Oid newOwnerId); static void change_owner_recurse_to_sequences(Oid relationOid, *** *** 5574,5579 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5576,5583 numpks; Oid indexOid; Oid constrOid; + boolold_check_ok; + ListCell *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop); /* * Grab an exclusive lock on the pk table, so that someone doesn't delete *** *** 5690,5695 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5694,5706 (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg(number of referencing and referenced columns for foreign key disagree))); + /* +* On the strength of a previous constraint, we might avoid scanning +* tables to validate this one. See below. +*/ + old_check_ok = (fkconstraint-old_conpfeqop != NIL); + Assert(!old_check_ok || numfks == list_length(fkconstraint-old_conpfeqop)); + for (i = 0; i numpks; i++) { Oid pktype = pktypoid[i]; *** *** 5704,5709 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, --- 5715,5721 Oid ppeqop; Oid ffeqop; int16 eqstrategy; + Oid pfeqop_right; /* We need several fields out of the pg_opclass entry */ cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclasses[i])); *** *** 5746,5755 ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, eqstrategy); if (OidIsValid(pfeqop)) ffeqop = get_opfamily_member(opfamily, fktyped, fktyped, eqstrategy); else ! ffeqop = InvalidOid;/* keep compiler quiet */ if (!(OidIsValid(pfeqop) OidIsValid(ffeqop))) { --- 5758,5774 pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, eqstrategy); if (OidIsValid(pfeqop)) + { + pfeqop_right = fktyped; ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families
New version that repairs a defective test case. diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 712b0b0..1bf1de5 100644 *** a/src/backend/commands/indexcmds.c --- b/src/backend/commands/indexcmds.c *** *** 23,28 --- 23,29 #include catalog/pg_opclass.h #include catalog/pg_opfamily.h #include catalog/pg_tablespace.h + #include catalog/pg_type.h #include commands/dbcommands.h #include commands/defrem.h #include commands/tablecmds.h *** *** 52,57 --- 53,59 /* non-export function prototypes */ static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, + Oid *typeOidP, Oid *collationOidP, Oid *classOidP, int16 *colOptionP, *** *** 87,104 static void RangeVarCallbackForReindexIndex(const RangeVar *relation, * of columns and that if one has an expression column or predicate, both do. * Errors arising from the attribute list still apply. * ! * Most column type changes that can skip a table rewrite will not invalidate ! * indexes. For btree and hash indexes, we assume continued validity when ! * each column of an index would have the same operator family before and ! * after the change. Since we do not document a contract for GIN or GiST ! * operator families, we require an exact operator class match for them and ! * for any other access methods. ! * ! * DefineIndex always verifies that each exclusion operator shares an operator ! * family with its corresponding index operator class. For access methods ! * having no operator family contract, confirm that the old and new indexes ! * use the exact same exclusion operator. For btree and hash, there's nothing ! * more to check. * * We do not yet implement a test to verify compatibility of expression * columns or predicates, so assume any such index is incompatible. --- 89,105 * of columns and that if one has an expression column or predicate, both do. * Errors arising from the attribute list still apply. * ! * Most column type changes that can skip a table rewrite do not invalidate ! * indexes. We ackowledge this when all operator classes, collations and ! * exclusion operators match. Though we could further permit intra-opfamily ! * changes for btree and hash indexes, that adds subtle complexity with no ! * concrete benefit for core types. ! ! * When a comparison or exclusion operator has a polymorphic input type, the ! * actual input types must also match. This defends against the possibility ! * that operators could vary behavior in response to get_fn_expr_argtype(). ! * At present, this hazard is theoretical: check_exclusion_constraint() and ! * all core index access methods decline to set fn_expr for such calls. * * We do not yet implement a test to verify compatibility of expression * columns or predicates, so assume any such index is incompatible. *** *** 111,116 CheckIndexCompatible(Oid oldId, --- 112,118 List *exclusionOpNames) { boolisconstraint; + Oid*typeObjectId; Oid*collationObjectId; Oid*classObjectId; Oid accessMethodId; *** *** 123,132 CheckIndexCompatible(Oid oldId, int numberOfAttributes; int old_natts; boolisnull; - boolfamily_am; boolret = true; oidvector *old_indclass; oidvector *old_indcollation; int i; Datum d; --- 125,134 int numberOfAttributes; int old_natts; boolisnull; boolret = true; oidvector *old_indclass; oidvector *old_indcollation; + Relationirel; int i; Datum d; *** *** 168,177 CheckIndexCompatible(Oid oldId, indexInfo-ii_ExclusionOps = NULL; indexInfo-ii_ExclusionProcs = NULL; indexInfo-ii_ExclusionStrats = NULL; collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid)); classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid)); coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16)); ! ComputeIndexAttrs(indexInfo, collationObjectId, classObjectId, coloptions, attributeList, exclusionOpNames, relationId, accessMethodName, accessMethodId, --- 170,181 indexInfo-ii_ExclusionOps = NULL;
[HACKERS] Next steps on pg_stat_statements normalisation
So, having received feedback from Tom and others in relation to this patch, I would like to state how I think I should go about addressing various concerns to ensure that a revision of the patch gets into the 9.2 release. As I've said time and again, I think that it is very important that we have this, sooner rather than later. The main objection made in relation to my patch was to the overhead of the admittedly invasive (though purely mechanical) changes to the grammar to record the length of all tokens, so that Const tokens could subsequently have their length known in a principled fashion. While I believe that you'd be extremely hard pushed to demonstrate any regression at all from these extra cycles, it's also true that given the starting position of tokens (a value already exposed for the purposes of producing error messages outside the parser that reference particular tokens with a caret to their position in the query string, corresponding to various nodes) there's no reason to think that it should not be possible to calculate the total length on-the-fly from within pg_stat_statements. The syntax for constants is sufficiently simple that I think that a good set of regression tests should make this entirely practicable, covering all permutations of relevant factors affecting how the implementation should act, including for example standard_conforming_strings. There's no reason to think that the SQL syntax rules for constants should need to change very frequently, or even at all, so we should be fine with just knowing the starting position. It's quicker and easier to do it this way than to argue the case for my original implementation, so that's what I'll do. Whatever overhead this independent, pg_stat_statements-internal const parsing may impose, it will at least only be paid once per query, when we first require a canonicalised representation of the query for the pg_stat_statements view. I have, in the past, spoken against the idea of parsing query strings on the fly (an objection proven to be well-founded by various third party tools), but this is something quite different - by starting from a position that is known to be where a Const token starts, we need only concern ourselves with constant syntax, a much more limited and stable thing to have to pin down. I imagined that by plugging into Planner_hook, and hashing the post-rewrite query tree immediately before it was passed to the planner, that there was a certain amount of merit in recognising as equivalent queries that would definitely result in the same plan, all other things, and selectivity estimates of constants, being equal. However, since some of that logic, such as the logic through which different though equivalent join syntaxes are normalised, actually occurs in the planner, it was judged that there was an undesirable inconsistency. I also felt that it was useful that the hashing occur on the post-rewrite query proper, but various examples illustrated that that might be problematic, such as the inability for the implementation to differentiate constants in the original query string from, for example, constants appearing in view definitions. Tom wanted to hash plans. I think that this view was partially informed by concerns about co-ordination of hooks. While I had worked around those problems, the solution that I'd proposed wasn't very pretty, and it wasn't that hard to imagine someone finding a reason to need to break it. While I think there might be a lot to be said for hashing plans, that isn't a natural adjunct to pg_stat_statements - the user expects to see exactly one entry for what they consider to be the same query, and I think you'd have a hard time convincing many people that it's okay to have a large number of entries for what they thought was the same query, just reflecting the fact that a different plan was used for each of several entries with the same query string, based on factors they didn't consider essential to the query that may be quite transitory. Some more marginal plans for the same query could be missing entirely from the fixed-size hash table. It wouldn't have been a very useful way of normalising queries, even if it did more accurately reflect where execution costs came from, so I am dead set against doing this for normalisation. However, it might be that we can add even more value in the 9.3 release by hashing plans as well, to do entirely more interesting things, but we need to have a canonicalised, normalised representation of queries first - that's obviously how people expect a tool like this to work, and it's difficult to imagine a practical work-flow for isolating poorly performing queries/plans that doesn't stem from that. Balancing all of these concerns, here's a basic outline of what I think that the way forward is: * Add a hook to core that allows modules to hook into the call to parse_analyze() and parse_analyze_varparams(). There'd probably be two single-line plugin functions for each, so
Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE
On Sat, Jan 21, 2012 at 9:05 PM, Noah Misch n...@leadboat.com wrote: On Sat, Jan 21, 2012 at 08:04:20PM -0800, Jeff Janes wrote: This is failing make check for me. *** /tmp/foo/src/test/regress/expected/alter_table.out Sat Jan 21 19:51:46 2012 --- /tmp/foo/src/test/regress/results/alter_table.out Sat Jan 21 19:54:18 2012 *** *** 1662,1667 --- 1662,1668 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index norewrite1_parent_pkey for table norewrite1_parent DEBUG: building index norewrite1_parent_pkey on table norewrite1_parent CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent); + DEBUG: building index pg_toast_41491_index on table pg_toast_41491 ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK DEBUG: validating foreign key constraint norewrite1_child_c_fkey CREATE DOMAIN other_int AS int; Thanks. I've attached a new version fixing this problem. Thanks, I've verified that it now passes make check. Looking at the code now, I don't think I'll be able to do a meaningful review in a reasonable time. I'm not familiar with that part or the code, and it is too complex for me to learn right now. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers