Re: [HACKERS] REVIEW: pg_stat_statements with query tree based normalization

2012-01-21 Thread Kääriäinen Anssi

I am going to produce another revision in response to feedback already
received. I intend to outline the steps that it will take to resolve
some outstanding issues in the next day or so. It would be nice if you
could take a look at the revised patch that is ultimately produced.
Should I keep you posted?


Please do.

I tried the patch because I wanted to investigate the Django test suite.
While I did that, I ended up doing a small review of the feature. I can easily
do that again with an updated patch.

 - Anssi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-21 Thread Peter Geoghegan
I decided to take advantage of my ongoing access to a benchmarking
server to see how I could get on with a query with an especially large
sort. Now, that box has 16GB of ram, and some people have that much
memory in their laptops these days, so I was somewhat limited in how
far I could push things.

I eventually decided upon much the same benchmark as I'd made in my
previous e-mail (the query SELECT * FROM pgbench_accounts ORDER BY
bid;, but primed with random data, while being sure to subsequently
vacuum).

I stress that I have not made any attempt to artificially remove
client overhead. I have, however, used a faster disk (caches were not
warmed in a seperate run of pgbench or anything like that for either
of my two most recent benchmarks, so there may have been and indeed
may still be some small bias there), in addition to connecting pgbench
with the -h option. Apparently a known problem with Linux kernels
using the Completely Fair Scheduler introduced in 2.6.23 is that it
does not schedule the pgbench program very well when it's connecting
to the database usingUnix-domain sockets, as I have been until now.
I'm not sure that this had all that much potential to spoil my
results, but didn't want to take the chance .

Anyway, the results of running this latest benchmark, with 20
successive runs of each large query, were:

Patch:

pghost: localhost pgport:  nclients: 1 nxacts: 20 dbName: pgbench
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
number of transactions per client: 20
number of transactions actually processed: 20/20
tps = 0.030924 (including connections establishing)
tps = 0.030924 (excluding connections establishing)
statement latencies in milliseconds:
31163.957400SELECT * FROM pgbench_accounts ORDER BY bid;

Master:

pghost: localhost pgport:  nclients: 1 nxacts: 20 dbName: pgbench
pghost: localhost pgport:  nclients: 1 nxacts: 20 dbName: pgbench
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
number of transactions per client: 20
number of transactions actually processed: 20/20
tps = 0.026769 (including connections establishing)
tps = 0.026769 (excluding connections establishing)
statement latencies in milliseconds:
36179.508750SELECT * FROM pgbench_accounts ORDER BY bid;

That's a larger proportional improvement than reported on Thursday,
and at significantly greater scale.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-21 Thread Kohei KaiGai
2012/1/19 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 19, 2012 at 3:51 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/1/19 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 In sepgsql side, it determines a case to apply permission checks
 according to the contextual information; that is same technique
 when we implemented create permission.
 Thus, it could checks db_xxx:{drop} permission correctly.

 Why do we need the contextual information in this case?  Why
 can't/shouldn't the decision be made solely on the basis of what
 object is targeted?

 Several code-paths to remove a particular objects are not appropriate
 to apply permission checks. For example...

 [1] Cleanup of temporary objects
 on_shmem_eixt() registers RemoveTempRelationsCallback(), then
 it eventually calls deleteWhatDependsOn() that have an invocation
 of deleteOneObject().
 This code path is just an internal cleanup process, not related to
 permission of the client.

 [2] Cleanup of transient table at VACUUM FULL/CLUSTER command
 rebuild_relation() creates a temporary table with make_new_heap(),
 then it copies the contents of original table according to the order of
 index, and calls finish_heap_swap() that swaps relation files and
 removes the temporary table using performDeletion().
 This code actually create and drop a table, however, it is quite
 internal design and not related to permission of the client.

 So, I think sepgsql should only applied to permission checks
 object deletion invoked by user's operations, such as DropStmt.

 I agree with that theory, but isn't this method of implementing that a
 pretty horrible kludge?  For example, if you'd implemented it this way
 for 9.1, the recent drop-statement refactoring would have broken it.
 Or if, in the future, we add another type of statement that can drop
 things, this code will still compile just fine but will no longer work
 correctly.  ISTM that we need a way to either (1) not call the hook at
 all unless the operation is user-initiated, or (2) call the hook, but
 pass a flag indicating what sort of operation this is?

Yes. This approach requires to continue code revising on sepgsql-side
also for each major release.
I think the approach (1) raise an issue similar to what we discussed
when sepgsql implemented create permission; we have to know
details of extension module to determine whether the hook should
be called, or not. My preference is (2) that is more reasonable.

 Let's imagine another possible use of this hook: we want to emit some
 kind of log message every time a database object gets dropped.  I
 think that's a plausible use-case, and in that case what we'd want is:
 (1) VACUUM FULL or CLUSTER shouldn't call the hook at all, (2) cleanup
 of temporary objects should probably call the hook, but ideally with a
 flag to indicate that it's an internal (DB-initiated) operation, and
 (3) user activity should definitely call the hook.

 I'm not sure how we can cleanly get that behavior, but ISTM that's
 what we want...

I think the case (1) should also call the hook but with a flag that indicate
database internal stuff, because make_new_heap() calls OAT_POST_CREATE
hook, thus, we need to give extensions a chance to cleanup, if it did
something on this timing.

I'd like to propose to utilize DropBehavior argument of performDeletion()
to inform dependency.c its invoked context.
If we have a new flag DROP_INTERNAL that can be OR-masked with
existing DROP_RESTRICT or DROP_CASCADE, it can be used to
inform the current context, then, it shall be used to the flag to the hook.
It seems to me this approach is minimum invasive.

In addition, we may have a case when extension want to know whether
the deletion is cascaded, or explicitly specified by users. If they want to
implement same security model on this hook.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-21 Thread Thomas Munro
On 19/01/2012, Robert Haas robertmh...@gmail.com wrote:
 rhaas=# alter sequence foo_a_seq rename to bob;
 ALTER SEQUENCE

 If somebody renames the table or the column at this point, it's a good
 bet that they *don't* want bob renamed.

FWIW the patch addresses this case and wouldn't try to rename 'bob'.

 Another, admittedly minor consideration is that this can introduce
 some subtle concurrency bugs that we don't have today.  For example,
 suppose we choose a new name for the sequence which isn't in use, but
 then between the time when we pick the name and the time we do the
 insert the name becomes used, due to some concurrent transaction.  Now
 we'll fail with a rather baffling error message.  That isn't
 necessarily a huge problem - we have lots of code that is prone to
 such race conditions - but it's not wonderful either.  ...

I thought about this, and it seemed to me that (1) the same race
already applies when you CREATE a table with a serial column and (2)
anyone running a bunch of DDL concurrently with other DDL operations
already needs to coordinate their action or deal with occasional name
collisions in general.  But yeah I see that it's not ideal.

 I think we should remove this from the TODO list, or at least document
 that there are a number of reasons why it might be a deeper hole than
 it appears to be at first glance.

Fair enough, I'll leave it there.

Thanks for the feedback!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-21 Thread Ants Aasma
On 19 Jan 2012 21:00, Robert Haas robertmh...@gmail.com wrote:
 I agree.  It occurs to me that we recently changed the default *text*
 output format for bytea for reasons not dissimilar to those
 contemplated here.  Presumably, that's a much more disruptive change,
 and yet we've had minimal complaints because anyone who gets bitten
 can easily set bytea_output='escape' and the problem goes away.

I had a run in with this. JDBC driver versions  9.0 with the default
configuration resulted in silent data corruption. The fix was easy, but not
having an useful error was what really bothered me.

--
Ants Aasma


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-21 Thread Marc Mamin
 
  c. Refine error handling of dblink.c. I think it preserves the
 previous behavior for column number mismatch and type
 conversion exception.

Hello,

I don't know if this cover following issue.
I just mention it for the case you didn't notice it and would like to
handle this rather cosmetic issue as well.

http://archives.postgresql.org/pgsql-bugs/2011-08/msg00113.php

best regards,

Marc Mamin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump custom format specification

2012-01-21 Thread Guillermo M. Narvaja
Someone has some kind of specification or description of the pg_dump
custom format?

What I'm trying to do is an utility to remotelly syncronize postgresql
dumps, using lib_rsync to syncronize each table independently and
copying blobs only when sizes differ. 

I've made the same using the Tar format, but I think most of the
features and optimizations like paralell-restore are now only available
with the custom format.

I will do the script in Python, so, I need to know how to parse and
write pg_dumps in custom format.

Thanks in advance.

-- 
Guillermo M. Narvaja
Lambda Sistemas S.R.L.
www.fierro-soft.com.ar
Tel: (5411) 4139-0493/4
Cel: (5411) 15-6783-4435
Email: guillermo.narv...@fierro-soft.com.ar
MSN: guillermo_narv...@hotmail.com
Skype: guillermonarvaja
Lavalleja 519 1er Piso - Ciudad de Buenos Aires - Argentina


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump custom format specification

2012-01-21 Thread Andrew Dunstan



On 01/21/2012 07:44 AM, Guillermo M. Narvaja wrote:

Someone has some kind of specification or description of the pg_dump
custom format?

What I'm trying to do is an utility to remotelly syncronize postgresql
dumps, using lib_rsync to syncronize each table independently and
copying blobs only when sizes differ.

I've made the same using the Tar format, but I think most of the
features and optimizations like paralell-restore are now only available
with the custom format.

I will do the script in Python, so, I need to know how to parse and
write pg_dumps in custom format.


You might find this useful: https://gist.github.com/1258232 It's a 
perl script that parses and prints out the header and TOC of a custom 
format dump.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-21 Thread Robert Haas
On Fri, Jan 20, 2012 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 D'oh.  You're right.  Looks like I accidentally tried to apply this to
 the 9.1 sources.  Sigh...

 No worries. It's Friday.

Server passed 'make check' with this patch, but when I tried to fire
it up for some test runs, it fell over with:

FATAL:  no more LWLockIds available

I assume that it must be dependent on the config settings used.  Here are mine:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_writer_delay = 20ms

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-21 Thread Simon Riggs
On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

 Corrected your point, and another found during retest.

 Works as advertised, docs corrected.

 This comment needs updating:

    /*
     * To drop an index safely, we must grab exclusive lock on its parent
     * table.  Exclusive lock on the index alone is insufficient because
     * another backend might be about to execute a query on the parent table.
     * If it relies on a previously cached list of index OIDs, then it could
     * attempt to access the just-dropped index.  We must therefore take a
     * table lock strong enough to prevent all queries on the table from
     * proceeding until we commit and send out a shared-cache-inval notice
     * that will make them update their index lists.
     */

OK, I did reword some but clearly not enough.

 Speaking of that comment, why does a concurrent index drop need any
 lock at all on the parent table?  If, as the current comment text
 says, the point of locking the parent table is to make sure that no
 new backends can create relcache entries until our transaction
 commits, then it's not needed here, or at least not for that purpose.
 We're going to out-wait anyone who has the index open *after* we've
 committed our changes to the pg_index entry, so there shouldn't be a
 race condition there.  There may very well be a reason why we need a
 lock on the parent, or there may not, but that's not it.

I feel happier locking the parent as well. I'd rather avoid strange
weirdness later as has happened before in this type of discussion.

 On the flip side, it strikes me that there's considerable danger that
 ShareUpdateExclusiveLock on the index might not be strong enough.  In
 particular, it would fall down if there's anyone who takes an
 AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
 to access the index data despite its being marked !indisready and
 !indisvalid.  There are certainly instances of this in contrib, such
 as in pgstatindex(), which I'm pretty sure will abort with a nastygram
 if somebody truncates away the file under it.  That might not be
 enough reason to reject the patch, but I do think it would be the only
 place in the system where we allow something like that to happen.  I'm
 not sure whether there are any similar risks in core - I am a bit
 worried about get_actual_variable_range() and gincostestimate(), but I
 can't tell for sure whether there is any actual problem there, or
 elsewhere.  I wonder if we should only do the *first* phase of the
 drop with ShareUpdateExcusliveLock, and then after outwaiting
 everyone, take an AccessExclusiveLock on the index (but not the table)
 to do the rest of the work.  If that doesn't allow the drop to proceed
 concurrently, then we go find the places that block against it and
 teach them not to lock/use/examine indexes that are !indisready.  That
 way, the worst case is that D-I-C is less concurrent than we'd like,
 rather than making random other things fall over - specifically,
 anything that count on our traditional guarantee that AccessShareLock
 is enough to keep the file from disappearing.

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.


 In the department of minor nitpicks, the syntax synopsis you've added
 looks wrong:

 DROP INDEX
       [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 [, ...] [ CASCADE | RESTRICT ]
       CONCURRENTLY replaceable class=PARAMETERname/replaceable

 That implies that you may write if exists, followed by 1 or more
 names, followed by cascade or restrict.  After that you must write
 CONCURRENTLY, followed by exactly one name.  I think what you want is:

 DROP INDEX [ IF EXISTS ] replaceable
 class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
 DROP INDEX CONCURRENTLY replaceable class=PARAMETERname/replaceable

 ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
 CONCURRENTLY.  It could also be very useful to be able to concurrently
 drop multiple indexes at once, since that would require waiting out
 all concurrent transactions only once 

Re: [HACKERS] CLOG contention, part 2

2012-01-21 Thread Simon Riggs
On Sat, Jan 21, 2012 at 1:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 20, 2012 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 D'oh.  You're right.  Looks like I accidentally tried to apply this to
 the 9.1 sources.  Sigh...

 No worries. It's Friday.

 Server passed 'make check' with this patch, but when I tried to fire
 it up for some test runs, it fell over with:

 FATAL:  no more LWLockIds available

 I assume that it must be dependent on the config settings used.  Here are 
 mine:

 shared_buffers = 8GB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 300
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 wal_writer_delay = 20ms

Yes, it was. Sorry about that. New version attached, retesting while
you read this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 69b6ef3..6ff6894 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -37,6 +37,7 @@
 #include access/transam.h
 #include miscadmin.h
 #include pg_trace.h
+#include utils/snapmgr.h
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -70,10 +71,17 @@
 
 /*
  * Link to shared-memory data structures for CLOG control
+ *
+ * As of 9.2, we have 2 structures for commit log data.
+ * ClogCtl manages the main read/write part of the commit log, while
+ * the ClogHistoryCtl manages the now read-only, older part. ClogHistory
+ * removes contention from the path of transaction commits.
  */
 static SlruCtlData ClogCtlData;
+static SlruCtlData ClogHistoryCtlData;
 
-#define ClogCtl (ClogCtlData)
+#define ClogCtl 		(ClogCtlData)
+#define ClogHistoryCtl	(ClogHistoryCtlData)
 
 
 static int	ZeroCLOGPage(int pageno, bool writeXlog);
@@ -296,6 +304,10 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 
 		/* ... then the main transaction */
 		TransactionIdSetStatusBit(xid, status, lsn, slotno);
+
+		/* When we commit advance ClogCtl's shared RecentXminPageno if needed */
+		if (ClogCtl-shared-RecentXminPageno  TransactionIdToPage(RecentXmin))
+			ClogCtl-shared-RecentXminPageno = TransactionIdToPage(RecentXmin);
 	}
 
 	/* Set the subtransactions */
@@ -387,6 +399,8 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 XidStatus
 TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 {
+	SlruCtl		clog = ClogCtl;
+	bool		useClogHistory = true;
 	int			pageno = TransactionIdToPage(xid);
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -397,15 +411,35 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
-	slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, xid);
-	byteptr = ClogCtl-shared-page_buffer[slotno] + byteno;
+	/*
+	 * Decide whether to use main Clog or read-only ClogHistory.
+	 *
+	 * Our knowledge of the boundary between the two may be a little out
+	 * of date, so if we try Clog and can't find it we need to try again
+	 * against ClogHistory.
+	 */
+	if (pageno = ClogCtl-recent_oldest_active_page_number)
+	{
+		slotno = SimpleLruReadPage_ReadOnly(clog, pageno, xid);
+		if (slotno = 0)
+			useClogHistory = false;
+	}
+
+	if (useClogHistory)
+	{
+		clog = ClogHistoryCtl;
+		slotno = SimpleLruReadPage_ReadOnly(clog, pageno, xid);
+		Assert(slotno = 0);
+	}
+
+	byteptr = clog-shared-page_buffer[slotno] + byteno;
 
 	status = (*byteptr  bshift)  CLOG_XACT_BITMASK;
 
 	lsnindex = GetLSNIndex(slotno, xid);
-	*lsn = ClogCtl-shared-group_lsn[lsnindex];
+	*lsn = clog-shared-group_lsn[lsnindex];
 
-	LWLockRelease(CLogControlLock);
+	LWLockRelease(clog-shared-ControlLock);
 
 	return status;
 }
@@ -445,15 +479,19 @@ CLOGShmemBuffers(void)
 Size
 CLOGShmemSize(void)
 {
-	return SimpleLruShmemSize(CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE);
+	/* Reserve shmem for both ClogCtl and ClogHistoryCtl */
+	return SimpleLruShmemSize(2 * CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE);
 }
 
 void
 CLOGShmemInit(void)
 {
 	ClogCtl-PagePrecedes = CLOGPagePrecedes;
+	ClogHistoryCtl-PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, CLOG Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
   CLogControlLock, pg_clog);
+	SimpleLruInit(ClogHistoryCtl, CLOG History Ctl, CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+  CLogHistoryControlLock, pg_clog);
 }
 
 /*
@@ -592,6 +630,16 @@ CheckPointCLOG(void)
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
 	SimpleLruFlush(ClogCtl, true);
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
+
+	/*
+	 * Now that we've written out all dirty buffers the only pages that
+	 * will get dirty again will be pages with active transactions on them.
+	 * So we can move forward the oldest_active_page_number and allow
+	 * read only operations via ClogHistoryCtl.
+	 */
+	LWLockAcquire(CLogControlLock, 

Re: [HACKERS] xlog location arithmetic

2012-01-21 Thread Euler Taveira de Oliveira
On 23-12-2011 12:05, Tom Lane wrote:
 I too think a datatype is overkill, if we're only planning on providing
 one function.  Just emit the values as numeric and have done.
 
Here it is. Output changed to numeric. While the output was int8 I could use
pg_size_pretty but now I couldn't. I attached another patch that implements
pg_size_pretty(numeric).


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 48631cc..04bc24d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14378,6 +14378,9 @@ SELECT set_config('log_statement_stats', 'off', false);
indexterm
 primarypg_xlogfile_name_offset/primary
/indexterm
+   indexterm
+primarypg_xlog_location_diff/primary
+   /indexterm
 
para
 The functions shown in xref
@@ -14450,6 +14453,13 @@ SELECT set_config('log_statement_stats', 'off', false);
entrytypetext/, typeinteger//entry
entryConvert transaction log location string to file name and decimal byte offset within file/entry
   /row
+  row
+   entry
+literalfunctionpg_xlog_location_diff(parameterlocation/ typetext/, parameterlocation/ typetext/)/function/literal
+/entry
+   entrytypenumeric//entry
+   entryCalculate the difference between two transaction log locations/entry
+  /row
  /tbody
 /tgroup
/table
@@ -14543,6 +14553,13 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
/para
 
para
+	functionpg_xlog_location_diff/ calculates the difference in bytes
+	between two transaction log locations. It can be used with
+	structnamepg_stat_replication/structname or some functions shown in
+	xref linkend=functions-admin-backup-table to get the replication lag.
+   /para
+
+   para
 For details about proper usage of these functions, see
 xref linkend=continuous-archiving.
/para
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 2e10d4d..e03c5e8 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -26,6 +26,7 @@
 #include replication/walreceiver.h
 #include storage/smgr.h
 #include utils/builtins.h
+#include utils/numeric.h
 #include utils/guc.h
 #include utils/timestamp.h
 
@@ -465,3 +466,84 @@ pg_is_in_recovery(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_BOOL(RecoveryInProgress());
 }
+
+static void
+validate_xlog_location(char *str)
+{
+#define	MAXLSNCOMPONENT		8
+
+	int	len1, len2;
+
+	len1 = strspn(str, 0123456789abcdefABCDEF);
+	if (len1  1 || len1  MAXLSNCOMPONENT || str[len1] != '/')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg(invalid input syntax for transaction log location: \%s\, str)));
+	len2 = strspn(str + len1 + 1, 0123456789abcdefABCDEF);
+	if (len2  1 || len2  MAXLSNCOMPONENT || str[len1 + 1 + len2] != '\0')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg(invalid input syntax for transaction log location: \%s\, str)));
+}
+
+/*
+ * Compute the difference in bytes between two WAL locations.
+ */
+Datum
+pg_xlog_location_diff(PG_FUNCTION_ARGS)
+{
+	text	*location1 = PG_GETARG_TEXT_P(0);
+	text	*location2 = PG_GETARG_TEXT_P(1);
+	char	*str1, *str2;
+	uint64	xlogid1, xrecoff1;
+	uint64	xlogid2, xrecoff2;
+	Numeric	result;
+
+	/*
+	 * Read and parse input
+	 */
+	str1 = text_to_cstring(location1);
+	str2 = text_to_cstring(location2);
+
+	validate_xlog_location(str1);
+	validate_xlog_location(str2);
+
+	if (sscanf(str1, %8lX/%8lX, xlogid1, xrecoff1) != 2)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg(could not parse transaction log location \%s\, str1)));
+	if (sscanf(str2, %8lX/%8lX, xlogid2, xrecoff2) != 2)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg(could not parse transaction log location \%s\, str2)));
+
+	/*
+	 * Sanity check
+	 */
+	if (xrecoff1  XLogFileSize)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg(xrecoff \%lX\ is out of valid range, 0..%X, xrecoff1, XLogFileSize)));
+	if (xrecoff2  XLogFileSize)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg(xrecoff \%lX\ is out of valid range, 0..%X, xrecoff2, XLogFileSize)));
+
+	/*
+	 * result = XLogFileSize * (xlogid1 - xlogid2) + xrecoff1 - xrecoff2
+	 */
+	result = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
+			DirectFunctionCall1(int8_numeric, Int64GetDatum(xlogid1)),
+			DirectFunctionCall1(int8_numeric, Int64GetDatum(xlogid2;
+	result = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+			DirectFunctionCall1(int8_numeric, Int64GetDatum(XLogFileSize)),
+			NumericGetDatum(result)));
+	result = DatumGetNumeric(DirectFunctionCall2(numeric_add,
+			NumericGetDatum(result),
+			DirectFunctionCall1(int8_numeric, Int64GetDatum(xrecoff1;

Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)

2012-01-21 Thread Robert Haas
On Sat, Jan 21, 2012 at 3:59 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I marked the default leakproof function according to the criteria that
 does not leak contents of the argument.
 Indeed, timestamp_ne_timestamptz() has a code path that rises
 an error of timestamp out of range message. Is it a good idea to
 avoid mark leakproof on these functions also?

I think that anything which looks at the data and uses that as a basis
for whether or not to throw an error is non-leakproof.  Even if
doesn't directly leak an arbitrary value, I think that leaking even
some information about what the value is no good.  Otherwise, you
might imagine that we would allow /(int, int), because it only leaks
in the second_arg = 0 case.  And you might imagine we'd allow -(int,
int) because it only leaks in the case where an overflow occurs.  But
of course the combination of the two allows writing something of the
form 1/(a-constant) and getting it pushed down, and now you have the
ability to probe for an arbitrary value.  So I think it's just no good
to allow any leaking at all: otherwise it'll be unclear how safe it
really is, especially when combinations of different functions or
operators are involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-21 Thread Tomas Vondra
On 20 Leden 2012, 13:23, Magnus Hagander wrote:
 On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote:
 On 20.12.2011 19:59, Tomas Vondra wrote:
 On 20.12.2011 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable
 at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

 I've added the docs (see the attachment) and rebased to current head.

 Tomas

 Fixed a failing regression test (check of pg_stat_database structure).

 I'm wondering if this (and also my deadlocks stats patch that's int he
 queue) should instead of inventing new pgstats messages, add fields to
 the tabstat message. It sounds like that one is just for tables, but
 it's already the one collecting info about commits and rollbacks, and
 it's already sent on every commit.

Hmmm, I'm not against that, but I'd recommend changing the message name to
something that reflects the reality. If it's not just about table
statistics, it should not be named 'tabstats' IMHO. Or maybe split that
into two messages, both sent at the commit time.

I do like the idea of not sending the message for each temp file, though.
One thing that worries me are long running transactions (think about a
batch process that runs for several hours within a single transaction). By
sending the data only at the commit, such transactions would not be
accounted properly. So I'd suggest sending the message either at commit
time or after collecting enough data (increment a counter whenever the
struct is updated and send a message when the counter = N for a
reasonable value of N, say 20). But maybe it already works that way - I
haven't checked the current 'tabstat' implementation.

 Adding two fields to that one would add some extra bytes on every
 send, but I wonder if that woudl ever affect performance, given the
 total size of the packet? And it would certainly be lower overhead in
 the cases that there *have* been temp tables used.

It's not about temp tables, it's about temp files. Which IMHO implies that
there would be exactly 0.01% performance difference because temporary
files are quite expensive.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-21 Thread Magnus Hagander
On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote:
 On 20 Leden 2012, 13:23, Magnus Hagander wrote:
 On Tue, Jan 17, 2012 at 21:39, Tomas Vondra t...@fuzzy.cz wrote:
 On 20.12.2011 19:59, Tomas Vondra wrote:
 On 20.12.2011 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable
 at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

 I've added the docs (see the attachment) and rebased to current head.

 Tomas

 Fixed a failing regression test (check of pg_stat_database structure).

 I'm wondering if this (and also my deadlocks stats patch that's int he
 queue) should instead of inventing new pgstats messages, add fields to
 the tabstat message. It sounds like that one is just for tables, but
 it's already the one collecting info about commits and rollbacks, and
 it's already sent on every commit.

 Hmmm, I'm not against that, but I'd recommend changing the message name to
 something that reflects the reality. If it's not just about table
 statistics, it should not be named 'tabstats' IMHO. Or maybe split that
 into two messages, both sent at the commit time.

Yes, renaming it might be a good idea. If you split it into two
messages that would defeat much of the point.

Though I just looked at the tabstat code again, and we already split
that message up at regular intervals. Which would make it quite weird
to have global counters in it as well.

But instead of there, perhaps we need a general non table, but more
than one type of data message sent out at the same time. There is
other stuff in the queue for it.

I'm not convinced either way - I'm not against the original way in
your patch either. I just wanted to throw the idea out there, and was
hoping somebody else would have an opinion too :-)


 I do like the idea of not sending the message for each temp file, though.
 One thing that worries me are long running transactions (think about a
 batch process that runs for several hours within a single transaction). By
 sending the data only at the commit, such transactions would not be
 accounted properly. So I'd suggest sending the message either at commit

By that argument they are *already* not accounted properly, because
the number of rows and those counters are wrong. By sending the temp
file data in the middle of the transaction, you won't be able to
correlate those numbers with the temp file usage.

I'm not saying the other usecase isn't more common, but whichever way
you do it, it's going to get inconsistent with *something*.


 time or after collecting enough data (increment a counter whenever the
 struct is updated and send a message when the counter = N for a
 reasonable value of N, say 20). But maybe it already works that way - I
 haven't checked the current 'tabstat' implementation.

No, tabstat is sent at transaction end only.


 Adding two fields to that one would add some extra bytes on every
 send, but I wonder if that woudl ever affect performance, given the
 total size of the packet? And it would certainly be lower overhead in
 the cases that there *have* been temp tables used.

 It's not about temp tables, it's about temp files. Which IMHO implies that
 there would be exactly 0.01% performance difference because temporary
 files are quite expensive.

Bad choice of words, I meant temp files, and was thinking temp files
the whole way :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-01-21 Thread Dimitri Fontaine
Hi,

Thank you for reviewing this patch!

Hitoshi Harada umi.tan...@gmail.com writes:
 The patch applies with one reject, which I could fix easily. The make
 check passed.

Bitrot happens fast in this season…  will produce another version of the
patch.

 Table pg_catalog.pg_extension_feature
Column   | Type | Modifiers
 +--+---
  extoid | oid  | not null
  extfeature | name | not null
 Indexes:
 pg_extension_feature_index UNIQUE, btree (extoid, extfeature)
 pg_extension_feature_oid_index UNIQUE, btree (oid)

 * I'm not quit sure why pg_extension_feature_index needs extoid column.

That allows managing features per extension: you need to know which
extension is providing which feature to be able to solve dependencies.

 * I have a big question to add two-column catalog. I don't mind the
 actual number of columns, but if the table has only two columns, it
 implies the design may be bad. Only two column catalog other than this
 is pg_largeobject_metadata.

We need each feature to be a full PostgreSQL object so that we can use
the dependency tracking.  That allows to manage DROP EXTENSION foo and
cascade to extensions that depend on feature(s) provided by foo.

It also allows to track at update time when the feature set changes if
we're removing a feature that some other installed extension is
depending on, allowing our users to know that they should either drop
this other extension or update it too.

 Next, some questions:
 - Why is the finer dependency needed? Do you have tangible example
 that struggles with the dependency granularity? I feel so good about
 the existing dependency on extension as an extension developer of
 several ones.

The problem is not yet very apparent only because extensions are very
new. The main thing we address with this patch is depending on a feature
that appeared while developing an extension or that gets removed down
the line. It allows to depend on features and avoid needing to compare
version numbers and maintain a list of which version number is providing
which feature.

This feature has been asked by several extension users, beginning even
before 9.1 got released.

 - What happens if DROP EXTENSION ... CASCADE? Does it work?

It should, what happens when you try? :)

 - How does pg_upgrade interact with this? The dependency changes.

We're compatible because the extension name itself is always considered
as a feature and added to the catalogs, so that e.g. require = 'cube'
will now target a feature rather than an extension, and this feature
defaults to being provided by the 'create extension cube' statement that
pg_upgrade issues given 9.1 dependency tracking.

 A minor memo.
 list_extension_features(): I guess the size argument for repalloc is
 bogus.

Oh, a quick review of repalloc call points seems to indicate you're
right, I'll fix that in the next version of the patch.

 So, that's pretty much I've reviewed quickly. I'll need more time to
 look in detail, but I'd like more inputs for the high-level design and
 direction.

I hope to be answering to your questions here, please ask for more
details as you need them!

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Finer Extension dependencies

2012-01-21 Thread Tomas Vondra
On 21 Leden 2012, 18:20, Dimitri Fontaine wrote:
 Hi,

 Thank you for reviewing this patch!

 Hitoshi Harada umi.tan...@gmail.com writes:

 Next, some questions:
 - Why is the finer dependency needed? Do you have tangible example
 that struggles with the dependency granularity? I feel so good about
 the existing dependency on extension as an extension developer of
 several ones.

 The problem is not yet very apparent only because extensions are very
 new. The main thing we address with this patch is depending on a feature
 that appeared while developing an extension or that gets removed down
 the line. It allows to depend on features and avoid needing to compare
 version numbers and maintain a list of which version number is providing
 which feature.

 This feature has been asked by several extension users, beginning even
 before 9.1 got released.

It's also about several extension providing the same sort of functionality
implemented in different ways. Think about extensions that allow you to
send e-mails right from the database. One could do that directly, the
other one could implement queuing, another one could be optimized for a
specific SMTP server. With features, you could just say 'require = mail'
and use the extension that's installed. Yes, this needs a common API.

I personally see this as a major step towards fully-fledged package
management, similar to those available in modern Linux distributions (apt,
yum, portage, ...).

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: psql tab completion for GRANT role

2012-01-21 Thread Peter Eisentraut
On tis, 2012-01-17 at 13:03 +0100, Pavel Stehule wrote:
 Hello
 
 I did review of this small patch
 http://archives.postgresql.org/message-id/1326055692.15293.15.ca...@vanquo.pezone.net
 
 * This patch was applied without with one hunk and compiled without warnings
 
 bash-4.2$ patch -p1  autocompleta.patch
 patching file src/bin/psql/tab-complete.c
 Hunk #2 succeeded at 2335 with fuzz 1.
 
 * all regress tests was processed
 
 * this patch respect Pg coding standards
 * code is simple and has not impact on core functionality or core performance
 * regress tests are not requested for interactive function
 * I checked it, and autocomplete has expected behave.
 
 It is ready for commit

Committed, thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lots of unused variable warnings in assert-free builds

2012-01-21 Thread Peter Eisentraut
On ons, 2012-01-18 at 21:16 +0200, Peter Eisentraut wrote:
 On sön, 2012-01-15 at 08:14 -0500, Andrew Dunstan wrote:
  It would possibly have some documentary value too. Just looking very 
  quickly at Peter's patch, I don't really understand his assertion that
  this would significantly butcher the code. The worst effect would be 
  that in a few cases we'd have to break up multiple declarations where 
  one of the variables was in this class. That doesn't seem like a
  tragedy.
 
 Well, I'll prepare a patch like that and then you can judge.

So, here are three patches that could solve this issue.

pg-cassert-unused-attribute.patch, the one I already showed, using
__attribute__((unused).

pg-cassert-unused-ifdef.patch, using only additional #ifdef
USE_ASSERT_CHECKING.  This does have additional documentation value, but
you can see that it gets bulky and complicated.  (I introduced several
bugs merely while creating this patch.)

pg-cassert-unused-void.patch is an alternative approach that avoids the
warnings by casting the arguments of Assert() to void if assertions are
disabled.  This has the least code impact, but it changes the
traditional semantics of asserts, which is that they disappear
completely when not enabled.  You can see how this creates a problem in
src/backend/replication/syncrep.c, where the nontrivial call to
SyncRepQueueIsOrderedByLSN() now becomes visible even in non-assert
builds.  I played around with some other options like
__attribute__((pure)) to make the compiler optimize the function away in
that case, but that didn't appear to work.  So this might not be a
workable solution (and it would be GCC-specific anyway).

diff --git i/src/backend/access/hash/hashovfl.c w/src/backend/access/hash/hashovfl.c
index 130c296..d4329fb 100644
--- i/src/backend/access/hash/hashovfl.c
+++ w/src/backend/access/hash/hashovfl.c
@@ -391,7 +391,9 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 	uint32		ovflbitno;
 	int32		bitmappage,
 bitmapbit;
+#ifdef USE_ASSERT_CHECKING
 	Bucket		bucket;
+#endif
 
 	/* Get information from the doomed page */
 	_hash_checkpage(rel, ovflbuf, LH_OVERFLOW_PAGE);
@@ -400,7 +402,9 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
 	nextblkno = ovflopaque-hasho_nextblkno;
 	prevblkno = ovflopaque-hasho_prevblkno;
+#ifdef USE_ASSERT_CHECKING
 	bucket = ovflopaque-hasho_bucket;
+#endif
 
 	/*
 	 * Zero the page for debugging's sake; then write and release it. (Note:
diff --git i/src/backend/executor/execCurrent.c w/src/backend/executor/execCurrent.c
index b07161f..2a59fc6 100644
--- i/src/backend/executor/execCurrent.c
+++ w/src/backend/executor/execCurrent.c
@@ -151,7 +151,7 @@ execCurrentOf(CurrentOfExpr *cexpr,
 	{
 		ScanState  *scanstate;
 		bool		lisnull;
-		Oid			tuple_tableoid;
+		Oid			tuple_tableoid __attribute__((unused));
 		ItemPointer tuple_tid;
 
 		/*
diff --git i/src/backend/executor/nodeMaterial.c w/src/backend/executor/nodeMaterial.c
index b320b54..4ab660e 100644
--- i/src/backend/executor/nodeMaterial.c
+++ w/src/backend/executor/nodeMaterial.c
@@ -66,7 +66,7 @@ ExecMaterial(MaterialState *node)
 			 * Allocate a second read pointer to serve as the mark. We know it
 			 * must have index 1, so needn't store that.
 			 */
-			int			ptrno;
+			int			ptrno __attribute__((unused));
 
 			ptrno = tuplestore_alloc_read_pointer(tuplestorestate,
   node-eflags);
diff --git i/src/backend/executor/nodeSetOp.c w/src/backend/executor/nodeSetOp.c
index 7fa5730..c6a88a8 100644
--- i/src/backend/executor/nodeSetOp.c
+++ w/src/backend/executor/nodeSetOp.c
@@ -344,7 +344,7 @@ setop_fill_hash_table(SetOpState *setopstate)
 	SetOp	   *node = (SetOp *) setopstate-ps.plan;
 	PlanState  *outerPlan;
 	int			firstFlag;
-	bool		in_first_rel;
+	bool		in_first_rel __attribute__((unused));
 
 	/*
 	 * get state info from node
diff --git i/src/backend/executor/nodeWorktablescan.c w/src/backend/executor/nodeWorktablescan.c
index e2f3dd4..e72d1cb 100644
--- i/src/backend/executor/nodeWorktablescan.c
+++ w/src/backend/executor/nodeWorktablescan.c
@@ -30,7 +30,7 @@ static TupleTableSlot *
 WorkTableScanNext(WorkTableScanState *node)
 {
 	TupleTableSlot *slot;
-	EState	   *estate;
+	EState	   *estate __attribute__((unused));
 	Tuplestorestate *tuplestorestate;
 
 	/*
diff --git i/src/backend/libpq/be-fsstubs.c w/src/backend/libpq/be-fsstubs.c
index b864c86..1bf765c 100644
--- i/src/backend/libpq/be-fsstubs.c
+++ w/src/backend/libpq/be-fsstubs.c
@@ -378,7 +378,7 @@ lo_import_internal(text *filename, Oid lobjOid)
 {
 	File		fd;
 	int			nbytes,
-tmp;
+tmp __attribute__((unused));
 	char		buf[BUFSIZE];
 	char		fnamebuf[MAXPGPATH];
 	LargeObjectDesc *lobj;
diff --git i/src/backend/libpq/pqcomm.c w/src/backend/libpq/pqcomm.c
index 35812f4..df41d60 100644
--- i/src/backend/libpq/pqcomm.c
+++ w/src/backend/libpq/pqcomm.c
@@ -1373,7 +1373,7 @@ fail:
 void
 pq_putmessage_noblock(char msgtype, const char 

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-21 Thread Tomas Vondra
On 21 Leden 2012, 18:13, Magnus Hagander wrote:
 On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote:
 On 20 Leden 2012, 13:23, Magnus Hagander wrote:
 I'm wondering if this (and also my deadlocks stats patch that's int he
 queue) should instead of inventing new pgstats messages, add fields to
 the tabstat message. It sounds like that one is just for tables, but
 it's already the one collecting info about commits and rollbacks, and
 it's already sent on every commit.

 Hmmm, I'm not against that, but I'd recommend changing the message name
 to
 something that reflects the reality. If it's not just about table
 statistics, it should not be named 'tabstats' IMHO. Or maybe split that
 into two messages, both sent at the commit time.

 Yes, renaming it might be a good idea. If you split it into two
 messages that would defeat much of the point.

Not really, if combined with the 'send after each N updates or at the
transaction end' because then those parts might be sent independently
(thus not transmitting the whole message).

 Though I just looked at the tabstat code again, and we already split
 that message up at regular intervals. Which would make it quite weird
 to have global counters in it as well.

 But instead of there, perhaps we need a general non table, but more
 than one type of data message sent out at the same time. There is
 other stuff in the queue for it.

 I'm not convinced either way - I'm not against the original way in
 your patch either. I just wanted to throw the idea out there, and was
 hoping somebody else would have an opinion too :-)

Let's make that in a separate patch. It seems like a good thing to do, but
I don't think it should be mixed with this patch.

 I do like the idea of not sending the message for each temp file,
 though.
 One thing that worries me are long running transactions (think about a
 batch process that runs for several hours within a single transaction).
 By
 sending the data only at the commit, such transactions would not be
 accounted properly. So I'd suggest sending the message either at commit

 By that argument they are *already* not accounted properly, because
 the number of rows and those counters are wrong. By sending the temp
 file data in the middle of the transaction, you won't be able to
 correlate those numbers with the temp file usage.

 I'm not saying the other usecase isn't more common, but whichever way
 you do it, it's going to get inconsistent with *something*.

The question is whether the patch should be modified to send the message
only at commit (i.e. just like tabstats) or if it can remain the way it is
now. And maybe we could change the way all those messages are sent (e.g.
to the regular submits I've proposed in the previous post) to make them
more consistent.

I'm not asking for 100% consistency. What I don't like is a batch process
consuming resources but not reflected in the stats until the final commit.
With a huge spike in the stats right after that, although the activity was
actually spread over several hours.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-01-21 Thread Noah Misch
On Sat, Jan 14, 2012 at 08:18:48PM +0100, Marco Nenciarini wrote:
 This is our latest version of the patch. Gabriele, Gianni and I have
 discussed a lot and decided to send an initial patch which uses EACH
 REFERENCES instead of ARRAY REFERENCES. The reason behind this is that
 ARRAY REFERENCES generates an ambiguous grammar, and we all agreed that
 EACH REFERENCES makes sense (and the same time does not introduce any
 new keyword). This is however open for discussion.

I greatly like that name; it would still make sense for other aggregate types,
should we ever expand its use.  Please complete the name change: the
documentation, catalog entries, etc should all call them something like each
foreign key constraints (I don't particularly like that exact wording).

You currently forbid multi-column EACH FKs.  I agree that we should allow only
one array column per FK; with more, the set of required PK rows would be
something like the Cartesian product of the elements of array columns.
However, there are no definitional problems, at least for NO ACTION, around a
FK constraint having one array column and N scalar columns.  Whether or not
you implement that now, let's choose a table_constraint syntax leaving that
opportunity open.  How about:
FOREIGN KEY(col_a, EACH col_b, col_c) REFERENCES pktable (a, b, c)

You've identified that we cannot generally implement the ON DELETE ARRAY
CASCADE action on multidimensional arrays.  This patch chooses to downgrade to
ON DELETE ARRAY SET NULL in that case.  My initial reaction is that it would
be better to forbid multidimensional arrays in the column when the delete
action is ON DELETE ARRAY SET NULL.  That's not satisfying, either, because it
makes the definition of conforming data depend on the ON DELETE action.  Do we
have other options?

 --- - -
|   ON|   ON|
 Action | DELETE  | UPDATE  |
 --- - -
 CASCADE|   Row   |  Error  |
 SET NULL   |   Row   |   Row   |
 SET DEFAULT|   Row   |   Row   |
 ARRAY CASCADE  | Element | Element |
 ARRAY SET NULL | Element | Element |
 NO ACTION  |-|-|
 RESTRICT   |-|-|
 --- - - 

To complete the ARRAY - EACH transition, I would suggest names like CASCADE
EACH/SET EACH NULL.

I like the extensive test cases you have included.  There's one more thing
they should do: leave tables having EACH REFERENCES relationships in the
regression database.  This way, pg_dump tests of the regression database will
exercise pg_dump with respect to this feature.

The patch emits several warnings:

heap.c: In function `StoreRelCheck':
heap.c:1947: warning: passing argument 17 of `CreateConstraintEntry' makes 
integer from pointer without a cast
index.c: In function `index_constraint_create':
index.c:1160: warning: passing argument 17 of `CreateConstraintEntry' makes 
integer from pointer without a cast
In file included from gram.y:13051:
scan.c: In function `yy_try_NUL_trans':
scan.c:16243: warning: unused variable `yyg'
trigger.c: In function `CreateTrigger':
trigger.c:454: warning: passing argument 17 of `CreateConstraintEntry' makes 
integer from pointer without a cast
typecmds.c: In function `domainAddConstraint':
typecmds.c:2960: warning: passing argument 17 of `CreateConstraintEntry' makes 
integer from pointer without a cast
arrayfuncs.c: In function `array_remove':
arrayfuncs.c:5197: warning: unused variable `dimresult'
ri_triggers.c: In function `RI_FKey_check':
ri_triggers.c:484: warning: too many arguments for format

This test case, copied from my previous review except for updating the syntax,
still fails:

BEGIN;
CREATE TABLE parent (c int PRIMARY KEY);
CREATE TABLE child  (c int[]);
INSERT INTO parent VALUES (1);
INSERT INTO child VALUES ('{3,1,2}');
ALTER TABLE child ADD FOREIGN KEY (c) EACH REFERENCES parent; -- should error
INSERT INTO child VALUES ('{3,1,2}'); -- does error, as expected
ROLLBACK;

Most of my code comments concern minor matters:

 *** a/doc/src/sgml/ddl.sgml
 --- b/doc/src/sgml/ddl.sgml

 *** CREATE TABLE order_items (
 *** 852,857 
 --- 882,931 
  /para
   
  para
 + When working with foreign key arrays, you have two more
 + options that can be used with your
 + literalEACH REFERENCES/literal definition:
 + literalARRAY CASCADE/literal and
 + literalARRAY SET NULL/literal. Depending on
 + the triggering action (commandDELETE/command or
 + commandUPDATE/command) on the referenced table,
 + every element in the referencing array will be either
 + deleted/updated or set to NULL.
 + For more detailed information on foreign key arrays
 + options and special cases, please refer to the
 + documentation for xref linkend=sql-createtable.
 +/para
 + 
 + para
 + For instance, in the example below, a commandDELETE/command
 + from the literalcountries/literal table 

Re: [HACKERS] Review of patch renaming constraints

2012-01-21 Thread Peter Eisentraut
On fre, 2012-01-20 at 11:32 +0530, Nikhil Sontakke wrote:
 Agreed. And right now primary key constraints are not marked as only
 making them available for inheritance in the future. Or you prefer it
 otherwise?
 
 Anyways, fail to see the direct connection between this and renaming.
 Might have to look at this patch for that.

It checks conisonly to determine whether it needs to rename the
constraint in child tables as well.  Since a primary has conisonly =
false, it goes to the child tables, but the constraint it not there.

In the past, we have treated this merely as an implementation artifact:
check constraints are inherited, primary key constraints are not.  Now
we can choose for check constraints, with inherited being the default.
Having inheritable primary key constraints is a possible future feature.
So we need to think a littler harder now how to work that into the
existing logic.  This also ties in with the other thread about having
this in CREATE TABLE syntax.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-21 Thread Jeff Janes
On Sun, Jan 15, 2012 at 10:28 PM, Greg Smith g...@2ndquadrant.com wrote:

 I would like to be able to tell people they need never turn on
 log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets
 a good precedent for what direction the database is going.  It would be nice
 for pg_stat_bgwriter's format to settle down for a long period too.

While on the topic of the format of pg_stat_bgwriter and it converging
on stability:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes.  That is a pretty limiting limit.

I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.

I know you've argued against this before
(http://archives.postgresql.org/pgsql-performance/2011-03/msg00340.php),
but I think the assumption that all writes are absorbed by the OS
without blocking is assuming an awful lot.  If the server is not
behaving well, then it very well may be blocking on writes.  If we are
confident that the OS can always efficiently cache writes, why bother
with a background writer at all?

I think the argument that it is not important to know which strategy
motivated a backend write could just as validly be reformulated as an
argument that we don't need to know how many backend writes there were
in the first place.

Knowing only an aggregation of BAS-motivated backend writes and
straggling-bgwriter-motivated backend writes just doesn't seem useful
to me.  Am I missing a use for it?  Not only is it probably not
useful, it seems to be actively harmful as I've seen several times
where it leads people down false paths (including me several times, as
about once a year I forget why it is useless and spend time
rediscovering it).

If we are not willing to separate the two types of backend writes, or
just omit the BAS ones altogether, then I would suggest we drop the
column entirely.

Would a patch to separate the backend writes into two columns be
entertained for 9.3?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-01-21 Thread Peter Geoghegan
On 16 January 2012 00:59, Jeff Janes jeff.ja...@gmail.com wrote:
 I think it would be better to pre-deduct the tape overhead amount we
 will need if we decide to switch to tape sort from the availMem before
 we even start reading (and then add it back if we do indeed make that
 switch).  That way we wouldn't over-run the memory in the first place.
  However, that would cause apparent regressions in which sorts that
 previously fit into maintenance_work_mem no longer do.  Boosting
 maintenance_work_mem to a level that was actually being used
 previously would fix those regressions, but pointing out that the
 previous behavior was not optimal doesn't change the fact that people
 are used to it and perhaps tuned to it.  So the attached patch seems
 more backwards-friendly.

Hmm. Are people really setting maintenance_work_mem such that it is
exactly large enough to quicksort when building an index in one case
but not another? Is the difference large enough to warrant avoiding
pre-deduction?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-21 Thread Jeff Janes
On Fri, Jan 13, 2012 at 7:21 AM, Robert Haas robertmh...@gmail.com wrote:
 I think that people who are using index-only scans are going to want
 some way to find out the degree to which the scans are in fact
 index-only.

 So here's a 5-line patch that adds the number of heap fetches to the
 EXPLAIN ANALYZE output.  This might not be all the instrumentation
 we'll ever want here, but I think we at least want this much.

A review:

It is not in context diff format.

It applies, builds, and runs cleanly, including under --enable-cassert

No docs, but the output format of other EXPLAIN ANALYZE fields
generally aren't documented either.  But it is fairly self-explanatory
(provided one knows what an index-only scan is in the first place).

No regression tests, but that too seems appropriate.

It does what it says.  We (or I, anyway) want what it does.  As
mentioned, we might want counts also exposed via the stats collector,
but that is another matter.

I don't see how it could possibly cause a meaningful slow-down, and
tests on all-memory index-only scans cannot detect a slow down.

On my machine, using EXPLAIN ANALYZE causes a 16 fold slow down on an
in memory -i -s100 select count(*) from pgbench_accounts conducted
after a run of pgbench -T30, so that there are some heap fetches
needed.  Compared to that ANALYZE slowdown, any additional slow down
from this patch is ridiculously slow.  It would be nice if you could
get the output of this patch (and of the BUFFERS option to EXPLAIN)
without incurring the timing overhead of ANALYZE.  But again, that is
not the subject of this patch.

I wondered about the type of ioss_HeapFetches.  Other members of
execnodes.h structs tend to be int64, not long.  But those others are
not reported in EXPLAIN.  Other things reported in explain.c tend to
be long.  This seems to be a foot in both worlds, and your response to
Peter is convincing.

I agree with Tom on the pre increment versus post increment of ioss_HeapFetches.

I also wondered if ioss_HeapFetches ought to be initialized to zero.
I looked for analogous code but didn't find enough analogy to convince
me one way or the other.

All the other members of IndexOnlyScanState have entries in the big
comment block preceding the typedef, so I would expect
ioss_HeapFetches should have one as well.

These are all minor issues, and since you are a committer I'm going to
mark it ready for committer.

As a side-note, I noticed that I needed to run vacuum twice in a row
to get the Heap Fetches to drop to zero.  I vaguely recall that only
one vacuum was needed when ios first went in (and I had instrumented
it to elog heap-fetches).  Does anyone know if this the expected
consequence of one of the recent changes we made to vacuum?

Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-21 Thread Jeff Janes
On Fri, Jan 13, 2012 at 3:07 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. The default value of TIMING option did not work as intended, it
 was set to true even for plain EXPLAIN (without ANALYZE). In that case
 the EXPLAIN failed.


I've applied this over the show Heap Fetches in EXPLAIN ANALYZE
output for index-only scans patch.  It applied and built and passes
installcheck.

I have not done a full review of this, but I want to say that I want
this very much.  Not only can I get the row counts, but also the Heap
Fetches and the output of BUFFERS, without the overhead of timing.  I
haven't looked at contrib aspects of it at all.

Thank you for making this.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-21 Thread Jeff Janes
On Wed, Jan 4, 2012 at 12:35 PM, Noah Misch n...@leadboat.com wrote:
 I neglected to commit after revising the text of a few comments; use this
 version instead.  Thanks.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


This is failing make check for me.


*** /tmp/foo/src/test/regress/expected/alter_table.out  Sat Jan 21 19:51:46 2012
--- /tmp/foo/src/test/regress/results/alter_table.out   Sat Jan 21 19:54:18 2012
***
*** 1662,1667 
--- 1662,1668 
  NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
norewrite1_parent_pkey for table norewrite1_parent
  DEBUG:  building index norewrite1_parent_pkey on table norewrite1_parent
  CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent);
+ DEBUG:  building index pg_toast_41491_index on table pg_toast_41491
  ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK
  DEBUG:  validating foreign key constraint norewrite1_child_c_fkey
  CREATE DOMAIN other_int AS int;


Settings:

  name   |  current_setting
-+
 version | PostgreSQL 9.2devel on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 4.1.2 20070115 (SUSE Linux), 64-bit
 lc_collate  | C
 lc_ctype| C
 max_connections | 100
 max_stack_depth | 2MB
 server_encoding | SQL_ASCII
 shared_buffers  | 32MB
 TimeZone| US/Pacific
 wal_buffers | 1MB



Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-21 Thread Jeff Janes
On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 01/14/2012 03:06 PM, Andrew Dunstan wrote:




 OK, here's a patch that does both query_to_json and array_to_json, along
 with docs and regression tests. It include Robert's original patch, although
 I can produce a differential patch if required. It can also be pulled from
 https://bitbucket.org/adunstan/pgdevel




 Here's an update that adds row_to_json, plus a bit more cleanup.

This is bit-rotted such that initdb fails

creating template1 database in
/tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL:  could
not create unique index pg_proc_oid_index
DETAIL:  Key (oid)=(3145) is duplicated.

I bumped up those oids in the patch, and it passes make check once I
figure out how to get the test run under UTF-8.  Is it supposed to
pass under other encodings?  I can't tell from the rest of thread
whether it supposed to pass in other encodings or not.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-21 Thread Noah Misch
On Sat, Jan 21, 2012 at 08:04:20PM -0800, Jeff Janes wrote:
 This is failing make check for me.
 
 
 *** /tmp/foo/src/test/regress/expected/alter_table.out  Sat Jan 21 19:51:46 
 2012
 --- /tmp/foo/src/test/regress/results/alter_table.out   Sat Jan 21 19:54:18 
 2012
 ***
 *** 1662,1667 
 --- 1662,1668 
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 norewrite1_parent_pkey for table norewrite1_parent
   DEBUG:  building index norewrite1_parent_pkey on table norewrite1_parent
   CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent);
 + DEBUG:  building index pg_toast_41491_index on table pg_toast_41491
   ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK
   DEBUG:  validating foreign key constraint norewrite1_child_c_fkey
   CREATE DOMAIN other_int AS int;

Thanks.  I've attached a new version fixing this problem.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cc210f0..bb2cf62 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 276,281  static Oid transformFkeyCheckAttrs(Relation pkrel,
--- 276,282 
int numattrs, int16 *attnums,
Oid *opclasses);
  static void checkFkeyPermissions(Relation rel, int16 *attnums, int natts);
+ static CoercionPathType findFkeyCast(Oid targetTypeId, Oid sourceTypeId, Oid 
*funcid);
  static void validateCheckConstraint(Relation rel, HeapTuple constrtup);
  static void validateForeignKeyConstraint(char *conname,
 Relation rel, Relation 
pkrel,
***
*** 357,362  static void ATPostAlterTypeCleanup(List **wqueue, 
AlteredTableInfo *tab, LOCKMOD
--- 358,364 
  static void ATPostAlterTypeParse(Oid oldId, char *cmd,
 List **wqueue, LOCKMODE lockmode, bool 
rewrite);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
+ static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
 Oid oldOwnerId, Oid 
newOwnerId);
  static void change_owner_recurse_to_sequences(Oid relationOid,
***
*** 5574,5579  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5576,5583 
numpks;
Oid indexOid;
Oid constrOid;
+   boolold_check_ok;
+   ListCell   *old_pfeqop_item = list_head(fkconstraint-old_conpfeqop);
  
/*
 * Grab an exclusive lock on the pk table, so that someone doesn't 
delete
***
*** 5690,5695  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5694,5706 
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
 errmsg(number of referencing and referenced 
columns for foreign key disagree)));
  
+   /*
+* On the strength of a previous constraint, we might avoid scanning
+* tables to validate this one.  See below.
+*/
+   old_check_ok = (fkconstraint-old_conpfeqop != NIL);
+   Assert(!old_check_ok || numfks == 
list_length(fkconstraint-old_conpfeqop));
+ 
for (i = 0; i  numpks; i++)
{
Oid pktype = pktypoid[i];
***
*** 5704,5709  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
--- 5715,5721 
Oid ppeqop;
Oid ffeqop;
int16   eqstrategy;
+   Oid pfeqop_right;
  
/* We need several fields out of the pg_opclass entry */
cla_ht = SearchSysCache1(CLAOID, 
ObjectIdGetDatum(opclasses[i]));
***
*** 5746,5755  ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation 
rel,
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,

 eqstrategy);
else
!   ffeqop = InvalidOid;/* keep compiler quiet */
  
if (!(OidIsValid(pfeqop)  OidIsValid(ffeqop)))
{
--- 5758,5774 
pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
 
eqstrategy);
if (OidIsValid(pfeqop))
+   {
+   pfeqop_right = fktyped;
ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
   

Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-21 Thread Noah Misch
New version that repairs a defective test case.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 712b0b0..1bf1de5 100644
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***
*** 23,28 
--- 23,29 
  #include catalog/pg_opclass.h
  #include catalog/pg_opfamily.h
  #include catalog/pg_tablespace.h
+ #include catalog/pg_type.h
  #include commands/dbcommands.h
  #include commands/defrem.h
  #include commands/tablecmds.h
***
*** 52,57 
--- 53,59 
  /* non-export function prototypes */
  static void CheckPredicate(Expr *predicate);
  static void ComputeIndexAttrs(IndexInfo *indexInfo,
+ Oid *typeOidP,
  Oid *collationOidP,
  Oid *classOidP,
  int16 *colOptionP,
***
*** 87,104  static void RangeVarCallbackForReindexIndex(const RangeVar 
*relation,
   * of columns and that if one has an expression column or predicate, both do.
   * Errors arising from the attribute list still apply.
   *
!  * Most column type changes that can skip a table rewrite will not invalidate
!  * indexes.  For btree and hash indexes, we assume continued validity when
!  * each column of an index would have the same operator family before and
!  * after the change.  Since we do not document a contract for GIN or GiST
!  * operator families, we require an exact operator class match for them and
!  * for any other access methods.
!  *
!  * DefineIndex always verifies that each exclusion operator shares an operator
!  * family with its corresponding index operator class.  For access methods
!  * having no operator family contract, confirm that the old and new indexes
!  * use the exact same exclusion operator.  For btree and hash, there's nothing
!  * more to check.
   *
   * We do not yet implement a test to verify compatibility of expression
   * columns or predicates, so assume any such index is incompatible.
--- 89,105 
   * of columns and that if one has an expression column or predicate, both do.
   * Errors arising from the attribute list still apply.
   *
!  * Most column type changes that can skip a table rewrite do not invalidate
!  * indexes.  We ackowledge this when all operator classes, collations and
!  * exclusion operators match.  Though we could further permit intra-opfamily
!  * changes for btree and hash indexes, that adds subtle complexity with no
!  * concrete benefit for core types.
! 
!  * When a comparison or exclusion operator has a polymorphic input type, the
!  * actual input types must also match.  This defends against the possibility
!  * that operators could vary behavior in response to get_fn_expr_argtype().
!  * At present, this hazard is theoretical: check_exclusion_constraint() and
!  * all core index access methods decline to set fn_expr for such calls.
   *
   * We do not yet implement a test to verify compatibility of expression
   * columns or predicates, so assume any such index is incompatible.
***
*** 111,116  CheckIndexCompatible(Oid oldId,
--- 112,118 
 List *exclusionOpNames)
  {
boolisconstraint;
+   Oid*typeObjectId;
Oid*collationObjectId;
Oid*classObjectId;
Oid accessMethodId;
***
*** 123,132  CheckIndexCompatible(Oid oldId,
int numberOfAttributes;
int old_natts;
boolisnull;
-   boolfamily_am;
boolret = true;
oidvector  *old_indclass;
oidvector  *old_indcollation;
int i;
Datum   d;
  
--- 125,134 
int numberOfAttributes;
int old_natts;
boolisnull;
boolret = true;
oidvector  *old_indclass;
oidvector  *old_indcollation;
+   Relationirel;
int i;
Datum   d;
  
***
*** 168,177  CheckIndexCompatible(Oid oldId,
indexInfo-ii_ExclusionOps = NULL;
indexInfo-ii_ExclusionProcs = NULL;
indexInfo-ii_ExclusionStrats = NULL;
collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
!   ComputeIndexAttrs(indexInfo, collationObjectId, classObjectId,
  coloptions, attributeList,
  exclusionOpNames, relationId,
  accessMethodName, accessMethodId,
--- 170,181 
indexInfo-ii_ExclusionOps = NULL;

[HACKERS] Next steps on pg_stat_statements normalisation

2012-01-21 Thread Peter Geoghegan
So, having received feedback from Tom and others in relation to this
patch, I would like to state how I think I should go about addressing
various concerns to ensure that a revision of the patch gets into the
9.2 release. As I've said time and again, I think that it is very
important that we have this, sooner rather than later.

The main objection made in relation to my patch was to the overhead of
the admittedly invasive (though purely mechanical) changes to the
grammar to record the length of all tokens, so that Const tokens could
subsequently have their length known in a principled fashion. While I
believe that you'd be extremely hard pushed to demonstrate any
regression at all from these extra cycles, it's also true that given
the starting position of tokens (a value already exposed for the
purposes of producing error messages outside the parser that reference
particular tokens with a caret to their position in the query string,
corresponding to various nodes) there's no reason to think that it
should not be possible to calculate the total length on-the-fly from
within pg_stat_statements.

The syntax for constants is sufficiently simple that I think that a
good set of regression tests should make this entirely practicable,
covering all permutations of relevant factors affecting how the
implementation should act, including for example
standard_conforming_strings. There's no reason to think that the SQL
syntax rules for constants should need to change very frequently, or
even at all, so we should be fine with just knowing the starting
position. It's quicker and easier to do it this way than to argue the
case for my original implementation, so that's what I'll do. Whatever
overhead this independent, pg_stat_statements-internal const parsing
may impose, it will at least only be paid once per query, when we
first require a canonicalised representation of the query for the
pg_stat_statements view.

I have, in the past, spoken against the idea of parsing query strings
on the fly (an objection proven to be well-founded by various third
party tools), but this is something quite different - by starting from
a position that is known to be where a Const token starts, we need
only concern ourselves with constant syntax, a much more limited and
stable thing to have to pin down.

I imagined that by plugging into Planner_hook, and hashing the
post-rewrite query tree immediately before it was passed to the
planner, that there was a certain amount of merit in recognising as
equivalent queries that would definitely result in the same plan, all
other things, and selectivity estimates of constants, being equal.
However, since some of that logic, such as the logic through which
different though equivalent join syntaxes are normalised, actually
occurs in the planner, it was judged that there was an undesirable
inconsistency. I also felt that it was useful that the hashing occur
on the post-rewrite query proper, but various examples illustrated
that that might be problematic, such as the inability for the
implementation to differentiate constants in the original query string
from, for example, constants appearing in view definitions.

Tom wanted to hash plans. I think that this view was partially
informed by concerns about co-ordination of hooks. While I had worked
around those problems, the solution that I'd proposed wasn't very
pretty, and it wasn't that hard to imagine someone finding a reason to
need to break it. While I think there might be a lot to be said for
hashing plans, that isn't a natural adjunct to pg_stat_statements -
the user expects to see exactly one entry for what they consider to be
the same query, and I think you'd have a hard time convincing many
people that it's okay to have a large number of entries for what they
thought was the same query, just reflecting the fact that a different
plan was used for each of several entries with the same query string,
based on factors they didn't consider essential to the query that may
be quite transitory. Some more marginal plans for the same query could
be missing entirely from the fixed-size hash table. It wouldn't have
been a very useful way of normalising queries, even if it did more
accurately reflect where execution costs came from, so I am dead set
against doing this for normalisation. However, it might be that we can
add even more value in the 9.3 release by hashing plans as well, to do
entirely more interesting things, but we need to have a canonicalised,
normalised representation of queries first - that's obviously how
people expect a tool like this to work, and it's difficult to imagine
a practical work-flow for isolating poorly performing queries/plans
that doesn't stem from that.

Balancing all of these concerns, here's a basic outline of what I
think that the way forward is:

* Add a hook to core that allows modules to hook into the call to
parse_analyze() and parse_analyze_varparams(). There'd probably be two
single-line plugin functions for each, so 

Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-21 Thread Jeff Janes
On Sat, Jan 21, 2012 at 9:05 PM, Noah Misch n...@leadboat.com wrote:
 On Sat, Jan 21, 2012 at 08:04:20PM -0800, Jeff Janes wrote:
 This is failing make check for me.


 *** /tmp/foo/src/test/regress/expected/alter_table.out  Sat Jan 21 19:51:46 
 2012
 --- /tmp/foo/src/test/regress/results/alter_table.out   Sat Jan 21 19:54:18 
 2012
 ***
 *** 1662,1667 
 --- 1662,1668 
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 norewrite1_parent_pkey for table norewrite1_parent
   DEBUG:  building index norewrite1_parent_pkey on table 
 norewrite1_parent
   CREATE TABLE norewrite1_child(c text REFERENCES norewrite1_parent);
 + DEBUG:  building index pg_toast_41491_index on table pg_toast_41491
   ALTER TABLE norewrite1_child ALTER c TYPE varchar; -- recheck FK
   DEBUG:  validating foreign key constraint norewrite1_child_c_fkey
   CREATE DOMAIN other_int AS int;

 Thanks.  I've attached a new version fixing this problem.

Thanks, I've verified that it now passes make check.

Looking at the code now, I don't think I'll be able to do a meaningful
review in a reasonable time.  I'm not familiar with that part or the
code, and it is too complex for me to learn right now.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers