Re: [HACKERS] Buffer statistics for pg_stat_statements

2009-12-22 Thread Takahiro Itagaki

Cedric Villemain cedric.villem...@dalibo.com wrote:

 Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit :
  I'd like to add per-query buffer usage into contrib/pg_stat_statements.

Here is a patch to add buffer usage columns into pg_stat_statements.
It also changes initialzation of the result TupleDesc from manually
coded routines to get_call_result_type(). 

 Perhaps it can be usefull to have the percentage for hit/read ratio computed 
 in the view ?

I think different DBAs want different derived values; Some of them might want
the buffer hit ratio, but others might request numbers per query. I'd like to
privide only raw values from pg_stat_statements to keep it simple, but I added
a computational expression of hit percentage in the documentation. 

! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
!nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
!   FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
! -[ RECORD 1 
]-
! query   | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid 
= $2;
! calls   | 3000
! total_time  | 9.6090010002
! rows| 2836
! hit_percent | 99.977897200936

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



pg_stat_statements_bufusage_20091222.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Sun, 2009-12-20 at 19:11 -0500, Robert Haas wrote:
 On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote:
  I put them on the TODO list at
  https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix
  category.
 
  I notice you also re-arranged other items on there, specifically the
  notion that starting from a shutdown checkpoint is somehow important.
  It's definitely not any kind of bug.
 
  We've discussed this on-list and I've requested that you justify this.
  So far, nothing you've said on that issue has been at all convincing for
  me or others. The topic is already mentioned on the HS todo, since if
  one person requests something we should track that, just in case others
  eventually agree. But having said that, it clearly isn't a priority, so
  rearranging the item like that was not appropriate, unless you were
  thinking of doing it yourself, though that wasn't marked.
 
 This doesn't match my recollection of the previous discussion on this
 topic.  I am not sure that I'd call it a bug, but I'd definitely like
 to see it fixed, and I think I mentioned that previously, though I
 don't have the email in front ATM.  I am also not aware that anyone
 other than yourself has opined that we should not worry about fixing
 it, although I might be wrong about that too.  At any rate, clearly
 not a priority seems like an overstatement relative to my memory of
 that conversation.

Please check the thread then. Nobody but me has opined that we should
not worry about fixing it, but then nobody else other than Heikki has
suggested it is even a feature worthy of inclusion, ever. One person
agreed with my position, nobody has spoken in favour of Heikki's
position. However, I had already included the feature on the todo; it
was further down the todo before a second copy was added, second copy
now removed.

If you are saying being able to start Hot Standby from a shutdown
checkpoint is an important feature for you, then say so, and why.

Please also be careful that you don't mix this up with other
improvements, nor say they all need fixing. This isn't a general
discussion on those points. There are other important things.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New VACUUM FULL

2009-12-22 Thread Takahiro Itagaki

Simon Riggs si...@2ndquadrant.com wrote:

 I think we
 either need to implement that or document that vacuum will not skip
 all-visible pages when running VACUUM FULL.

All-visible does not always mean filled enough, no?  We will need to
check both visibility and fillfactor when we optimize copying tuples.

 Old VACUUM FULL was substantially faster than this on tables that had
 nothing to remove.

Yeah, that's why traditional VACUUM FULL is still kept as INPLACE vacuum.

 Please can you arrange for the cluster operation to skip rebuilding
 indexes if the table is not reduced in size? 

Do you think we should dispose the rewritten table when we find the
VACUUM FULL (or CLUSTER) is useless?  We could save the time to reindex,
but we've already consumed time to rewrite tables. It will be still
slower than VACUUM FULL INPLACE because it is read-only.

Instead, I'd suggest to have conditional database-wide maintenance
commands, something like:
VACUUM FULL WHERE the table is fragmented

We don't have to support the feature by SQL syntax; it could be done
in client tools.  How about pg_maintain or pg_ctl maintain that cleans
up each relation with appropriate command? We could replace vacuumdb,
clusterdb, and reindexdb with it then.


 Part of the reason why these happen is that the code hasn't been
 refactored much at all from its original use for cluster. There are
 almost zero comments to explain the additional use of this code for
 VACUUM, or at least to explain it still all works even when there is no
 index.

Ok, I'll check for additional comments. The flow of code might be still
confusable because vacuum() calls cluster(), but we need major replacement
of codes to refactor it. I'm not sure we need the code rewrites for it.

Also, I think we need additional messages for VACUUM VERBOSE
(and CLUSTER VERBOSE), though it might be adjusted in another patch.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fdw validation function vs zero catalog id

2009-12-22 Thread Martin Pihlak
I wrote:
 The validator is run for the generic options specified to CREATE/ALTER FDW,
 SERVER and USER MAPPING (+ possible future SQL/MED objects). In this case the
 catalog is always known. Also we can assume that the catalog is known, if a 
 user
 runs the validator directly. So yes, AFAICS there is no case for the or 
 zero.
 

Updated patch attached. This now also removes the or zero note from
the documentation and modifies postgresql_fdw_validator() to assume that
a valid catalog oid is always passed.

regards,
Martin

*** a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml
--- b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml
***
*** 74,83  CREATE FOREIGN DATA WRAPPER replaceable class=parametername/replaceable
take two arguments: one of type typetext[]/type, which will
contain the array of options as stored in the system catalogs,
and one of type typeoid/type, which will be the OID of the
!   system catalog containing the options, or zero if the context is
!   not known.  The return type is ignored; the function should
!   indicate invalid options using
!   the functionereport()/function function.
   /para
  /listitem
 /varlistentry
--- 74,82 
take two arguments: one of type typetext[]/type, which will
contain the array of options as stored in the system catalogs,
and one of type typeoid/type, which will be the OID of the
!   system catalog containing the options. The return type is ignored;
!   the function should indicate invalid options using the
!   functionereport()/function function.
   /para
  /listitem
 /varlistentry
*** a/src/backend/commands/foreigncmds.c
--- b/src/backend/commands/foreigncmds.c
***
*** 88,94  optionListToArray(List *options)
   * This is used by CREATE/ALTER of FOREIGN DATA WRAPPER/SERVER/USER MAPPING.
   */
  static Datum
! transformGenericOptions(Datum oldOptions,
  		List *options,
  		Oid fdwvalidator)
  {
--- 88,95 
   * This is used by CREATE/ALTER of FOREIGN DATA WRAPPER/SERVER/USER MAPPING.
   */
  static Datum
! transformGenericOptions(Oid catalogId,
! 		Datum oldOptions,
  		List *options,
  		Oid fdwvalidator)
  {
***
*** 162,168  transformGenericOptions(Datum oldOptions,
  	result = optionListToArray(resultOptions);
  
  	if (fdwvalidator)
! 		OidFunctionCall2(fdwvalidator, result, (Datum) 0);
  
  	return result;
  }
--- 163,169 
  	result = optionListToArray(resultOptions);
  
  	if (fdwvalidator)
! 		OidFunctionCall2(fdwvalidator, result, ObjectIdGetDatum(catalogId));
  
  	return result;
  }
***
*** 384,390  CreateForeignDataWrapper(CreateFdwStmt *stmt)
  
  	nulls[Anum_pg_foreign_data_wrapper_fdwacl - 1] = true;
  
! 	fdwoptions = transformGenericOptions(PointerGetDatum(NULL), stmt-options,
  		 fdwvalidator);
  
  	if (PointerIsValid(DatumGetPointer(fdwoptions)))
--- 385,393 
  
  	nulls[Anum_pg_foreign_data_wrapper_fdwacl - 1] = true;
  
! 	fdwoptions = transformGenericOptions(ForeignDataWrapperRelationId,
! 		 PointerGetDatum(NULL),
! 		 stmt-options,
  		 fdwvalidator);
  
  	if (PointerIsValid(DatumGetPointer(fdwoptions)))
***
*** 501,507  AlterForeignDataWrapper(AlterFdwStmt *stmt)
  			datum = PointerGetDatum(NULL);
  
  		/* Transform the options */
! 		datum = transformGenericOptions(datum, stmt-options, fdwvalidator);
  
  		if (PointerIsValid(DatumGetPointer(datum)))
  			repl_val[Anum_pg_foreign_data_wrapper_fdwoptions - 1] = datum;
--- 504,513 
  			datum = PointerGetDatum(NULL);
  
  		/* Transform the options */
! 		datum = transformGenericOptions(ForeignDataWrapperRelationId,
! 		datum,
! 		stmt-options,
! 		fdwvalidator);
  
  		if (PointerIsValid(DatumGetPointer(datum)))
  			repl_val[Anum_pg_foreign_data_wrapper_fdwoptions - 1] = datum;
***
*** 667,673  CreateForeignServer(CreateForeignServerStmt *stmt)
  	nulls[Anum_pg_foreign_server_srvacl - 1] = true;
  
  	/* Add server options */
! 	srvoptions = transformGenericOptions(PointerGetDatum(NULL), stmt-options,
  		 fdw-fdwvalidator);
  
  	if (PointerIsValid(DatumGetPointer(srvoptions)))
--- 673,681 
  	nulls[Anum_pg_foreign_server_srvacl - 1] = true;
  
  	/* Add server options */
! 	srvoptions = transformGenericOptions(ForeignServerRelationId,
! 		 PointerGetDatum(NULL),
! 		 stmt-options,
  		 fdw-fdwvalidator);
  
  	if (PointerIsValid(DatumGetPointer(srvoptions)))
***
*** 765,771  AlterForeignServer(AlterForeignServerStmt *stmt)
  			datum = PointerGetDatum(NULL);
  
  		/* Prepare the options array */
! 		datum = transformGenericOptions(datum, stmt-options,
  		fdw-fdwvalidator);
  
  		if (PointerIsValid(DatumGetPointer(datum)))
--- 773,781 
  			datum = PointerGetDatum(NULL);
  
  		/* Prepare the options array */
! 		datum = 

Re: [HACKERS] New VACUUM FULL

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote:
 
  Old VACUUM FULL was substantially faster than this on tables that
 had
  nothing to remove.

 Yeah, that's why traditional VACUUM FULL is still kept as INPLACE
 vacuum.
 
  Please can you arrange for the cluster operation to skip rebuilding
  indexes if the table is not reduced in size? 
 
 Do you think we should dispose the rewritten table when we find the
 VACUUM FULL (or CLUSTER) is useless?  We could save the time to
 reindex,
 but we've already consumed time to rewrite tables. 

The main purpose of doing a new VF is to compact space, so its role has
slightly changed from earlier versions. We need much clearer docs about
this.

On a production system, it seems better to skip the re-indexing, which
could take a long, long time. I don't see any way to skip completely
re-writing the table though, other than scanning the table with a normal
VACUUM first, as old VF does. 

I would be inclined towards the idea that if somebody does a VF of a
whole database then we should look out for and optimise for tables with
no changes, but when operating on a single table we just do as
instructed and rebuild everything, including indexes. That seems like we
should do it, but we're running out of time.

For now, I think we can easily and should skip the index build, if
appropriate. That just takes a little reworking of code, which appears
necessary anyway. We should just document that this works a little
differently now and a complete db VF is now not either necessary or a
good thing. And running 
  VACUUM; REINDEX DATABASE foo;
will now be very pointless.

 It will be still
 slower than VACUUM FULL INPLACE because it is read-only.

Understood, lets document that.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small typos in Hot Standby docs

2009-12-22 Thread Simon Riggs
On Sat, 2009-12-19 at 18:42 -0800, John Naylor wrote:
 Here's a patch:

Thanks John, much appreciated.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New VACUUM FULL

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote:

 Instead, I'd suggest to have conditional database-wide maintenance
 commands, something like:
 VACUUM FULL WHERE the table is fragmented
 
 We don't have to support the feature by SQL syntax; it could be done
 in client tools.  How about pg_maintain or pg_ctl maintain that cleans
 up each relation with appropriate command? We could replace vacuumdb,
 clusterdb, and reindexdb with it then.

Some broad thoughts...

Our perception of acceptable change is much higher than most users. If
we tell people use VACUUM FULL or vacuumdb -f, then that command
should, if possible, continue to work well across many releases.
vacuumdb in most people's minds is the command you run to ease
maintenance and make everything right, rather than a specific set of
features.

We have It just works as a principle. I think the corollary of that is
that we should also have It just continues to work the same way.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New VACUUM FULL

2009-12-22 Thread Takahiro Itagaki

Simon Riggs si...@2ndquadrant.com wrote:

 Our perception of acceptable change is much higher than most users. If
 we tell people use VACUUM FULL or vacuumdb -f, then that command
 should, if possible, continue to work well across many releases.
 vacuumdb in most people's minds is the command you run to ease
 maintenance and make everything right, rather than a specific set of
 features.
 
 We have It just works as a principle. I think the corollary of that is
 that we should also have It just continues to work the same way.

I used VACUUM FULL because we were discussing to drop VFI completely,
but I won't replace the behavior if hot-standby can support VFI.
We can use any keywords without making it reserved in VACUUM (...) syntax.
VACUUM (REWRITE), the first idea, can be used here. We can also take on
entirely-different syntax for it -- ex, ALTER TABLE REWRITE or SHRINK.

I think the ALTER TABLE idea is not so bad because it does _not_ support
database-wide maintenance. REWRITE is not the best maintenance in normal
cases because a database should contain some rarely-updated tables.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small Bug in GetConflictingVirtualXIDs

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
 On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
  Giving the drop database a snapshot is not the answer. I expect Andres
  to be able to fix this with a simple patch that would not effect the
  case of normal running.
 Actually its less simply than I had thought at first - I don't think the code 
 ever handled that correctly.
 I might be wrong there, my knowledge of the involved code is a bit sparse...
 The whole conflict resolution builds on the concept of waiting for an VXid, 
 but 
 an idle backend does not have a valid vxid. Thats correct, right?

Yes, that's correct. I'll take this one back then.

 Sure, the code should be modifyable to handle that code mostly transparently 
 (simply ignoring a invalid localTransactionId when the database id is valid), 
 but ...
 
 I am inclined to just unconditionally kill the users of the database. Its not 
 like that would be an issue in production. I cant see a case where its 
 important to run a session to its end on a database which was dropped on the 
 master.
 Opinions on that?

I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table size does not include toast size

2009-12-22 Thread Cédric Villemain
2009/12/21 Tom Lane t...@sss.pgh.pa.us:
 Greg Smith g...@2ndquadrant.com writes:
 To answer Rafael's concerns directly:  you're right that this is
 confusing.  pg_relation_size is always going to do what it does right
 now just because of how that fits into the design of the database.
 However, the documentation should be updated to warn against the issue
 with TOAST here.  And it should be easier to get the total you're like
 to see here:  main relation + toasted parts, since that's what most DBAs
 want in this area.

 Perhaps invent  pg_table_size() = base table + toast table + toast index
 and             pg_indexes_size() = all other indexes for table
 giving us the property pg_table_size + pg_indexes_size =
 pg_total_relation_size

Did you mean :
 pg_table_size() = base table + toast table
 pg_indexes_size() = base indexes + toast indexes
?


 I think the 8.4 documentation already makes it apparent that
 pg_relation_size is a pretty low-level number.  If we invent other
 functions with obvious names, that should be sufficient.

                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer statistics for pg_stat_statements

2009-12-22 Thread Cédric Villemain
2009/12/22 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 Cedric Villemain cedric.villem...@dalibo.com wrote:

 Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit :
  I'd like to add per-query buffer usage into contrib/pg_stat_statements.

 Here is a patch to add buffer usage columns into pg_stat_statements.
 It also changes initialzation of the result TupleDesc from manually
 coded routines to get_call_result_type().

 Perhaps it can be usefull to have the percentage for hit/read ratio computed
 in the view ?

 I think different DBAs want different derived values; Some of them might want
 the buffer hit ratio, but others might request numbers per query. I'd like to
 privide only raw values from pg_stat_statements to keep it simple, but I added
 a computational expression of hit percentage in the documentation.

Yes, you are right.


 ! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
 !                nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
 !           FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
 ! -[ RECORD 1 
 ]-
 ! query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE 
 bid = $2;
 ! calls       | 3000
 ! total_time  | 9.6090010002
 ! rows        | 2836
 ! hit_percent | 99.977897200936

 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Florian Pflug

On 22.12.09 9:34 , Simon Riggs wrote:

If you are saying being able to start Hot Standby from a shutdown
checkpoint is an important feature for you, then say so, and why.


I think it's not so much an important feature but more the removal of a
footgun.

Image a reporting database where all transactions but a few daily bulk
imports are read-only. To spread the load, you do your bulk loads on the
master, but run the reporting queries against a read-only HS slave. Now
you take the master down for maintenance. Since all clients but the bulk
loader use the slave already, and since the bulk loads can be deferred
until after the maintenance window closes again, you don't actually do a
fail-over.

Now you're already pointing at your foot with the gun. All it takes to
ruin your day is *some* reason for the slave to restart. Maybe due to a
junior DBA's typo, or maybe due to a bug in postgres. Anway, once the
slave is down, it won't come up until you manage to get the master up
and running again. And this limitation is pretty surprising, since one
would assume that if the slave survives a *crash* of the master, it'd
certainly survive a simple *shutdown*.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Greg Stark
On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.

Please? I am really interested in replacing walsender and walreceiver
with something which uses a communication bus like spread instead of a
single point to point connection.

ISTM if we start with something tightly coupled it'll be hard to
decouple later. Whereas if we start with a limited interface we'll
learn just how much information is really required by the modules and
will have fewer surprises later when we find suprising
interdependencies.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Greg Stark
On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 If you are saying being able to start Hot Standby from a shutdown
 checkpoint is an important feature for you, then say so, and why.

Can you explain the consequences of missing this? It sounds to me like
if I lose my master and it happened to be while it was shut down for
whatever reason then I'll be stuck and won't be able to use my
standby. If that's true it seems like it's a major problem. Or does it
just mean I would have to follow a different procedure when failing
over?

I'm not sure if it's relevant but one thing to realize is that a lot
of MySQL people are used to doing failovers to do regular maintenance
tasks like creating indexes or making schema changes. Besides,a lot of
sites build in regular failovers to ensure that their failover
procedure works. In both cases they usually want to do a clean shut
down of the master to ensure they don't lose any transactions during
the failover.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Heikki Linnakangas
With regards to this bug report:
http://archives.postgresql.org/pgsql-bugs/2009-12/msg00241.php

I think we should change tuplestore code so that callers of
tuplestore_put* don't need to switch to the correct memory context (and
resource owner, after this patch) before call. Instead,
tuplestore_begin_heap() should memorize the context and resource owner
used to create the tuplestore, and use that in tuplestore_put*
functions. AFAICS it is always a bug to be in a different memory context
in tuplestore_put* than in tuplestore_begin_heap(), so it would be more
robust to not put the burden on the callers.

Patch against CVS HEAD to do that and fix the reported bug attached. Now
that the tuplestore_put* switches to the right memory context, we could
remove that from all the callers, but this patch only does it for pl_exec.c.

Thoughts?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
? GNUmakefile
? config.log
? config.status
? gin-splay-1.patch
? gin-splay-2.patch
? gin-splay-3.patch
? md-1.c
? md-1.patch
? temp-file-resowner-2.patch
? contrib/fuzzystrmatch/.deps
? contrib/fuzzystrmatch/fuzzystrmatch.sql
? contrib/pg_standby/.deps
? contrib/pg_standby/pg_standby
? contrib/pgbench/fsynctest
? contrib/pgbench/fsynctest.c
? contrib/pgbench/fsynctestfile
? contrib/spi/.deps
? src/Makefile.global
? src/backend/aaa.patch
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/foreign/.deps
? src/backend/foreign/dummy/.deps
? src/backend/foreign/postgresql/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/po/af.mo
? src/backend/po/cs.mo
? src/backend/po/de.mo
? src/backend/po/es.mo
? src/backend/po/fr.mo
? src/backend/po/hr.mo
? src/backend/po/hu.mo
? src/backend/po/it.mo
? src/backend/po/ja.mo
? src/backend/po/ko.mo
? src/backend/po/nb.mo
? src/backend/po/nl.mo
? src/backend/po/pl.mo
? src/backend/po/pt_BR.mo
? src/backend/po/ro.mo
? src/backend/po/ru.mo
? src/backend/po/sk.mo
? src/backend/po/sl.mo
? src/backend/po/sv.mo
? src/backend/po/tr.mo
? src/backend/po/zh_CN.mo
? src/backend/po/zh_TW.mo
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/Unicode/BIG5.TXT
? src/backend/utils/mb/Unicode/CP950.TXT
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps
? 

Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Heikki Linnakangas
Fujii Masao wrote:
 On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.

 That's the way I did it yesterday, see 'replication' branch in my git
 repository, but it looks like I fumbled the commit so that some of the
 changes were committed as part of the merge commit with origin/master
 (=CVS HEAD). Sorry about that.
 
 Umm.., I still cannot find the place where the walreceiver module is
 loaded by using load_external_function() in your 'replication' branch.
 Also the compilation of that branch fails. Is the 'pushed' branch the
 latest? Sorry if I'm missing something.

Ah, I see. The changes were not included in the merge commit after all,
but I had simple forgot to git add them. Sorry about that, should be
there now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Greg Stark
On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
...
 AFAICS it is always a bug to be in a different memory context
 in tuplestore_put* than in tuplestore_begin_heap(), so it would be more
 robust to not put the burden on the callers.
 ...
 Patch against CVS HEAD to do that and fix the reported bug attached. Now
 that the tuplestore_put* switches to the right memory context, we could
 remove that from all the callers, but this patch only does it for pl_exec.c.

 Thoughts?

I thought there were comments specifically explaining why it was done
that way but I don't recall what they said. Perhaps it was a
performance concern since it's going to happen for every tuple put in
the tuplestore and usually you'll just be in the same memory context
anyways. It would certainly be a lot less confusing the way you
describe though.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Greg Stark
On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 With regards to this bug report:
 http://archives.postgresql.org/pgsql-bugs/2009-12/msg00241.php

Hm, do we have any build farm members running with work_mem set to the minimum?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote:
 On 22.12.09 9:34 , Simon Riggs wrote:
  If you are saying being able to start Hot Standby from a shutdown
  checkpoint is an important feature for you, then say so, and why.
 
 I think it's not so much an important feature but more the removal of a
 footgun.
 
 Image a reporting database where all transactions but a few daily bulk
 imports are read-only. To spread the load, you do your bulk loads on the
 master, but run the reporting queries against a read-only HS slave. Now
 you take the master down for maintenance. Since all clients but the bulk
 loader use the slave already, and since the bulk loads can be deferred
 until after the maintenance window closes again, you don't actually do a
 fail-over.
 
 Now you're already pointing at your foot with the gun. All it takes to
 ruin your day is *some* reason for the slave to restart. Maybe due to a
 junior DBA's typo, or maybe due to a bug in postgres. Anway, once the
 slave is down, it won't come up until you manage to get the master up
 and running again. And this limitation is pretty surprising, since one
 would assume that if the slave survives a *crash* of the master, it'd
 certainly survive a simple *shutdown*.

Well, you either wait for master to come up again and restart, or you
flip into normal mode and keep running queries from there. You aren't
prevented from using the server, except by your own refusal to failover.

That's not enough for me to raise the priority for this feature.

But it was already on the list and remains there now. If someone does
add this, it will require careful thought about how to avoid introducing
further subtle ways to break HS, all of which will need testing and
re-testing to avoid regression.

So I'm not personally going to be working on it, for this release and
likely the next also, nor will I encourage others to do so, for anyone
looking to assist. There are more important things for us to do, IMHO.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 11:41 +, Greg Stark wrote:
 On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
  If you are saying being able to start Hot Standby from a shutdown
  checkpoint is an important feature for you, then say so, and why.
 
 Can you explain the consequences of missing this? It sounds to me like
 if I lose my master and it happened to be while it was shut down for
 whatever reason then I'll be stuck and won't be able to use my
 standby. If that's true it seems like it's a major problem. Or does it
 just mean I would have to follow a different procedure when failing
 over?

Failover isn't prevented in this case.

If we were going to spend time on anything it would be to make failover
and switchback easier so that people aren't afraid of it. I've spent a
few weeks trying to remove the shutdown checkpoint, but no luck so far.

Switchback optimization is probably something for next release now,
unless you're looking for a project?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New VACUUM FULL

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 19:45 +0900, Takahiro Itagaki wrote:

 I used VACUUM FULL because we were discussing to drop VFI completely,
 but I won't replace the behavior if hot-standby can support VFI.

HS can't support VFI now, by definition. We agreed to spend the time
getting rid of VFI, which working on this with you is part of.

If we can just skip the index rebuild, I think that's all the additional
code changes we need. I'll improve the docs as I review-to-commit.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Heikki Linnakangas
Greg Stark wrote:
 On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 I think we can just use load_external_function() to load the library and
 call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the
 library name. Walreceiver is quite tightly coupled with the rest of the
 backend anyway, so I don't think we need to come up with a pluggable API
 at the moment.
 
 Please? I am really interested in replacing walsender and walreceiver
 with something which uses a communication bus like spread instead of a
 single point to point connection.

I think you'd still need to be able to request older WAL segments to
resync after a lost connection, restore from base backup etc., which
don't really fit into a publish/subscribe style communication bus. I'm
sure it could all be solved though. It would be a pretty cool feature,
for scaling to a large number of slaves.

 ISTM if we start with something tightly coupled it'll be hard to
 decouple later. Whereas if we start with a limited interface we'll
 learn just how much information is really required by the modules and
 will have fewer surprises later when we find suprising
 interdependencies.

I'm all ears if you have a concrete proposal.

I'm not too worried about it being hard to decouple later. The interface
is actually quite limited already, as the communication between
processes is done via shared memory. It probably wouldn't be hard to
turn it into an API, but I don't think there's a hurry to do that until
someone actually steps up to write an alternative walreceiver/walsender,

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 If someone does
 add this, it will require careful thought about how to avoid introducing
 further subtle ways to break HS, all of which will need testing and
 re-testing to avoid regression.

Well, I *did* add that, but you removed it...

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table size does not include toast size

2009-12-22 Thread Bernd Helmle



--On 22. Dezember 2009 11:46:32 +0100 Cédric Villemain 
cedric.villemain.deb...@gmail.com wrote:



Did you mean :
 pg_table_size() = base table + toast table
 pg_indexes_size() = base indexes + toast indexes
?


Since you always have a toast index automatically it makes sense to include 
them in pg_table_size().


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table size does not include toast size

2009-12-22 Thread Bernd Helmle



--On 21. Dezember 2009 12:02:02 -0500 Greg Smith g...@2ndquadrant.com 
wrote:



Tom Lane wrote:

Perhaps invent  pg_table_size() = base table + toast table + toast index
and pg_indexes_size() = all other indexes for table
giving us the property pg_table_size + pg_indexes_size =
pg_total_relation_size


Right; that's exactly the way I'm computing things now, I just have to
crawl way too much catalog data to do it.  I also agree that if we
provide pg_table_size, the issue of pg_relation_size doesn't do what I
want goes away without needing to even change the existing
documentation--people don't come to that section looking for relation,
they're looking for table.

Bernd, there's a basic spec if you have time to work on this.


I see if i can get some time for it during christmas vacation (its on my 
radar for a longer period of time). I'm still working on this NOT NULL 
pg_constraint representation and would like to propose a patch fairly soon 
for this.


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication and non-blocking I/O

2009-12-22 Thread Fujii Masao
On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ah, I see. The changes were not included in the merge commit after all,
 but I had simple forgot to git add them. Sorry about that, should be
 there now.

Thanks for doing git push again!

But the compilation still fails.
Attached patch addresses this problem.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/Makefile
--- b/src/backend/Makefile
***
*** 34,41  endif
  
  OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a
  
! # We put libpgport into OBJS, so remove it from LIBS; also add libldap and libpq
! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE) $(libpq)
  
  # The backend doesn't need everything that's in LIBS, however
  LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
--- 34,41 
  
  OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a
  
! # We put libpgport into OBJS, so remove it from LIBS; also add libldap
! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE)
  
  # The backend doesn't need everything that's in LIBS, however
  LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
*** a/src/backend/postmaster/walreceiverproc/Makefile
--- b/src/backend/postmaster/walreceiverproc/Makefile
***
*** 18,24  OBJS = walreceiverproc.o
  SHLIB_LINK = $(libpq)
  NAME = walreceiverproc
  
! all: all-shared-lib
  
  include $(top_srcdir)/src/Makefile.shlib
  
--- 18,24 
  SHLIB_LINK = $(libpq)
  NAME = walreceiverproc
  
! all: submake-libpq all-shared-lib
  
  include $(top_srcdir)/src/Makefile.shlib
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 16:09 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  If someone does
  add this, it will require careful thought about how to avoid introducing
  further subtle ways to break HS, all of which will need testing and
  re-testing to avoid regression.
 
 Well, I *did* add that, but you removed it...

It was already on there when you added the second one. It is still there
now, even after I removed the duplicate entry.

By add I meant to write the feature, test and then support it
afterwards, not to re-discuss editing the Wiki.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table size does not include toast size

2009-12-22 Thread Tom Lane
=?ISO-8859-1?Q?C=E9dric_Villemain?= cedric.villemain.deb...@gmail.com writes:
 2009/12/21 Tom Lane t...@sss.pgh.pa.us:
 Perhaps invent  pg_table_size() = base table + toast table + toast index
 and             pg_indexes_size() = all other indexes for table
 giving us the property pg_table_size + pg_indexes_size =
 pg_total_relation_size

 Did you mean :
  pg_table_size() = base table + toast table
  pg_indexes_size() = base indexes + toast indexes
 ?

No.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 AFAICS it is always a bug to be in a different memory context
 in tuplestore_put* than in tuplestore_begin_heap(), so it would be more
 robust to not put the burden on the callers.

 I thought there were comments specifically explaining why it was done
 that way but I don't recall what they said.

I think it was just a performance optimization.  It's probably not
measurable though; even in the in-memory case there's at least a palloc
inside the put() function, no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Florian Pflug

On 22.12.09 13:21 , Simon Riggs wrote:

On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote:

Image a reporting database where all transactions but a few daily
bulk imports are read-only. To spread the load, you do your bulk
loads on the master, but run the reporting queries against a
read-only HS slave. Now you take the master down for maintenance.
Since all clients but the bulk loader use the slave already, and
since the bulk loads can be deferred until after the maintenance
window closes again, you don't actually do a fail-over.

Now you're already pointing at your foot with the gun. All it
takes to ruin your day is *some* reason for the slave to restart.
Maybe due to a junior DBA's typo, or maybe due to a bug in
postgres. Anway, once the slave is down, it won't come up until you
manage to get the master up and running again. And this limitation
is pretty surprising, since one would assume that if the slave
survives a *crash* of the master, it'd certainly survive a simple
*shutdown*.


Well, you either wait for master to come up again and restart, or you
flip into normal mode and keep running queries from there. You aren't
prevented from using the server, except by your own refusal to
failover.


Very true. However, that refusal as you put it might actually be the
most sensible thing to do in a lot of setups. Not everyone needs extreme
up-time guarantees, and for those people setting up, testing and
*continuously* exercising fail-over is just not worth the effort.
Especially since fail-over with asynchronous replication is tricky to
get right if you want to avoid data loss.

So I still believe that there are very real use-cases for HS where this
limitation can be quite a PITA.

But you are of course free to work on whatever you feel like, and
probably need to satisfy your client's needs first. So I'm in no way
implying that this issue is a must-fix issue, or that you're in any way
obliged to take care of it. I merely wanted to make the point that there
*are* valid use-cases where this behavior is not ideal.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Patch against CVS HEAD to do that and fix the reported bug attached. Now
 that the tuplestore_put* switches to the right memory context, we could
 remove that from all the callers, but this patch only does it for pl_exec.c.

BTW, I'm not convinced that the owner-switchery you added to pl_exec.c
is necessary/appropriate.  Under what circumstances would that be a good
idea?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Greg Stark
On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug fgp.phlo@gmail.com wrote:
 Well, you either wait for master to come up again and restart, or you
 flip into normal mode and keep running queries from there. You aren't
 prevented from using the server, except by your own refusal to
 failover.

 Very true. However, that refusal as you put it might actually be the
 most sensible thing to do in a lot of setups. Not everyone needs extreme
 up-time guarantees, and for those people setting up, testing and
 *continuously* exercising fail-over is just not worth the effort.
 Especially since fail-over with asynchronous replication is tricky to
 get right if you want to avoid data loss.

To say nothing that the replica might not be a suitable master at all.
It could be running on inferior hardware or be on a separate network
perhaps too slow to reach from production services.

HA is not the only use case for HS or even the main one in my experience


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote:

 But you are of course free to work on whatever you feel like, and
 probably need to satisfy your client's needs first.

Alluding to me as whimsical or mercenary isn't likely to change my mind.

IMHO this isn't one of the more important features, for the majority, in
this release. I do intend to check that.

If there are people that believe otherwise, knock yourselves out.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Heikki Linnakangas
Tom Lane wrote:
 Greg Stark gsst...@mit.edu writes:
 On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 AFAICS it is always a bug to be in a different memory context
 in tuplestore_put* than in tuplestore_begin_heap(), so it would be more
 robust to not put the burden on the callers.
 
 I thought there were comments specifically explaining why it was done
 that way but I don't recall what they said.
 
 I think it was just a performance optimization.  It's probably not
 measurable though; even in the in-memory case there's at least a palloc
 inside the put() function, no?

Yes. And many of the callers do the memory context switching dance anyway.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 15:38 +, Greg Stark wrote:
 On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug fgp.phlo@gmail.com wrote:
  Well, you either wait for master to come up again and restart, or you
  flip into normal mode and keep running queries from there. You aren't
  prevented from using the server, except by your own refusal to
  failover.
 
  Very true. However, that refusal as you put it might actually be the
  most sensible thing to do in a lot of setups. Not everyone needs extreme
  up-time guarantees, and for those people setting up, testing and
  *continuously* exercising fail-over is just not worth the effort.
  Especially since fail-over with asynchronous replication is tricky to
  get right if you want to avoid data loss.
 
 To say nothing that the replica might not be a suitable master at all.
 It could be running on inferior hardware or be on a separate network
 perhaps too slow to reach from production services.
 
 HA is not the only use case for HS or even the main one in my experience

I can invent scenarios in which all the outstanding issues give
problems. What I have to do is balance which of those is more likely and
which have useful workarounds. This is about priority and in particular,
my priority. IMHO my time would be misplaced to work upon this issue,
though I will check that other users feel that way also.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 By add I meant to write the feature, test and then support it
 afterwards, not to re-discuss editing the Wiki.

That's exactly what I meant too. I *did* write the feature, but you
removed it before committing.

I can extract the removed parts from the git repository and send you as
a new patch for review, if you'd like.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] remove redundant ownership checks

2009-12-22 Thread Robert Haas
2009/12/22 KaiGai Kohei kai...@ak.jp.nec.com:
 (2009/12/21 12:53), Robert Haas wrote:
 On Thu, Dec 17, 2009 at 7:19 PM, Tom Lanet...@sss.pgh.pa.us  wrote:
 KaiGai Koheikai...@ak.jp.nec.com  writes:
 [ patch to remove EnableDisableRule's permissions check ]

 I don't particularly like this patch, mainly because I disagree with
 randomly removing permissions checks without any sort of plan about
 where they ought to go.

 So where ought they to go?

 If we're going to start moving these checks around we need a very
 well-defined notion of where permissions checks should be made, so that
 everyone knows what to expect.  I have not seen any plan for that.

 This seems to me to get right the heart of the matter.  When I
 submitted my machine-readable explain patch, you critiqued it for
 implementing half of an abstraction layer, and it seems to me that our
 current permissions-checking logic has precisely the same issue.  We
 consistently write code that starts by checking permissions and then
 moves right along to implementing the action.  Those two things need
 to be severed.  I see two ways to do this.  Given a function that (A)
 does some prep work, (B) checks permissions, and (C) performs the
 action, we could either:

 1. Make the existing function do (A) and (B) and then call another
 function to do (C), or
 2. Make the existing function do (A), call another function to do (B),
 and then do (C) itself.

 I'm not sure which will work better, but I think making a decision
 about which way to do it and how to name the functions would be a big
 step towards having a coherent plan for this project.

 We have mixed policy in the current implementation.

 The point is what database object shall be handled in this function.

 If we consider a rewrite rule as a database object, not a property of
 the relation, it seems to me a correct manner to apply permission checks
 in the EnableDisableRule(), because it handles a given rewrite rule.

 If we consider a rewrite rule as a property of a relation, not an independent
 database object, it seems to me we should apply permission checks in 
 ATPrepCmd()
 which handles a relation, rather than EnableDisableRule().

 My patch stands on the later perspective, because pg_rewrite entries don't
 have its own ownership and access privileges, and it is always owned by
 a certain relation.

That's somewhat separate from the point I was making, but it's a good
point all the same.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  By add I meant to write the feature, test and then support it
  afterwards, not to re-discuss editing the Wiki.
 
 That's exactly what I meant too. I *did* write the feature, but you
 removed it before committing.

I removed it because you showed it wouldn't work. If you want to fix
that problem, test, commit and support it, go right ahead.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 I think it was just a performance optimization.  It's probably not
 measurable though; even in the in-memory case there's at least a palloc
 inside the put() function, no?

 Yes. And many of the callers do the memory context switching dance anyway.

Yeah, I was just noticing that.  We should go around and clean those up
if we apply this change.

Looking at the CVS history, I think the reason tuplestore doesn't do its
own memory context switch is that it was cloned from tuplesort, which
didn't either at the time.  But several years ago we changed tuplesort
to be safer about this (it actually keeps its own memory context now),
so it's just inconsistent that tuplestore still exposes the risk.

The ownership business is another story though.  tuplesort doesn't
make any attempt to defend itself against resource-owner changes.  If we
need this for tuplestore I bet we need it for tuplesort too; but do we?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 By add I meant to write the feature, test and then support it
 afterwards, not to re-discuss editing the Wiki.
 That's exactly what I meant too. I *did* write the feature, but you
 removed it before committing.
 
 I removed it because you showed it wouldn't work.

I did?

I believe this is the discussion that lead to you removing it (6th of
December, thread Hot Standby, recent changes):

Simon Riggs wrote:
  On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote:
  4. Need to handle the case where master is started up with
  wal_standby_info=true, shut down, and restarted with
  wal_standby_info=false, while the standby server runs continuously. And
  the code in StartupXLog() to initialize recovery snapshot from a
  shutdown checkpoint needs to check that too.
 
 I don't really understand the use case for shutting down the server and
 then using it as a HS base backup. Why would anyone do that? Why would
 they have their server down for potentially hours, when they can take
 the backup while the server is up? If the server is idle, it can be
 up-and-idle just as easily as down-and-idle, in which case we wouldn't
 need to support this at all. Adding yards of code for this capability
 isn't important to me. I'd rather strip the whole lot out than keep
 fiddling with a low priority area. Please justify this as a real world
 solution before we continue trying to support it.

The issue I mentioned had nothing to do with starting from a shutdown
checkpoint - it's still a problem if you keep the standby running
through the restart cycle in the master) - but maybe you thought it was?
Or was there something else?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Largeobject Access Controls (r2460)

2009-12-22 Thread Robert Haas
2009/12/22 KaiGai Kohei kai...@ak.jp.nec.com:
 (2009/12/21 9:39), KaiGai Kohei wrote:
 (2009/12/19 12:05), Robert Haas wrote:
 On Fri, Dec 18, 2009 at 9:48 PM, Tom Lanet...@sss.pgh.pa.us   wrote:
 Robert Haasrobertmh...@gmail.com   writes:
 Oh.  This is more complicated than it appeared on the surface.  It
 seems that the string BLOB COMMENTS actually gets inserted into
 custom dumps somewhere, so I'm not sure whether we can just change it.
    Was this issue discussed at some point before this was committed?
 Changing it would seem to require inserting some backward
 compatibility code here.  Another option would be to add a separate
 section for BLOB METADATA, and leave BLOB COMMENTS alone.  Can
 anyone comment on what the Right Thing To Do is here?

 The BLOB COMMENTS label is, or was, correct for what it contained.
 If this patch has usurped it to contain other things

 It has.

 I would argue
 that that is seriously wrong.  pg_dump already has a clear notion
 of how to handle ACLs for objects.  ACLs for blobs ought to be
 made to fit into that structure, not dumped in some random place
 because that saved a few lines of code.

 OK.  Hopefully KaiGai or Takahiro can suggest a fix.

 The patch has grown larger than I expected before, because the way
 to handle large objects are far from any other object classes.

 Here are three points:

 1) The new BLOB ACLS section was added.

 It is a single purpose section to describe GRANT/REVOKE statements
 on large objects, and BLOB COMMENTS section was reverted to put
 only descriptions.

 Because we need to assume a case when the database holds massive
 number of large objects, it is not reasonable to store them using
 dumpACL(). It chains an ACL entry with the list of TOC entries,
 then, these are dumped. It means pg_dump may have to register massive
 number of large objects in the local memory space.

 Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS
 section, but confusable. Even if pg_restore is launched with
 --no-privileges options, it cannot ignore GRANT/REVOKE statements
 on large objects. This fix enables to distinguish ACLs on large
 objects from other properties, and to handle them correctly.

 2) The BLOBS section was separated for each database users.

 Currently, the BLOBS section does not have information about owner
 of the large objects to be restored. So, we tried to alter its
 ownership in the BLOB COMMENTS section, but incorrect.

 The --use-set-session-authorization option requires to restore
 ownership of objects without ALTER ... OWNER TO statements.
 So, we need to set up correct database username on the section
 properties.

 This patch renamed the hasBlobs() by getBlobs(), and changed its
 purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
 for each large objects owners, if necessary.
 For example, if here are five users owning large objects, getBlobs()
 shall register five TOC entries for each users, and dumpBlobs(),
 dumpBlobComments() and dumpBlobAcls() shall be also invoked five
 times with the username.

 3) _LoadBlobs()

 For regular database object classes, _printTocEntry() can inject
 ALTER xxx OWNER TO ... statement on the restored object based on
 the ownership described in the section header.
 However, we cannot use this infrastructure for large objects as-is,
 because one BLOBS section can restore multiple large objects.

 _LoadBlobs() is a routine to restore large objects within a certain
 section. This patch modifies this routine to inject ALTER LARGE
 OBJECT loid OWNER TO user statement for each large objects
 based on the ownership of the section.
 (if --use-set-session-authorization is not given.)


 $ diffstat pgsql-fix-pg_dump-blob-privs.patch
  pg_backup_archiver.c |    4
  pg_backup_custom.c   |   11 !
  pg_backup_files.c    |    9 !
  pg_backup_tar.c      |    9 !
  pg_dump.c            |  312 +++!!
  pg_dump.h            |    9 !
  pg_dump_sort.c       |    8 !
  7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!)

I will review this sooner if I have time, but please make sure it gets
added to the next CommitFest so we don't lose it.  I think it also
needs to be added here, since AFAICS this is a must-fix for 8.5.

http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items

Thanks,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 18:40 +0200, Heikki Linnakangas wrote:

 The issue I mentioned had nothing to do with starting from a shutdown
 checkpoint - it's still a problem if you keep the standby running
 through the restart cycle in the master) - but maybe you thought it was?
 Or was there something else?

Strangely enough that exact same problem already happens with
archive_mode, and we see a fix coming for that soon also.

That fix takes the same approach as HS already takes. HS will flip out
when it sees the next record (checkpoint). The only way out is to
re-take base backup, just the same.

Even after that fix is applied, HS will still work as well as
archive-mode, so if anything HS is ahead of other functionality.

Fixing obscure cases where people actively try to get past configuration
options is not a priority. I'm not sure why you see it as important,
especially when you've argued we don't even need the parameter in the
first place.

You've been perfectly happy for *years* with the situation that recovery
would fail if max_prepared_transactions was not correctly. You're not
going to tell me you never noticed? Why is avoidance of obvious
misconfiguration of HS such a heavy priority when nothing else ever was?

I'm going to concentrate on fixing important issues. I'd rather you
helped with those.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small change of the HS document

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 11:25 +0900, Fujii Masao wrote:

 I found there is no primary tag for the HS parameters
 in config.sgml. Attached patch adds that tag.

Thanks

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 You've been perfectly happy for *years* with the situation that recovery
 would fail if max_prepared_transactions was not correctly. You're not
 going to tell me you never noticed? Why is avoidance of obvious
 misconfiguration of HS such a heavy priority when nothing else ever was?

That's not a priority, and I never said it was.

It almost sounds like we're in a violant agreement: this issue of
flipping wal_standby_info in the master has nothing to do with the
removal of the capability to start standby from a shutdown checkpoint.

So what *was* the reason? Was there something wrong with it? If not,
please put it back.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Thu, 2009-11-26 at 17:02 +0900, Fujii Masao wrote:
 On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  Fujii Masao wrote:
  In current SR, since a backup history file is not replicated,
  the standby always starts an archive recovery without a backup
  history file, and a wrong minRecoveryPoint might be used. This
  is not a problem for SR itself, but would cause trouble when
  SR cooperates with Hot Standby.
 
  But the backup history file is included in the base backup you start
  replication from, right?
 
 No. A backup history file is created by pg_stop_backup().
 So it's not included in the base backup.

The backup history file is a slightly bit quirky way of doing things and
was designed when the transfer mechanism was file-based.

Why don't we just write a new xlog record that contains the information
we need? Copy the contents of the backup history file into the WAL
record, just like we do with prepared transactions. That way it will be
streamed to the standby without any other code being needed for SR,
while we don't need to retest warm standby to check that still works
also.

(The thread diverges onto a second point and this first point seems to
have been a little forgotten)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 14:40 +0900, Fujii Masao wrote:
 On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  I don't think it's worthwhile to modify pg_stop_backup() like that. We
  should address the general problem. At the moment, you're fine if you
  also configure WAL archiving and log file shipping, but it would be nice
  to have some simpler mechanism to avoid the problem. For example, a GUC
  in master to retain all log files (including backup history files) for X
  days. Or some way for to register the standby with the master so that
  the master knows it's out there, and still needs the logs, even when
  it's not connected.
 
 I propose the new GUC replication_reserved_segments (better name?) which
 determines the maximum number of WAL files held for the standby.
 
 Design:
 
 * Only the WAL files which are replication_reserved_segments segments older
   than the current write segment can be recycled. IOW, we can think that the
   standby which falls replication_reserved_segments segments behind is always
   connected to the primary, and the WAL files needed for the active standby
   are not recycled.

(I don't fully understand your words above, sorry.)

Possibly an easier way would be to have a size limit, not a number of
segments. Something like max_reserved_wal = 240GB. We made that change
to shared_buffers a few years back and it was really useful.

The problem I foresee is that doing it this way puts an upper limit on
the size of database we can replicate. While we do base backup and
transfer it we must store WAL somewhere. Forcing us to store the WAL on
the master while this happens could be very limiting.

 * Disjoin the standby which falls more than replication_reserved_segment
   segments behind, in order to avoid the disk full failure, i.e., the
   primary server's PANIC error. This would be only possible in asynchronous
   replication case.

Or at the start of replication.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Heikki Linnakangas
Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Patch against CVS HEAD to do that and fix the reported bug attached. Now
 that the tuplestore_put* switches to the right memory context, we could
 remove that from all the callers, but this patch only does it for pl_exec.c.
 
 BTW, I'm not convinced that the owner-switchery you added to pl_exec.c
 is necessary/appropriate.  Under what circumstances would that be a good
 idea?

A PL/pgSQL normally runs in the whatever resource owner is current when
the function is called. When we allocate the tuplestore for return
tuples, it's associated with the current resource owner.

But if you have an exception-block, we start a new subtransaction and
switch to the subtransaction resource owner. If you have a RETURN
NEXT/QUERY in the block, the tuplestore (or the temporary file backing
it, to be precise) is initialized into the subtransaction resource
owner, which is released at subtransaction commit. The subtransaction
resource owner is not the right owner for the tuplestore holding return
tuples.

We already take care to use the right memory context for the tuplestore,
but now that temp files are associated with resource owners, we need to
use the right resource owner as well.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segfault from PL/Perl Returning vstring

2009-12-22 Thread David E. Wheeler
On Dec 21, 2009, at 9:04 PM, Andrew Dunstan wrote:

 I cannot reproduce this.  I tested with perl 5.10.1 which is the latest 
 reported stable release at http://www.cpan.org/src/README.html, on an 8.4.2 
 UTF8 database, and with the same Safe and Encode module versions as above.

I've replicated it all the way back to 8.0. I'd be happy to give you a login to 
my box.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Bruce Momjian
Simon Riggs wrote:
 On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote:
 
  Do you think this problem is must-fix for the final release ?
 
 We should be clear that this is a behaviour I told you about, not a
 shock discovery by yourself. There is no permanent freeze, just a wait,
 from which the Startup process wakes up at the appropriate time. There
 is no crash or hang as is usually implied by the word freeze.
 
 It remains to be seen whether this is a priority for usability
 enhancement in this release. There are other issues as well and it is
 doubtful that every user will be fully happy with the functionality in
 this release. I will work on things in the order in which I understand
 them to be important for the majority, given my time and budget
 constraints and the resolvability of the issues.
 
 When you report bugs, I say thanks. When you start agitating about
 already-documented restrictions and I see which other software you
 promote, I think you may have other motives. Regrettably that reduces
 the weight I give your claims, in relation to other potential users.

Simon, where did this come from?  Other software?  

I think Simon's comments are way off base here and only serve to
increase tension in this discussion.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Fujii Masao
On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The backup history file is a slightly bit quirky way of doing things and
 was designed when the transfer mechanism was file-based.

 Why don't we just write a new xlog record that contains the information
 we need? Copy the contents of the backup history file into the WAL
 record, just like we do with prepared transactions. That way it will be
 streamed to the standby without any other code being needed for SR,
 while we don't need to retest warm standby to check that still works
 also.

This means that we can replace a backup history file with the corresponding
xlog record. I think that we should simplify the code by making the replacement
completely rather than just adding new xlog record. Thought?

BTW, in current SR code, the capability to replicate a backup history file has
been implemented. But if there is better and simpler idea, I'll adopt it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Wed, 2009-12-23 at 03:28 +0900, Fujii Masao wrote:
 On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs si...@2ndquadrant.com wrote:
  The backup history file is a slightly bit quirky way of doing things and
  was designed when the transfer mechanism was file-based.
 
  Why don't we just write a new xlog record that contains the information
  we need? Copy the contents of the backup history file into the WAL
  record, just like we do with prepared transactions. That way it will be
  streamed to the standby without any other code being needed for SR,
  while we don't need to retest warm standby to check that still works
  also.
 
 This means that we can replace a backup history file with the corresponding
 xlog record. I think that we should simplify the code by making the 
 replacement
 completely rather than just adding new xlog record. Thought?

We can't do that because it would stop file-based archiving from
working. I don't think we should deprecate that for another release at
least.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small Bug in GetConflictingVirtualXIDs

2009-12-22 Thread Andres Freund
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
 On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
  On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
   Giving the drop database a snapshot is not the answer. I expect Andres
   to be able to fix this with a simple patch that would not effect the
   case of normal running.
 
  Actually its less simply than I had thought at first - I don't think the
  code ever handled that correctly.
  I might be wrong there, my knowledge of the involved code is a bit
  sparse... The whole conflict resolution builds on the concept of waiting
  for an VXid, but an idle backend does not have a valid vxid. Thats
  correct, right?
 
 Yes, that's correct. I'll take this one back then.
So youre writing a fix or shall I?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 19:30 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  You've been perfectly happy for *years* with the situation that recovery
  would fail if max_prepared_transactions was not correctly. You're not
  going to tell me you never noticed? Why is avoidance of obvious
  misconfiguration of HS such a heavy priority when nothing else ever was?
 
 That's not a priority, and I never said it was.
 
 It almost sounds like we're in a violant agreement: this issue of
 flipping wal_standby_info in the master has nothing to do with the
 removal of the capability to start standby from a shutdown checkpoint.

I removed the capability to start at shutdown checkpoints because you
said it would cause this bug. That gives two choices: fix the bug,
remove the feature. I don't think it is a priority to support that
feature, so I removed it in favour of other work.

I will work on issues in priority order and this was already on the list
and remains so. I don't have endless time, so realistically, given its
current priority it is unlikely to be addressed in this release.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Florian Pflug

On 22.12.09 16:45 , Simon Riggs wrote:

On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote:

But you are of course free to work on whatever you feel like, and
probably need to satisfy your client's needs first.


Alluding to me as whimsical or mercenary isn't likely to change my
mind.


Simon, you *completely* miss-understood my last paragraph!

I never intended to call you whimsical or mercenary, and I honestly
don't believe I did. The only thing I alluded to you was seeing HS
mostly as a solution for HA setups, whereas I felt that I has quite a
few use-cases beside that. Plus that your view of what the important
use-cases are is influenced by the projects you usually work on, and
that it's perfectly reasonable for your priorities to reflect that view.

None of this was meant as an insult of any kind.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming Rep - 2-phase backups and reducing time to full replication

2009-12-22 Thread Simon Riggs
Some ideas to improve current behaviour of SR

http://wiki.postgresql.org/wiki/Streaming_Replication

The current startup process is copied below. (7) gives some issues if it
is a very long step, notably that the master may fill with data and then
break off the connection before replication is fully configured.

7. Make a base backup of the primary server, load this data onto the
standby.

8. Set up XLOG archiving, connections and authentication in the standby
server like the primary, so that the standby might work as a primary
after failover.

9. Create a recovery command file in the standby server; the following
parameters are required for streaming replication.

10. Start postgres in the standby server. It will start streaming
replication.


It occurs to me to ask which files we need as a minimum before we can
begin step (10)? If we could start step (10) before step (7) is complete
then we would avoid many of our space problems and replication would
enter safe mode considerably sooner, in some cases dozens of hours
earlier.

We read the recovery.conf at the start of StartupXLog(). So by that
point we need only the following files
* All *.conf files
* pg_control (and probably much of the rest of global/ directory)

Some very quick surgery on a current-version data directory shows this
is correct, apart from the call to RelationCacheInitFileRemove() which
can be altered to accept a missing directory as proof that the file has
been removed.

If we then think of the starting procedure as happening in two parts:
i) sufficient startup to get to the point where we bring up the
walreceiver, while startup process waits further confirmation
ii) following further confirmation startup process now begins recovering
database

So if we do the base backup in two stages the sequence of actions could
become

9. Create a recovery command file in the standby server with parameters
required for streaming replication.

7. (a) Make a base backup of minimal essential files from primary
server, load this data onto the standby.

10. Start postgres in the standby server. It will start streaming
replication.

7. (b) Continue with second phase of base backup, copying all remaining
files, ending with pg_stop_backup()

* Next step is to waken startup process so it can continue recovery


We don't need to introduce another call, we just need to have a
mechanism for telling the startup process to sleep because we are doing
a two-phase backup and another mechanism for waking it when the whole
backup is complete. That sounds like a recovery.conf parameter and an
additional kind of trigger file, perhaps the backup file?

This seems like a simple and useful option for 8.5

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 BTW, I'm not convinced that the owner-switchery you added to pl_exec.c
 is necessary/appropriate.  Under what circumstances would that be a good
 idea?

 A PL/pgSQL normally runs in the whatever resource owner is current when
 the function is called. When we allocate the tuplestore for return
 tuples, it's associated with the current resource owner.

 But if you have an exception-block, we start a new subtransaction and
 switch to the subtransaction resource owner. If you have a RETURN
 NEXT/QUERY in the block, the tuplestore (or the temporary file backing
 it, to be precise) is initialized into the subtransaction resource
 owner, which is released at subtransaction commit.

Got it.  So doesn't tuplesort have the same issue?

The patch definitely requires more than zero comments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Fujii Masao
On Wed, Dec 23, 2009 at 2:49 AM, Simon Riggs si...@2ndquadrant.com wrote:
 (I don't fully understand your words above, sorry.)

NM ;-)

 Possibly an easier way would be to have a size limit, not a number of
 segments. Something like max_reserved_wal = 240GB. We made that change
 to shared_buffers a few years back and it was really useful.

For me, a size limit is intuitive because the checkpoint_segments which
is closely connected with the new parameter still indicates the number of
segments. But if more people like a size limit, I'll make that change.

 The problem I foresee is that doing it this way puts an upper limit on
 the size of database we can replicate. While we do base backup and
 transfer it we must store WAL somewhere. Forcing us to store the WAL on
 the master while this happens could be very limiting.

Yes. If the size of pg_xlog is relatively small to the size of database,
the primary would not be able to hold all the WAL files required for
the standby, so we would need to use the restore_command which
retrieves the WAL files from the master's archival area. I'm OK that
such extra operation is required in that special case, now.

 * Disjoin the standby which falls more than replication_reserved_segment
   segments behind, in order to avoid the disk full failure, i.e., the
   primary server's PANIC error. This would be only possible in asynchronous
   replication case.

 Or at the start of replication.

Yes. I think that avoidance of the primary's PANIC error should be
given priority over a smooth start of replication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 19:53 +0100, Florian Pflug wrote:

 None of this was meant as an insult of any kind.

Then I apologise completely.

I've clearly been working too hard and will retire for some rest (even
though that is not listed as a task on the Wiki).

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 The backup history file is a slightly bit quirky way of doing things and
 was designed when the transfer mechanism was file-based.

 Why don't we just write a new xlog record that contains the information
 we need?

Certainly not.  The history file is, in the restore-from-archive case,
needed to *find* the xlog data.

However, it's not clear to me why SR should have any need for it.
It's not doing restore from archive.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread David E. Wheeler
On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote:

 I've clearly been working too hard and will retire for some rest (even
 though that is not listed as a task on the Wiki).

Someone add it!

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Fujii Masao
On Wed, Dec 23, 2009 at 3:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 This means that we can replace a backup history file with the corresponding
 xlog record. I think that we should simplify the code by making the 
 replacement
 completely rather than just adding new xlog record. Thought?

 We can't do that because it would stop file-based archiving from
 working. I don't think we should deprecate that for another release at
 least.

Umm... ISTM that we can do that even if file-base archiving case;

* pg_stop_backup writes the xlog record corresponding to a backup
   history file, and requests the WAL file switch.

* In PITR or warm-standby, the startup process just marks database
   as inconsistent until it has read that xlog record.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alpha3 release schedule?

2009-12-22 Thread David Fetter
On Tue, Dec 22, 2009 at 11:04:29AM -0800, David Wheeler wrote:
 On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote:
 
  I've clearly been working too hard and will retire for some rest (even
  though that is not listed as a task on the Wiki).
 
 Someone add it!

Done! :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 14:02 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  The backup history file is a slightly bit quirky way of doing things and
  was designed when the transfer mechanism was file-based.
 
  Why don't we just write a new xlog record that contains the information
  we need?
 
 Certainly not.  The history file is, in the restore-from-archive case,
 needed to *find* the xlog data.
 
 However, it's not clear to me why SR should have any need for it.
 It's not doing restore from archive.

Definitely should not make this *just* in WAL, files still required,
agreed.

It's needed to find the place where the backup stopped, so it defines
the safe stopping point. We could easily pass that info via WAL, when
streaming. It doesn't actually matter until we try to failover.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Fujii Masao
On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's needed to find the place where the backup stopped, so it defines
 the safe stopping point. We could easily pass that info via WAL, when
 streaming. It doesn't actually matter until we try to failover.

Right. And, it's also needed to cooperate with HS which begins accepting
read-only queries after a recovery reaches that safe stopping point.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote:
 On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
  It's needed to find the place where the backup stopped, so it defines
  the safe stopping point. We could easily pass that info via WAL, when
  streaming. It doesn't actually matter until we try to failover.
 
 Right. And, it's also needed to cooperate with HS which begins accepting
 read-only queries after a recovery reaches that safe stopping point.

Agreed, hence my interest!

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's needed to find the place where the backup stopped, so it defines
 the safe stopping point. We could easily pass that info via WAL, when
 streaming. It doesn't actually matter until we try to failover.

 Right. And, it's also needed to cooperate with HS which begins accepting
 read-only queries after a recovery reaches that safe stopping point.

OK, so the information needed in the WAL record doesn't even include
most of what is in the history file, correct?  What you're actually
talking about is identifying a WAL position.  Maybe you'd want to
provide the backup label for identification purposes, but not much else.
In that case I concur that this is a better solution than hacking up
something to pass over the history file.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segfault from PL/Perl Returning vstring

2009-12-22 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Dec 21, 2009, at 9:04 PM, Andrew Dunstan wrote:
 
  I cannot reproduce this.  I tested with perl 5.10.1 which is the latest 
  reported stable release at http://www.cpan.org/src/README.html, on an 
  8.4.2 UTF8 database, and with the same Safe and Encode module versions as 
  above.
 
 I've replicated it all the way back to 8.0. I'd be happy to give you a login 
 to my box.

If a trivial plperl function can crash the server this is surely worth
some research.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy

2009-12-22 Thread Tom Lane
Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I suggest that we might want to just
 rip out the support for copying comments on indexes.

 We have two related ToDo items below. They are a bit inconsintent,
 but they mean we should forbid COMMENT on columns of an index,
 or must have full-support of the feature.

 Which direction should we go?  As for me, forbidding comments on index
 columns seems to be a saner way because index can have arbitrary key
 names in some cases.

 - Forbid COMMENT on columns of an index
 Postgres currently allows comments to be placed on the columns of
 an index, but pg_dump doesn't handle them and the column names
 themselves are implementation-dependent. 
 http://archives.postgresql.org/message-id/27676.1237906...@sss.pgh.pa.us

 - pg_dump / pg_restore: Add dumping of comments on index columns and
   composite type columns 
 http://archives.postgresql.org/pgsql-hackers/2009-03/msg00931.php
 (XXX: Comments on composite type columns can work now?)

I'm for forbidding comments on index columns.  The amount of work
required to support the feature fully seems far out of proportion to
its value.

In any case, if pg_dump drops such comments (which I had forgotten,
but it seems true after a quick look at the code), then we could
certainly get away with having LIKE not copy them.  That would fix
the immediate problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Heikki Linnakangas
Tom Lane wrote:
 Got it.  So doesn't tuplesort have the same issue?

Tuplesort has the same general problem that the caller of puttuple needs
to be in the right resource owner. Which ought to be fixed, especially
since tuplesort doesn't require that for the memory context anymore.

But we don't use tuplesort to return tuples from functions, so it's not
broken in a user-visible way. Unless you can think of another scenario
like that.

 The patch definitely requires more than zero comments.

Sure.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote:
 On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's needed to find the place where the backup stopped, so it defines
 the safe stopping point. We could easily pass that info via WAL, when
 streaming. It doesn't actually matter until we try to failover.
 Right. And, it's also needed to cooperate with HS which begins accepting
 read-only queries after a recovery reaches that safe stopping point.
 
 Agreed, hence my interest!

Yeah, that's a great idea.

I was just having a chat with Magnus this morning, and he asked if the
current patch already provides or if it would be possible to write a
stand-alone utility to connect to a master and stream WAL files to an
archive directory, without setting up a full-blown standby instance. We
came to the conclusion that backup history files wouldn't be copied as
the patch stands, because the standby has to specifically request them.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Heikki Linnakangas
Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The backup history file is a slightly bit quirky way of doing things and
 was designed when the transfer mechanism was file-based.
 
 Why don't we just write a new xlog record that contains the information
 we need?
 
 Certainly not.  The history file is, in the restore-from-archive case,
 needed to *find* the xlog data.

Hmm, not really. The backup_label file tells where the checkpoint record
is. And that is still needed. AFAICS the backup history file is only
needed to determine the point where the backup was completed, ie. the
minimum safe stopping point for WAL replay.

I think we could get away without the backup history file altogether.
It's kind of nice to have them in the archive directory, for the DBA, to
easily see the locations where base backups were taken. But if we write
the backup stop location in WAL, the system doesn't really need the
history file for anything.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy

2009-12-22 Thread Andrew Dunstan



Tom Lane wrote:


I'm for forbidding comments on index columns.  The amount of work
required to support the feature fully seems far out of proportion to
its value.

In any case, if pg_dump drops such comments (which I had forgotten,
but it seems true after a quick look at the code), then we could
certainly get away with having LIKE not copy them.  That would fix
the immediate problem.


  


If we're going to keep the comments we should dump them. I don't mind 
dropping them altogether - it's hardly a killer feature. We should just 
be consistent, IMNSHO.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplestore should remember the memory context it's created in

2009-12-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 Got it.  So doesn't tuplesort have the same issue?

 Tuplesort has the same general problem that the caller of puttuple needs
 to be in the right resource owner. Which ought to be fixed, especially
 since tuplesort doesn't require that for the memory context anymore.

 But we don't use tuplesort to return tuples from functions, so it's not
 broken in a user-visible way. Unless you can think of another scenario
 like that.

(1) create a cursor whose plan involves a sort that will spill to disk
(2) enter subtransaction
(3) fetch from cursor (causing the sort to actually happen)
(4) leave subtransaction
(5) fetch some more from cursor

Too busy to develop a test case right now, but ISTM it ought to fail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible patch for better index name choosing

2009-12-22 Thread Tom Lane
I wrote:
 Attached is a WIP patch for addressing the problems mentioned in this
 thread:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php
 ...
 There is one thing that is not terribly nice about the behavior, which
 is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart
 names for expression indexes;
 ...
 The reason for this is that the patch depends on FigureColname which
 works on untransformed parse trees, and we don't have access to such
 a tree when copying an existing index.  There seem to be three possible
 responses to that:
 ...
 3. Implement a separate FigureIndexColname function that works as much
 like FigureColname as it can, but takes a transformed parse tree.

I fooled around with this solution and decided that it is a lot messier
than it's worth.

In the first place, we can't make a FigureColname-like function that
just takes a transformed tree: there is no way to interpret Vars without
some context.  You need at least a table OID, and more than that if
you'd like to handle cases like multiple-relation expressions or
non-RELATION RTEs.  For the case at hand of index expressions, a table
OID would be enough, but that doesn't leave much room for imagining the
function could be used for anything else in future.  Worse, for the
problematic case (CREATE TABLE LIKE) we actually do not have a table OID
because the target table doesn't exist yet.  We could finesse that by
passing the source table's OID instead, but that seems pretty klugy
itself.

In the second place, the number of corner cases where we'd generate
output different from FigureColname is much greater than I realized.
As an example, if foo is a type name then foo(x) and x::foo produce
the same parsed tree, but FigureColname will treat them differently.

Seeing that CREATE TABLE LIKE doesn't try to reproduce the source table's
index names anyway, I'm inclined to just go with the patch as-is and not
try to make it handle this one case nicely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I think we could get away without the backup history file altogether.

Hmmm ... actually I was confusing these with timeline history files,
which are definitely not something we can drop.  You might be right
that the backup history file could be part of WAL instead.  On the other
hand, it's quite comforting that the history file is plain ASCII and can
be examined without any special tools.  I'm -1 for removing it, even
if we decide to duplicate the info in a WAL record.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [091222 15:47]:

 I was just having a chat with Magnus this morning, and he asked if the
 current patch already provides or if it would be possible to write a
 stand-alone utility to connect to a master and stream WAL files to an
 archive directory, without setting up a full-blown standby instance. We
 came to the conclusion that backup history files wouldn't be copied as
 the patch stands, because the standby has to specifically request them.

Please, please, please...

I've been watching the SR from the sidelines, basically because I want
my WAL fsync'ed on 2 physically separate machines before the client's
COMMIt returns...

And no, I'm not really interested in trying to do raid over NBD or DRBD,
and deal with the problems and pitfals that entails...

Being able to write a utility that connects as an SR client, but just
synchronously writes WAL into an archive directory setup is exactly what
I want...  And once SR has settled, it's something I'm interested in
working on...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy

2009-12-22 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 I'm for forbidding comments on index columns.  The amount of work
 required to support the feature fully seems far out of proportion to
 its value.
 
 In any case, if pg_dump drops such comments (which I had forgotten,
 but it seems true after a quick look at the code), then we could
 certainly get away with having LIKE not copy them.  That would fix
 the immediate problem.

 If we're going to keep the comments we should dump them. I don't mind 
 dropping them altogether - it's hardly a killer feature. We should just 
 be consistent, IMNSHO.

Well, let's just forbid them then.  It'll take just a few extra lines in
comment.c to throw an error if the target table has the wrong relkind.
Then we can pull out the troublesome code in parse_utilcmds.c.

It strikes me also that this changes the terms of discussion for the
other patch I was working on.  I was mistakenly assuming that we could
not change the naming convention for individual index columns because
it would cause errors during dump/reload of per-column comments.  But
since pg_dump never has supported that, there is no such risk.  I
propose re-instating my previous idea of replacing pg_expression_n
with the name chosen by FigureColname.  This not only makes the index
column names a bit more useful, but it fixes the disadvantage I was
on about for CREATE TABLE LIKE: it can get the FigureColname name
from the index's pg_attribute entry, so there's no problem with
producing smart index-expression column names when cloning indexes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup history file should be replicated in Streaming Replication?

2009-12-22 Thread Simon Riggs
On Tue, 2009-12-22 at 22:46 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote:
  On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
  It's needed to find the place where the backup stopped, so it defines
  the safe stopping point. We could easily pass that info via WAL, when
  streaming. It doesn't actually matter until we try to failover.
  Right. And, it's also needed to cooperate with HS which begins accepting
  read-only queries after a recovery reaches that safe stopping point.
  
  Agreed, hence my interest!
 
 Yeah, that's a great idea.
 
 I was just having a chat with Magnus this morning, and he asked if the
 current patch already provides or if it would be possible to write a
 stand-alone utility to connect to a master and stream WAL files to an
 archive directory, without setting up a full-blown standby instance. We
 came to the conclusion that backup history files wouldn't be copied as
 the patch stands, because the standby has to specifically request them.

There isn't any need to write that utility. Read my post about 2-phase
backup and you'll see we are a couple of lines of code away from that.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] join ordering via Simulated Annealing

2009-12-22 Thread Jan Urbański
Hi,

I've been playing with using a Simulated Annealing-type algorithm for
determinig join ordering for relations. To get into context see
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php
(there's also a TODO in the wiki). There's a nice paper on that  in
http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf
(also linked from that thread) and someone even wrote a patch:
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php

This generally aims at being able to replace GEQO.

I have some rough prototype code, but I'm not even asking people to look
at it. It's stuffed with silly things and writes three Graphviz-style
.dot files in /tmp for each algorithm step. But I'd like to at least
present the idea.

There are three main problems that have to be solved to get a good
Simulated Annealing implementation:
 o) choosing the starting point for the algorithm
 o) generating the next point
 o) computing the cost in the current point

The code I have now creates the initial plan by doing something similar
to what gimme_tree does in GEQO, but without any kind of heuristic to
try to favour joins with join restrictions (the idea is that it doesn't
matter, since we only want to get *any* plan and we only do it once),
but ideally it would be somehow chosen randonly from the space of all
possible join orderings.

I keep a binary tree of relations in memory, where leaves are
RelOptInfos with 1 relid and the root is a RelOptInfo with all relids.
Each iteration of the algorithm picks two nodes at random (with some
restrictions, but that's not important) and tries to swap them around,
meaning that a tree like (use a monospace font for best results):

   (1 2 3 4)
  *(1 2)(3 4)
  (1) (2) *(3) (4)

where the parenthesised things are the two chosen nodes would get
transformed into

   (1 2 3 4)
 (3)  (1 2 4)
   (1 2)(4)
  (1) (2)

that is, the (1 2) node and its subtree gets swapped with the (3) node
and the upper-level nodes get changed accordingly. Sometimes a swap like
that will produce an invalid join ordering - then swap is then reverted.
If the join order given by the tree after the swap is legal, the paths
are recomputed, much like in geqo_eval, and if the root node's cheapest
path is not cheaper that before the swap, the swap gets reverted.
Simulated Annealing algorithms permit uphill moves in terms of cost,
with some probability that's decreasing as time passes, but that's not
implemented yet. After a fixed amount of moves, the algorithm stops and
returns the root node of the tree as the single RelOptInfo.

The issues with the approach are:

 o) the initial state is not really a random plan, it's usualy a
left-deep tree (and is very inefficient) and this might skew results.

 o) is swapping random nodes like that a good way of exploring the
solution space? The SA paper suggests something much simpler, but some
of the moves proposed there don't really make sense when using the
make_join_rel machinery:
   *) changing the inner and outer relation of a join doesn't make
sense, because make_join_rel is symmetrical
   *) changing the join executing strategy (nested loop, merge join,
etc.) doesn't make sense, because make_join_rel considers all possible
paths for a join

 o) each swap needs a full recalcualtion of the whole solution
(geqo_eval does that, for instance), maybe it's possible to leave the
subtrees of swapped nodes intact and only recalculate the nodes above
the two swapped nodes?

 o) because make_join_rel scribbles on the PlannerInfo, the same hack as
in geqo_eval is necessary: truncating join_rel_list after building all
joinrels and nulling out the join_rel_hash

 o) I use make_join_rel to determine whether a join is legal, which does
lots of other things. That looks like wasted effort, although in the end
each time I need to build the final rel to assess the resulting path's
cost. To follow the SA algorithm spirit more closely it would be
necessary to only consider one, random path for each relation and make
using different paths a move that the algorithm can choose while
exploring the solution space. A cheaper way of determining the current
solution's cost would be nice, too - fully rebuilding the final
RelOptInfo after each move is annoying.

Lastly, I'm lacking good testcases or even a testing approach: I'm
generating silly queries and looking at how they get optimised, but if
someone has a real dataset and examples of joins that cannot be planned
with the standard planner, I would be interested to compare the results
my prototype gets with those produced by GEQO.

The code, a module that you can LOAD into a backend, is here (if you
really want to see it):
http://git.wulczer.org/?p=saio.git
Set the saio GUC to true and saio_cutoff to the number of iterations you
want.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] join ordering via Simulated Annealing

2009-12-22 Thread bin wang
I will follow it.
Thank you.

2009/12/23 Jan Urba��ski wulc...@wulczer.org

 Hi,

 I've been playing with using a Simulated Annealing-type algorithm for
 determinig join ordering for relations. To get into context see
 http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php
 (there's also a TODO in the wiki). There's a nice paper on that  in

 http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf
 (also linked from that thread) and someone even wrote a patch:
 http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php

 This generally aims at being able to replace GEQO.

 I have some rough prototype code, but I'm not even asking people to look
 at it. It's stuffed with silly things and writes three Graphviz-style
 .dot files in /tmp for each algorithm step. But I'd like to at least
 present the idea.

 There are three main problems that have to be solved to get a good
 Simulated Annealing implementation:
  o) choosing the starting point for the algorithm
  o) generating the next point
  o) computing the cost in the current point

 The code I have now creates the initial plan by doing something similar
 to what gimme_tree does in GEQO, but without any kind of heuristic to
 try to favour joins with join restrictions (the idea is that it doesn't
 matter, since we only want to get *any* plan and we only do it once),
 but ideally it would be somehow chosen randonly from the space of all
 possible join orderings.

 I keep a binary tree of relations in memory, where leaves are
 RelOptInfos with 1 relid and the root is a RelOptInfo with all relids.
 Each iteration of the algorithm picks two nodes at random (with some
 restrictions, but that's not important) and tries to swap them around,
 meaning that a tree like (use a monospace font for best results):

   (1 2 3 4)
  *(1 2)(3 4)
  (1) (2) *(3) (4)

 where the parenthesised things are the two chosen nodes would get
 transformed into

   (1 2 3 4)
 (3)  (1 2 4)
   (1 2)(4)
  (1) (2)

 that is, the (1 2) node and its subtree gets swapped with the (3) node
 and the upper-level nodes get changed accordingly. Sometimes a swap like
 that will produce an invalid join ordering - then swap is then reverted.
 If the join order given by the tree after the swap is legal, the paths
 are recomputed, much like in geqo_eval, and if the root node's cheapest
 path is not cheaper that before the swap, the swap gets reverted.
 Simulated Annealing algorithms permit uphill moves in terms of cost,
 with some probability that's decreasing as time passes, but that's not
 implemented yet. After a fixed amount of moves, the algorithm stops and
 returns the root node of the tree as the single RelOptInfo.

 The issues with the approach are:

  o) the initial state is not really a random plan, it's usualy a
 left-deep tree (and is very inefficient) and this might skew results.

  o) is swapping random nodes like that a good way of exploring the
 solution space? The SA paper suggests something much simpler, but some
 of the moves proposed there don't really make sense when using the
 make_join_rel machinery:
   *) changing the inner and outer relation of a join doesn't make
 sense, because make_join_rel is symmetrical
   *) changing the join executing strategy (nested loop, merge join,
 etc.) doesn't make sense, because make_join_rel considers all possible
 paths for a join

  o) each swap needs a full recalcualtion of the whole solution
 (geqo_eval does that, for instance), maybe it's possible to leave the
 subtrees of swapped nodes intact and only recalculate the nodes above
 the two swapped nodes?

  o) because make_join_rel scribbles on the PlannerInfo, the same hack as
 in geqo_eval is necessary: truncating join_rel_list after building all
 joinrels and nulling out the join_rel_hash

  o) I use make_join_rel to determine whether a join is legal, which does
 lots of other things. That looks like wasted effort, although in the end
 each time I need to build the final rel to assess the resulting path's
 cost. To follow the SA algorithm spirit more closely it would be
 necessary to only consider one, random path for each relation and make
 using different paths a move that the algorithm can choose while
 exploring the solution space. A cheaper way of determining the current
 solution's cost would be nice, too - fully rebuilding the final
 RelOptInfo after each move is annoying.

 Lastly, I'm lacking good testcases or even a testing approach: I'm
 generating silly queries and looking at how they get optimised, but if
 someone has a real dataset and examples of joins that cannot be planned
 with the standard planner, I would be interested to compare the results
 my prototype gets with those produced by GEQO.

 The code, a module that you can LOAD into a backend, is here (if you
 really want to see it):
 http://git.wulczer.org/?p=saio.git
 Set the saio GUC to true 

Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Could we create an option to create index names automatically, so you'd
 only have to write

 CREATE INDEX ON foo (a);

 which would pick a name like foo_a_idx.

Having done all the groundwork to support that nicely, I find that it
doesn't work because of bison limitations :-(.  AFAICT, the only way
we could support this syntax would be to make ON a reserved word.
Or at least more reserved than it is now.  We used up all the wiggle
room we had by making CONCURRENTLY non-reserved.

Now ON is reserved according to SQL99, but I'm a bit hesitant to
make it so in our grammar for such a marginal feature as this.
Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Tom Lane
I wrote:
 The trouble with changing the index attnames for expressions is that it
 increases the risk of collisions with attnames for regular index
 columns.  You can hit that case today:

 regression=# create table foo (f1 int, f2 text);
 CREATE TABLE
 regression=# create index fooi on foo(f1, lower(f2));
 CREATE INDEX
 regression=# \d fooi
   Index public.fooi
  Column  |  Type   | Definition 
 -+-+
  f1  | integer | f1
  pg_expression_2 | text| lower(f2)
 btree, for table public.foo

 regression=# alter table foo rename f1 to pg_expression_2;
 ERROR:  duplicate key value violates unique constraint 
 pg_attribute_relid_attnam_index
 DETAIL:  Key (attrelid, attname)=(64621, pg_expression_2) already exists.

 but it's not exactly probable that someone would name a column
 pg_expression_N.  The risk goes up quite a lot if we might use simple
 names like abs or lower for expression columns.

It strikes me that the easiest way to deal with this is just to get rid
of the code in renameatt() that tries to rename index columns to agree
with the underlying table columns.  That code is not nearly bright
enough to deal with collisions, and furthermore it seems rather
inconsistent to try to rename index columns (which are not very
user-visible in the first place) while not renaming the indexes
themselves (which surely are user-visible).  There was some marginal
excuse for doing it back when \d didn't show the index column
definition; but now that it does, I don't think the behavior is worth
expending effort on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Alvaro Herrera
Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Could we create an option to create index names automatically, so you'd
  only have to write
 
  CREATE INDEX ON foo (a);
 
  which would pick a name like foo_a_idx.
 
 Having done all the groundwork to support that nicely, I find that it
 doesn't work because of bison limitations :-(.  AFAICT, the only way
 we could support this syntax would be to make ON a reserved word.
 Or at least more reserved than it is now.  We used up all the wiggle
 room we had by making CONCURRENTLY non-reserved.

And here's Simon talking about making CONCURRENTLY more reserved so that
people stop creating indexes named concurrently ...

http://database-explorer.blogspot.com/2009/09/create-index-concurrently.html

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane wrote:
 ... AFAICT, the only way
 we could support this syntax would be to make ON a reserved word.
 Or at least more reserved than it is now.  We used up all the wiggle
 room we had by making CONCURRENTLY non-reserved.

 And here's Simon talking about making CONCURRENTLY more reserved so that
 people stop creating indexes named concurrently ...
 http://database-explorer.blogspot.com/2009/09/create-index-concurrently.html

Hmm.  It would actually work if we made CONCURRENTLY reserved instead;
and that would fix Simon's gripe too.  That's kind of weird from a
standards-compliance POV, but in terms of the risk of breaking
applications it might be better than reserving ON.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Tom Lane
I wrote:
 Hmm.  It would actually work if we made CONCURRENTLY reserved instead;
 and that would fix Simon's gripe too.  That's kind of weird from a
 standards-compliance POV, but in terms of the risk of breaking
 applications it might be better than reserving ON.

Wait a minute.  I must have been looking at the wrong keyword list
--- ON already is reserved.  The problem is exactly that it can't
tell whether CREATE INDEX CONCURRENTLY ON ... means to default the
index name or to create an index named CONCURRENTLY.  So really the
*only* way to fix this is to make CONCURRENTLY be at least
type_func_name_keyword.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread David E. Wheeler
On Dec 22, 2009, at 7:31 PM, Tom Lane wrote:

 Wait a minute.  I must have been looking at the wrong keyword list
 --- ON already is reserved.  The problem is exactly that it can't
 tell whether CREATE INDEX CONCURRENTLY ON ... means to default the
 index name or to create an index named CONCURRENTLY.  So really the
 *only* way to fix this is to make CONCURRENTLY be at least
 type_func_name_keyword.

+1 if it prevents indexes from being named CONCURRENTLY.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] join ordering via Simulated Annealing

2009-12-22 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 I've been playing with using a Simulated Annealing-type algorithm for
 determinig join ordering for relations.

Cool.

 The code I have now creates the initial plan by doing something similar
 to what gimme_tree does in GEQO, but without any kind of heuristic to
 try to favour joins with join restrictions (the idea is that it doesn't
 matter, since we only want to get *any* plan and we only do it once),
 but ideally it would be somehow chosen randonly from the space of all
 possible join orderings.

FWIW, I think that's probably a bad idea.  In a query where there are a
lot of join order constraints, you can waste enormous amounts of time
trying to find a join order that is even legal at all, let alone any
good.  Pure randomness is not as nice as it seems.  You might want to
study the CVS history of GEQO a bit and try to avoid falling into some
of the traps we already fell into with that ;-)

  o) each swap needs a full recalcualtion of the whole solution
 (geqo_eval does that, for instance), maybe it's possible to leave the
 subtrees of swapped nodes intact and only recalculate the nodes above
 the two swapped nodes?

Even if you could make this work, it'd probably be an infeasible
space-for-time tradeoff, because you'd have to keep around all the Path
infrastructure of the lower nodes in order to not start from scratch.
As we were forcibly reminded a few months ago, one of the important
properties of the GEQO code is to be able to do planning with a limited
amount of memory even for very large relation sets.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Dec 22, 2009, at 7:31 PM, Tom Lane wrote:
 
  Wait a minute.  I must have been looking at the wrong keyword list
  --- ON already is reserved.  The problem is exactly that it can't
  tell whether CREATE INDEX CONCURRENTLY ON ... means to default the
  index name or to create an index named CONCURRENTLY.  So really the
  *only* way to fix this is to make CONCURRENTLY be at least
  type_func_name_keyword.
 
 +1 if it prevents indexes from being named CONCURRENTLY.

Yeah, if you really want to have an index named like that you can use
double quotes.  Seems a sensible compromise.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating index names automatically?

2009-12-22 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 David E. Wheeler wrote:
 +1 if it prevents indexes from being named CONCURRENTLY.

 Yeah, if you really want to have an index named like that you can use
 double quotes.  Seems a sensible compromise.

Well, this will also break tables and columns named concurrently.
I think the odds of it being a problem are small, but still it is
a reserved word that shouldn't be reserved according to the SQL spec.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers