Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Heikki Linnakangas
Tom Lane wrote:
 So I think that labeling textanycat/anytextcat as immutable was a
 thinko, and we instead ought to label them volatile so that the planner
 can inline them no matter what the behavior of the underlying text cast
 is.

That feels backwards, having to label functions as more volatile than
they really are, just to allow optimizations elsewhere. Marking
textanycat as not immutable would forbid using it in expression indexes,
too.

 ... The planner will not inline a function if the
 resulting expression would be more volatile than the function claims
 itself to be, because sometimes the point of such a function is to
 hide the expression's volatility. ...

Could we inline the function anyway, if it came from an operator?
Presumably if you want to hide an expresssion's volatility, you use an
explicit function call - I can't imagine using an operator for that purpose.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for PKST timezone

2010-05-16 Thread Joachim Wieland
On Wed, May 12, 2010 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joachim Wieland j...@mcknight.de writes:
 Good we have found that inconsistency now, so let's add PKST.

 OK, done.  BTW, I notice that PKT was labeled (not in zic), which
 is not the case, per this discussion.  I seem to recall having noticed
 some others that seemed to be mislabeled the same way.  What process did
 you use to compare this list to the zic files, and do we need to revisit
 it?

I have used a modified version of zic.c that outputs the data while
generating the binary timezone files. Generating the timezone offset
files from that then included some scripts and some manual work. It
seems that we should have an automated process for that, at least for
checking against our current set. I'll see if I can come up with that.
PKST for example was valid only for a single year in the past but in
the newer timezone data it is now valid forever. Ideally we can run a
script that tells us about such changes whenever we bring in new
timezone data.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Bruce Momjian wrote:
 
  Maybe I have misunderstood. How exactly is the server version being 
  hacked here? I know it's only for testing, but it still seems to me that 
  lying to a program as heavily version dependant as pg_dump is in general 
  a bad idea.
  
 
  The code in pg_dump 9.0 is:
 
  /*
   * If supported, set extra_float_digits so that we can dump float data
   * exactly (given correctly implemented float I/O code, anyway)
   */
  if (g_fout-remoteVersion = 9)
  do_sql_command(g_conn, SET extra_float_digits TO 3);
  else if (g_fout-remoteVersion = 70400)
  -- do_sql_command(g_conn, SET extra_float_digits TO 2);
 
  The indicated line had to be changed to '3'.  I did not change anything
  else, and it was only done in my private CVS tree.
 

 
 Oh, I see. It is pg_dump that you hacked. That wasn't clear to me from 
 what you first said.
 
 But do earlier server versions accept a value of 3? The 8.4 docs say 
 The value can be set as high as 2.

That is the other thing I had to hack --- the 8.4 backend version had to
be changed to accept '3'.  The good thing is this has to be done only
once --- once I have the dump file, I can use it in testing repeatedly
because 8.4 does not change.

Eventually the idea would be to have the build farm run such tests (with
a properly created dump file) so we can learn quickly if the backend
data format is changed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Andrew Dunstan wrote:
 But do earlier server versions accept a value of 3? The 8.4 docs say 
 The value can be set as high as 2.

 That is the other thing I had to hack --- the 8.4 backend version had to
 be changed to accept '3'.  The good thing is this has to be done only
 once --- once I have the dump file, I can use it in testing repeatedly
 because 8.4 does not change.

 Eventually the idea would be to have the build farm run such tests (with
 a properly created dump file) so we can learn quickly if the backend
 data format is changed.

If we're thinking of doing that, it would be better to back-patch the
change that allowed '3'.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 So I think that labeling textanycat/anytextcat as immutable was a
 thinko, and we instead ought to label them volatile so that the planner
 can inline them no matter what the behavior of the underlying text cast
 is.

 That feels backwards, having to label functions as more volatile than
 they really are, just to allow optimizations elsewhere.

Well, it's also bogus to label functions as less volatile than they
really are.  An error in that direction is unsafe, while marking a
function as more volatile than it truly is cannot result in wrong
behavior.

 Marking textanycat as not immutable would forbid using it in
 expression indexes, too.

True.  On the other hand, the current state of affairs allows one to
create an index on expressions that aren't really immutable, with
ensuing hilarity.

The basic problem here is that these functions are polymorphic and their
actual volatility can vary depending on the argument datatype.  We don't
have any way to describe that in the present pg_proc definition.  It
does seem like we ought to think about improving this, but that's
clearly a research project.  In terms of what we could reasonably do
for 9.0, I think it's change the volatility marking or nothing ...
and changing the marking looks like the better bet to me.

 ... The planner will not inline a function if the
 resulting expression would be more volatile than the function claims
 itself to be, because sometimes the point of such a function is to
 hide the expression's volatility. ...

 Could we inline the function anyway, if it came from an operator?

No, that's just a crock with no semantic justification at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Simon Riggs
On Sun, 2010-05-16 at 00:05 +0300, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  Simon Riggs wrote:
  WALSender sleeps even when it might have more WAL to send, it doesn't
  check it just unconditionally sleeps. At least WALReceiver loops until
  it has no more to receive. I just can't imagine why that's useful
  behaviour.
  
  Good catch. That should be fixed.
  
  I also note that walsender doesn't respond to signals, while it's
  sending a large batch. That's analogous to the issue that was addressed
  recently in the archiver process.
 
 Attached patch rearranges the walsender loops slightly to fix the above.
 XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
 2) in one round and returns to the main loop after that even if there's
 unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
 That way the main loop gets to respond to signals quickly, and we also
 get to update the shared memory status and PS display more often when
 there's a lot of catching up to do.
 
 Comments

8MB at a time still seems like a large batch to me.

libpq is going to send it in smaller chunks anyway, so I don't see the
importance of trying to keep the batch too large. It just introduces
delay into the sending process. We should be sending chunks that matches
libpq better.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Tom Lane
I wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Marking textanycat as not immutable would forbid using it in
 expression indexes, too.

 True.  On the other hand, the current state of affairs allows one to
 create an index on expressions that aren't really immutable, with
 ensuing hilarity.

It strikes me that we could avoid any possible functional regression
here by having CREATE INDEX perform expression preprocessing (in
particular, function inlining) before it tests to see if the index
expression contains any non-immutable functions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-16 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 A saner
 approach, which would also help for other corner cases such as
 out-of-disk-space, would be to check for write failures on the output
 file and abandon the query if any occur.

I had considered this, but I'm not sure we really need to catch *every*
write failure.  Perhaps just catching if the '\n' at the end of a row
fails to be written out would be sufficient?  Then turning around and
setting cancel_query might be enough..  I'll write that up and test if
it works.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Simon Riggs
On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote:
 On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:
 
  I will recode using that concept.

 Startup gets new pointer when it runs out of data to replay. That might
 or might not include an updated keepalive timestamp, since there's no
 exact relationship between chunks sent and chunks received. Startup
 might ask for a new chunk when half a chunk has been received, or when
 multiple chunks have been received.

New version, with some other cleanup of wait processing.

New logic is that when Startup asks for next applychunk of WAL it saves
the lastChunkTimestamp. That is then the base time used by
WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
Since multiple receivechunks can arrive from primary before Startup asks
for next applychunk we use the oldest receivechunk timestamp, not the
latest. Doing it this way means the lastChunkTimestamp doesn't change
when new keepalives arrive, so we have a stable and reasonably accurate
recordSendTimestamp for each WAL record.

The keepalive is sent as the first part of a new message, if any. So
partial chunks of data always have an accurate timestamp, even if that
is slightly older as a result. Doesn't make much difference except with
very large chunks.

I think that addresses the points raised on this thread and others.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 4232,4247  The commands accepted in walsender mode are:
/varlistentry
/variablelist
/para
!   /listitem
!   /varlistentry
!   /variablelist
!  /para
!  para
 A single WAL record is never split across two CopyData messages. When
 a WAL record crosses a WAL page boundary, however, and is therefore
 already split using continuation records, it can be split at the page
 boundary. In other words, the first main WAL record and its
 continuation records can be split across different CopyData messages.
   /para
  /listitem
/varlistentry
--- 4232,4283 
/varlistentry
/variablelist
/para
!   para
 A single WAL record is never split across two CopyData messages. When
 a WAL record crosses a WAL page boundary, however, and is therefore
 already split using continuation records, it can be split at the page
 boundary. In other words, the first main WAL record and its
 continuation records can be split across different CopyData messages.
+   /para
+   /listitem
+   /varlistentry
+   varlistentry
+   term
+   Keepalive (B)
+   /term
+   listitem
+   para
+   variablelist
+   varlistentry
+   term
+   Byte1('k')
+   /term
+   listitem
+   para
+   Identifies the message as a keepalive.
+   /para
+   /listitem
+   /varlistentry
+   varlistentry
+   term
+   TimestampTz
+   /term
+   listitem
+   para
+   The current timestamp on the primary server when the keepalive was sent.
+   /para
+   /listitem
+   /varlistentry
+   /variablelist
+   /para
+   para
+If varnamewal_level/ is set to literalhot_standby/ then a keepalive
+is sent once per varnamewal_sender_delay/. The keepalive is sent after
+WAL data has been sent, if any.
+   /para
+   /listitem
+   /varlistentry
+   /variablelist
   /para
  /listitem
/varlistentry
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 1938,1944  UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
  			UpdateControlFile();
  			minRecoveryPoint = newMinRecoveryPoint;
  
! 			ereport(DEBUG2,
  	(errmsg(updated min recovery point to %X/%X,
  		minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff)));
  		}
--- 1938,1944 
  			UpdateControlFile();
  			minRecoveryPoint = newMinRecoveryPoint;
  
! 			ereport(DEBUG3,
  	(errmsg(updated min recovery point to %X/%X,
  		minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff)));
  		}
***
*** 9212,9218  retry:
  	 * While walreceiver is active, wait for new WAL to arrive
  	 * from primary.
  	 */
! 	receivedUpto = GetWalRcvWriteRecPtr();
  	if (XLByteLT(*RecPtr, receivedUpto))
  	{
  		/*
--- 9212,9218 
  	 * While walreceiver is active, wait for new WAL to arrive
  	 * from primary.
  	 */
! 	receivedUpto = GetWalRcvNextChunk();
  	if (XLByteLT(*RecPtr, receivedUpto))
  	{
  		/*
*** a/src/backend/replication/walreceiver.c
--- b/src/backend/replication/walreceiver.c
***
*** 407,412  XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
--- 407,428 
  XLogWalRcvWrite(buf, len, recptr);
  break;
  			}
+ 		case 'k':/* keepalive */
+ 			{
+ 

Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-16 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 A saner
 approach, which would also help for other corner cases such as
 out-of-disk-space, would be to check for write failures on the output
 file and abandon the query if any occur.

 I had considered this, but I'm not sure we really need to catch *every*
 write failure.  Perhaps just catching if the '\n' at the end of a row
 fails to be written out would be sufficient?

If you're combining this with the FETCH_COUNT logic then it seems like
it'd be sufficient to check ferror(fout) once per fetch chunk, and just
fall out of that loop then.  I don't want psql issuing query cancels
on its own authority, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Robert Haas
On Sat, May 15, 2010 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I noticed by accident that there are some cases where the planner fails
 to inline the SQL functions that underlie the || operator for text vs
 non-text cases.  The reason is that these functions are marked
 immutable, but their expansion involves a coercion to text that might
 not be immutable.  The planner will not inline a function if the
 resulting expression would be more volatile than the function claims
 itself to be, because sometimes the point of such a function is to hide
 the expression's volatility.  In this case, however, we don't want to
 hide the true nature of the expression, and we definitely don't want
 to pay the performance price of calling a SQL function.  That price
 is pretty significant, eg on a rather slow machine I get

 regression=# select count(localtimestamp || i::text) from 
 generate_series(1,10) i;
  count
 
  10
 (1 row)

 Time: 12512.624 ms
 regression=# update pg_proc set provolatile = 'v' where oid = 2004;
 UPDATE 1
 Time: 7.104 ms
 regression=# select count(localtimestamp || i::text) from 
 generate_series(1,10) i;
  count
 
  10
 (1 row)

 Time: 4967.086 ms

 so the added overhead more than doubles the cost of this case.

 There's also a possibility of an outright wrong behavior, since the
 immutable marking will allow the concatenation of two constants to
 be folded to a constant in contexts where perhaps it shouldn't be.
 Continuing the above example, 'now'::timestamp || 'foo' will be folded
 to a constant on sight, which is wrong because the coercion to text
 depends on DateStyle and ought to react to a later change in DateStyle.

 So I think that labeling textanycat/anytextcat as immutable was a
 thinko, and we instead ought to label them volatile so that the planner
 can inline them no matter what the behavior of the underlying text cast
 is.

Couldn't you apply this argument to any built-in immutable function whatsoever?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Andrew Dunstan



Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:
  

Andrew Dunstan wrote:

But do earlier server versions accept a value of 3? The 8.4 docs say 
The value can be set as high as 2.
  


  

That is the other thing I had to hack --- the 8.4 backend version had to
be changed to accept '3'.  The good thing is this has to be done only
once --- once I have the dump file, I can use it in testing repeatedly
because 8.4 does not change.



  

Eventually the idea would be to have the build farm run such tests (with
a properly created dump file) so we can learn quickly if the backend
data format is changed.



If we're thinking of doing that, it would be better to back-patch the
change that allowed '3'.


  


Yeah.

It's going to require some fancy dancing to get the buildfarm to do it. 
Each buildfarm run is for a specific branch, and all the built artefacts 
are normally thrown away. I'd have to work out a way of stashing the 
binaries from a build on one branch for use in the pg_upgrade tests in 
the run on another branch. It's doable but could get messy.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Couldn't you apply this argument to any built-in immutable function 
 whatsoever?

No, only the ones that are built on top of other functions that aren't
immutable.

I did go looking for other potential problems of the same ilk.  The only
one I can find at present is to_timestamp(double), which is an immutable
SQL function but it uses timestamptz + interval, which is marked as not
immutable.  I believe the reason for that is that if the interval
includes month or day components then the addition result can depend on
the timezone setting.  However, the usage in to_timestamp() involves only
a pure seconds component so the immutable marking should be correct.
Still, we might want to think about reimplementing to_timestamp() as a
pure C function sometime, because the current implementation is many
times slower than it needs to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication patch built on SR

2010-05-16 Thread Simon Riggs
On Fri, 2010-05-14 at 15:15 -0400, Robert Haas wrote:
 On Fri, May 14, 2010 at 9:33 AM, Boszormenyi Zoltan z...@cybertec.at wrote:
  If  min_sync_replication_clients == 0, then the replication is async.
  If  min_sync_replication_clients == max_wal_senders then the
  replication is fully synchronous.
  If 0  min_sync_replication_clients  max_wal_senders then
  the replication is partially synchronous, i.e. the master can wait
  only for say, 50% of the clients to report back before it's considered
  synchronous and the relevant transactions get released from the wait.
 
 That's an interesting design and in some ways pretty elegant, but it
 rules out some things that people might easily want to do - for
 example, synchronous replication to the other server in the same data
 center that acts as a backup for the master; and asynchronous
 replication to a reporting server located off-site.

The design above allows the case you mention:
min_sync_replication_clients = 1
max_wal_senders = 2

It works well in failure cases, such as the case where the local backup
server goes down.

It seems exactly what we need to me, though not sure about names.

 One of the things that I think we will probably need/want to change
 eventually is the fact that the master has no real knowledge of who
 the replication slaves are.  That might be something we want to change
 in order to be able to support more configurability.  Inventing syntax
 out of whole cloth and leaving semantics to the imagination of the
 reader:
 
 CREATE REPLICATION SLAVE reporting_server (mode asynchronous, xid_feedback 
 on);
 CREATE REPLICATION SLAVE failover_server (mode synchronous,
 xid_feedback off, break_synchrep_timeout 30);

I am against labelling servers as synchronous/asynchronous. We've had
this discussion a few times since 2008.

There is significant advantage in having the user specify the level of
robustness, so that it can vary from transaction to transaction, just as
already happens at commit. That way the user gets to say what happens.
Look for threads on transaction controlled robustness.

As alluded to above, if you label the servers you also need to say what
happens when one or more of them are down. e.g. synchronous to B AND
async to C, except when B is not available, in which case make C
synchronous. With N servers, you end up needing to specify O(N^2) rules
for what happens, so it only works neatly for 2, maybe 3 servers.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Robert Haas
On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Couldn't you apply this argument to any built-in immutable function 
 whatsoever?

 No, only the ones that are built on top of other functions that aren't
 immutable.

Built on top of?  I don't get it.  It seems like anything of the form
immutablefunction(volatilefunction()) is vulnerable to this, and you
can give a volatile function as an argument to any function you like.
If you're saying we're testing for immutability by looking only at the
outermost function call, that seems pretty broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Jaime Casanova
On Sun, May 16, 2010 at 1:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Couldn't you apply this argument to any built-in immutable function 
 whatsoever?

 No, only the ones that are built on top of other functions that aren't
 immutable.

 Built on top of?  I don't get it.  It seems like anything of the form
 immutablefunction(volatilefunction()) is vulnerable to this, and you
 can give a volatile function as an argument to any function you like.
 If you're saying we're testing for immutability by looking only at the
 outermost function call, that seems pretty broken.


you mean we shouldn't allow this?


select version();
  version
---
 PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit
(1 row)

create table t1 (col1 int);

create function f1(int) returns double precision as $$
select random() * $1;
$$ language sql immutable;

create index idx on t1(f1(col1));


then, welcome to the club... there were various conversations on this same topic

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, only the ones that are built on top of other functions that aren't
 immutable.

 Built on top of?  I don't get it.  It seems like anything of the form
 immutablefunction(volatilefunction()) is vulnerable to this,

Uh, no, you misunderstand completely.  The problematic case is where the
function's own expansion contains a non-immutable function.  In
particular, what we have for these functions is that textanycat(a,b)
expands to a || b::text, and depending on what the type of b is, the
cast from that to text might not be immutable.  This is entirely
independent of whether the argument expressions are volatile or not.
Rather, the problem is that inlining the function definition could
by itself increase the expression's apparent volatility.  (Decreasing
the volatility is not problematic.  Increasing it is.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Robert Haas
On Sun, May 16, 2010 at 2:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, only the ones that are built on top of other functions that aren't
 immutable.

 Built on top of?  I don't get it.  It seems like anything of the form
 immutablefunction(volatilefunction()) is vulnerable to this,

 Uh, no, you misunderstand completely.  The problematic case is where the
 function's own expansion contains a non-immutable function.  In
 particular, what we have for these functions is that textanycat(a,b)
 expands to a || b::text, and depending on what the type of b is, the
 cast from that to text might not be immutable.  This is entirely
 independent of whether the argument expressions are volatile or not.
 Rather, the problem is that inlining the function definition could
 by itself increase the expression's apparent volatility.  (Decreasing
 the volatility is not problematic.  Increasing it is.)

I guess my point is that the actual volatility of an expression is
presumably independent of whether it gets inlined.  (If inlining is
changing the semantics, that's a problem.)  So if inlining is changing
it's apparent volatility, then there's something wrong with the way
we're computing apparent volatility.  No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
Hi all,

After having received several reports of worse plans on 8.4 compared to 8.3 
and recently once more one from 'vaxerdec' on IRC I tried to investigate the 
difference.
Reducing the (large and ugly, automatically generated queries) to a 
reproducible testcase I ended up with the following pattern:

explain SELECT 1
   
FROM
c
WHERE
EXISTS (
SELECT *   
FROM a
JOIN b USING (b_id)
WHERE b.c_id = c.c_id)
AND c.value = 1;

8.3 planned this to:

  Index Scan using c_value_key on c  (cost=0.00..24.83 rows=1 width=0)
Index Cond: (value = 1)
Filter: (subplan)
SubPlan
  -  Nested Loop  (cost=0.00..16.56 rows=1 width=12)
-  Index Scan using b__c_id on b  (cost=0.00..8.27 rows=1 
width=8)
  Index Cond: (c_id = $0)
-  Index Scan using a__b_id on a  (cost=0.00..8.27 rows=1 
width=8)
  Index Cond: (a.b_id = b.b_id)

Which is quite good for such a kind of query.

From 8.4 onwards this gets planned to 

  Nested Loop Semi Join  (cost=1543.00..7708.29 rows=1 width=0)
Join Filter: (c.c_id = b.c_id)
-  Index Scan using c_value_key on c  (cost=0.00..8.27 rows=1 width=4)
  Index Cond: (value = 1)
-  Hash Join  (cost=1543.00..7075.02 rows=5 width=4)
  Hash Cond: (b.b_id = a.b_id)
  -  Seq Scan on b  (cost=0.00..2164.01 rows=150001 width=8)
  -  Hash  (cost=722.00..722.00 rows=5 width=4)
-  Seq Scan on a  (cost=0.00..722.00 rows=5 width=4)

which is the near-equivalent (with s/Semi/IN/) what 8.3 produces for the above 
query with IN instead of EXISTS.

This kind of plan obviously is horrid.

The behavioral change was introduced in Tom's initial commit to support Semi 
Joins Implement SEMI and ANTI joins in the planner and executor. from 
2008-08-14 (I tried the commits before and after). 
Seeing that 8.3 didn't inline EXISTS but IN and showed the bad plan with IN 
its pretty evident that the inlining is the problem and not the patch itself.

Unsurprisingly 8.4 produces a similar plan to 8.3 if one uses a volatile 
function or a OFFSET 0 as that stops inlining.

Two questions:
1. Is there a reason this cannot be optimized? In the semi join case it 
doesn't seem to be *that* complex to push down qualifiers resulting in a plan 
like:


Nested Loop Semi Join
   - Index Scan using c_value_key on c
Index Cond: (value = 1)
   - Nested Loop
   - Index Scan using b__c_id on b
Index Cond: (b.c_id = c.c_id)
   - Index Scan using a__b_id on a
Index Cond: (a.b_id = b.b_id)

or, a bit more complex:

Nested Loop Semi Join
  -  Nested Loop Semi Join
-  Index Scan using c_value_key on c
  Index Cond: (value = 1)
-  Index Scan using b__c_id on b
  Index Cond: (b.c_id = c.c_id)
  -  Index Scan using a__b_id on a
Index Cond: (a.b_id = b.b_id)


2. I can construct several cases off the top of my head where avoiding 
inlining might yield significantly better plans unless variables are pushed 
down more aggressively into EXISTS/IN. While it would solve this issue I doubt 
that generating a separate path without inlining is viable (code complexity, 
plantime)?


I thinks its pretty annoying to have so much worse plans in 8.4 than earlier 
in relatively obvious queries, but I don't see any good, low-impact fix?

Greetings,

Andres

PS: Tom: Do you like to get directly addressed on bugs/mails like this one 
or not? I am not really sure whats the policy regarding that is on the pg 
mailinglists. Is there one?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
Testschema:

ROLLBACK;
BEGIN;

CREATE TABLE a (
a_id serial PRIMARY KEY NOT NULL,
b_id integer
);
CREATE INDEX a__b_id ON a USING btree (b_id);


CREATE TABLE b (
b_id serial NOT NULL,
c_id integer
);
CREATE INDEX b__c_id ON b USING btree (c_id);


CREATE TABLE c (
c_id serial PRIMARY KEY NOT NULL,
value integer UNIQUE
);

INSERT INTO b (b_id, c_id)
SELECT g.i, g.i FROM generate_series(1, 5) g(i);

INSERT INTO a(b_id)
SELECT g.i FROM generate_series(1, 5) g(i);

COMMIT;
ANALYZE;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Robert Haas
On Sun, May 16, 2010 at 7:07 PM, Andres Freund and...@anarazel.de wrote:
 Reducing the (large and ugly, automatically generated queries) to a
 reproducible testcase I ended up with the following pattern:

 explain SELECT 1
 FROM
    c
 WHERE
    EXISTS (
        SELECT *
        FROM a
            JOIN b USING (b_id)
        WHERE b.c_id = c.c_id)
    AND c.value = 1;

 8.3 planned this to:

  Index Scan using c_value_key on c  (cost=0.00..24.83 rows=1 width=0)
    Index Cond: (value = 1)
    Filter: (subplan)
    SubPlan
      -  Nested Loop  (cost=0.00..16.56 rows=1 width=12)
            -  Index Scan using b__c_id on b  (cost=0.00..8.27 rows=1
 width=8)
                  Index Cond: (c_id = $0)
            -  Index Scan using a__b_id on a  (cost=0.00..8.27 rows=1
 width=8)
                  Index Cond: (a.b_id = b.b_id)

 Which is quite good for such a kind of query.

 From 8.4 onwards this gets planned to
 [something bad]

I believe this is  a result of a limitation we've discussed
previously, namely, that the planner presently uses a limited,
special-case kludge to consider partial index scans, and the executor
uses another kludge to execute them.

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00525.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01755.php

I believe that Tom is planning to fix this for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Bruce Momjian
Andrew Dunstan wrote:
  Eventually the idea would be to have the build farm run such tests (with
  a properly created dump file) so we can learn quickly if the backend
  data format is changed.
  
 
  If we're thinking of doing that, it would be better to back-patch the
  change that allowed '3'.
 
  

 
 Yeah.
 
 It's going to require some fancy dancing to get the buildfarm to do it. 
 Each buildfarm run is for a specific branch, and all the built artefacts 
 are normally thrown away. I'd have to work out a way of stashing the 
 binaries from a build on one branch for use in the pg_upgrade tests in 
 the run on another branch. It's doable but could get messy.

Uh, that is not actually a problem.  You just need to set
extra_float_digits=-3 to create the dump file, which is only done once
for each major version.  You can _load_ that dump file into an
unmodified old cluster and test just fine.  I will write up some
instructions in the next few days.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Andrew Dunstan



Bruce Momjian wrote:

Andrew Dunstan wrote:
  

Eventually the idea would be to have the build farm run such tests (with
a properly created dump file) so we can learn quickly if the backend
data format is changed.



If we're thinking of doing that, it would be better to back-patch the
change that allowed '3'.


  
  

Yeah.

It's going to require some fancy dancing to get the buildfarm to do it. 
Each buildfarm run is for a specific branch, and all the built artefacts 
are normally thrown away. I'd have to work out a way of stashing the 
binaries from a build on one branch for use in the pg_upgrade tests in 
the run on another branch. It's doable but could get messy.



Uh, that is not actually a problem.  You just need to set
extra_float_digits=-3 to create the dump file, which is only done once
for each major version.  You can _load_ that dump file into an
unmodified old cluster and test just fine.  I will write up some
instructions in the next few days.

  


You are missing the point I was making. A buildfarm run does not 
normally have available to it any binaries for a version other that the 
one it is building. There is no notion of a multi-branch buildfarm run. 
Each run is for a particular branch and is a separate miracle.  So I'm 
not concerned about the structure of the dump file but about what will 
be used to load it into an old cluster during a buildfarm run.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle

2010-05-16 Thread Florian Pflug
On May 14, 2010, at 22:54 , Robert Haas wrote:
 On Thu, May 13, 2010 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 All in all, I believe that SHARE and UPDATE row-level locks should be
 changed to cause concurrent UPDATEs to fail with a serialization
 error.
 
 I don't see an argument for doing that for FOR SHARE locks, and it
 already happens for FOR UPDATE (at least if the row actually gets
 updated).  AFAICS this proposal mainly breaks things, in pursuit of
 an unnecessary and probably-impossible-anyway goal of making FK locking
 work with only user-level snapshots.
 
 After giving this considerable thought and testing the behavior at
 some length, I think the OP has it right.  One thing I sometimes need
 to do is denormalize a copy of a field, e.g.
 
 snipped example

I've whipped up a quick and still rather dirty patch that implements the 
behavior I proposed, at least for the case of conflicts between FOR UPDATE 
locks and updates. With the patch, any attempt to UPDATE or FOR UPDATE lock a 
row that has concurrently been FOR UPDATE locked will cause a serialization 
error. (The same for an actually updated row of course, but that happened 
before too).

While this part of the patch was fairly straight forward, make FOR SHARE 
conflict too seems to be much harder. The assumption that a lock becomes 
irrelevant after the transaction(s) that held it completely is built deeply 
into the multi xact machinery that powers SHARE locks. That machinery therefore 
assumes that once all members of a multi xact have completed the multi xact is 
dead also. But my proposal depends on a SERIALIZABLE transaction being able to 
find if any of the lockers of a row are invisible under it's snapshot - for 
which it'd need any multi xact containing invisible xids to outlive its 
snapshot.

best regards,
Florian Pflug




serializable_share_lock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Bruce Momjian
Andrew Dunstan wrote:
  Uh, that is not actually a problem.  You just need to set
  extra_float_digits=-3 to create the dump file, which is only done once
  for each major version.  You can _load_ that dump file into an
  unmodified old cluster and test just fine.  I will write up some
  instructions in the next few days.
 

 
 You are missing the point I was making. A buildfarm run does not 
 normally have available to it any binaries for a version other that the 
 one it is building. There is no notion of a multi-branch buildfarm run. 
 Each run is for a particular branch and is a separate miracle.  So I'm 
 not concerned about the structure of the dump file but about what will 
 be used to load it into an old cluster during a buildfarm run.

Thank you.  I understand now.

Imagine finding out on the build farm right away when we break binary
compatibility --- that would be cool.  However, that might be overkill. 
My testing seems to be working just fine.  In fact the only diff I see
is:

 CREATE PROCEDURAL LANGUAGE plpgsql;
---
 CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql;

and that is a known change.  I might end up adding my regression dump
files to our ftp site (400k for each major version), and just having
people use them for testing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle

2010-05-16 Thread Florian Pflug

On May 14, 2010, at 16:32 , Kevin Grittner wrote:

 Florian Pflug f...@phlo.org wrote:
 
 I must admit that I wasn't able to find an explicit reference to
 Oracle's behavior in their docs, so I had to resort to
 experiments. They do have examples showing how to do FK-like
 constraints with triggers, and those don't contain any warning
 whatsoever about problems in SERIALIZABLE mode, though.  But
 still, if there is word on this from Oracle somewhere, I'd love to
 hear about it.
 
 I suspect that in trying to emulate Oracle on this, you may run into
 an issue which posed challenges for the SSI implementation which
 didn't come up in the Cahill prototype implementations: Oracle, and
 all other MVCC databases I've read about outside of PostgreSQL, use
 an update in place with a rollback log technique.  Access to any
 version of a given row or index entry goes through a single
 location, with possible backtracking through the log after that,
 which simplifies management of certain concurrency issues.  Do they
 perhaps use an in-RAM lock table, pointing to the base location of
 the row for these SELECT FOR UPDATE locks?  (Just guessing; I've
 never used Oracle, myself.)

Thanks for the heads up. I think my proposed doges this, though, since UPDATE 
as well as FOR SHARE and FOR UPDATE already follow the ctid chain to find the 
most recent tuple and fail with a serialization error (within = REPEATABLE 
READ transaction) should this tuple be inaccessible to the transaction's 
snapshot.

Btw, I've just posted a quick-and-dirty patch that implements the parts of my 
proposal that deal with FOR UPDATE vs. UPDATE conflicts in response to Robert 
Haas' mail on this thread, just in case you're interested.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)

2010-05-16 Thread Robert Haas
On Wed, May 12, 2010 at 10:41 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 12, 2010 at 10:36 PM, Alvaro Herrera
 alvhe...@alvh.no-ip.org wrote:
 Excerpts from Robert Haas's message of mié may 12 20:48:41 -0400 2010:
 On Wed, May 12, 2010 at 3:55 PM, Robert Haas robertmh...@gmail.com wrote:
  I am wondering if we are not correctly handling the case where we get
  a shutdown request while we are still in the PM_STARTUP state.  It
  looks like we might go ahead and switch to PM_RECOVERY and then
  PM_RECOVERY_CONSISTENT without noticing the shutdown.  There is some
  logic to handle the shutdown when the startup process exits, but if
  the startup process never exits it looks like we might get stuck.

 I can reproduce the behavior Stefan is seeing consistently using the
 attached patch.

 W1: postgres -D ~/pgslave
 W2: pg_ctl -D ~/pgslave stop

 If there's anything to learn from this patch, is that sleep is
 uninterruptible on some platforms.  This is why sleeps elsewhere are
 broken down in loops, sleeping in small increments and checking
 interrupts each time.  Maybe some of the sleeps in the new HS code need
 to be handled this way?

 I don't think the problem is that the sleep is uninterruptible.  I
 think the problem is that a smart shutdown request received while in
 the PM_STARTUP state does not acted upon until we enter the PM_RUN
 state.  That is, there's a race condition between the SIGUSR1 that the
 startup process sends to the postmaster to signal that recovery has
 begun and the SIGTERM being sent by pg_ctl.

 However, since I haven't succeeded in producing a fix yet, take that
 with a grain of salt...

I'm replying to this thread rather than the max_standby_delay thread
because this line of discussion is not actually related to
max_standby_delay.

Fujii Masao previously reported this problem and proposed this fix:
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php

I responded by proposing that we instead simply adjust things so that
a smart shutdown is always handled at the end of recovery, just prior
to entering normal running.  On further reflection, though, I've
concluded that was a dumb idea.  Currently, when a smart shutdown
occurs during recovery, we handle it immediately UNLESS it happens
before we receive the signal that tells us it's OK to start the
background writer, in which case we get confused and lock everyone out
of the database until a fast or immediate shutdown request arrives.
So this is just a race condition, plain and simple.  Therefore I think
Fujii Masao's original idea was the best, but I have what I believe is
an equivalent but simpler implementation, which is attached.

Thoughts?  Should we try to fix this in 8.4 also, or just in HEAD?
8.3 and 8.2 never handle a smart shutdown prior to entering normal
running, and while that seems pretty useless, doing something
different would be a behavior change, so that seems like a
non-starter.  8.4 has the same behavior as HEAD, though it's not
documented in the release notes, so it's not clear how intentional the
change was.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


fix_smart_shutdown_in_recovery.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Andrew Dunstan



Bruce Momjian wrote:

Thank you.  I understand now.

Imagine finding out on the build farm right away when we break binary
compatibility --- that would be cool.  
  


I'm not saying we can't do that, just that it will not be a trivial 
change. And yes it would be cool, although I hope we would know before 
we committed such a change that that would be the outcome.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle

2010-05-16 Thread Robert Haas
On Sun, May 16, 2010 at 9:07 PM, Florian Pflug f...@phlo.org wrote:
 On May 14, 2010, at 22:54 , Robert Haas wrote:
 On Thu, May 13, 2010 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 All in all, I believe that SHARE and UPDATE row-level locks should be
 changed to cause concurrent UPDATEs to fail with a serialization
 error.

 I don't see an argument for doing that for FOR SHARE locks, and it
 already happens for FOR UPDATE (at least if the row actually gets
 updated).  AFAICS this proposal mainly breaks things, in pursuit of
 an unnecessary and probably-impossible-anyway goal of making FK locking
 work with only user-level snapshots.

 After giving this considerable thought and testing the behavior at
 some length, I think the OP has it right.  One thing I sometimes need
 to do is denormalize a copy of a field, e.g.

 snipped example

 I've whipped up a quick and still rather dirty patch that implements the 
 behavior I proposed, at least for the case of conflicts between FOR UPDATE 
 locks and updates. With the patch, any attempt to UPDATE or FOR UPDATE lock a 
 row that has concurrently been FOR UPDATE locked will cause a serialization 
 error. (The same for an actually updated row of course, but that happened 
 before too).

 While this part of the patch was fairly straight forward, make FOR SHARE 
 conflict too seems to be much harder. The assumption that a lock becomes 
 irrelevant after the transaction(s) that held it completely is built deeply 
 into the multi xact machinery that powers SHARE locks. That machinery 
 therefore assumes that once all members of a multi xact have completed the 
 multi xact is dead also. But my proposal depends on a SERIALIZABLE 
 transaction being able to find if any of the lockers of a row are invisible 
 under it's snapshot - for which it'd need any multi xact containing invisible 
 xids to outlive its snapshot.

Thanks for putting this together. I suggest adding it to the open
CommitFest - even if we decide to go forward with this, I don't
imagine anyone is going to be excited about changing it during beta.

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Andrew Dunstan wrote:
 It's going to require some fancy dancing to get the buildfarm to do it. 
 Each buildfarm run is for a specific branch, and all the built artefacts 
 are normally thrown away.

 Uh, that is not actually a problem.  You just need to set
 extra_float_digits=-3 to create the dump file, which is only done once
 for each major version.

Wrong.  In the first place, we're not going to start carrying something
as large as a pg_dump of the regression database as part of the source
code for the buildfarm.  Even if we wanted to, it wouldn't work because
the results aren't platform-independent --- there are float differences
and probably row ordering differences to worry about.  In the second
place, it won't only be done once, unless you imagine that we never
change the regression tests for back branches; a casual perusal of the
CVS logs will disprove that idea.

The only thing that's really going to work here is to generate the dump
on the fly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I guess my point is that the actual volatility of an expression is
 presumably independent of whether it gets inlined.  (If inlining is
 changing the semantics, that's a problem.)  So if inlining is changing
 it's apparent volatility, then there's something wrong with the way
 we're computing apparent volatility.  No?

Well, the way we're computing volatility is certainly an
oversimplification of reality, as was already noted upthread --- the
fundamental issue here is that polymorphism of the textcat functions
renders it impossible to know their true volatility status without
knowing the specific datatype they're being applied to.  Perhaps we
could generalize the pg_proc representation to accommodate that, but
it's certainly in the nature of a research project.  And there are
*many* cases where we're already approximating for other reasons; the
timestamptz plus interval case that I mentioned earlier is one.  So long
as any approximations are in the direction of supposing the expression
is more volatile than it really is, they are not dangerous.  But right
at the moment, the textcat functions are on the wrong side of that,
because they're claiming to be immutable when they sometimes aren't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I believe this is  a result of a limitation we've discussed
 previously, namely, that the planner presently uses a limited,
 special-case kludge to consider partial index scans, and the executor
 uses another kludge to execute them.

Yeah.  To restore this case to something like what previous versions
did, we need to be able to push an inner-indexscan parameter down
through multiple join levels, which neither the planner nor executor
can deal with at the moment.  I am planning to work on this for 9.1.

It may be worth pointing out that while the current code sucks for the
case where a nestloop-with-inner-indexscan would be the best plan, the
previous code sucked for every other case; because the previous code was
only capable of generating the equivalent of a nestloop join.  We have
to continue down this path in order to get to the place we need to be.
It's too bad that all the work didn't get done in one development cycle,
but sometimes life's like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Fujii Masao
On Sun, May 16, 2010 at 6:05 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Heikki Linnakangas wrote:
 Simon Riggs wrote:
 WALSender sleeps even when it might have more WAL to send, it doesn't
 check it just unconditionally sleeps. At least WALReceiver loops until
 it has no more to receive. I just can't imagine why that's useful
 behaviour.

 Good catch. That should be fixed.

 I also note that walsender doesn't respond to signals, while it's
 sending a large batch. That's analogous to the issue that was addressed
 recently in the archiver process.

 Attached patch rearranges the walsender loops slightly to fix the above.
 XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
 2) in one round and returns to the main loop after that even if there's
 unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
 That way the main loop gets to respond to signals quickly, and we also
 get to update the shared memory status and PS display more often when
 there's a lot of catching up to do.

 Comments

The main loop needs to respond to also the 'X' message from the standby
quickly?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Andrew Dunstan



Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:
  

Andrew Dunstan wrote:

It's going to require some fancy dancing to get the buildfarm to do it. 
Each buildfarm run is for a specific branch, and all the built artefacts 
are normally thrown away.
  


  

Uh, that is not actually a problem.  You just need to set
extra_float_digits=-3 to create the dump file, which is only done once
for each major version.



Wrong.  In the first place, we're not going to start carrying something
as large as a pg_dump of the regression database as part of the source
code for the buildfarm.  Even if we wanted to, it wouldn't work because
the results aren't platform-independent --- there are float differences
and probably row ordering differences to worry about.  In the second
place, it won't only be done once, unless you imagine that we never
change the regression tests for back branches; a casual perusal of the
CVS logs will disprove that idea.

The only thing that's really going to work here is to generate the dump
on the fly.


  


This whole discussion leads me to the conclusion that we need to look 
more imaginatively at our testing regime. When the buildfarm was created 
it (via pg_regress) covered a lot of what we needed to test, but that is 
becoming less and less true. Not only does pg_upgrade need testing but 
we need to devise some sort of automated testing regime for SR and HS, 
among other things. pg_regress is showing it's age a bit, I think.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff

2010-05-16 Thread Andres Freund
On Monday 17 May 2010 04:10:46 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I believe this is  a result of a limitation we've discussed
  previously, namely, that the planner presently uses a limited,
  special-case kludge to consider partial index scans, and the executor
  uses another kludge to execute them.
 It may be worth pointing out that while the current code sucks for the
 case where a nestloop-with-inner-indexscan would be the best plan, the
 previous code sucked for every other case; because the previous code was
 only capable of generating the equivalent of a nestloop join.  We have
 to continue down this path in order to get to the place we need to be.
 It's too bad that all the work didn't get done in one development cycle,
 but sometimes life's like that.
Yes, I realize that. Thats why I didnt report it as an actual bug... And its 
way easier to deal with 8.4s deficiency than with the former behaviour.

Thanks,

Andres

PS: I think it lead me to an actual bug, expect a report tomorrow...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-16 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 This whole discussion leads me to the conclusion that we need to look 
 more imaginatively at our testing regime. When the buildfarm was created 
 it (via pg_regress) covered a lot of what we needed to test, but that is 
 becoming less and less true. Not only does pg_upgrade need testing but 
 we need to devise some sort of automated testing regime for SR and HS, 
 among other things. pg_regress is showing it's age a bit, I think.

The regression tests have never pretended to test more than a fraction
of what might be interesting to test.  Crash recovery, in particular,
has always been interesting and has never been tested in any mechanized
way.  They don't really exercise concurrent behavior in any meaningful
way either.  I don't think they're showing their age so much as we're
starting to get more ambitious about what we would like to have routine
testing for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Fujii Masao
On Mon, May 17, 2010 at 1:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote:
 On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

  I will recode using that concept.

 Startup gets new pointer when it runs out of data to replay. That might
 or might not include an updated keepalive timestamp, since there's no
 exact relationship between chunks sent and chunks received. Startup
 might ask for a new chunk when half a chunk has been received, or when
 multiple chunks have been received.

 New version, with some other cleanup of wait processing.

 New logic is that when Startup asks for next applychunk of WAL it saves
 the lastChunkTimestamp. That is then the base time used by
 WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later.
 Since multiple receivechunks can arrive from primary before Startup asks
 for next applychunk we use the oldest receivechunk timestamp, not the
 latest. Doing it this way means the lastChunkTimestamp doesn't change
 when new keepalives arrive, so we have a stable and reasonably accurate
 recordSendTimestamp for each WAL record.

 The keepalive is sent as the first part of a new message, if any. So
 partial chunks of data always have an accurate timestamp, even if that
 is slightly older as a result. Doesn't make much difference except with
 very large chunks.

 I think that addresses the points raised on this thread and others.

Is it OK that this keepalive message cannot be used by HS in file-based
log-shipping? Even in SR, the startup process cannot use the keepalive
until walreceiver has been started up.

WalSndKeepAlive() always calls initStringInfo(), which seems to cause
memory-leak.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers