date:20100504

[HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Srinivas Naik

Hi,

 Can any one tell me whats the effect of the below Query

SELECT substring(B'0001' from 5 for -2);
SELECT substring(B'0001' from 4 for -3);

its observed that there's an Error invalid memory alloc request size
4244635647

What will actually happen to the Postgresql database.

I am using Postgresql 8.4.3 on Ubuntu 8.04

Thanks in Advance
Regards,
Srinivas Naik

[HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Simon Riggs


In the original patch I had Pause/Resume feature for controlling
recovery during Hot Standby. It was removed for lack of time.

With all the discussion around the HS UI, it would be something that
could be back very easily.

I would like to do it as a recovery control plugin. The plugin would be
passed 3 pieces of information and be called before each WAL record was
applied:

* current WAL pointer
* xid - 0 if not a commit/abort
* timestamp - if available
* boolean flag indicating whether it's a record type that conflicts

**No user data would be passed to the plugin**, so no need to revisit
the discussions around WAL plugins etc.. The plugin has the benefit of
providing a whole range of possible control options, as well as being
minimal performance overhead.

This would allow initially allow
* Pause
* Resume
and would go into 9.0 as a contrib module, included with the plugin
patch.

Later we would be able to add on such things as
* Pause for a delay
* Seek to a particular xid commit record
* Seek to a particular WAL pointer and stop

This would be particularly helpful in designing an automated test suite,
since we can recheck the snapshot after each commit to verify it matches
the primary.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TOAST code ignores freespace (was Tweak TOAST code)

2010-05-04 Thread Simon Riggs

On Mon, 2010-05-03 at 23:36 -0400, Jan Wieck wrote:

 Suppose we had something similar to the NULL value bitmap, specifying 
 plain or compressed values (not TOAST references), that are moved to a
 shadow tuple inside the toast table. Suppose further we had some 
 statistics about how often attributes appear in a qualification (i.e. 
 end up in a scan key or scan filter or other parts of the qual 
 expression list). Not sure, maybe we even want to know how often or 
 seldom an attribute is heap_getattr()'d at all. Those don't need to be
 accurate counts. Small random samples will probably do. ANALYZE could 
 evaluate those statistics and adjust the shadow storage settings per
 attribute accordingly.
 
 I can imagine many applications, where this would shrink the main
 tuples to almost nothing at all.

Automatic vertical partitioning. Like it. 

TODO item for further detailed research.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Mon, 2010-05-03 at 15:04 -0700, Josh Berkus wrote:

 I don't see the issue with Tom's approach from a wait perspective.  The
 max wait becomes 1.001X max_standby_delay; there's no way I can think of
 that replay would wait longer than that.  I've yet to see an explanation
 why it would be longer.

Yes, the max wait on any *one* blocker will be max_standby_delay. But if
you wait for two blockers, then the total time by which the standby lags
will now be 2*max_standby_delay. Add a third, fourth etc and the standby
lag keeps rising.

We need to avoid confusing these two measurables

* standby lag - defined as the total delay from when a WAL record is
written to the time the WAL record is applied. This includes both
transfer time and any delays imposed by Hot Standby.

* standby query delay - defined as the time that recovery will wait for
a query to complete before a cancellation takes place. (We could
complicate this by asking what happens when recovery is blocked twice by
the same query? Would it wait twice, or does it have to track how much
it has waited for each query in total so far?)

Currently max_standby_delay seeks to constrain the standby lag to a
particular value, as a way of providing a bounded time for failover, and
also to constrain the amount of WAL that needs to be stored as the lag
increases. Currently, there is no guaranteed minimum query delay given
to each query.

If every query is guaranteed its requested query delay then the standby
lag will be unbounded. Less cancellations, higher lag. Some people do
want this, though is not currently available. We can do this with two
new GUCs:

* standby_query_delay - USERSET parameter that allows user to specify a
guaranteed query delay, anywhere from 0 to maximum_standby_query_delay

* max_standby_query_delay - SIGHUP parameter - parameter exists to
provide DBA with a limit on the USERSET standby_query_delay, though I
can see some would say this is optional

Current behaviour is same as global settings of
standby_query_delay = 0
max_standby_query_delay = 0
max_standby_delay = X

So if people want minimal cancellations they would specify
standby_query_delay = Y (e.g. 30)
max_standby_query_delay = Z (e.g. 300)
max_standby_delay = -1

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:

 As I remember, 9.0 has two behaviors:
 
   o  master delays vacuum cleanup
   o  slave delays WAL application
 
 and in 9.1 we will be adding:
 
   o  slave communicates snapshots to master

 How would this figure into what we ultimately want in 9.1?

We would still want all options, since slave communicates snapshot to
master doesn't solve the problem it just moves the problem elsewhere.
It's a question of which factors the user wishes to emphasise for their
specific use.

 I understand Simon's point that the two behaviors have different
 benefits.  However, I believe few users will be able to understand when
 to use which.

If users can understand how to set NDISTINCT for a column, they can
understand this. It's not about complexity of UI, its about solving
problems. When people hit an issue, I don't want to be telling people
we thought you wouldn't understand it, so we removed the parachute.
They might not understand it *before* they hit a problem, so what? But
users certainly will afterwards and won't say thanks if you prevent an
option for them, especially for the stated reason. (My point about
ndistinct: 99% of users have no idea that exists or when to use it, but
it still exists as an option because it solves a known issue, just like
this.)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Dimitri Fontaine

Simon Riggs si...@2ndquadrant.com writes:

 In the original patch I had Pause/Resume feature for controlling
 recovery during Hot Standby. It was removed for lack of time.

 With all the discussion around the HS UI, it would be something that
 could be back very easily.

Please!

Manual control over recovery is the best solution ever proposed for
giving the user explicit control over the trade-off between HA and slave
queries.

It would allow us to say that by default, conflict favors WAL recovery
no matter what. If you want to ensure your queries won't get canceled,
pause the recovery, run your report, resume the recovery.

I understand that automated and flexible conflict resolution still is
needed or wanted even with this UI, but that would allow a much more
crude automated tool to be acceptable.  Specifically, it could only
target short queries on the standby, for long running queries you don't
want to get cancelled, pause the recovery.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Mark Kirkwood


On 04/05/10 18:47, Srinivas Naik wrote:

Hi,
 Can any one tell me whats the effect of the below Query
SELECT substring(B'0001' from 5 for -2);
SELECT substring(B'0001' from 4 for -3);
its observed that there's an Error invalid memory alloc request size 
4244635647

What will actually happen to the Postgresql database.
I am using Postgresql 8.4.3 on Ubuntu 8.04



I can reproduce this with 8.4.*2* on Ubuntu 10.04 32-bit (postgres built 
from src) but *not* with 8.4.*3*. Can you double check it is 8.4.3 you 
are using? and if so tell us whether you are using 32 or 64 bit Ubuntu!


Cheers

Mark

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 4:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 option for them, especially for the stated reason. (My point about
 ndistinct: 99% of users have no idea that exists or when to use it, but
 it still exists as an option because it solves a known issue, just like
 this.)

Slightly OT, but funnily enough, when I was up in New York a couple of
weeks ago with Bruce and a couple of other folks, I started talking
with a DBA up there about his frustrations with PostgreSQL, and - I'm
not making this up - the first example he gave me of something he
wished he could do in PG to improve query planning was manually
override ndistinct estimates.  He was pleased to here that we'll have
that in 9.0 and I was pleased to be able to tell him it was my patch.
If you'd asked me what the odds that someone picking a missing feature
would have come up with that one were, I'd have said a billion-to-one
against.  But I'm not making this up.

To be honest, I am far from convinced that the existing behavior is a
good one and I'm in favor of modifying it or ripping it out altogether
if we can think of something better.  But it has to really be better,
of course, not just trading one set of pain points for another.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 07:13 -0400, Robert Haas wrote:
 On Tue, May 4, 2010 at 4:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
  option for them, especially for the stated reason. (My point about
  ndistinct: 99% of users have no idea that exists or when to use it, but
  it still exists as an option because it solves a known issue, just like
  this.)
 
 Slightly OT, but funnily enough, when I was up in New York a couple of
 weeks ago with Bruce and a couple of other folks, I started talking
 with a DBA up there about his frustrations with PostgreSQL, and - I'm
 not making this up - the first example he gave me of something he
 wished he could do in PG to improve query planning was manually
 override ndistinct estimates.  He was pleased to here that we'll have
 that in 9.0 and I was pleased to be able to tell him it was my patch.
 If you'd asked me what the odds that someone picking a missing feature
 would have come up with that one were, I'd have said a billion-to-one
 against.  But I'm not making this up.

It matches my experience. I think its a testament to the expertise of
our users as well to the hackers that have done so much to make that the
top of user's lists for change.

 To be honest, I am far from convinced that the existing behavior is a
 good one and I'm in favor of modifying it or ripping it out altogether
 if we can think of something better.  But it has to really be better,
 of course, not just trading one set of pain points for another.

The only way I see as genuine better rather than just a different mix of
trade-offs is to come up with ways where there are no conflicts. Hannu
came up with one, using filesystem snapshots, but we haven't had time to
implement that yet.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Srinivas Naik

Hi Mark,

Please find the below details:

postgresql-8.3

and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb

and its an 32bit Ubuntu.



On Tue, May 4, 2010 at 9:02 AM, Mark Kirkwood mark.kirkw...@catalyst.net.nz
 wrote:

  On 04/05/10 18:47, Srinivas Naik wrote:

 Hi,

  Can any one tell me whats the effect of the below Query

 SELECT substring(B'0001' from 5 for -2);
 SELECT substring(B'0001' from 4 for -3);

 its observed that there's an Error invalid memory alloc request size
 4244635647

 What will actually happen to the Postgresql database.

 I am using Postgresql 8.4.3 on Ubuntu 8.04




 I can reproduce this with 8.4.*2* on Ubuntu 10.04 32-bit (postgres built
 from src) but *not* with 8.4.*3*. Can you double check it is 8.4.3 you are
 using? and if so tell us whether you are using 32 or 64 bit Ubuntu!

 Cheers

 Mark

Re: [HACKERS] what is good solution for support NULL inside string_to_array function?

2010-05-04 Thread Pavel Stehule

2010/5/4 Josh Berkus j...@agliodbs.com:

 quietly removing NULL is maybe good for compatibility but is wrong for
 functionality.

 I agree.  I wasn't aware of this little misfeature.

 Default display for NULL should be a zero-length string.


I disagree - NULL is NULL, not empty string (Oracle is different)

if array_to_string is equivalent to

x[1] || sep || x[2] || sep || x[3] || sep 

then correct result is NULL

and then string_to_array and array_to_string are correct, because
string_to_array cannot contain any NULL symbol.

Regards
Pavel Stehule



 --
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
 If recovery waits for max_standby_delay every time something gets in its
 way, it should be clear that if many things get in its way it will
 progressively fall behind. There is no limit to this and it can always
 fall further behind. It does result in fewer cancelled queries and I do
 understand many may like that.

Guess I wasn't very clear in my previous description of what I *think*
the change would be (Tom, please jump in if I've got this wrong..).
Recovery wouldn't wait max_standby_delay every time; I agree, that would
be a big change in behaviour and could make it very difficult for the
slave to keep up.  Rather, recovery would proceed as normal until it
encounters a lock, at which point it would start a counting down from
max_standby_delay, if the lock is released before it hits that, then it
will move on, if another lock is encoutered, it would start counting
down from where it left off last time.  If it hits zero, it'll cancel
the other query, and any other queries that get in the way, until it's
caught up again completely.  Once recovery is fully caught up, the
counter would reset again to max_standby_delay.

 That is *significantly* different from how it works now. (Plus: If there
 really was no difference, why not leave it as is?)

Because it's much more complicated the way it is, it doesn't really work
as one would expect in a number of situations, and it's trying to
guarantee something that it probably can't.

 The bottom line is this is about conflict resolution. There is simply no
 way to resolve conflicts without favouring one or other of the
 protagonists. Whatever mechanism you come up with that favours one will,
 disfavour the other. I'm happy to give choices, but I'm not happy to
 force just one kind of conflict resolution.

I don't think anyone is trying to get rid of the knob entirely; you're
right, you can't please everyone all the time, so there has to be some
kind of knob there which people can adjust based on their particular use
case and system.  This is about what exactly the knob is and how it's
implemented and documented.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 4:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 In the original patch I had Pause/Resume feature for controlling
 recovery during Hot Standby. It was removed for lack of time.

Well, it's not like we have more time now than we did then.  I think
we need to postpone this discussion to 9.1.  If we're going to start
accepting patches for new features, then why should we accept only
patches for HS/SR?  I have two patches already in the queue that I'd
like to see committed and if I thought that there was a chance of
getting anything further done for 9.0, there'd be several more.  Many
other people have patches waiting also, or are holding off development
because we are in feature freeze right now.  Hot Standby is a great
feature, but, I don't see any reason to say that we're going to allow
new feature development just for HS but not for anything else.

I also think that worrying about fine-tuning HS at this point is a bit
like complaining that the jump suits of the crew of the Space Shuttle
Challenger were not made of 100% recyclable materials.  Just yesterday
we had a report of an HS server getting into a state where it failed
to shut down properly; and I believe that we never fully resolved the
issue of occasional extremely-long spikes in HS response time, either.
 Heikki just fixed a bug our btree recovery code which is apparently
new to 9.0 since he did not backpatch it.  I think that getting into a
discussion of pausing and resuming recovery, or even the parallel
discussion on max_standby_delay, are fiddling with things that,
granted, are probably not ideal, and yes, we should improve them in a
future release, but they're not what we should be worrying about right
now.  What I think we SHOULD be worried about right now - VERY worried
- is stabilizing the existing Hot Standby code to the point where it
won't be an embarrassment to us when we ship it.  The rate at which
we're finding new problems even with the small number of people who
test alpha releases and nightly snapshots suggests to me that we're
not there yet.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 7:46 AM, Srinivas Naik naik.sr...@gmail.com wrote:
 Hi Mark,

     Please find the below details:

 postgresql-8.3

 and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb

 and its an 32bit Ubuntu.

Err, before you said 8.4.3.  Now you're saying 8.3.  Those are totally
different.  Can we get the exact identifier of the package you have
installed, plus the output from

SELECT version();

Thanks,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 09:12 -0400, Stephen Frost wrote:
 * Simon Riggs (si...@2ndquadrant.com) wrote:
  If recovery waits for max_standby_delay every time something gets in its
  way, it should be clear that if many things get in its way it will
  progressively fall behind. There is no limit to this and it can always
  fall further behind. It does result in fewer cancelled queries and I do
  understand many may like that.
 
 Guess I wasn't very clear in my previous description of what I *think*
 the change would be (Tom, please jump in if I've got this wrong..).
 Recovery wouldn't wait max_standby_delay every time; I agree, that would
 be a big change in behaviour and could make it very difficult for the
 slave to keep up.  Rather, recovery would proceed as normal until it
 encounters a lock, at which point it would start a counting down from
 max_standby_delay, if the lock is released before it hits that, then it
 will move on, if another lock is encoutered, it would start counting
 down from where it left off last time.  If it hits zero, it'll cancel
 the other query, and any other queries that get in the way, until it's
 caught up again completely.  Once recovery is fully caught up, the
 counter would reset again to max_standby_delay.

This new clarification is almost exactly how it works already. Sounds
like the existing docs need some improvement.

The only difference is that max_standby_delay is measured from log
timestamp. Perhaps it should work from WAL receipt timestamp rather than
from log timestamp? That would make some of the problems go away without
significantly changing the definition. I'll look at that.

(And that conflicts are caused by more situations than just locks, but
that detail doesn't alter your point).

  The bottom line is this is about conflict resolution. There is simply no
  way to resolve conflicts without favouring one or other of the
  protagonists. Whatever mechanism you come up with that favours one will,
  disfavour the other. I'm happy to give choices, but I'm not happy to
  force just one kind of conflict resolution.
 
 I don't think anyone is trying to get rid of the knob entirely; you're
 right, you can't please everyone all the time, so there has to be some
 kind of knob there which people can adjust based on their particular use
 case and system.  This is about what exactly the knob is and how it's
 implemented and documented.

I'm happy with more than one way. It'd be nice if a single parameter,
giving one dimension of tuning, suited all ways people have said they
would like it to behave. I've not found a way of doing that.

I have no problem at all with adding additional parameters or mechanisms
to cater for the multiple dimensions of control people have asked for.
So your original interpretation is also valid for some users.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Erik Rijkers

On Tue, May 4, 2010 15:40, Robert Haas wrote:
 On Tue, May 4, 2010 at 7:46 AM, Srinivas Naik naik.sr...@gmail.com wrote:
 Hi Mark,

     Please find the below details:

 postgresql-8.3

 and UBUNTU-8.10 with linux-image-2.6.27.18-standard_810_i386.deb

 and its an 32bit Ubuntu.

 Err, before you said 8.4.3.  Now you're saying 8.3.  Those are totally
 different.  Can we get the exact identifier of the package you have
 installed, plus the output from

 SELECT version();

fwiw, results for all current postgres versions:

-- to be executed:
SELECT substring(B'0001' from 5 for -2); SELECT 
substring(B'0001' from 4
for -3);

-- postgres 9.0beta1
ERROR:  negative substring length not allowed
ERROR:  negative substring length not allowed
-- postgres 8.4.3
0001
10001
-- postgres 8.3.10
0001
10001
-- postgres 8.2.16
0001
10001
-- postgres 8.1.20
0001
10001
-- postgres 8.0.24
0001
10001
-- postgres 7.4.28
0001
10001



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what is good solution for support NULL inside string_to_array function?

2010-05-04 Thread Pavel Stehule

2010/5/4 Pavel Stehule pavel.steh...@gmail.com:
 2010/5/4 Tom Lane t...@sss.pgh.pa.us:
 Josh Berkus j...@agliodbs.com writes:
 quietly removing NULL is maybe good for compatibility but is wrong for
 functionality.

 I agree.  I wasn't aware of this little misfeature.

 Default display for NULL should be a zero-length string.

 That's just as broken as Pavel's suggestion.  Unless you have something
 that is guaranteed distingishable from the output of any non-null value,
 you really can't make a significant improvement here.


 I wouldn't modify current two params string_to_array and
 array_to_string function. So there are not any default string (maybe
 empty string) for NULL. My proposal is new three params functions with
explicit null string definition. This cannot break
 compatibility and enhance functionality - It is just short cut for
 code from my proposal - in C this functionality can by implemented
 much faster.

I did some coding - the patch can be very simple

postgres=# select array_to_string(array[1,2,3,4,5,null],',','*');
 array_to_string
-
 1,2,3,4,5,*
(1 row)

Time: 0,501 ms
postgres=# select
string_to_array(array_to_string(array[1,2,3,4,5,null],',','*'),',','*');
 string_to_array
--
 {1,2,3,4,5,NULL}
(1 row)

Time: 0,617 ms

postgres=# select string_to_array('1,2,3,4,5,*',',','*')::int[];
 string_to_array
--
 {1,2,3,4,5,NULL}
(1 row)

Time: 0,652 ms

and then string_to_array and array_to_string are orthogonal with NULL.

Pavel



 Regards
 Pavel

                        regards, tom lane




3params_arrayfce.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 10:08 AM, Erik Rijkers e...@xs4all.nl wrote:
 fwiw, results for all current postgres versions:
 [ only 9.0beta1 is different ]

It looks like the relevant commits are:

commit 822f2ac5a2ec7c6f10634f62a0b2dc6cc9929759
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Mon Jan 25 20:55:32 2010 +

Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner

commit 75dea10196c31d98d98c0bafeeb576ae99c09b12
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Thu Jan 7 19:53:11 2010 +

Make bit/varbit substring() treat any negative length as meaning all the re
of the string.  The previous coding treated only -1 that way, and would
produce an invalid result value for other negative values.

We ought to fix it so that 2-parameter bit substring() is a different C
function and the 3-parameter form throws error for negative length, but
that takes a pg_proc change which is impractical in the back branches;
and in any case somebody might be relying on -1 working this way.
So just do this as a back-patchable fix.

I think the OP is probably running a version that doesn't include the
Jan 7 commit, which was effectively undone by the Jan 25 commit for
CVS HEAD.  It looks like this was intentional based on spec behavior
of overlay(), but should we consider maintaining the historical
behavior instead?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what is good solution for support NULL inside string_to_array function?

2010-05-04 Thread Merlin Moncure

On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 and then string_to_array and array_to_string are orthogonal with NULL.

I like the behavior, but should it share the name with the 2 argument
version given the incompatibility? Maybe make a new function
to_string(anyarray, sep, nullsym='') and deprecate the old one?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

Downthread, I said..

On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote:

 The only difference is that max_standby_delay is measured from log
 timestamp. Perhaps it should work from WAL receipt timestamp rather than
 from log timestamp? That would make some of the problems go away without
 significantly changing the definition. I'll look at that.

Patch to implement this idea attached: for discussion, not tested yet.
No docs yet.

The attached patch redefines standby delay to be the amount of time
elapsed from point of receipt to point of application. The point of
receipt is reset every chunk of data when streaming, or every file when
reading file by file. In all cases this new time is later than the
latest log time we would have used previously.

This addresses all of your points, as shown below.

On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote:
 There are three really fundamental problems with it:
 
 1. The timestamps we are reading from the log might be historical,
 if we are replaying from archive rather than reading a live SR stream.
 In the current implementation that means zero grace period for standby
 queries.  Now if your only interest is catching up as fast as possible,
 that could be a sane behavior, but this is clearly not the only possible
 interest --- in fact, if that's all you care about, why did you allow
 standby queries at all?

The delay used is from time of receipt of WAL, no longer from log date.
So this would no longer apply.

 2. There could be clock skew between the master and slave servers.
 If the master's clock is a minute or so ahead of the slave's, again we
 get into a situation where standby queries have zero grace period, even
 though killing them won't do a darn thing to permit catchup.  If the
 master is behind the slave then we have an artificially inflated grace
 period, which is going to slow down the slave.

The timestamp is from standby, not master, so this would no longer
apply.

 3. There could be significant propagation delay from master to slave,
 if the WAL stream is being transmitted with pg_standby or some such.
 Again this results in cutting into the standby queries' grace period,
 for no defensible reason.

The timestamp is taken immediately at the point the WAL is ready for
replay, so other timing overheads would not be included.

 In addition to these fundamental problems there's a fatal implementation
 problem: the actual comparison is not to the master's current clock
 reading, but to the latest commit, abort, or checkpoint timestamp read
 from the WAL.  Thus, if the last commit was more than max_standby_delay
 seconds ago, zero grace time.  Now if the master is really idle then
 there aren't going to be any conflicts anyway, but what if it's running
 only long-running queries?  Or what happens when it was idle for awhile
 and then starts new queries?  Zero grace period, that's what.
 
 We could possibly improve matters for the SR case by having walsender
 transmit the master's current clock reading every so often (probably
 once per activity cycle), outside the WAL stream proper.  The receiver
 could subtract off its own clock reading in order to measure the skew,
 and then we could cancel queries if the de-skewed transmission time
 falls too far behind.  However this doesn't do anything to fix the cases
 where we aren't reading (and caught up to) a live SR broadcast.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 9281,9287  retry:
--- 9281,9294 
    sources);
  	switched_segment = true;
  	if (readFile != -1)
+ 	{
+ 		/*
+ 		 * Nudge forwards the WAL receive timestamp when
+ 		 * we are relying on WAL files.
+ 		 */
+ 		SetWalRcvTimestamp();
  		break;
+ 	}
  
  	/*
  	 * Nope, not found in archive and/or pg_xlog.
*** a/src/backend/replication/walreceiver.c
--- b/src/backend/replication/walreceiver.c
***
*** 515,521  XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
  	}
  }
  
! /* Flush the log to disk */
  static void
  XLogWalRcvFlush(void)
  {
--- 515,521 
  	}
  }
  
! /* Flush the log to disk and update shared memory pointer and timestamp */
  static void
  XLogWalRcvFlush(void)
  {
***
*** 524,537  XLogWalRcvFlush(void)
--- 524,541 
  		/* use volatile pointer to prevent code rearrangement */
  		volatile WalRcvData *walrcv = WalRcv;
  		char		activitymsg[50];
+ 		TimestampTz receivedTimestamp;
  
  		issue_xlog_fsync(recvFile, recvId, recvSeg);
  
  		LogstreamResult.Flush = LogstreamResult.Write;
  
+ 		receivedTimestamp = GetCurrentTimestamp();
+ 
  		/* Update shared-memory status */
  		SpinLockAcquire(walrcv-mutex);
  		walrcv-receivedUpto = LogstreamResult.Flush;
+ 		walrcv-receivedTimestamp = receivedTimestamp;
  		SpinLockRelease(walrcv-mutex);
  
  		/* Report XLOG streaming progress in PS display */
***

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 I think the OP is probably running a version that doesn't include
 the Jan 7 commit, which was effectively undone by the Jan 25
 commit for CVS HEAD.
 
It sure looks like it.
 
 It looks like this was intentional based on spec behavior
 of overlay(), but should we consider maintaining the historical
 behavior instead?
 
I know I read through the spec (several versions of it) related to
this issue when I reviewed the patch, and if memory serves the 9.0
behavior is what the spec requires.  Obviously that's a behavior
change, so it can't be back-patched.  I'm inclined to think the
previous behavior was pretty marginal, and there is certainly a
workaround -- omit the third parameter rather than specifying a
negative number:
 
SELECT substring(B'0001' from 5);
  substring
--
 0001
(1 row)

SELECT substring(B'0001' from 4);
   substring
---
 10001
(1 row)
 
We have maintained nonstandard behavior in the past for
compatibility reasons, so it's a fair question; however, I'm
inclined toward the standard on this one.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what is good solution for support NULL inside string_to_array function?

2010-05-04 Thread Pavel Stehule

2010/5/4 Merlin Moncure mmonc...@gmail.com:
 On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 and then string_to_array and array_to_string are orthogonal with NULL.

 I like the behavior, but should it share the name with the 2 argument
 version given the incompatibility? Maybe make a new function
 to_string(anyarray, sep, nullsym='') and deprecate the old one?


maybe to_string X to_array ... Why not? It shorter, maybe it is cleaner

Regards
Pavel

 merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote:

 The only difference is that max_standby_delay is measured from log
 timestamp. Perhaps it should work from WAL receipt timestamp rather than
 from log timestamp? That would make some of the problems go away without
 significantly changing the definition. I'll look at that.

Patch to implement this idea posted in response to OT, upthread, so I
can respond to the original complaints directly.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 10:29 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 We have maintained nonstandard behavior in the past for
 compatibility reasons, so it's a fair question; however, I'm
 inclined toward the standard on this one.

In a case like this, it seems unlikely that someone would be counting
on a negative value to throw an error, so I tend to regard doing
something else as an extension of the standard rather than a deviation
from it.  But I don't have strong feelings about it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, May 4, 2010 at 10:29 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 We have maintained nonstandard behavior in the past for
 compatibility reasons, so it's a fair question; however, I'm
 inclined toward the standard on this one.

 In a case like this, it seems unlikely that someone would be counting
 on a negative value to throw an error, so I tend to regard doing
 something else as an extension of the standard rather than a deviation
 from it.  But I don't have strong feelings about it.

The reason we changed it is that our other versions of substring()
already had the spec-required behavior of throwing error for negative
length.  Only the bit/varbit implementation was out of step.

The OP did not state that this behavioral change broke his application,
anyway.  I suspect the actual subtext is that he's poking into the
vulnerability report that was issued against the unpatched code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Simon Riggs


On Tue, 2010-05-04 at 09:36 -0400, Robert Haas wrote:
 On Tue, May 4, 2010 at 4:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
  In the original patch I had Pause/Resume feature for controlling
  recovery during Hot Standby. It was removed for lack of time.
 
 Well, it's not like we have more time now than we did then.  I think
 we need to postpone this discussion to 9.1.  If we're going to start
 accepting patches for new features, then why should we accept only
 patches for HS/SR? 

Robert,

This is clearly a response to issues raised about HS and not a new
feature. It's also proposed in the most minimal way possible with
respect for the current state of release. Why is you think I want to go
to beta less quickly than anyone else? I have many other items to work
on in the new release also, none of them have been even discussed, again
out of respect for the timing and the process.

 I also think that worrying about fine-tuning HS at this point is a bit
 like complaining that the jump suits of the crew of the Space Shuttle
 Challenger were not made of 100% recyclable materials.  Just yesterday
 we had a report of an HS server getting into a state where it failed
 to shut down properly; and I believe that we never fully resolved the
 issue of occasional extremely-long spikes in HS response time, either.
  Heikki just fixed a bug our btree recovery code which is apparently
 new to 9.0 since he did not backpatch it.  I think that getting into a
 discussion of pausing and resuming recovery, or even the parallel
 discussion on max_standby_delay, are fiddling with things that,
 granted, are probably not ideal, and yes, we should improve them in a
 future release, but they're not what we should be worrying about right
 now.  What I think we SHOULD be worried about right now - VERY worried
 - is stabilizing the existing Hot Standby code to the point where it
 won't be an embarrassment to us when we ship it.  The rate at which
 we're finding new problems even with the small number of people who
 test alpha releases and nightly snapshots suggests to me that we're
 not there yet.

There hasn't been anything more than a minor bug in weeks, so not really
sure how you arrive at that the idea the code needs stabilising. 

But even if you think we need stabilising, how do you propose I do
that? What exact action?

When people complain, I propose solutions. If you then object that the
proposed solution is actually a new feature, that leaves us in a
deadlock.

There is no evidence that Erik's strange performance has anything to do
with HS; it hasn't been seen elsewhere and he didn't respond to
questions about the test setup to provide background. The profile didn't
fit any software problem I can see.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 In the original patch I had Pause/Resume feature for controlling
 recovery during Hot Standby. It was removed for lack of time.

 With all the discussion around the HS UI, it would be something that
 could be back very easily.

Sure.  In 9.1.  You have enough bugs to fix that you have *no* business
thinking about adding features for 9.0, even if that were permissible
under the ground rules for beta.  Pretending that it's a contrib module
is just a transparent end-run around that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 But I don't have strong feelings about it.
 
Nor do I.  Perhaps this question should be floated on -general?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 11:12 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  In the original patch I had Pause/Resume feature for controlling
  recovery during Hot Standby. It was removed for lack of time.
 
  With all the discussion around the HS UI, it would be something that
  could be back very easily.
 
 Sure.  In 9.1.  You have enough bugs to fix that you have *no* business
 thinking about adding features for 9.0, even if that were permissible
 under the ground rules for beta.  Pretending that it's a contrib module
 is just a transparent end-run around that.

As stated, this was proposed as a response to your gripes elsewhere. 

If people gripe, I propose a solution. I'm happy if you say No to the
proposed solution, but let's not pretend I'm breaking rules all the time
when I do.

What bugs do I have to fix? I am not aware of any.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm building all live branches from git

2010-05-04 Thread Andrew Dunstan




Robert Haas wrote:

On Mon, May 3, 2010 at 4:04 PM, Andrew Dunstan and...@dunslane.net wrote:
  

For those who have been following along, today I reached a bit of a
milestone. The buildfarm member quoll, which is running a git-ized buildfarm
client due to be released in a week or so, today successfully built all the
live branches, all the way back to 7.4, from git. Note, this is running from
my test git repo, not the community's repo. Sadly, that means its change
links will be broken - I'm not exactly sure what gets hashed to provide a
commit ID in git, but the IDs don't match between these two repos.

See http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=quoll.



Thanks for working on this.


  


You're welcome ;-)

I have actually fixed the change link issue. I added an enhancement that 
allows the client to specify a URL where changesets can be found. The 
server code no longer presumptively assumes that the changeset can be 
found on the community repo.


see the bottom of 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quolldt=2010-05-03%2021:21:11 
for an example of this in action.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 This is clearly a response to issues raised about HS and not a new
 feature.

I don't find that clear at all.  In fact, I find the exact opposition
position to be clear.

 It's also proposed in the most minimal way possible with
 respect for the current state of release. Why is you think I want to go
 to beta less quickly than anyone else?

We're already in beta.  I said nothing about when you want to go to
beta or do anything else.

 There hasn't been anything more than a minor bug in weeks, so not really
 sure how you arrive at that the idea the code needs stabilising.

I don't agree that there hasn't been anything more than a minor bug in
weeks.  I arrive at the idea that the code needs stabilizing on the
basis of the fact that we keep finding new bugs.

 When people complain, I propose solutions. If you then object that the
 proposed solution is actually a new feature, that leaves us in a
 deadlock.

Not really.  You're entitled to say what you think we should do and I
am entitled to say what I think we should do.  I think we should wait
for 9.1.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:

 There is no evidence that Erik's strange performance has anything to do
 with HS; it hasn't been seen elsewhere and he didn't respond to
 questions about the test setup to provide background. The profile didn't
 fit any software problem I can see.


I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


thanks,

Erik Rijkers



On Sun, April 25, 2010 21:07, Simon Riggs wrote:
 On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:

 Sorry if it's too much data, but to me at least it was illuminating;
 I now understand the effects of the different parameters better.

 That's great, many thanks.

 A few observations

 * Standby performance is actually slightly above normal running. This is
 credible because of the way snapshots are now taken. We don't need to
 scan the procarray looking for write transactions, since we know
 everything is read only. So we scan just the knownassignedxids, which if
 no activity from primary will be zero-length, so snapshots will actually
 get taken much faster in this case on standby. The snapshot performance
 on standby is O(n) where n is the number of write transactions
 currently on primary (transfer delays blur the word currently).

 * The results for scale factor  100 are fine, and the results for 100
 with few connections get thrown out by long transaction times. With
 larger numbers of connections the wait problems seem to go away. Looks
 like Erik (and possibly Hot Standby in general) has an I/O problem,
 though from what is not yet determined. It could be just hardware, or
 might be hardware plus other factors.

 --
  Simon Riggs   www.2ndQuadrant.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
 It would be interesting if anyone repeated these simple tests and
 produced
 evidence that these non-HS.
 
 (Unfortunately, I have at the moment not much time for more testing)

Would you be able to make those systems available for further testing?

First, I'd perform the same test with the systems swapped, so we know
more about the symmetry of the issue. After that, would like to look
more into internals.

Is it possible to setup SytemTap and dtrace on these systems?

Thanks, either way.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

On Tue, May 4, 2010 18:19, Simon Riggs wrote:
 On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
 It would be interesting if anyone repeated these simple tests and
 produced evidence that these non-HS.

 (Unfortunately, I have at the moment not much time for more testing)

 Would you be able to make those systems available for further testing?

No. sorry.

 First, I'd perform the same test with the systems swapped, so we know
 more about the symmetry of the issue. After that, would like to look
 more into internals.

you mean systems swapped, primary and standby?  primary and standby were on 
the same machine in
these tests (even the same raid).

I can eventually move the standby (the 'slow' side, as it stands) to another, 
quite similar
machine.  Not in the coming days though...


 Is it possible to setup SytemTap and dtrace on these systems?

I did install systemtap last week. dtrace is not installed (I think. I've never 
used either.)



Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Bruce Momjian

Simon Riggs wrote:
 On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:
 
  As I remember, 9.0 has two behaviors:
  
  o  master delays vacuum cleanup
  o  slave delays WAL application
  
  and in 9.1 we will be adding:
  
  o  slave communicates snapshots to master
 
  How would this figure into what we ultimately want in 9.1?
 
 We would still want all options, since slave communicates snapshot to
 master doesn't solve the problem it just moves the problem elsewhere.
 It's a question of which factors the user wishes to emphasise for their
 specific use.
 
  I understand Simon's point that the two behaviors have different
  benefits.  However, I believe few users will be able to understand when
  to use which.
 
 If users can understand how to set NDISTINCT for a column, they can
 understand this. It's not about complexity of UI, its about solving
 problems. When people hit an issue, I don't want to be telling people
 we thought you wouldn't understand it, so we removed the parachute.
 They might not understand it *before* they hit a problem, so what? But
 users certainly will afterwards and won't say thanks if you prevent an
 option for them, especially for the stated reason. (My point about
 ndistinct: 99% of users have no idea that exists or when to use it, but
 it still exists as an option because it solves a known issue, just like
 this.)

Well, this is kind of my point --- that if few people are going to need
a parameter and it is going to take us to tell them to use it, it isn't
a good parameter because the other 99.9% are going to stare at the
parameters and not konw what it does or how it is different from other
similar parameters.  Adding another parameter might help 0.1% of our
users, but it is going to confuse the other 99.9%.  :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 There hasn't been anything more than a minor bug in weeks, so not really
 sure how you arrive at that the idea the code needs stabilising. 

Simon, if you don't think the code needs stabilizing, you need to think
again.

* max_standby_delay logic is broken, as per other thread.

* handle_standby_sig_alarm is broken to the point of needing to be
thrown away; you can NOT do that kind of thing in an interrupt handler.

* RecordKnownAssignedTransactionIds is changing ShmemVariableCache-nextXid
without any kind of lock (in general, I suspect all the xlog replay code
needs to be revisited to see if it's skipping locks on shared data
structures that are now potentially going to be examined by backends)

* Use of StandbyTransactionIdIsPrepared seems awfully dubious: why are
we trusting the standby's pg_twophase files more than data from the WAL
log, *especially* before we have reached consistency?  Not to mention
that that's a horridly expensive operation (filesystem access) being
invoked while holding ProcArrayLock.

* Why is ExtendCLOG/ExtendSUBTRANS done in RecordKnownAssignedTransactionIds?
It's inappropriate from a modularity standpoint, and it also seems completely
wrong that it won't get done if standbyState  STANDBY_SNAPSHOT_PENDING.
nextXID manipulation there seems equally bogus not to mention unlocked.

* snapshotOldestActiveXid is bogus (I complained about this
already, you have not fixed it)

* LogStandbySnapshot is merest fantasy: no guarantee that either the XIDs
list or the locks list will be consistent with the point in WAL where it
will get inserted.  What's worse, locking things down enough to guarantee
consistency would be horrid for performance, or maybe even deadlock-inducing.
Could lose both ways: list might contain an XID whose commit/abort went
to WAL before the snapshot did, or list might be missing an XID started
just after snap was taken, The latter case could possibly be dealt with
via nextXid filtering, but that doesn't fix the former case, and anyway
we have both ends of the same problem for locks.

That's just what I found in a day or so of code reading, and I haven't
read anything like all of the HS patches.  You need to stop thinking
about adding features and start thinking about making what's in there
bulletproof.  If you happen to have an idle moment when you're not
fixing known problems, re-read some code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 13:00 -0400, Bruce Momjian wrote:

 Well, this is kind of my point --- that if few people are going to need
 a parameter and it is going to take us to tell them to use it, it isn't
 a good parameter because the other 99.9% are going to stare at the
 parameters and not konw what it does or how it is different from other
 similar parameters.  Adding another parameter might help 0.1% of our
 users, but it is going to confuse the other 99.9%.  :-(

You've missed my point. Most users of HS will need these parameters.
There is no need to understand them immediately, nor do I expect them to
do so. People won't understand why they exist until they've understood
the actual behaviour, received some errors and *then* they will
understand them, want them and need them. Just like deadlocks, ndistinct
and loads of other features we provide and support.

The current behaviour of max_standby_delay is designed to favour High
Availability users, not query users. I doubt that users with HA concerns
are only 0.1% of our users. I've accepted that some users may not put
that consideration first and so adding some minor, easy to implement
additional parameters will improve the behaviour for those people.
Forcing just one behaviour will be bad for many people.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  There hasn't been anything more than a minor bug in weeks, so not really
  sure how you arrive at that the idea the code needs stabilising. 
 
 Simon, if you don't think the code needs stabilizing, you need to think
 again.

This list is entirely new to me. I can't fix problems you haven't even
raised before, can I? Why have you been saving that list?? No way are
these known problems.

 That's just what I found in a day or so of code reading, and I haven't
 read anything like all of the HS patches.  You need to stop thinking
 about adding features and start thinking about making what's in there
 bulletproof.  If you happen to have an idle moment when you're not
 fixing known problems, re-read some code.

Nobody is adding new features. Stop barracking me for something that's
not even happening, especially if you persuade yourself you should be
angry about it. I care as much about beta as anyone else.

Yes, I'll go read your list. Thank you for your review.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Alvaro Herrera

Robert Haas escribió:

 [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
 relations?  I think the only backend that can have an smgr reference
 to a temprel other than the owning backend is bgwriter, and AFAICS
 bgwriter will only have such a reference if it's responding to a
 request by the owning backend to unlink the associated files, in which
 case (I think) the owning backend will have no reference.

Hmm, wasn't there a proposal to have the owning backend delete the files
instead of asking the bgwriter to?

 [dbsize.c] As with relcache.c, there's a problem if we're asked for
 the size of a temporary relation that is not our own: we can't call
 relpath() without knowing the ID of the owning backend, and there's no
 way to acquire that information for pg_class.  I guess we could just
 refuse to answer the question in that case, but that doesn't seem real
 cool.  Or we could physically scan the directory for files that match
 a suitably constructed wildcard, I suppose.

I don't very much like the wildcard idea; but I don't think it's
unreasonable to refuse to provide a file size.  If the owning backend
has still got part of the table in local buffers, you'll get a
misleading answer, so perhaps it's best to not give an answer at all.

Maybe this problem could be solved if we could somehow force that
backend to write down its local buffers, in which case it'd be nice to
have a solution to the dbsize problem.

 [syncscan.c] It seems we pursue this optimization even for temprels; I
 can't think of why that would be useful in practice.  If it's useless
 overhead, should we skip it?  This is really independent of this
 project; just a side thought.

Maybe recently used buffers are more likely to be in the OS page cache,
so perhaps it's not good to disable it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Robert Haas escribió:

Hey, thanks for writing back!  I just spent the last few hours
thinking about this and beating my head against the wall.

 [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
 relations?  I think the only backend that can have an smgr reference
 to a temprel other than the owning backend is bgwriter, and AFAICS
 bgwriter will only have such a reference if it's responding to a
 request by the owning backend to unlink the associated files, in which
 case (I think) the owning backend will have no reference.

 Hmm, wasn't there a proposal to have the owning backend delete the files
 instead of asking the bgwriter to?

I did propose that upthread; it may have been proposed previously
also. This might be worth doing independently of the rest of the patch
(which I'm starting to fear is doomed, cue ominous soundtrack) since
it would reduce the chance of orphaning data files and possibly
simplify the logic also.

 [dbsize.c] As with relcache.c, there's a problem if we're asked for
 the size of a temporary relation that is not our own: we can't call
 relpath() without knowing the ID of the owning backend, and there's no
 way to acquire that information for pg_class.  I guess we could just
 refuse to answer the question in that case, but that doesn't seem real
 cool.  Or we could physically scan the directory for files that match
 a suitably constructed wildcard, I suppose.

 I don't very much like the wildcard idea; but I don't think it's
 unreasonable to refuse to provide a file size.  If the owning backend
 has still got part of the table in local buffers, you'll get a
 misleading answer, so perhaps it's best to not give an answer at all.

 Maybe this problem could be solved if we could somehow force that
 backend to write down its local buffers, in which case it'd be nice to
 have a solution to the dbsize problem.

I'm sure we could add some kind of signaling mechanism that would tell
all backends to flush their local buffers, but I'm not too sure it
would help this case very much, because you likely wouldn't want to
wait for all the backends to complete that process before reporting
results.

 [syncscan.c] It seems we pursue this optimization even for temprels; I
 can't think of why that would be useful in practice.  If it's useless
 overhead, should we skip it?  This is really independent of this
 project; just a side thought.

 Maybe recently used buffers are more likely to be in the OS page cache,
 so perhaps it's not good to disable it.

I don't get it.  If the whole relation fits in the page cache, it
doesn't much matter where you start a seqscan.  If it doesn't,
starting where the last one ended is anti-optimal.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Srinivas Naik

I am sorry for that, but I made two different installations and I was
messing up with various inputs.

Actually, the installed versions are below
*postgresql-8.3*
*Ubuntu 8.10  with 2.6.27 Kernel*
*and its an 32Bit O/S*

pgsql$ SELECT substring(B'0001' from 5 for -2);
ERROR:invalid memory alloc request size 4244635647

I just wanted to know how severe it is and how it can effect the database to
result Memory Corruption/DoS.

Please help me in making the point clear.

Thanks again.

On Tue, May 4, 2010 at 8:45 PM, Kevin Grittner
kevin.gritt...@wicourts.govwrote:

 Robert Haas robertmh...@gmail.com wrote:

  But I don't have strong feelings about it.

 Nor do I.  Perhaps this question should be floated on -general?

 -Kevin

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Greg Smith


Erik Rijkers wrote:

  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)
  


To fill in from data you already mentioned upthread:
32 GB RAM
CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

Thanks for the all the reporting you've done here, really helpful.  
Questions to make sure I'm trying to duplicate the right thing here:


Is your disk array all configured as one big RAID10 volume, so 
essentially a 6-disk stripe with redundancy, or something else?  In 
particular I want know whether the WAL/database/archives are split onto 
separate volumes or all on one big one when you were testing. 


Is this is on ext3 with standard mount parameters?

Also, can you confirm that every test you ran only had a single pgbench 
worker thread (-j 1 or not specified)?  That looked to be the case from 
the ones I saw where you posted the whole command used.  It would not 
surprise me to find that the CPU usage profile of a standby is just 
different enough from the primary that it results in the pgbench program 
not being scheduled enough time, due to the known Linux issues in that 
area.  Not going to assume that, of course, just one thing I want to 
check when trying to replicate what you've run into. 

I didn't see any glaring HS performance issues like you've been 
reporting on last time I tried performance testing in this area, just a 
small percentage drop.  But I didn't specifically go looking for it 
either.  With your testing rig out of service, we're going to try and 
replicate that on a system here.  My home server is like a scaled down 
version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 
disks instead of 12) and it's running the same CentOS version.  If the 
problems really universal I should see it here too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Josh Berkus

Simon,

 Yes, the max wait on any *one* blocker will be max_standby_delay. But if
 you wait for two blockers, then the total time by which the standby lags
 will now be 2*max_standby_delay. Add a third, fourth etc and the standby
 lag keeps rising.

I still don't see how that works.  If we're locking for applying log
segments, then any query which came in after the recovery lock would,
presumably, wait.  So you'd have a lot of degraded query performance,
but no more than max_standby_delay of waiting to apply logs.

I'm more interested in your assertion that there's a lot in the
replication stream which doesn't take a lock; if that's the case, then
implementing any part of Tom's proposal is hopeless.

 * standby query delay - defined as the time that recovery will wait for
 a query to complete before a cancellation takes place. (We could
 complicate this by asking what happens when recovery is blocked twice by
 the same query? Would it wait twice, or does it have to track how much
 it has waited for each query in total so far?)

Aha!  Now I see the confusion.  AFAIK, Tom was proposing that the
pending recovery data would wait for max_standby_delay, total, then
cancel *all* queries which conflicted with it.  Now that we've talked
this out, though, I can see that this can still result in mass cancel
issues, just like the current max_standby_delay.   The main advantage I
can see to Tom's idea is that (presumably) it can be more discriminating
about which queries it cancels.

I agree that waiting on *each* query for up to # time would be a
completely different behavior, and as such, should be a option for DBAs.
 We might make it the default option, but we wouldn't make it the only
option.

Speaking of which, was *your* more discriminating query cancel ever applied?

 Currently max_standby_delay seeks to constrain the standby lag to a
 particular value, as a way of providing a bounded time for failover, and
 also to constrain the amount of WAL that needs to be stored as the lag
 increases. Currently, there is no guaranteed minimum query delay given
 to each query.

Yeah, I can just see a lot of combinational issues with this.  For
example, what if the user's network changes in some way to retard
delivery of log segments to the point where the delivery time is longer
than max_standby_delay?  To say nothing about system clock synch, which
isn't perfect even if you have it set up.

I can see DBAs who are very focussed on HA wanting a standby-lag based
control anyway, when HA is far more important than the ability to run
queries on the slave.  But I don't that that is the largest group; I
think that far more people will want to balance the two considerations.

Ultimately, as you say, we would like to have all three knobs:

standby lag: max time measured from master timestamp to slave timestamp

application lag: max time measured from local receipt of WAL records
(via log copy or recovery connection) to their application

query lag: max time any query which is blocking a recovery operation can run

These three, in combination, would let us cover most potential use
cases.  So I think you've assessed that's where we're going in the
9.1-9.2 timeframe.

However, I'd say for 9.0 that application lag is the least confusing
option and the least dependant on the DBA's server room setup.  So if we
can only have one of these for 9.0 (and I think going out with more than
one might be too complex, especially at this late date) I think that's
the way to go.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Stark

On Mon, May 3, 2010 at 4:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. The timestamps we are reading from the log might be historical,

 2. There could be clock skew between the master and slave servers.

 3. There could be significant propagation delay from master to slave,

So it sounds like what you're expecting is for max_standby_delay to
represent not the maximum lag between server commit and standby commit
but rather the maximum lag introduced by conflicts. Or perhaps maximum
lag introduced relative to the lag present at startup. I think it's
possible to implement either of these and it would solve all three
problems above:

The slave maintains a static measure of how far behind it is from the
master. Every time it executes a recovery operation or waits on a
conflict it adds the time it spent executing or waiting. Every time it
executes a commit record it subtracts the *difference* between this
commit record and the last. I assume we clip at 0 so it never goes
negative which has odd effects but it seems to match what I would
expect to happen.

In the face of a standby recovering historical logs then it would
start with a assumed delay of 0. As long as the conflicts don't slow
down execution of the logs so that they run slower than the server
then the measured delay would stay near 0. The only time queries would
be canceled would be if the conflicts are causing problems replaying
the logs.

In the face of clock skew it nothing changes as long as the clocks run
at the same speed.

In the face of an environment where the master is idle I think this
scheme has the same problems you described but I think this might be
manageable. Perhaps we need more timestamps in the master's log stream
aside from the commit timestamps. Or perhaps we don't care about
standby delay except when reading a commit record since any other
record isn't actually delayed unless its commit is delayed.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Kevin Grittner

Srinivas Naik naik.sr...@gmail.com wrote:
 
 Actually, the installed versions are below
 *postgresql-8.3*
 
 I just wanted to know how severe it is and how it can effect the
 database to result Memory Corruption/DoS.
 
Well, you're clearly *not* on 8.3.10, or you would not get the
error.  Perhaps you should apply the latest bug fixes?
 
http://www.postgresql.org/support/versioning
 
http://www.postgresql.org/docs/8.3/static/release.html
 
As far as I know it isn't any more conducive to DoS attacks than,
say, your average syntax error; however, if you're trying to keep
that risk low, you should be keeping up with the minor releases
anyway.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Alvaro Herrera

Robert Haas escribió:
 On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Robert Haas escribió:
 
 Hey, thanks for writing back!  I just spent the last few hours
 thinking about this and beating my head against the wall.

:-)

  [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
  relations?  I think the only backend that can have an smgr reference
  to a temprel other than the owning backend is bgwriter, and AFAICS
  bgwriter will only have such a reference if it's responding to a
  request by the owning backend to unlink the associated files, in which
  case (I think) the owning backend will have no reference.
 
  Hmm, wasn't there a proposal to have the owning backend delete the files
  instead of asking the bgwriter to?
 
 I did propose that upthread; it may have been proposed previously
 also. This might be worth doing independently of the rest of the patch
 (which I'm starting to fear is doomed, cue ominous soundtrack) since
 it would reduce the chance of orphaning data files and possibly
 simplify the logic also.

+1 for doing it separately, but hopefully that doesn't mean the rest of
this patch is doomed ...

  [dbsize.c] As with relcache.c, there's a problem if we're asked for
  the size of a temporary relation that is not our own: we can't call
  relpath() without knowing the ID of the owning backend, and there's no
  way to acquire that information for pg_class.  I guess we could just
  refuse to answer the question in that case, but that doesn't seem real
  cool.  Or we could physically scan the directory for files that match
  a suitably constructed wildcard, I suppose.
 
  I don't very much like the wildcard idea; but I don't think it's
  unreasonable to refuse to provide a file size.  If the owning backend
  has still got part of the table in local buffers, you'll get a
  misleading answer, so perhaps it's best to not give an answer at all.
 
  Maybe this problem could be solved if we could somehow force that
  backend to write down its local buffers, in which case it'd be nice to
  have a solution to the dbsize problem.
 
 I'm sure we could add some kind of signaling mechanism that would tell
 all backends to flush their local buffers, but I'm not too sure it
 would help this case very much, because you likely wouldn't want to
 wait for all the backends to complete that process before reporting
 results.

Hmm, I was thinking in the pg_relation_size function -- given this new
mechanism you could get an accurate size of temp tables for other
backends.  I wasn't thinking in the pg_database_size function, and
perhaps it's better to *not* include temp tables in that report at all.

  [syncscan.c] It seems we pursue this optimization even for temprels; I
  can't think of why that would be useful in practice.  If it's useless
  overhead, should we skip it?  This is really independent of this
  project; just a side thought.
 
  Maybe recently used buffers are more likely to be in the OS page cache,
  so perhaps it's not good to disable it.
 
 I don't get it.  If the whole relation fits in the page cache, it
 doesn't much matter where you start a seqscan.  If it doesn't,
 starting where the last one ended is anti-optimal.

Err, I was thinking that a syncscan started a bunch of pages earlier
than the point where the previous scan ended, but yeah, that's a bit
silly.  Maybe we should just ignore syncscan in temp tables altogether,
as you propose.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Stefan Kaltenbrunner


Erik Rijkers wrote:

Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:

There is no evidence that Erik's strange performance has anything to do
with HS; it hasn't been seen elsewhere and he didn't respond to
questions about the test setup to provide background. The profile didn't
fit any software problem I can see.



I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
kinda related.


using -j 16 -c 16 -T 120 I get either 10tps and around 66 
contextswitches per second or on some runs I end up with 15tps and 
around 1M contextswitches/s sustained. I mostly get the 100k result but 
once in a while I get the 150k one. And one even can anticipate the 
final transaction rate from watching vmstat 1...


I'm not sure yet on what is causing that behaviour but that is with 
9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
pure in-memory workload (scale = 20 with 48GB RAM).



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

On Tue, May 4, 2010 20:26, Greg Smith wrote:
 Erik Rijkers wrote:
   OS: Centos 5.4
   2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
   Areca 1280ML
   primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
 Barracuda ES.2)


 To fill in from data you already mentioned upthread:
 32 GB RAM
 CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

 Thanks for the all the reporting you've done here, really helpful.
 Questions to make sure I'm trying to duplicate the right thing here:

 Is your disk array all configured as one big RAID10 volume, so
 essentially a 6-disk stripe with redundancy, or something else?  In
 particular I want know whether the WAL/database/archives are split onto
 separate volumes or all on one big one when you were testing.

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).

Btw, some of my emails contained the postgresql.conf of both instances.


 Is this is on ext3 with standard mount parameters?

ext3 noatime

 Also, can you confirm that every test you ran only had a single pgbench
 worker thread (-j 1 or not specified)?  That looked to be the case from
 the ones I saw where you posted the whole command used.  It would not

yes; the literal cmd:
time /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin/pgbench -h /tmp 
-p 6565 -U rijkers
-n -S -c 20 -T 900 -j 1 replicas

To avoid wrapping in the emails I just removed '-h \tmp', -U rijkers', and 
'replicas'.

(I may have run the primary's pgbench binary also against the slave - don't 
think
that should make any difference)

 surprise me to find that the CPU usage profile of a standby is just
 different enough from the primary that it results in the pgbench program
 not being scheduled enough time, due to the known Linux issues in that
 area.  Not going to assume that, of course, just one thing I want to
 check when trying to replicate what you've run into.

 I didn't see any glaring HS performance issues like you've been
 reporting on last time I tried performance testing in this area, just a
 small percentage drop.  But I didn't specifically go looking for it

Here, it seems repeatable, but does not occur with all scales.

Hm, maybe I should just dump *all* of my results on the wiki for reference.  
(I'll look at that
later).

 either.  With your testing rig out of service, we're going to try and
 replicate that on a system here.  My home server is like a scaled down
 version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5
 disks instead of 12) and it's running the same CentOS version.  If the
 problems really universal I should see it here too.


Thanks,


Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 3:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
  Hmm, wasn't there a proposal to have the owning backend delete the files
  instead of asking the bgwriter to?

 I did propose that upthread; it may have been proposed previously
 also. This might be worth doing independently of the rest of the patch
 (which I'm starting to fear is doomed, cue ominous soundtrack) since
 it would reduce the chance of orphaning data files and possibly
 simplify the logic also.

 +1 for doing it separately, but hopefully that doesn't mean the rest of
 this patch is doomed ...

I wonder if it would be possible to reject access to temporary
relations at a higher level.  Right now, if you create a temporary
relation in one session, you can issue a SELECT statement against it
in another relation, and get back 0 rows.  If you then insert data
into it and select against it again, you'll get an error saying that
you can't access temporary tables of other sessions.  If you try to
truncate somebody else's temporary relation, it fails; but if you try
to drop it, it works.  In fact, you can even run ALTER TABLE ... ADD
COLUMN on somebody else's temp table, as long as you don't do anything
that requires a rewrite.  CLUSTER fails; VACUUM and VACUUM FULL both
appear to work but apparently actually don't do anything under the
hood, so that database-wide vacuums don't barf.  The whole thing seems
pretty leaky.  It would be nice if we could find a small set of
control points where we basically reject ALL access to somebody else's
temp relations, period.

One possible thing we might do (bearing in mind that we might need to
wall off access at multiple levels) would be to forbid creating a
relcache entry for a non-local temprel.  That would, in turn, forbid
doing pretty much anything to such a relation, although I'm not sure
what else would get broken in the process.  But it would eliminate,
for example, all the checks for RELATION_IS_OTHER_TEMP, since that
Just Couldn't Happen.  It would would eliminate the need to install
specific handling for this case in dbsize.c - we'd just automatically
croak.  And it's also probably necessary to do this anyhow if we want
to ever eliminate those CacheInvalidSmgr() calls for temp rels,
because if I can drop your temprel, that implies I can smgropen() it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 11:27 -0700, Josh Berkus wrote:

 I still don't see how that works.
...

The good news is we agree by the time we get to the bottom... ;-)

 I'm more interested in your assertion that there's a lot in the
 replication stream which doesn't take a lock; if that's the case, then
 implementing any part of Tom's proposal is hopeless.

(No, still valid, the idea is generic)

  * standby query delay - defined as the time that recovery will wait for
  a query to complete before a cancellation takes place. (We could
  complicate this by asking what happens when recovery is blocked twice by
  the same query? Would it wait twice, or does it have to track how much
  it has waited for each query in total so far?)
 
 Aha!  Now I see the confusion.

BTW, Tom's proposal was approx half a sentence long so that is the
source of any confusion.

   AFAIK, Tom was proposing that the
 pending recovery data would wait for max_standby_delay, total, then
 cancel *all* queries which conflicted with it.  Now that we've talked
 this out, though, I can see that this can still result in mass cancel
 issues, just like the current max_standby_delay.   The main advantage I
 can see to Tom's idea is that (presumably) it can be more discriminating
 about which queries it cancels.

As I said to Stephen, this is exactly how it works already and wasn't
what was proposed.

 I agree that waiting on *each* query for up to # time would be a
 completely different behavior, and as such, should be a option for DBAs.
  We might make it the default option, but we wouldn't make it the only
 option.

Glad to hear you say that.

 Speaking of which, was *your* more discriminating query cancel ever applied?
 
  Currently max_standby_delay seeks to constrain the standby lag to a
  particular value, as a way of providing a bounded time for failover, and
  also to constrain the amount of WAL that needs to be stored as the lag
  increases. Currently, there is no guaranteed minimum query delay given
  to each query.
 
 Yeah, I can just see a lot of combinational issues with this.  For
 example, what if the user's network changes in some way to retard
 delivery of log segments to the point where the delivery time is longer
 than max_standby_delay?  To say nothing about system clock synch, which
 isn't perfect even if you have it set up.
 
 I can see DBAs who are very focussed on HA wanting a standby-lag based
 control anyway, when HA is far more important than the ability to run
 queries on the slave.  But I don't that that is the largest group; I
 think that far more people will want to balance the two considerations.
 
 Ultimately, as you say, we would like to have all three knobs:
 
 standby lag: max time measured from master timestamp to slave timestamp
 
 application lag: max time measured from local receipt of WAL records
 (via log copy or recovery connection) to their application

 query lag: max time any query which is blocking a recovery operation can run
 
 These three, in combination, would let us cover most potential use
 cases.  So I think you've assessed that's where we're going in the
 9.1-9.2 timeframe.
 
 However, I'd say for 9.0 that application lag is the least confusing
 option and the least dependant on the DBA's server room setup.  So if we
 can only have one of these for 9.0 (and I think going out with more than
 one might be too complex, especially at this late date) I think that's
 the way to go.

Before you posted, I submitted a patch on this thread to redefine
max_standby_delay to depend upon the application lag, as you've newly
defined it here - though obviously I didn't call it that. That solves
Tom's 3 issues. max_apply_delay might be technically more accurate term,
though isn't sufficiently better parameter name as to be worth the
change.

That patch doesn't implement his proposal, but that can be done as well
as (though IMHO not instead of). Given that two people have already
misunderstood what Tom proposed, and various people are saying we need
only one, I'm getting less inclined to have that at all.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pause/Resume feature for Hot Standby

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:

 * max_standby_delay logic is broken, as per other thread.

Proposed fix submitted,

 * handle_standby_sig_alarm is broken to the point of needing to be
 thrown away; you can NOT do that kind of thing in an interrupt handler.

This was modelled very closely on handle_sig_alarm() and was reviewed by
other hackers. I'm not great on that, as you know, so if you can explain
what it is I can't do, and how that differs from handle_sig_alarm
running the deadlock detector in the same way, then I'll work on it some
more.

 * RecordKnownAssignedTransactionIds is changing ShmemVariableCache-nextXid
 without any kind of lock (in general, I suspect all the xlog replay code
 needs to be revisited to see if it's skipping locks on shared data
 structures that are now potentially going to be examined by backends)

There is only one writer and this a single integer value, so any reads
are atomic. This is not being used as a memory barrier, so our earlier
discussion about weak-memory ordering doesn't apply.

The only other reader is bgwriter.

I'm happy to add additional locking if you think its really needed.

 * Use of StandbyTransactionIdIsPrepared seems awfully dubious: why are
 we trusting the standby's pg_twophase files more than data from the WAL
 log, *especially* before we have reached consistency?  

StandbyTransactionIdIsPrepared() is only called in two places, both of
which relate to pruning the KnownAssignedXids array. Pruning only occurs
when the WAL log specifically does not contain the information we need,
which only occurs when those hypothetical FATAL errors come along. In
that case we rely upon the pg_twophase files.

Both of those call points happen in ProcArrayApplyRecoveryInfo() which
does get called before we are consistent, though we can change that if
you see a problem. At this point, I don't see an issue.

 Not to mention
 that that's a horridly expensive operation (filesystem access) being
 invoked while holding ProcArrayLock.

I just optimised that in the recent patch you committed. It isn't a high
cost item any longer now that we are able to prune KnownAssignedXids()
from the left, since pruning will typically not test more than one xid.

 * Why is ExtendCLOG/ExtendSUBTRANS done in RecordKnownAssignedTransactionIds?

Heikki placed them there, so I left that coding, since it does work.
RecordKnown..() is supposed to be the logical equivalent of assigning an
xid, so it seemed logical. Happy to move wherever you see fit.

 It's inappropriate from a modularity standpoint, and it also seems completely
 wrong that it won't get done if standbyState  STANDBY_SNAPSHOT_PENDING.

Yes, that looks like a logic error and will be fixed. However, its
trapped later by clog code to zero new blocks, so in practice there is
no bug.

 nextXID manipulation there seems equally bogus not to mention unlocked.

Traced the code, looks fine to me. Yes, unlocked.

 * snapshotOldestActiveXid is bogus (I complained about this
 already, you have not fixed it)

I understood you were fixing it, as raised during your recent review of
the KAX patch. Will fix.

 * LogStandbySnapshot is merest fantasy: no guarantee that either the XIDs
 list or the locks list will be consistent with the point in WAL where it
 will get inserted.  What's worse, locking things down enough to guarantee
 consistency would be horrid for performance, or maybe even deadlock-inducing.
 Could lose both ways: list might contain an XID whose commit/abort went
 to WAL before the snapshot did, or list might be missing an XID started
 just after snap was taken, The latter case could possibly be dealt with
 via nextXid filtering, but that doesn't fix the former case, and anyway
 we have both ends of the same problem for locks.

That was recoded by Heikki and I left it as written, though I checked
it, considered it correct and take responsibility for it. Will review
further and report back.

Thanks for the review.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 21:34 +0200, Stefan Kaltenbrunner wrote:

 FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
 kinda related.
 
 using -j 16 -c 16 -T 120 I get either 10tps and around 66 
 contextswitches per second or on some runs I end up with 15tps and 
 around 1M contextswitches/s sustained. I mostly get the 100k result but 
 once in a while I get the 150k one. And one even can anticipate the 
 final transaction rate from watching vmstat 1...
 
 I'm not sure yet on what is causing that behaviour but that is with 
 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
 pure in-memory workload (scale = 20 with 48GB RAM).

Educated guess at a fix: please test this patch. It's good for
performance testing, but doesn't work correctly at failover, which would
obviously be addressed prior to any commit.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 158,163  static XLogRecPtr LastRec;
--- 158,164 
   * known, need to check the shared state.
   */
  static bool LocalRecoveryInProgress = true;
+ static bool KnownRecoveryState = false;
  
  /*
   * Local state for XLogInsertAllowed():
***
*** 6524,6537  CheckRecoveryConsistency(void)
  bool
  RecoveryInProgress(void)
  {
! 	/*
! 	 * We check shared state each time only until we leave recovery mode. We
! 	 * can't re-enter recovery, so there's no need to keep checking after the
! 	 * shared variable has once been seen false.
! 	 */
! 	if (!LocalRecoveryInProgress)
! 		return false;
! 	else
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
--- 6525,6533 
  bool
  RecoveryInProgress(void)
  {
! 	if (KnownRecoveryState)
! 		return LocalRecoveryInProgress;
! 	else		
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
***
*** 6541,6546  RecoveryInProgress(void)
--- 6537,6544 
  		LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress;
  		SpinLockRelease(xlogctl-info_lck);
  
+ 		KnownRecoveryState = true;
+ 
  		/*
  		 * Initialize TimeLineID and RedoRecPtr when we discover that recovery
  		 * is finished. InitPostgres() relies upon this behaviour to ensure

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GUCs that need restart

2010-05-04 Thread Gurjeet Singh

There are quite a few GUC parameters that need restart. Is there a way we
can avoid some of them needing restart? I am specifically looking at
archive_mode and the new wal_level.

From my limited understanding, these parameters need restart because in a
running cluster we cannot safely change these GUCs and make sure that
other/running backends will pick them up immediately so that they start
behaving differently as required by the GUC.

Are there other reasons to have them set to PGC_POSTMASTER?

If the above is correct and the only reason, then can we have them assigned
to a new PGC_ mode and have the SET commands somehow wait for all backends
to pickup the value before returning? (specifically, wait for any running
backends to exit the transaction).

I know there are genuine reasons behind having them depend on restart, but
am just trying to eliminate that, at least for some parameters which a DBA
might want to change on the fly, being fully aware of the consequences.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.enterprisedb.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Josh Berkus


   AFAIK, Tom was proposing that the
 pending recovery data would wait for max_standby_delay, total, then
 cancel *all* queries which conflicted with it.  Now that we've talked
 this out, though, I can see that this can still result in mass cancel
 issues, just like the current max_standby_delay.   The main advantage I
 can see to Tom's idea is that (presumably) it can be more discriminating
 about which queries it cancels.
 
 As I said to Stephen, this is exactly how it works already and wasn't
 what was proposed.

Well, it's not exactly how it works, as I understand it ... doesn't the
timer running out on the slave currently cancel *all* running queries
with old snapshots, regardless of what relations they touch?

 Before you posted, I submitted a patch on this thread to redefine
 max_standby_delay to depend upon the application lag, as you've newly
 defined it here - though obviously I didn't call it that. That solves
 Tom's 3 issues. max_apply_delay might be technically more accurate term,
 though isn't sufficiently better parameter name as to be worth the
 change.

Yeah, that looks less complicated for admins.  Thanks.

 That patch doesn't implement his proposal, but that can be done as well
 as (though IMHO not instead of). Given that two people have already
 misunderstood what Tom proposed, and various people are saying we need
 only one, I'm getting less inclined to have that at all.

Given your clarification on the whole set of behaviors, I'm highly
dubious about the idea of implementing Tom's proposal when we're already
Beta 1.  It seems like a 9.1 thing.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUCs that need restart

2010-05-04 Thread Kevin Grittner

Gurjeet Singh singh.gurj...@gmail.com wrote:
 
 There are quite a few GUC parameters that need restart. Is there a
 way we can avoid some of them needing restart? I am specifically
 looking at archive_mode and the new wal_level.
 
I'll second this on a don't know if it's practicable, but it would
be nice if... basis.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 One possible thing we might do (bearing in mind that we might need to
 wall off access at multiple levels) would be to forbid creating a
 relcache entry for a non-local temprel.  That would, in turn, forbid
 doing pretty much anything to such a relation, although I'm not sure
 what else would get broken in the process.

Dropping temprels left behind by a crashed backend would get broken by
that; which is a deal-breaker, because we have to be able to clean those
up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 I don't very much like the wildcard idea; but I don't think it's
 unreasonable to refuse to provide a file size.  If the owning backend
 has still got part of the table in local buffers, you'll get a
 misleading answer, so perhaps it's best to not give an answer at all.

FWIW, that's not the case, anymore than it is for blocks in shared
buffer cache for regular rels.  smgrextend() results in an observable
extension of the file EOF immediately, whether or not you can see
up-to-date data for those pages.

Now people have often complained about the extra I/O involved in that,
and it'd be nice to have a solution, but it's not clear to me that
fixing it would be harder for temprels than regular rels.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 One possible thing we might do (bearing in mind that we might need to
 wall off access at multiple levels) would be to forbid creating a
 relcache entry for a non-local temprel.  That would, in turn, forbid
 doing pretty much anything to such a relation, although I'm not sure
 what else would get broken in the process.

 Dropping temprels left behind by a crashed backend would get broken by
 that; which is a deal-breaker, because we have to be able to clean those
 up.

Phooey.  It was such a good idea in my head.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 Given your clarification on the whole set of behaviors, I'm highly
 dubious about the idea of implementing Tom's proposal when we're already
 Beta 1.  It seems like a 9.1 thing.

I think you missed the point: do nothing is not a viable option.
I was proposing something that seemed simple enough to be safe to
drop into 9.0 at this point.  I'm less convinced that what Simon
is proposing is safe enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 18:53 -0400, Tom Lane wrote:

 I think you missed the point: do nothing is not a viable option.
 I was proposing something that seemed simple enough to be safe to
 drop into 9.0 at this point. 

I've posted a patch that meets your stated objections. If you could
review that, this could be done in an hour.

There are other ways, but you'll need to explain a proposal in enough
detail that we're clear what you actually mean. 

 I'm less convinced that what Simon is proposing is safe enough.

Which proposal?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Smith


Tom Lane wrote:

1. The timestamps we are reading from the log might be historical,
if we are replaying from archive rather than reading a live SR stream.
In the current implementation that means zero grace period for standby
queries.  Now if your only interest is catching up as fast as possible,
that could be a sane behavior, but this is clearly not the only possible
interest --- in fact, if that's all you care about, why did you allow
standby queries at all?
  


If the standby is not current, you may not want people to execute 
queries against it.  In some situations, returning results against 
obsolete data is worse than not letting the query execute at all.  As I 
see it, the current max_standby_delay implementation includes the 
expectation that the results you are getting are no more than 
max_standby_delay behind the master, presuming that new data is still 
coming in.  If the standby has really fallen further behind than that, 
there are situations where you don't want it doing anything but catching 
up until that is no longer the case, and you especially don't want it 
returning stale query data.


The fact that tuning in that direction could mean the standby never 
actually executes any queries is something you need to monitor for--it 
suggests the standby isn't powerful/well connected to the master enough 
to keep up--but that's not necessarily the wrong behavior.  Saying I 
only want the standby to execute queries if it's not too far behind the 
master is the answer to why did you allow standby queries at all? 
when tuning for that use case.



2. There could be clock skew between the master and slave servers.
  


Not the database's problem to worry about.  Document that time should be 
carefully sync'd and move on.  I'll add that.



3. There could be significant propagation delay from master to slave,
if the WAL stream is being transmitted with pg_standby or some such.
Again this results in cutting into the standby queries' grace period,
for no defensible reason.
  


Then people should adjust their max_standby_delay upwards to account for 
that.  For high availability purposes, it's vital that the delay number 
be referenced to the commit records on the master.  If lag is eating a 
portion of that, again it's something people should be monitoring for, 
but not something we can correct.  The whole idea here is that 
max_standby_delay is an upper bound on how stale the data on the standby 
can be, and whether or not lag is a component to that doesn't impact how 
the database is being asked to act.



In addition to these fundamental problems there's a fatal implementation
problem: the actual comparison is not to the master's current clock
reading, but to the latest commit, abort, or checkpoint timestamp read
from the WAL.
Right; this has been documented for months at 
http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before 
that, i.e. If there's little activity in the master, that can lead to 
surprising results.  The suggested long-term fix has been adding 
keepalive timestamps into SR, which seems to get reinvented every time 
somebody plays with this for a bit.  The HS documentation improvements 
I'm working on will suggest that you make sure this doesn't happen, that 
people have some sort of keepalive  WAL-generating activity on the 
master regularly, if they expect max_standby_delay to work reasonably in 
the face of an idle master.  It's not ideal, but it's straightforward to 
work around in user space.



I'm inclined to think that we should throw away all this logic and just
have the slave cancel competing queries if the replay process waits
more than max_standby_delay seconds to acquire a lock.  This is simple,
understandable, and behaves the same whether we're reading live data or
not.


I don't consider something that allows queries to execute when not 
playing recent live data is necessarily a step forward, from the 
perspective of implementations preferring high-availability.  It's 
reasonable for some people to request that the last thing a standby 
that's not current (max_standby_delay behind the master, based on the 
last thing received) should be doing is answering any queries, when it 
doesn't have current data and it should be working on catchup instead.


Discussion here obviously has wandered past your fundamental objections 
here and onto implementation trivia, but I didn't think the difference 
between what you expected and what's actually committed already was 
properly addressed before doing that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Tom Lane

Greg Smith g...@2ndquadrant.com writes:
 If the standby is not current, you may not want people to execute 
 queries against it.  In some situations, returning results against 
 obsolete data is worse than not letting the query execute at all.  As I 
 see it, the current max_standby_delay implementation includes the 
 expectation that the results you are getting are no more than 
 max_standby_delay behind the master, presuming that new data is still 
 coming in.  If the standby has really fallen further behind than that, 
 there are situations where you don't want it doing anything but catching 
 up until that is no longer the case, and you especially don't want it 
 returning stale query data.

That is very possibly a useful thing to be able to specify, but the
current implementation has *nothing whatsoever* to do with making such a
guarantee.  It will only kill queries that are creating a lock conflict.
I would even argue that it's a bad thing to have a parameter that looks
like it might do that, when it doesn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Josh Berkus

On 5/4/10 4:26 PM, Greg Smith wrote:
 
 Not the database's problem to worry about.  Document that time should be
 carefully sync'd and move on.  I'll add that.

Releasing a hot standby which *only* works for users with an operational
ntp implementation is highly unrealistic.   Having built-in replication
in PostgreSQL was supposed to give the *majority* of users a *simple*
option for 2-server failover, not cater only to the high end.  Every
administrative requirement we add to HS/SR eliminates another set of
potential users, as well as adding another set of potential failure
conditions which need to be monitored.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Need to contact driver authors about change in index naming behavior ...

2010-05-04 Thread Josh Berkus

Hackers, Driver maintainers,

The 9.0 don't rename index cols behavior has already broken JDBC.  We
need to get in touch with other driver authors to see if they are
affected by this, and to let them know that they'll need a new driver
release for 9.0, if so.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] LogStandbySnapshot (was another thread)

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 13:23 -0400, Tom Lane wrote:

 * LogStandbySnapshot is merest fantasy: no guarantee that either the
 XIDs list or the locks list will be consistent with the point in WAL
 where it will get inserted.  What's worse, locking things down enough
 to guarantee consistency would be horrid for performance, or maybe
 even deadlock-inducing. Could lose both ways: list might contain an
 XID whose commit/abort went to WAL before the snapshot did, or list
 might be missing an XID started just after snap was taken, The latter
 case could possibly be dealt with via nextXid filtering, but that
 doesn't fix the former case, and anyway we have both ends of the same
 problem for locks.

This was the only serious complaint on your list, so lets address it.

Clearly we don't want to lock everything down, for all the reasons you
say. That creates a gap between when data is derived and when data
logged to WAL.

LogStandbySnapshot() occurs during online checkpoints on or after the
logical checkpoint location and before the physical checkpoint location.

We start recovery from a checkpoint, so we have a starting point in WAL
for our processing. The time sequence on the primary of these related
events is

Logical Checkpoint location
newxids/commits/locks Before1
AccessExclusiveLocks derived
newxids/commits/locks Before2
AccessExclusiveLocks WAL record inserted
newxids/commits/locks After1
RunningXact derived
newxids/commits/locks After2
RunningXact WAL record inserted

though when we read them back from WAL, they will be in this order, and
we cannot tell the difference between events at Before 1  2 or After 1
 2.

Logical Checkpoint location = STANDBY_INITIALIZED
newxids/commits/locks Before1
newxids/commits/locks Before2
AccessExclusiveLocks WAL record
newxids/commits/locks After1
newxids/commits/locks After2
RunningXact WAL record = STANDBY_SNAPSHOT_READY

We're looking for a consistent point. We don't know what the exact
time-synchronised point is on master, so we have to use an exact point
in WAL and work from there. We need to understand that the serialization
of events in the log can be slightly different to how they occurred on
the primary, but that doesn't change anything important.

So to get a set of xids + locks that are consistent at the moment the
RunningXact WAL record is read we need to 

1. Begin processing incoming changes from the time we are
STANDBY_INITIALIZED, though forgive any errors for removals of missing
items until we hit STANDBY_SNAPSHOT_READY
a) locks - we ignore missing locks in StandbyReleaseLocks()
b) xids - we ignore missing xids in KnownAssignedXidsRemove()

2. Any transaction commits/aborts from the time we are
STANDBY_INITIALIZED, through to STANDBY_SNAPSHOT_READY need to be saved,
so that we can remove them again from the snapshot state. That is
because events might otherwise exist in the standby that will never be
removed from snapshot. We do this by simple test whether the related xid
has already completed.
a) locks - we ignore locks for already completed xids in
StandbyAcquireAccessExclusiveLock()
b) xids - we ignore already completed xids in
ProcArrayApplyRecoveryInfo()

We currently do all of the above. So it looks correct to me.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Joshua D. Drake

On Tue, 2010-05-04 at 16:34 -0700, Josh Berkus wrote:
 On 5/4/10 4:26 PM, Greg Smith wrote:
  
  Not the database's problem to worry about.  Document that time should be
  carefully sync'd and move on.  I'll add that.
 
 Releasing a hot standby which *only* works for users with an operational
 ntp implementation is highly unrealistic.   Having built-in replication
 in PostgreSQL was supposed to give the *majority* of users a *simple*
 option for 2-server failover, not cater only to the high end.  Every
 administrative requirement we add to HS/SR eliminates another set of
 potential users, as well as adding another set of potential failure
 conditions which need to be monitored.

+1

Joshua D. Drake



-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Smith


Josh Berkus wrote:

Having built-in replication in PostgreSQL was supposed to give the *majority* 
of users a *simple*
option for 2-server failover, not cater only to the high end.


If that's your criteria, 9.0 has already failed that goal.  Just the 
fact that you have to make your own base backup and manage that whole 
area alone excludes simple as a goal anyone can claim 9.0 meets with a 
straight face, long before you get to the mechanics of how HS handles 
query cancellation.  The new replication oriented features are 
functional, but neither are close to simple yet.  Based on the 
complication level of replication in other database products, I wouldn't 
put money on that even being possible.  You can make a simpler path the 
default one, but the minute you want to support more than one use case 
the complexity involved in setting up replication explodes.


Anyway, I have no idea where the idea that recommending time 
synchronization is a somehow a high end requirement, given that every 
OS I'm aware of makes that trivial nowadays.  Slave servers that drift 
too far away from the master time are going to cause all sorts of 
problems for user apps too.  Any app that gauges how long ago something 
happened by comparing a database timestamp with now() is going to give 
misleading results for example, and I know I see those all the time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Tom Lane

Greg Smith g...@2ndquadrant.com writes:
 Anyway, I have no idea where the idea that recommending time 
 synchronization is a somehow a high end requirement,

Considering that clock skew was only one of several scenarios in which
the max_standby_delay code misbehaves, it's not that important whether
you consider it highly probable or not.  The code still needs a
redesign, and we may as well eliminate the assumption of tight
synchronization while we are at it.  There's no really good reason to
have that requirement in there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Josh Berkus


 Releasing a hot standby which *only* works for users with an operational
 ntp implementation is highly unrealistic.   Having built-in replication
 in PostgreSQL was supposed to give the *majority* of users a *simple*
 option for 2-server failover, not cater only to the high end.  Every
 administrative requirement we add to HS/SR eliminates another set of
 potential users, as well as adding another set of potential failure
 conditions which need to be monitored.

To be completely practical, I'm saying that we should apply  test
Simon's latest patch moving the delay calculation to be application lag
instead of standby lag.

I'm also suggesting that we should have a standby lag option for 9.1 (as
well as, probably, a wait forever option ala Tom's suggestion).

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Mark Kirkwood


On 05/05/10 06:24, Srinivas Naik wrote:



I am sorry for that, but I made two different installations and I was 
messing up with various inputs.


Actually, the installed versions are below
*postgresql-8.3*
*Ubuntu 8.10  with 2.6.27 Kernel*
*and its an 32Bit O/S*

pgsql$ SELECT substring(B'0001' from 5 for -2);
ERROR:invalid memory alloc request size 4244635647




Please log into postgres do:

SELECT version();

(and Robert suggested) and show us the output - as we need to know the 
3rd number e.g 8.3.x in the postgres version to help you any more.


regards

Mark

Re: [HACKERS] Reg: SQL Query for Postgres 8.4.3

2010-05-04 Thread Mark Kirkwood


On 05/05/10 13:15, Mark Kirkwood wrote:


Please log into postgres do:

SELECT version();

(and Robert suggested)


Should read *as* Robert suggested - sorry.

Also you could do this from the os:

$ aptitude show postgresql-8.3*

*which will display more detail for the version.

Cheers

Mark
*


*

Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...

2010-05-04 Thread David Fetter

On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote:
 Hackers, Driver maintainers,
 
 The 9.0 don't rename index cols behavior has already broken JDBC.  We
 need to get in touch with other driver authors to see if they are
 affected by this, and to let them know that they'll need a new driver
 release for 9.0, if so.

Looks at first glance like it still works for PL/Perl.

I created a table t with single PK text column id, renamed it to
idiotic, and pointed the attached program at it.  Git master
(92b93d85a49eb8d55060961c0d53e085ed92) is running on port 2225,
and I'm connecting as my shell user.

Here's the SQL I ran before the script.

CREATE table simple(id SERIAL PRIMARY KEY);
ALTER TABLE simple RENAME COLUMN id TO idiotic;
CREATE table complexer(a text, b text, primary key(a,b));
ALTER TABLE complexer RENAME a to alpha, b to bravo;
ALTER TABLE complexer RENAME a to alpha;
ALTER TABLE complexer RENAME b to bravo;

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
#!/usr/bin/perl
use strict;
use warnings;

use DBI;

my @tables = qw(simple complexer);

my $dbh=DBI-connect(dbi:Pg:;port=2225,,)
or die Couldn't connect.;

foreach my $table (@tables) {
my @key_column_names=$dbh-primary_key(undef, public,$table);
print PK column(s): , join(, , @key_column_names), .\n;
my $sth=$dbh-primary_key_info( undef, public, $table, {pg_onerow=2});
if (defined $sth) {
my $pk = $sth-fetchall_arrayref()-[0];
print Table $pk-[2] has a primary key on these columns:\n;
for (my $x=0; defined $pk-[3][$x]; $x++) {
print Column: $pk-[3][$x]  (data type: $pk-[6][$x])\n;
}
}
}
$dbh-disconnect();

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...

2010-05-04 Thread Andrew Dunstan




David Fetter wrote:

On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote:
  

Hackers, Driver maintainers,

The 9.0 don't rename index cols behavior has already broken JDBC.  We
need to get in touch with other driver authors to see if they are
affected by this, and to let them know that they'll need a new driver
release for 9.0, if so.



Looks at first glance like it still works for PL/Perl.

  


Er, ITYM Perl/DBD::Pg.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need to contact driver authors about change in index naming behavior ...

2010-05-04 Thread David Fetter

On Tue, May 04, 2010 at 10:00:52PM -0400, Andrew Dunstan wrote:
 
 
 David Fetter wrote:
 On Tue, May 04, 2010 at 04:40:22PM -0700, Josh Berkus wrote:
 Hackers, Driver maintainers,
 
 The 9.0 don't rename index cols behavior has already broken JDBC.  We
 need to get in touch with other driver authors to see if they are
 affected by this, and to let them know that they'll need a new driver
 release for 9.0, if so.
 
 Looks at first glance like it still works for PL/Perl.
 
 
 Er, ITYM Perl/DBD::Pg.

You're right, of course.  Oops.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Bruce Momjian

Tom Lane wrote:
 Greg Smith g...@2ndquadrant.com writes:
  Anyway, I have no idea where the idea that recommending time 
  synchronization is a somehow a high end requirement,
 
 Considering that clock skew was only one of several scenarios in which
 the max_standby_delay code misbehaves, it's not that important whether
 you consider it highly probable or not.  The code still needs a
 redesign, and we may as well eliminate the assumption of tight
 synchronization while we are at it.  There's no really good reason to
 have that requirement in there.

Should I be concerned that we are redesigning HS features at this stage
in the release?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Robert Haas

On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian br...@momjian.us wrote:
 Should I be concerned that we are redesigning HS features at this stage
 in the release?

Yep.  You can decide whether you want to be concerned by the redesign
itself, or by the concerns about the underlying code that are
motivating the redesigns, but yes, you should definitely be concerned.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] including PID or backend ID in relpath of temp rels

2010-05-04 Thread Robert Haas

On Tue, Apr 27, 2010 at 9:59 PM, Robert Haas robertmh...@gmail.com wrote:
 [storage.c,xact.c,twophase.c] smgrGetPendingDeletes returns via an out
 parameter (its second argument) a list of RelFileNodes pending delete,
 which we then write to WAL or to the two-phase state file.

It appears that we are playing a little bit fast and loose with this.
I think that the two-phase code path is solid because we prohibit
PREPARE TRANSACTION if the transaction has referenced any temporary
tables, so when we read the two-phase state file it's safe to assume
that all the tables mentioned are non-temporary.  But the ordinary
one-phase commit writes permanent and temporary relfilenodes to WAL
without distinction, and then, in xl_redo_commit() and
xl_redo_abort(), does this:

XLogDropRelation(xlrec-xnodes[i], fork);
smgrdounlink(srel, fork, false, true);

The third argument to smgrdounlink() is isTemp, which we're here
passing as false, but might really be true.  I don't think it
technically matters at present because the only effect of that
parameter right now is that we pass it through to
DropRelFileNodeBuffers(), which will drop shared buffers rather than
local buffers as a result of the incorrect setting.  But that won't
matter because the WAL replay process shouldn't have any local buffers
anyway, since temp relations are not otherwise WAL-logged.  For the
same reason, I don't think the call to XLogDropRelation() is an issue
because its only purpose is to remove entries from invalid_page_tab,
and there won't be any temporary pages in there anyway.

Of course if we're going to do $SUBJECT this will need to be changed
anyway, but assuming the above analysis is correct I think the
existing coding at least deserves a comment...  then again, maybe I'm
all mixed up?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Andrew Dunstan




Robert Haas wrote:

On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian br...@momjian.us wrote:
  

Should I be concerned that we are redesigning HS features at this stage
in the release?



Yep.  You can decide whether you want to be concerned by the redesign
itself, or by the concerns about the underlying code that are
motivating the redesigns, but yes, you should definitely be concerned.


  


Our process is shot to pieces. But then, we knew that, didn't we ;-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_standby_delay considered harmful

2010-05-04 Thread Greg Stark

On Wed, May 5, 2010 at 5:01 AM, Andrew Dunstan and...@dunslane.net wrote:
 Our process is shot to pieces. But then, we knew that, didn't we ;-)


Franky I think these kinds of usability questions are things that
we'll never have great success getting feedback on without users
banging on the system. There are solutions but none of them are
perfect. We a) either let releases go out with functionality that
nobody's sure users will be happy with and promise that the next
release will polish it, b) commit big functionality changes only at
the beginning of the release cycle and live with a two-release cycle
latency for such changes, or c) make last-minute changes during betas.

So far we've seen a combination of (a) and (c). I think (a) has worked
out well, sure we have some non-ideal behaviour in new features but
people forgive and forget when the later releases polish it. I've
always advocated for (b) though, I think we've been lucky with the
cases where we've needed last minute adjustments that they've worked
better than we had a right to expect.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm building all live branches from git

2010-05-04 Thread Alex Hunsaker

On Mon, May 3, 2010 at 14:04, Andrew Dunstan and...@dunslane.net wrote:

 [ Awesome work getting buildfarm support for git ]

 Note, this is running from my test git repo, not the community's repo.

BTW +1 for gitting (heh, git puns are fun) a good git repo published.
Ive given up trying to trust it for back branches and always either go
to release tarballs or cvs.

 Sadly, that means its change
 links will be broken - I'm not exactly sure what gets hashed to provide a
 commit ID in git, but the IDs don't match between these two repos.

Yeah, git basically hashes *everything* including the previous
commits.  So if one commit is different in the repo all the commits
after that will have a different hash :-(

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

80 matches

Mail list logo