Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Heikki Linnakangas
Tom Lane wrote:
> So I think that labeling textanycat/anytextcat as immutable was a
> thinko, and we instead ought to label them volatile so that the planner
> can inline them no matter what the behavior of the underlying text cast
> is.

That feels backwards, having to label functions as more volatile than
they really are, just to allow optimizations elsewhere. Marking
textanycat as not immutable would forbid using it in expression indexes,
too.

> ... The planner will not inline a function if the
> resulting expression would be more volatile than the function claims
> itself to be, because sometimes the point of such a function is to
> hide the expression's volatility. ...

Could we inline the function anyway, if it came from an operator?
Presumably if you want to hide an expresssion's volatility, you use an
explicit function call - I can't imagine using an operator for that purpose.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Jaime Casanova
On Sat, May 15, 2010 at 10:21 PM, Tom Lane  wrote:
> Jaime Casanova  writes:
>> On Sat, May 15, 2010 at 10:16 PM, Tom Lane  wrote:
>>> Jaime Casanova  writes:
 On Sat, May 15, 2010 at 8:51 PM, Tom Lane  wrote:
 Is it reasonable to fix this now, and if so should I bump catversion
 or leave it alone?  My own preference is to fix it in pg_proc.h but
 not touch catversion; but you could argue that different ways.
>
>> ok, then is up to you if you think that it is worth an initdb in
>> beta... i still think is excessive...
>
> The point of not wanting to change catversion is to not force an
> initdb.
>

ah! yeah! you are the one that doesn't want to touch catversion, so
i'm barking at the wrong tree then.
i have been busy and need to rest a little ;)
+1 to not touch catversion

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Andrew Dunstan



Bruce Momjian wrote:


Maybe I have misunderstood. How exactly is the server version being 
hacked here? I know it's only for testing, but it still seems to me that 
lying to a program as heavily version dependant as pg_dump is in general 
a bad idea.



The code in pg_dump 9.0 is:

/*
 * If supported, set extra_float_digits so that we can dump float data
 * exactly (given correctly implemented float I/O code, anyway)
 */
if (g_fout->remoteVersion >= 9)
do_sql_command(g_conn, "SET extra_float_digits TO 3");
else if (g_fout->remoteVersion >= 70400)
--> do_sql_command(g_conn, "SET extra_float_digits TO 2");

The indicated line had to be changed to '3'.  I did not change anything
else, and it was only done in my private CVS tree.

  


Oh, I see. It is pg_dump that you hacked. That wasn't clear to me from 
what you first said.


But do earlier server versions accept a value of 3? The 8.4 docs say 
"The value can be set as high as 2".


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Tom Lane
Jaime Casanova  writes:
> On Sat, May 15, 2010 at 10:16 PM, Tom Lane  wrote:
>> Jaime Casanova  writes:
>>> On Sat, May 15, 2010 at 8:51 PM, Tom Lane  wrote:
>>> Is it reasonable to fix this now, and if so should I bump catversion
>>> or leave it alone?  My own preference is to fix it in pg_proc.h but
>>> not touch catversion; but you could argue that different ways.

> ok, then is up to you if you think that it is worth an initdb in
> beta... i still think is excessive...

The point of not wanting to change catversion is to not force an
initdb.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Jaime Casanova
On Sat, May 15, 2010 at 10:16 PM, Tom Lane  wrote:
> Jaime Casanova  writes:
>> On Sat, May 15, 2010 at 8:51 PM, Tom Lane  wrote:
>>> Is it reasonable to fix this now, and if so should I bump catversion
>>> or leave it alone?  My own preference is to fix it in pg_proc.h but
>>> not touch catversion; but you could argue that different ways.
>
>> are you planning to backpatch this?
>
> I wasn't planning to; as you say, without field complaints it doesn't
> seem compelling to fix in existing releases.
>

ok, then is up to you if you think that it is worth an initdb in
beta... i still think is excessive...
btw, is it worth documenting that somewhere for older releases?


-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Tom Lane
Jaime Casanova  writes:
> On Sat, May 15, 2010 at 8:51 PM, Tom Lane  wrote:
>> Is it reasonable to fix this now, and if so should I bump catversion
>> or leave it alone?  My own preference is to fix it in pg_proc.h but
>> not touch catversion; but you could argue that different ways.

> are you planning to backpatch this?

I wasn't planning to; as you say, without field complaints it doesn't
seem compelling to fix in existing releases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Bruce Momjian
Andrew Dunstan wrote:
> >>> The problem I just encountered is that pg_dump uses
> >>> extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
> >>> to do hack each server version to get a dump output that would match
> >>> without rounding errors --- it did eventually work and validated.
> >>>   
> >
> >   
> >> That sounds like a disaster waiting to happen. The server version is 
> >> going to affect much more than just this behaviour, surely. Wouldn't it 
> >> be better to provide a pg_dump option to provide the extra_float_digits 
> >> setting?
> >> 
> >
> > What disaster?  That's only for test purposes, it has nothing to do with
> > actual data transfer.
> >
> > 
> >   
> 
> Maybe I have misunderstood. How exactly is the server version being 
> hacked here? I know it's only for testing, but it still seems to me that 
> lying to a program as heavily version dependant as pg_dump is in general 
> a bad idea.

The code in pg_dump 9.0 is:

/*
 * If supported, set extra_float_digits so that we can dump float data
 * exactly (given correctly implemented float I/O code, anyway)
 */
if (g_fout->remoteVersion >= 9)
do_sql_command(g_conn, "SET extra_float_digits TO 3");
else if (g_fout->remoteVersion >= 70400)
--> do_sql_command(g_conn, "SET extra_float_digits TO 2");

The indicated line had to be changed to '3'.  I did not change anything
else, and it was only done in my private CVS tree.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan  writes:
  

Bruce Momjian wrote:


FYI, I test pg_upgrade by loading the old cluster's regression database
from a pg_dump output file, then after the upgrade, I dump the
regression database of the new cluster and diff the changes.

The problem I just encountered is that pg_dump uses
extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
to do hack each server version to get a dump output that would match
without rounding errors --- it did eventually work and validated.
  


  
That sounds like a disaster waiting to happen. The server version is 
going to affect much more than just this behaviour, surely. Wouldn't it 
be better to provide a pg_dump option to provide the extra_float_digits 
setting?



What disaster?  That's only for test purposes, it has nothing to do with
actual data transfer.


  


Maybe I have misunderstood. How exactly is the server version being 
hacked here? I know it's only for testing, but it still seems to me that 
lying to a program as heavily version dependant as pg_dump is in general 
a bad idea.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Jaime Casanova
On Sat, May 15, 2010 at 8:51 PM, Tom Lane  wrote:
>
> Is it reasonable to fix this now, and if so should I bump catversion
> or leave it alone?  My own preference is to fix it in pg_proc.h but
> not touch catversion; but you could argue that different ways.
>

are you planning to backpatch this? if so, i say no to bump catversion
but only mention in the release notes that if you are upgrading you
have to make those updates manually... we have made that before...
otherwise we will require an initdb for minor version upgrade and
being that no one noted this before that seems excessive to me, IMHO

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Tom Lane
Andrew Dunstan  writes:
> Bruce Momjian wrote:
>> FYI, I test pg_upgrade by loading the old cluster's regression database
>> from a pg_dump output file, then after the upgrade, I dump the
>> regression database of the new cluster and diff the changes.
>> 
>> The problem I just encountered is that pg_dump uses
>> extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
>> to do hack each server version to get a dump output that would match
>> without rounding errors --- it did eventually work and validated.

> That sounds like a disaster waiting to happen. The server version is 
> going to affect much more than just this behaviour, surely. Wouldn't it 
> be better to provide a pg_dump option to provide the extra_float_digits 
> setting?

What disaster?  That's only for test purposes, it has nothing to do with
actual data transfer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> 
> Bruce Momjian wrote:
> > FYI, I test pg_upgrade by loading the old cluster's regression database
> > from a pg_dump output file, then after the upgrade, I dump the
> > regression database of the new cluster and diff the changes.
> >
> > The problem I just encountered is that pg_dump uses
> > extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
> > to do hack each server version to get a dump output that would match
> > without rounding errors --- it did eventually work and validated.
> >
> >   
> 
> That sounds like a disaster waiting to happen. The server version is 
> going to affect much more than just this behaviour, surely. Wouldn't it 
> be better to provide a pg_dump option to provide the extra_float_digits 
> setting?

FYI, you can't override it with PGOPTIONS because it is set inside the
pg_dump binary.  I am not sure what you mean by your second sentence.

I was just reporting it in case anyone else was trying this for testing.
I doubt anyone else is going to try such a thing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Andrew Dunstan



Bruce Momjian wrote:

FYI, I test pg_upgrade by loading the old cluster's regression database
from a pg_dump output file, then after the upgrade, I dump the
regression database of the new cluster and diff the changes.

The problem I just encountered is that pg_dump uses
extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
to do hack each server version to get a dump output that would match
without rounding errors --- it did eventually work and validated.

  


That sounds like a disaster waiting to happen. The server version is 
going to affect much more than just this behaviour, surely. Wouldn't it 
be better to provide a pg_dump option to provide the extra_float_digits 
setting?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade and extra_float_digits

2010-05-15 Thread Bruce Momjian
FYI, I test pg_upgrade by loading the old cluster's regression database
from a pg_dump output file, then after the upgrade, I dump the
regression database of the new cluster and diff the changes.

The problem I just encountered is that pg_dump uses
extra_float_digits=-3 for 9.0, while previous releases used '2'.  I had
to do hack each server version to get a dump output that would match
without rounding errors --- it did eventually work and validated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance problem in textanycat/anytextcat

2010-05-15 Thread Tom Lane
I noticed by accident that there are some cases where the planner fails
to inline the SQL functions that underlie the || operator for text vs
non-text cases.  The reason is that these functions are marked
immutable, but their expansion involves a coercion to text that might
not be immutable.  The planner will not inline a function if the
resulting expression would be more volatile than the function claims
itself to be, because sometimes the point of such a function is to hide
the expression's volatility.  In this case, however, we don't want to
hide the true nature of the expression, and we definitely don't want
to pay the performance price of calling a SQL function.  That price
is pretty significant, eg on a rather slow machine I get

regression=# select count(localtimestamp || i::text) from 
generate_series(1,10) i;
 count  

 10
(1 row)

Time: 12512.624 ms
regression=# update pg_proc set provolatile = 'v' where oid = 2004;
UPDATE 1
Time: 7.104 ms
regression=# select count(localtimestamp || i::text) from 
generate_series(1,10) i;
 count  

 10
(1 row)

Time: 4967.086 ms

so the added overhead more than doubles the cost of this case.

There's also a possibility of an outright wrong behavior, since the
immutable marking will allow the concatenation of two constants to
be folded to a constant in contexts where perhaps it shouldn't be.
Continuing the above example, 'now'::timestamp || 'foo' will be folded
to a constant on sight, which is wrong because the coercion to text
depends on DateStyle and ought to react to a later change in DateStyle.

So I think that labeling textanycat/anytextcat as immutable was a
thinko, and we instead ought to label them volatile so that the planner
can inline them no matter what the behavior of the underlying text cast
is.

Is it reasonable to fix this now, and if so should I bump catversion
or leave it alone?  My own preference is to fix it in pg_proc.h but
not touch catversion; but you could argue that different ways.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-15 Thread Bruce Momjian
Tom Lane wrote:
> Robert Haas  writes:
> > On Sat, May 15, 2010 at 7:46 PM, David Fetter  wrote:
> >> Wouldn't this count as a bug fix?
> 
> > Possibly, but changes to signal handlers are pretty global and can
> > sometimes have surprising side effects.  I'm all in favor of someone
> > reviewing the patch - any volunteers?  One case to test might be
> > reading input from a file that contains \! escapes.  More generally,
> > we need to consider every way that psql can get SIGCHLD and think
> > about whether this is the right behavior.
> 
> I think this will introduce far more bugs than it fixes.  A saner
> approach, which would also help for other corner cases such as
> out-of-disk-space, would be to check for write failures on the output
> file and abandon the query if any occur.

Is this a TODO?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unexpected page allocation behavior on insert-only tables

2010-05-15 Thread Michael Renner

On 16.05.2010 02:16, Tom Lane wrote:

Michael Renner  writes:

I've written a simple tool to generate traffic on a database [1], which
did about 30 TX/inserts per second to a table. Upon inspecting the data
in the table, I noticed the expected grouping of tuples which came from
a single backend to matching pages [2]. The strange part was that the
pages weren't completely filled but the backends seemed to jump
arbitrarily from one page to the next [3]. For the table in question
this resulted in about 10% wasted space.


Which table would that be?  The trigger-driven updates to "auction",
in particular, would certainly guarantee some amount of "wasted" space.


Yeah, the auction table receives heavy updates and gets vacuumed regularly.

The behavior I showed was for the "bid" table, which only gets inserts 
(and triggers the updates for the auction table).


best regards,
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unexpected page allocation behavior on insert-only tables

2010-05-15 Thread Tom Lane
Michael Renner  writes:
> I've written a simple tool to generate traffic on a database [1], which 
> did about 30 TX/inserts per second to a table. Upon inspecting the data 
> in the table, I noticed the expected grouping of tuples which came from 
> a single backend to matching pages [2]. The strange part was that the 
> pages weren't completely filled but the backends seemed to jump 
> arbitrarily from one page to the next [3]. For the table in question 
> this resulted in about 10% wasted space.

Which table would that be?  The trigger-driven updates to "auction",
in particular, would certainly guarantee some amount of "wasted" space.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-15 Thread Tom Lane
Robert Haas  writes:
> On Sat, May 15, 2010 at 7:46 PM, David Fetter  wrote:
>> Wouldn't this count as a bug fix?

> Possibly, but changes to signal handlers are pretty global and can
> sometimes have surprising side effects.  I'm all in favor of someone
> reviewing the patch - any volunteers?  One case to test might be
> reading input from a file that contains \! escapes.  More generally,
> we need to consider every way that psql can get SIGCHLD and think
> about whether this is the right behavior.

I think this will introduce far more bugs than it fixes.  A saner
approach, which would also help for other corner cases such as
out-of-disk-space, would be to check for write failures on the output
file and abandon the query if any occur.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unexpected page allocation behavior on insert-only tables

2010-05-15 Thread Michael Renner
While preparing a replication test setup with 9.0beta1 I noticed strange 
page allocation patterns which Andrew Gierth found interesting enough to 
report here.


I've written a simple tool to generate traffic on a database [1], which 
did about 30 TX/inserts per second to a table. Upon inspecting the data 
in the table, I noticed the expected grouping of tuples which came from 
a single backend to matching pages [2]. The strange part was that the 
pages weren't completely filled but the backends seemed to jump 
arbitrarily from one page to the next [3]. For the table in question 
this resulted in about 10% wasted space.


After issuing a VACUUM on the table the free space map got updated (or 
initialized?) and the backends used the remaining space in the pages, 
though the spurious page allocation continued.



best regards,
Michael

[1] https://workbench.amd.co.at/hg/pgworkshop/file/dc5ab49c99bb/pgexerciser

[2] E.g.:

(0,1) TX1
(0,2) TX5
(0,3) TX7
..
(1,1) TX2
(1,2) TX6
(1,3) TX9

etc.

[3] http://nopaste.narf.at/show/55/
Optimal usage seems to be 136 tuples per page for the table in question.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-15 Thread Robert Haas
On Sat, May 15, 2010 at 7:46 PM, David Fetter  wrote:
>> >   Anyway, this makes FETCH_COUNT alot more useful, and, in my view, the
>> >   current behaviour of completely ignoring $PAGER exiting is a bug.
>>
>> Plesae add this to the next commit-fest:
>>
>>       https://commitfest.postgresql.org/action/commitfest_view/inprogress
>>
>> Thanks.
>
> Wouldn't this count as a bug fix?

Possibly, but changes to signal handlers are pretty global and can
sometimes have surprising side effects.  I'm all in favor of someone
reviewing the patch - any volunteers?  One case to test might be
reading input from a file that contains \! escapes.  More generally,
we need to consider every way that psql can get SIGCHLD and think
about whether this is the right behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-15 Thread David Fetter
On Fri, May 14, 2010 at 04:24:43PM -0400, Bruce Momjian wrote:
> Stephen Frost wrote:
> -- Start of PGP signed section.
> > Greetings,
> > 
> >   Toying around with FETCH_COUNT today, I discovered that it didn't do
> >   the #1 thing I really wanted to use it for- query large tables without
> >   having to worry about LIMIT to see the first couple hundred records.
> >   The reason is simple- psql ignores $PAGER exiting, which means that it
> >   will happily continue pulling down the entire large table long after
> >   you've stopped caring, which means you still have to wait forever.
> > 
> >   The attached, admittedly quick hack, fixes this by having psql catch
> >   SIGCHLD's using handle_sigint.  I've tested this and it doesn't
> >   appear to obviously break other cases where we have children (\!, for
> >   example), since we're not going to be running a database query when
> >   we're doing those, and if we are, and the child dies, we probably want
> >   to *stop* anyway, similar to the $PAGER issue.
> > 
> >   Another approach that I considered was fixing various things to deal
> >   cleanly with write's failing to $PAGER (I presume the writes *were*
> >   failing, since less was in a defunct state, but I didn't actually
> >   test).  This solution was simpler, faster to code and check, and alot
> >   less invasive (or so it seemed to me at the time).
> > 
> >   Anyway, this makes FETCH_COUNT alot more useful, and, in my view, the
> >   current behaviour of completely ignoring $PAGER exiting is a bug.
> 
> Plesae add this to the next commit-fest:
> 
>   https://commitfest.postgresql.org/action/commitfest_view/inprogress
> 
> Thanks.

Wouldn't this count as a bug fix?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> WALSender sleeps even when it might have more WAL to send, it doesn't
>> check it just unconditionally sleeps. At least WALReceiver loops until
>> it has no more to receive. I just can't imagine why that's useful
>> behaviour.
> 
> Good catch. That should be fixed.
> 
> I also note that walsender doesn't respond to signals, while it's
> sending a large batch. That's analogous to the issue that was addressed
> recently in the archiver process.

Attached patch rearranges the walsender loops slightly to fix the above.
XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
2) in one round and returns to the main loop after that even if there's
unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
That way the main loop gets to respond to signals quickly, and we also
get to update the shared memory status and PS display more often when
there's a lot of catching up to do.

Comments, have I screwed up anything?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 100,106  static void InitWalSnd(void);
  static void WalSndHandshake(void);
  static void WalSndKill(int code, Datum arg);
  static void XLogRead(char *buf, XLogRecPtr recptr, Size nbytes);
! static bool XLogSend(StringInfo outMsg);
  static void CheckClosedConnection(void);
  
  /*
--- 100,106 
  static void WalSndHandshake(void);
  static void WalSndKill(int code, Datum arg);
  static void XLogRead(char *buf, XLogRecPtr recptr, Size nbytes);
! static bool XLogSend(StringInfo outMsg, bool *caughtup);
  static void CheckClosedConnection(void);
  
  /*
***
*** 360,365  static int
--- 360,366 
  WalSndLoop(void)
  {
  	StringInfoData output_message;
+ 	bool		caughtup = false;
  
  	initStringInfo(&output_message);
  
***
*** 387,393  WalSndLoop(void)
  		 */
  		if (ready_to_stop)
  		{
! 			XLogSend(&output_message);
  			shutdown_requested = true;
  		}
  
--- 388,394 
  		 */
  		if (ready_to_stop)
  		{
! 			XLogSend(&output_message, &caughtup);
  			shutdown_requested = true;
  		}
  
***
*** 402,432  WalSndLoop(void)
  		}
  
  		/*
! 		 * Nap for the configured time or until a message arrives.
  		 *
  		 * On some platforms, signals won't interrupt the sleep.  To ensure we
  		 * respond reasonably promptly when someone signals us, break down the
  		 * sleep into NAPTIME_PER_CYCLE increments, and check for
  		 * interrupts after each nap.
  		 */
! 		remain = WalSndDelay * 1000L;
! 		while (remain > 0)
  		{
! 			if (got_SIGHUP || shutdown_requested || ready_to_stop)
! break;
  
! 			/*
! 			 * Check to see whether a message from the standby or an interrupt
! 			 * from other processes has arrived.
! 			 */
! 			pg_usleep(remain > NAPTIME_PER_CYCLE ? NAPTIME_PER_CYCLE : remain);
! 			CheckClosedConnection();
  
! 			remain -= NAPTIME_PER_CYCLE;
  		}
- 
  		/* Attempt to send the log once every loop */
! 		if (!XLogSend(&output_message))
  			goto eof;
  	}
  
--- 403,434 
  		}
  
  		/*
! 		 * If we had sent all accumulated WAL in last round, nap for the
! 		 * configured time before retrying.
  		 *
  		 * On some platforms, signals won't interrupt the sleep.  To ensure we
  		 * respond reasonably promptly when someone signals us, break down the
  		 * sleep into NAPTIME_PER_CYCLE increments, and check for
  		 * interrupts after each nap.
  		 */
! 		if (caughtup)
  		{
! 			remain = WalSndDelay * 1000L;
! 			while (remain > 0)
! 			{
! /* Check for interrupts */
! if (got_SIGHUP || shutdown_requested || ready_to_stop)
! 	break;
  
! /* Sleep and check that the connection is still alive */
! pg_usleep(remain > NAPTIME_PER_CYCLE ? NAPTIME_PER_CYCLE : remain);
! CheckClosedConnection();
  
! remain -= NAPTIME_PER_CYCLE;
! 			}
  		}
  		/* Attempt to send the log once every loop */
! 		if (!XLogSend(&output_message, &caughtup))
  			goto eof;
  	}
  
***
*** 623,637  XLogRead(char *buf, XLogRecPtr recptr, Size nbytes)
  }
  
  /*
!  * Read all WAL that's been written (and flushed) since last cycle, and send
!  * it to client.
   *
   * Returns true if OK, false if trouble.
   */
  static bool
! XLogSend(StringInfo outMsg)
  {
  	XLogRecPtr	SendRqstPtr;
  	char		activitymsg[50];
  
  	/* use volatile pointer to prevent code rearrangement */
--- 625,644 
  }
  
  /*
!  * Read up to MAX_SEND_SIZE bytes of WAL that's been written (and flushed),
!  * but not yet sent to the client, and send it. If there is no unsent WAL,
!  * *caughtup is set to true and nothing is sent, otherwise *caughtup is set
!  * to false.
   *
   * Returns true if OK, false if trouble.
   */
  static bool
! XLogSend(StringInfo outMsg, bool *caughtup)
  {
  	XLogRecPtr	SendRqstPtr;
+ 	XLogRecPtr	startptr;
+ 	XLogRecPtr	endptr;
+ 	Size		nbytes;
  	char		acti

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote:
> WALSender sleeps even when it might have more WAL to send, it doesn't
> check it just unconditionally sleeps. At least WALReceiver loops until
> it has no more to receive. I just can't imagine why that's useful
> behaviour.

Good catch. That should be fixed.

I also note that walsender doesn't respond to signals, while it's
sending a large batch. That's analogous to the issue that was addressed
recently in the archiver process.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery consistent != hot standby

2010-05-15 Thread Robert Haas
On Fri, May 14, 2010 at 5:23 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> PM_RECOVERY_CONSISTENT -> PM_HOT_STANDBY
>> PMSIGNAL_RECOVERY_CONSISTENT -> PMSIGNAL_BEGIN_HOT_STANDBY
>
> +1.  From the point of view of the postmaster, whether the state
> transition happens immediately upon reaching consistency, or at a
> later time, or perhaps even earlier (if we could make that work)
> is not relevant.  What's relevant is that it's allowed to let in
> hot-standby backends.  So the current naming overspecifies the
> meaning of the state and the transition event.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote:

> I will recode using that concept.

There's some behaviours that aren't helpful here:

Startup gets new pointer when it runs out of data to replay. That might
or might not include an updated keepalive timestamp, since there's no
exact relationship between chunks sent and chunks received. Startup
might ask for a new chunk when half a chunk has been received, or when
multiple chunks have been received.

WALSender doesn't chunk up what it sends, though libpq does at a lower
level. So there's no way to make a keepalive happen every X bytes
without doing this from within libpq.

WALSender sleeps even when it might have more WAL to send, it doesn't
check it just unconditionally sleeps. At least WALReceiver loops until
it has no more to receive. I just can't imagine why that's useful
behaviour.

All in all, I think we should be calling this "burst replication" not
streaming replication. That does cause an issue in trying to monitor
what's going on cos there's so much jitter.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 20:05 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
> >> Doesn't feel right to me either. If you want to expose the
> >> keepalive-time to queries, it should be a separate field, something like
> >> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
> >> read it.
> > 
> > That wouldn't be good because then you couldn't easily monitor the
> > delay? You'd have to run two different functions depending on the state
> > of replication (for which we would need yet another function). Users
> > would just wrap that back up into a single function.
> 
> What exactly is the user trying to monitor? If it's "how far behind is
> the standby", the difference between pg_current_xlog_insert_location()
> in the master and pg_last_xlog_replay_location() in the standby seems
> more robust and well-defined to me. It's a measure of XLOG location (ie.
> bytes) instead of time, but time is a complicated concept.

Maybe, but its meaningful to users and that is the point.

> Also note that as the patch stands, if you receive a keep-alive from the
> master at point X, it doesn't mean that the standby is fully up-to-date.
> It's possible that the walsender just finished sending a huge batch of
> accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
> During that time, a lot more WAL has accumulated, yet walsender sends a
> keep-alive with the current timestamp.

Not at all. The timestamp for the keepalive is calculated after the
pq_flush for the main WAL data. So it takes 10 years to send the next
blob of WAL data the timestamp will be current.

However, a point you made in an earlier thread is still true here. It
sounds like it would be best if startup process didn't re-access shared
memory for the next location until it had fully replayed all the WAL up
to the point it last accessed. That way the changing value of the shared
timestamp would have no effect on the calculated value at any point in
time. I will recode using that concept.

> In general, the purpose of a keep-alive is to keep the connection alive,
> but you're trying to accomplish something else too, and I don't fully
> understand what it is.

That surprises me. If nothing else, its in the title of the thread,
though since you personally added this to the Hot Standby todo more than
6 months ago I'd hope you of all people would understand the purpose.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote:
> On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
>> Doesn't feel right to me either. If you want to expose the
>> keepalive-time to queries, it should be a separate field, something like
>> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
>> read it.
> 
> That wouldn't be good because then you couldn't easily monitor the
> delay? You'd have to run two different functions depending on the state
> of replication (for which we would need yet another function). Users
> would just wrap that back up into a single function.

What exactly is the user trying to monitor? If it's "how far behind is
the standby", the difference between pg_current_xlog_insert_location()
in the master and pg_last_xlog_replay_location() in the standby seems
more robust and well-defined to me. It's a measure of XLOG location (ie.
bytes) instead of time, but time is a complicated concept.

Also note that as the patch stands, if you receive a keep-alive from the
master at point X, it doesn't mean that the standby is fully up-to-date.
It's possible that the walsender just finished sending a huge batch of
accumulated WAL, say 1 GB, and it took 1 hour for the batch to be sent.
During that time, a lot more WAL has accumulated, yet walsender sends a
keep-alive with the current timestamp.

In general, the purpose of a keep-alive is to keep the connection alive,
but you're trying to accomplish something else too, and I don't fully
understand what it is.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
> >> I'm also extremely dubious that it's a good idea to set
> >> recoveryLastXTime from this.  Using both that and the timestamps from
> >> the wal log flies in the face of everything I remember about control
> >> theory.  We should be doing only one or only the other, I think.
> > 
> > I can change it so that the recoveryLastXTime will not be updated if we
> > are using the value from the keepalives. So we have one, or the other.
> > Remember that replication can switch backwards and forwards between
> > modes, so it seems sensible to have a common timing value whichever mode
> > we're in.
> 
> That means that recoveryLastXTime can jump forwards and backwards.

That behaviour would be bad, so why not just prevent that from
happening?

> Doesn't feel right to me either. If you want to expose the
> keepalive-time to queries, it should be a separate field, something like
> lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
> read it.

That wouldn't be good because then you couldn't easily monitor the
delay? You'd have to run two different functions depending on the state
of replication (for which we would need yet another function). Users
would just wrap that back up into a single function.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote:
> On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
>> I'm also extremely dubious that it's a good idea to set
>> recoveryLastXTime from this.  Using both that and the timestamps from
>> the wal log flies in the face of everything I remember about control
>> theory.  We should be doing only one or only the other, I think.
> 
> I can change it so that the recoveryLastXTime will not be updated if we
> are using the value from the keepalives. So we have one, or the other.
> Remember that replication can switch backwards and forwards between
> modes, so it seems sensible to have a common timing value whichever mode
> we're in.

That means that recoveryLastXTime can jump forwards and backwards.
Doesn't feel right to me either. If you want to expose the
keepalive-time to queries, it should be a separate field, something like
lastMasterKeepaliveTime and a pg_last_master_keepalive() function to
read it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > Patch adds a keepalive message to ensure max_standby_delay is useful.
> 
> The proposed placement of the keepalive-send is about the worst it could
> possibly be.  It needs to be done right before pq_flush to ensure
> minimum transfer delay.  Otherwise any attempt to measure clock skew
> using the timestamp will be seriously off.  Where you've placed it
> guarantees maximum delay not minimum.

I don't understand. WalSndKeepAlive() contains a pq_flush() immediately
after the timestamp is set. I did that way for exactly the same reasons
you've said.

Do you mean you only want to see one pq_flush()? I used two so that the
actual data is never delayed by a keepalive. WAL Sender was going to
sleep anyway, so shouldn't be a problem.

> I'm also extremely dubious that it's a good idea to set
> recoveryLastXTime from this.  Using both that and the timestamps from
> the wal log flies in the face of everything I remember about control
> theory.  We should be doing only one or only the other, I think.

I can change it so that the recoveryLastXTime will not be updated if we
are using the value from the keepalives. So we have one, or the other.
Remember that replication can switch backwards and forwards between
modes, so it seems sensible to have a common timing value whichever mode
we're in.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Tom Lane
Simon Riggs  writes:
> Patch adds a keepalive message to ensure max_standby_delay is useful.

The proposed placement of the keepalive-send is about the worst it could
possibly be.  It needs to be done right before pq_flush to ensure
minimum transfer delay.  Otherwise any attempt to measure clock skew
using the timestamp will be seriously off.  Where you've placed it
guarantees maximum delay not minimum.

I'm also extremely dubious that it's a good idea to set
recoveryLastXTime from this.  Using both that and the timestamps from
the wal log flies in the face of everything I remember about control
theory.  We should be doing only one or only the other, I think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-15 Thread Rob Wultsch
On Fri, May 14, 2010 at 11:50 PM, Rob Wultsch  wrote:
>> Linux has *as many if not more* ... MySQL, if memory servers, has a half
>> dozen or more ... etc ...
>
> MySQL has a bunch of lists, none of which get much traffic. Honestly,
> they should probably be combined.
>
> --
> Rob Wultsch

"They" was referring to the various low traffic MySQL lists which in
my opinion does not work. As far as Linux, when I briefly subscribed
to the kernel mailing list there was such a volume of traffic that it
was difficult to manage as a noob.

I do not have an opinion about PG. I think that those two examples
could be seen as how not to run email lists effectively.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle

2010-05-15 Thread Rob Wultsch
On Sat, May 15, 2010 at 4:09 AM, Kevin Grittner
 wrote:
>   Anything in particular you wanted me to notice about it besides that?

Nope. It was just a counter point to your previous comment.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs

Patch adds a keepalive message to ensure max_standby_delay is useful.

No WAL format changes, no libpq changes. Just an additional message type
for the streaming replication protocol, sent once per main loop in
WALsender. Plus docs.

Comments?

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index c63d003..391d990 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -4232,16 +4232,52 @@ The commands accepted in walsender mode are:
   
   
   
-  
-  
-  
- 
- 
+  
A single WAL record is never split across two CopyData messages. When
a WAL record crosses a WAL page boundary, however, and is therefore
already split using continuation records, it can be split at the page
boundary. In other words, the first main WAL record and its
continuation records can be split across different CopyData messages.
+  
+  
+  
+  
+  
+  Keepalive (B)
+  
+  
+  
+  
+  
+  
+  Byte1('k')
+  
+  
+  
+  Identifies the message as a keepalive.
+  
+  
+  
+  
+  
+  TimestampTz
+  
+  
+  
+  The current timestamp on the primary server when the keepalive was sent.
+  
+  
+  
+  
+  
+  
+   If wal_level is set to hot_standby then a keepalive
+   is sent once per wal_sender_delay. The keepalive is sent after
+   WAL data has been sent, if any.
+  
+  
+  
+  
  
 
   
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 607d57e..ee383af 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5566,6 +5566,17 @@ GetLatestXLogTime(void)
 	return recoveryLastXTime;
 }
 
+void
+SetLatestXLogTime(TimestampTz newLastXTime)
+{
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
+
+	SpinLockAcquire(&xlogctl->info_lck);
+	xlogctl->recoveryLastXTime = newLastXTime;
+	SpinLockRelease(&xlogctl->info_lck);
+}
+
 /*
  * Note that text field supplied is a parameter name and does not require
  * translation
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index bb87a06..8d52c3f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -407,6 +407,22 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
 XLogWalRcvWrite(buf, len, recptr);
 break;
 			}
+		case 'k':/* keepalive */
+			{
+TimestampTz keepalive;
+
+if (len != sizeof(TimestampTz))
+	ereport(ERROR,
+			(errcode(ERRCODE_PROTOCOL_VIOLATION),
+			 errmsg_internal("invalid keepalive message received from primary")));
+
+memcpy(&keepalive, buf, sizeof(TimestampTz));
+buf += sizeof(TimestampTz);
+len -= sizeof(TimestampTz);
+
+SetLatestXLogTime(keepalive);
+break;
+			}
 		default:
 			ereport(ERROR,
 	(errcode(ERRCODE_PROTOCOL_VIOLATION),
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 1c04fc3..f2f8750 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -98,6 +98,7 @@ static void WalSndQuickDieHandler(SIGNAL_ARGS);
 static int	WalSndLoop(void);
 static void InitWalSnd(void);
 static void WalSndHandshake(void);
+static bool WalSndKeepAlive(void);
 static void WalSndKill(int code, Datum arg);
 static void XLogRead(char *buf, XLogRecPtr recptr, Size nbytes);
 static bool XLogSend(StringInfo outMsg);
@@ -314,6 +315,30 @@ WalSndHandshake(void)
 	}
 }
 
+static bool
+WalSndKeepAlive(void)
+{
+	StringInfoData outMsg;
+	TimestampTz ts;
+
+	if (!XLogStandbyInfoActive())
+		return true;
+
+	initStringInfo(&outMsg);
+	ts = GetCurrentTimestamp();
+
+	/* format the keepalive message */
+	pq_sendbyte(&outMsg, 'k');
+	pq_sendbytes(&outMsg, (char *) &ts, sizeof(TimestampTz));
+
+	/* send the CopyData message */
+	pq_putmessage('d', outMsg.data, outMsg.len);
+	if (pq_flush())
+		return false;
+
+	return true;
+}
+
 /*
  * Check if the remote end has closed the connection.
  */
@@ -428,6 +453,9 @@ WalSndLoop(void)
 		/* Attempt to send the log once every loop */
 		if (!XLogSend(&output_message))
 			goto eof;
+
+		if (!WalSndKeepAlive())
+			goto eof;
 	}
 
 	/* can't get here because the above loop never exits */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 8ff68a6..2d01670 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -284,6 +284,7 @@ extern void issue_xlog_fsync(int fd, uint32 log, uint32 seg);
 extern bool RecoveryInProgress(void);
 extern bool XLogInsertAllowed(void);
 extern TimestampTz GetLatestXLogTime(void);
+extern void SetLatestXLogTime(TimestampTz newLatestXLogTime);
 
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifie

Re: [HACKERS] HS/SR Assert server crash

2010-05-15 Thread Simon Riggs
On Thu, 2010-05-13 at 18:01 -0400, Bruce Momjian wrote:

> I was able to easily crash the standby server today just by starting it
> and connecting to it via psql.  The master was idle.  The failure was:
> 
>   LOG:  streaming replication successfully connected to primary
>   TRAP: FailedAssertion("!(((xmax) >= ((TransactionId) 3)))", File: 
> "procarray.c", Line: 1211)
>   LOG:  server process (PID 12761) was terminated by signal 6: Abort trap
>   LOG:  terminating any other active server processes

Thanks for the report. Fix applied. (Sorry for delay in replying)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to know killed by pg_terminate_backend

2010-05-15 Thread Tatsuo Ishii
> >> Seems reasonable. Does the victim backend currently know why it has been
> >> killed?
> >
> > I don't think so.
> >
> > One idea is postmaster sets a flag in the shared memory area
> > indicating it rceived SIGTERM before forwarding the signal to
> > backends.
> >
> > Backend check the flag and if it's not set, it knows that the signal
> > has been sent by pg_terminate_backend(), not postmaster.
> 
> Or it could also be sent by some other user process, like the user
> running "kill" from the shell.

No problem (at least for pgpool-II).

If the flag is not set, postgres returns the same code as the one
killed by pg_terminate_backend(). The point is, backend is killed by
postmaster or not. Because if backend was killed by postmaster,
pgpool-II should not expect the PostgreSQL server is usable since
postmaster decided to shutdown.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle

2010-05-15 Thread Kevin Grittner
Rob Wultsch  wrote:
 
> Have you looked at PBXT (which is explicitly NOT SERIALIZABLE)?
 
I hadn't heard of it; so I took a quick look based on your post.  It
seems to a new engine to use with MySQL which has MVCC without a
rollback log, so it presumably uses techniques at least vaguely
similar to PostgreSQL.  Anything in particular you wanted me to
notice about it besides that?  (Of course another MySQL engine which
doesn't provide very strong integrity guarantees isn't exciting to
me as a technology in itself.)
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] predefined macros for various BSD-based systems?

2010-05-15 Thread Bruce Momjian
Giles Lean wrote:
> 
> Tom Lane  wrote:
> 
> > I suppose that at least some of the *BSD herd really do predefine some
> > of the symbols being attributed to them here, but I would like to see
> > something authoritative about which and what.
> 
> Documentation follows, but first the summary:
> 
> FreeBSD: __FreeBSD__
> NetBSD:  __NetBSD__
> OpenBSD: __OpenBSD__

Great.  I have updated pg_upgrade to use those defines.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] predefined macros for various BSD-based systems?

2010-05-15 Thread Bruce Momjian
Tom Lane wrote:
> I'm not even too sure what "bsdi" is, but I'm suspicious of that branch
> too.  A search of our code finds
> 
> contrib/pg_upgrade/file.c: 248: #elif defined(freebsd) || defined(bsdi) || 
> defined(__darwin__) || defined(openbsd)
> src/backend/utils/misc/ps_status.c: 67: #elif (defined(BSD) || 
> defined(__bsdi__) || defined(__hurd__)) && !defined(__darwin__)
> src/include/port.h: 355: #if defined(bsdi) || defined(netbsd)
> src/port/fseeko.c: 20: #if defined(__bsdi__) || defined(__NetBSD__)
> src/port/fseeko.c: 24: #ifdef bsdi
> src/port/fseeko.c: 47: #ifdef bsdi
> src/port/fseeko.c: 55: #ifdef bsdi
> src/port/fseeko.c: 66: #ifdef bsdi
> src/port/fseeko.c: 76: #ifdef bsdi
> src/port/fseeko.c: 87: #ifdef bsdi
> 
> which leaves one with not a lot of warm fuzzies that we know how to
> spell the symbol for either bsdi or netbsd.  (Oh, and shouldn't
> this pg_upgrade check be looking for netbsd too?)

BSDi defines "bsdi" and "__bsdi__", but our code was clearly
inconsistent.  I have changed all references to __bsdi__.

FYI, src/tools/ccsym will show you your predefined symbols.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] predefined macros for various BSD-based systems?

2010-05-15 Thread Andrew Dunstan



Bruce Momjian wrote:

Peter Eisentraut wrote:
  

On l?r, 2010-05-15 at 00:23 -0400, Robert Haas wrote:


It's a commercial distribution of BSD.  I remember it being pretty
nice when I used it 10+ years ago, but it sounds like it's dead now.
  

BSDI is the company that produced BSD/OS, which was Bruce's main
development environment at some point, which is why it has left
excruciating traces all over the PostgreSQL source.



Uh, I still run BSDi.

  


That's more or less the OS equivalent of writing with a quill. :-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] predefined macros for various BSD-based systems?

2010-05-15 Thread Bruce Momjian
Peter Eisentraut wrote:
> On l?r, 2010-05-15 at 00:23 -0400, Robert Haas wrote:
> > It's a commercial distribution of BSD.  I remember it being pretty
> > nice when I used it 10+ years ago, but it sounds like it's dead now.
> 
> BSDI is the company that produced BSD/OS, which was Bruce's main
> development environment at some point, which is why it has left
> excruciating traces all over the PostgreSQL source.

Uh, I still run BSDi.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] predefined macros for various BSD-based systems?

2010-05-15 Thread Giles Lean

Tom Lane  wrote:

> I suppose that at least some of the *BSD herd really do predefine some
> of the symbols being attributed to them here, but I would like to see
> something authoritative about which and what.

Documentation follows, but first the summary:

FreeBSD: __FreeBSD__
NetBSD:  __NetBSD__
OpenBSD: __OpenBSD__

I believe those #defines also tell you what the release is.
I didn't look into their encoding schemes just now, but can if
you want.

(OS X aka Darwin is harder: they seem to like __APPLE__, but
to determine the OS version the best I can see is
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__, which is quite
horrid.)

Re BSDi, I have no idea really but based on Google searching
I'd bet on __bsdi__.

Per Wikipedia BSDi was discontinued in 2003 and support ended
in 2004.  I submit that anyone still using it is not likely to
be updating their PostgreSQL installation, so +1 from me for
dropping support for it unless a volunteer using it comes
forward.

FYI (and you may know this, but I didn't learn until recently)
GCC will tell you quite easily what #defines are predefined,
and all those platforms use gcc:

  $ cc -E -dM - < /dev/null | grep FreeBSD  
  #define __FreeBSD_cc_version 73
  #define __VERSION__ "4.2.1 20070719  [FreeBSD]"
  #define __FreeBSD__ 7

But you wanted something authoritative, so here's what I found:

FreeBSD
===

  http://www.freebsd.org/doc/en/books/porters-handbook/porting-versions.html

  "__FreeBSD__ is defined in all versions of FreeBSD."

NetBSD
==

>From the NetBSD-1.1 release notes (November, 1995):

  "* implement new cpp predefine strategy
 define __NetBSD__, ..."

This is still the current behaviour, although the current
release is 5.0.2 from February 2010.

OpenBSD
===

  http://www.openbsd.org/porting.html

  "Generic Porting Hints

* __OpenBSD__ should be used sparingly, if at all. Constructs that
  look like

#if defined(__NetBSD__) || defined(__FreeBSD__)

  are often inappropriate. Don't add blindly __OpenBSD__ to
  it. Instead, try to figure out what's going on, and what actual
  feature is needed."

Regards,

Giles

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication patch built on SR

2010-05-15 Thread Heikki Linnakangas
BTW, What I'd like to see as a very first patch first is to change the
current poll loops in walreceiver and walsender to, well, not poll.
That's a requirement for synchronous replication, is very useful on its
own, and requires a some design and implementation effort to get right.
It would be nice to get that out of the way before/during we discuss the
more user-visible behavior.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-15 Thread Marc G. Fournier


[redirected to -chat]

On Fri, 14 May 2010, Rob Wultsch wrote:


Linux has *as many if not more* ... MySQL, if memory servers, has a half
dozen or more ... etc ...


MySQL has a bunch of lists, none of which get much traffic. Honestly,
they should probably be combined.


Except, when you do post, ppl see it ...


Marc G. FournierHub.Org Hosting Solutions S.A.
scra...@hub.org http://www.hub.org

Yahoo:yscrappySkype: hub.orgICQ:7615664MSN:scra...@hub.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle

2010-05-15 Thread Rob Wultsch
On Fri, May 14, 2010 at 7:32 AM, Kevin Grittner
 wrote:
> Oracle, and all other MVCC databases I've read about outside of PostgreSQL, 
> use
> an "update in place with a rollback log" technique.

Have you looked at PBXT (which is explicitly NOT SERIALIZABLE)?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade code questions

2010-05-15 Thread Heikki Linnakangas
Tom Lane wrote:
> Magnus Hagander  writes:
>> On Thu, May 13, 2010 at 5:06 PM, Bruce Momjian  wrote:
>>> I have added SGML comments to comment out the text that mentions EDB
>>> Advanced Server.  Is that enough?  Should I remove the text from the
>>> SGML?  Should I move it to the bottom of the SGML?  Should I remove the
>>> EnterpriseDB Advanced Server checks from the C code too?  I don't
>>> remember having to deal with anything like this before, so I am unclear
>>> how to proceed.
> 
>> I say remove it. On all accounts.
> 
>> There's a fork of postgres for EDB AS, shouldn't there be a fork of
>> pg_upgrade the same way, if it requires special code? The code in
>> community postgresql certainly shouldn't have any EDB AS code in it.
> 
> Indeed.  Given the (presumably large) delta between EDB's code and ours,
> having to have some delta in pg_upgrade isn't going to make much
> difference for them.  I think the community code and docs should
> completely omit any mention of that.

Speaking as the person who has been doing the EDB AS merges recently, I
agree. It was helpful to have that stuff there when it was in pgfoundry,
but now that it's part of the main repository, it just gets in the way.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers