Re: [HACKERS] beta3?

2011-07-05 Thread Robert Haas
On Fri, Jul 1, 2011 at 6:06 PM, Josh Berkus j...@agliodbs.com wrote:
 That sounds reasonable to me.  I'll be on vacation then, but (1) I'm
 not really involved in pushing the release out the door and (2) I
 should have Internet access if push comes to shove.

 We seem to still have some blockers ...

I'm only seeing these two:

* ALTER TABLE lock strength reduction patch is unsafe
* btree_gist breaks some behaviors involving  operators

Simon fixed the first over the weekend (though I think there are a few
loose ends, see separate email on that topic) and Tom proposed a fix
for the second (which I am guessing he will implement).

Any other reason we can't or shouldn't wrap on the 11th?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3?

2011-07-05 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 Any other reason we can't or shouldn't wrap on the 11th?
 
There are two new SSI issues which Dan and I spent a lot of time on
over the holiday weekend.  I hope they can be pushed before the
11th.
 
I have added them to the Wiki page.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3?

2011-07-01 Thread Josh Berkus
Robert,

 That sounds reasonable to me.  I'll be on vacation then, but (1) I'm
 not really involved in pushing the release out the door and (2) I
 should have Internet access if push comes to shove.

We seem to still have some blockers ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] beta3?

2011-06-27 Thread Robert Haas
We have a couple of open items outstanding right now, but I'm
wondering if it's about time we should be thinking about a date for
beta3.

We tagged beta1 on April 27th, and beta2 on June 9th, so about six weeks apart.

But perhaps we shouldn't wait quite so long before putting out beta3?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3?

2011-06-27 Thread Josh Berkus
On 6/27/11 9:45 AM, Robert Haas wrote:
 We have a couple of open items outstanding right now, but I'm
 wondering if it's about time we should be thinking about a date for
 beta3.
 
 We tagged beta1 on April 27th, and beta2 on June 9th, so about six weeks 
 apart.
 
 But perhaps we shouldn't wait quite so long before putting out beta3?

I'd be up for July 11.  July 5 would be difficult, both because of the
American holiday, and Tom being on a trip.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3?

2011-06-27 Thread Robert Haas
On Mon, Jun 27, 2011 at 1:51 PM, Josh Berkus j...@agliodbs.com wrote:
 On 6/27/11 9:45 AM, Robert Haas wrote:
 We have a couple of open items outstanding right now, but I'm
 wondering if it's about time we should be thinking about a date for
 beta3.

 We tagged beta1 on April 27th, and beta2 on June 9th, so about six weeks 
 apart.

 But perhaps we shouldn't wait quite so long before putting out beta3?

 I'd be up for July 11.  July 5 would be difficult, both because of the
 American holiday, and Tom being on a trip.

That sounds reasonable to me.  I'll be on vacation then, but (1) I'm
not really involved in pushing the release out the door and (2) I
should have Internet access if push comes to shove.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-28 Thread Thom Brown
On 19 June 2010 14:43, Robert Haas robertmh...@gmail.com wrote:
 It would be nice if we could make a final push to get these issues
 resolved and another beta out the door before the end of the month...

So should we expect beta3 imminently, or are these issues still outstanding?

Thanks

Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-21 Thread Greg Stark
On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas robertmh...@gmail.com wrote:
 I feel like we're getting off in the weeds, here.  Obviously, the user
 would ideally like the connection to the master to last forever, but
 equally obviously, if the master unexpectedly reboots, they'd like the
 slave to notice - ideally within some reasonable time period - that it
 needs to reconnect.



  There's no perfect way to distinguish the master
 croaked from the network administrator unplugged the Ethernet cable
 and is planning to plug it back in any hour now, so we'll just need
 to pick some reasonable timeout and go with it.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-21 Thread Robert Haas
On Mon, Jun 21, 2010 at 4:37 AM, Greg Stark gsst...@mit.edu wrote:
 On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas robertmh...@gmail.com wrote:
 I feel like we're getting off in the weeds, here.  Obviously, the user
 would ideally like the connection to the master to last forever, but
 equally obviously, if the master unexpectedly reboots, they'd like the
 slave to notice - ideally within some reasonable time period - that it
 needs to reconnect.



  There's no perfect way to distinguish the master
 croaked from the network administrator unplugged the Ethernet cable
 and is planning to plug it back in any hour now, so we'll just need
 to pick some reasonable timeout and go with it.

Eh... was there supposed to be some text here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-21 Thread Robert Haas
On Sun, Jun 20, 2010 at 5:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On a quick read, I think I see a problem with this: if a parameter is
 specified with a non-zero value and there is no OS support available
 for that parameter, it's an error.  Presumably, for our purposes here,
 we'd prefer to simply ignore any parameters for which OS support is
 not available.  Given the nature of these parameters, one might argue
 that's a more useful behavior in general.

 Also, what about Windows?

 Well, of course that patch hasn't been reviewed yet ... but shouldn't we
 just be copying the existing server-side behavior, as to both points?

The existing server-side behavior is apparently to do elog(LOG) if a
given parameter is unsupported; I'm not sure what the equivalent for
libpq would be.

The current code does not seem to have any special cases for Windows
in this area, but that doesn't tell me whether it works or not.  It
looks like Windows must at least report success when you ask to turn
on keepalives, but whether it actually does anything, and whether
there extra parameters exist/work, I can't tell.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Andres Freund
On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
 On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
  4. Streaming Replication needs to detect death of master.  We need
  some sort of keep-alive, here.  Whether it's at the TCP level (as
  advocated by Tom Lane and others) or at the protocol level (as
  advocated by Greg Stark) is something that we have yet to decide; once
  it's decided, someone will need to do it...
 
 TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
 humbly suggest we *not* be pedantic and implement something practical
 and less prone to variables outside the control of Pg.
 
 Sincerely,
 +
 Joshua D. Drake

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Florian Pflug
On Jun 20, 2010, at 7:18 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On Jun 19, 2010, at 21:13 , Tom Lane wrote:
 This is nonsense --- the slave's kernel *will* eventually notice that
 the TCP connection is dead, and tell walreceiver so.  I don't doubt
 that the standard TCP timeout is longer than people want to wait for
 that, but claiming that it will never happen is simply wrong.
 
 No, Robert is correct AFAIK. If you're *waiting* for data, TCP
 generates no traffic (expect with keepalive enabled).
 
 Mph.  I was thinking that keepalive was on by default with a very long
 interval, but I see this isn't so.  However, if we enable keepalive,
 then it's irrelevant to the point anyway.  Nobody's produced any
 evidence that keepalive is an unsuitable solution.

Yeah, I agree. Just enabling keepalive should suffice for 9.0. 

BTW, the postmaster already enables keepalive on incoming connections in 
StreamConnection() - presumably to prevent crashed clients from occupying a 
backend process forever. So there's even a clear precedent for doing so, and 
proof that it doesn't cause any harm.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Kevin Grittner
Florian Pflug  wrote:
 On Jun 20, 2010, at 7:18 , Tom Lane wrote:
 
 I was thinking that keepalive was on by default with a very
 long interval, but I see this isn't so. However, if we enable
 keepalive, then it's irrelevant to the point anyway. Nobody's
 produced any evidence that keepalive is an unsuitable solution.

 Yeah, I agree. Just enabling keepalive should suffice for 9.0.
 
+1, with configurable timeout; otherwise people will often feel they
need to kill the receiver process to get it to attempt reconnect or
archive search, anyway.  Two hours is a long time to block
replication based on a broken connection before attempting to move
on.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Florian Pflug  wrote:
 Yeah, I agree. Just enabling keepalive should suffice for 9.0.
 
 +1, with configurable timeout;

Right, of course.  That's already in the pending patch isn't it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Joshua D. Drake
On Sun, 2010-06-20 at 11:36 -0400, Tom Lane wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
  Florian Pflug  wrote:
  Yeah, I agree. Just enabling keepalive should suffice for 9.0.
  
  +1, with configurable timeout;
 
 Right, of course.  That's already in the pending patch isn't it?

Can someone tell me what we are going to do about firewalls that impose
their own rules outside of the control of the DBA?

I know that keepalive *should* work, however I also know that regardless
of keepalive I often have to restart sessions etc. There are
environments that are outside the control of the user.

Perhaps this has already been solved and I don't know about it. Does the
master-slave relationship have a built in ping mechanism that is
outside of the TCP protocol?

Sincerely,

Joshua D. Drake

 
   regards, tom lane
 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Kevin Grittner
Joshua D. Drake  wrote:
 
 Can someone tell me what we are going to do about firewalls that
 impose their own rules outside of the control of the DBA?
 
Has anyone actually seen a firewall configured for something so
stupid as to allow *almost* all the various packets involved in using
a TCP connection, but which suppressed just keepalive packets?  That
seems to be what you're suggesting is the risk; it's an outlandish
enough suggestion that I think the burden of proof is on you to show
that it happens often enough to make this a worthless change.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Kenneth Marshall
On Sun, Jun 20, 2010 at 03:01:04PM -0500, Kevin Grittner wrote:
 Joshua D. Drake  wrote:
  
  Can someone tell me what we are going to do about firewalls that
  impose their own rules outside of the control of the DBA?
  
 Has anyone actually seen a firewall configured for something so
 stupid as to allow *almost* all the various packets involved in using
 a TCP connection, but which suppressed just keepalive packets?  That
 seems to be what you're suggesting is the risk; it's an outlandish
 enough suggestion that I think the burden of proof is on you to show
 that it happens often enough to make this a worthless change.
  
 -Kevin
 

I have seen this sort of behavior but in every case it has been
the result of a myopic view of firewall/IP tables solutions to
perceived attacks. While I do agree that having heartbeat
within the replication process it worthwhile, it should definitely
be 9.1 material at best. For 9.0 such ill-behaved environments
will need much more interaction by the DBA with monitoring and
triage of problems as they arrive.

Regards,
Ken

P.S. My favorite example of odd behavior was preemptively dropping
TCP packets in one direction only at a single port. Many, many
odd things happen when the kernel does not know that the packet
would never make it to it destination. Services would sometimes
run for weeks without a problem depending on when the port ended
up being used invariably at night or on the weekend.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Robert Haas
On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Florian Pflug  wrote:
 Yeah, I agree. Just enabling keepalive should suffice for 9.0.

 +1, with configurable timeout;

 Right, of course.  That's already in the pending patch isn't it?

Is this sarcasm, or is there a pending patch I'm not aware of?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Right, of course.  That's already in the pending patch isn't it?

 Is this sarcasm, or is there a pending patch I'm not aware of?

https://commitfest.postgresql.org/action/patch_view?id=281

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Florian Pflug
On Jun 20, 2010, at 22:01 , Kevin Grittner wrote:
 Joshua D. Drake  wrote:
 
 Can someone tell me what we are going to do about firewalls that
 impose their own rules outside of the control of the DBA?
 
 Has anyone actually seen a firewall configured for something so
 stupid as to allow *almost* all the various packets involved in using
 a TCP connection, but which suppressed just keepalive packets?  That
 seems to be what you're suggesting is the risk; it's an outlandish
 enough suggestion that I think the burden of proof is on you to show
 that it happens often enough to make this a worthless change.

Yeah, especially since there is no such thing as a special keepalive packet 
in TCP. Keepalive simply sends packets with zero bytes of payload every once in 
a while if the connection is otherwise inactive. If those aren't acknowledged 
(like every other packet would be) by the peer, the connection is assumed to be 
broken. On a reasonably active connection, keepalive neither causes additional 
transmissions, nor altered transmissions.

Keepalive is therefore extremely unlikely to break things - in the very worst 
case, a (really, really stupid) firewall might decide to drop packets with zero 
bytes of payload, causing inactive connections to abort after a while. AFAIK 
walreceiver will simply reconnect in this case. 

Plus, the postmaster enables keepalive on all incoming connections *already*, 
so any problems ought to have caused bugreports about dropped client 
connections.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Robert Haas
On Sun, Jun 20, 2010 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Right, of course.  That's already in the pending patch isn't it?

 Is this sarcasm, or is there a pending patch I'm not aware of?

 https://commitfest.postgresql.org/action/patch_view?id=281

+1 for applying something along these lines, but we'll also need to
update walreceiver to actually use one or more of these new
parameters.

On a quick read, I think I see a problem with this: if a parameter is
specified with a non-zero value and there is no OS support available
for that parameter, it's an error.  Presumably, for our purposes here,
we'd prefer to simply ignore any parameters for which OS support is
not available.  Given the nature of these parameters, one might argue
that's a more useful behavior in general.

Also, what about Windows?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jun 20, 2010 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 https://commitfest.postgresql.org/action/patch_view?id=281

 +1 for applying something along these lines, but we'll also need to
 update walreceiver to actually use one or more of these new
 parameters.

Right, but the libpq-level support has to come first.

 On a quick read, I think I see a problem with this: if a parameter is
 specified with a non-zero value and there is no OS support available
 for that parameter, it's an error.  Presumably, for our purposes here,
 we'd prefer to simply ignore any parameters for which OS support is
 not available.  Given the nature of these parameters, one might argue
 that's a more useful behavior in general.

 Also, what about Windows?

Well, of course that patch hasn't been reviewed yet ... but shouldn't we
just be copying the existing server-side behavior, as to both points?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Greg Stark
On Sun, Jun 20, 2010 at 10:41 PM, Florian Pflug f...@phlo.org wrote:
 Yeah, especially since there is no such thing as a special keepalive packet 
 in TCP. Keepalive simply sends packets with zero bytes of payload every once 
 in a while if the connection is otherwise inactive. If those aren't 
 acknowledged (like every other packet would be) by the peer, the connection 
 is assumed to be broken. On a reasonably active connection, keepalive neither 
 causes additional transmissions, nor altered transmissions.

Actualy keep-alive packets contain one byte of data which is a
duplicate of the last previously acked byte.


 Keepalive is therefore extremely unlikely to break things - in the very worst 
 case, a (really, really stupid) firewall might decide to drop packets with 
 zero bytes of payload, causing inactive connections to abort after a while. 
 AFAIK walreceiver will simply reconnect in this case.

Stateful firewalls whole raison-d'etre is to block packets which
aren't consistent with the current TCP state -- such as packets with a
sequence number earlier than the last acked sequence number.
Keepalives do in fact violate the basic TCP spec so they wouldn't be
entirely crazy to block them. Of course a firewall that blocked them
would be pretty criminally stupid given how ubiquitous they are.

  Plus, the postmaster enables keepalive on all incoming connections
*already*, so any problems ought to have caused bugreports about
dropped client connections.


Really? Since when? I thought there was some discussion about this
about a year ago and I made it very clear this had to be an optional
feature which defaulted to off.

Keepalives introduce spurious disconnections in working TCP
connections that have transient outages which is basic TCP
functionality that's supposed to work. There are cases where that's
what you want but it isn't the kind of thing that should be on by
default, let alone on unconditionally.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Kevin Grittner
Greg Stark  wrote:
 
 Keepalives introduce spurious disconnections in working TCP
 connections that have transient outages
 
It's been a while since I read up on this, so perhaps my memory has
distorted the facts over time, but I thought that under TCP, if one
side sends a packet which isn't ack'd after a (configurable) number
of tries with certain (configurable) timings, the connection would be
considered broken and an error returned regardless of keepalive
settings.  I thought keepalive only generated a trickle of small
packets during idle time so that broken connections could be detected
on the side of a connection which was waiting to receive data before
doing something.  That doesn't sound consistent with your
characterization, though, since if my recollection is right, one
could just as easily say that any write to a TCP socket by the
application can also cause spurious disconnections in working TCP
connections that have transient outages.
 
I know that with a two minute keepalive timeout, I can unplug a
machine from one switch port and plug it in somewhere else and the
networking hardware sorts things out fast enough that the transient
network outage doesn't break the TCP connection, whether the
application is sending data or it is quiescent and the OS is sending
keepalive packets.
 
From what I've read about the present walreceiver retry logic, if the
connection breaks, WR will use some intelligence to try the archive
and retry connecting through TCP, in turn, until it finds data.  If
the connection goes silent without breaking, WR sits there forever
without looking at the archive or trying to obtain a new TCP
connection to the master.  I know which behavior I'd prefer.
Apparently the testers who encountered the behavior felt the same.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Florian Pflug
On Jun 21, 2010, at 0:13 , Greg Stark wrote:
 Keepalive is therefore extremely unlikely to break things - in the very 
 worst case, a (really, really stupid) firewall might decide to drop packets 
 with zero bytes of payload, causing inactive connections to abort after a 
 while. AFAIK walreceiver will simply reconnect in this case.
 
 Stateful firewalls whole raison-d'etre is to block packets which
 aren't consistent with the current TCP state -- such as packets with a
 sequence number earlier than the last acked sequence number.
 Keepalives do in fact violate the basic TCP spec so they wouldn't be
 entirely crazy to block them. 

Keepalives play games with the spec, but they don't outright violate it I'd 
say. The sender bluffs by retransmitting data it *knows* has been ACK'ed. But 
since nobody else can prove with certainty that the sender actually saw that 
ACK (think NIC-internal buffer overflow), nobody is able to call that bluff. 

 Of course a firewall that blocked them
 would be pretty criminally stupid given how ubiquitous they are.


Very true, and another reason to stop worrying about possibly brain-dead 
firewalls.

 Plus, the postmaster enables keepalive on all incoming connections
 *already*, so any problems ought to have caused bugreports about
 dropped client connections.
 
 Really? Since when? I thought there was some discussion about this
 about a year ago and I made it very clear this had to be an optional
 feature which defaulted to off.

Since 'bout 10 years. The setsockopt call is in StreamConnection() in 
src/backend/libpq/pqcomm.c.

Here's the corresponding commit:

commit 5aa160abba32a1f2d7818b9f49213f38c99b3fd8
Author: Tatsuo Ishii is...@postgresql.org
Date:   Sat May 20 13:10:54 2000 +

Add KEEPALIVE option to the socket of backend. This will automatically
terminate the backend that has no frontend anymore.

 Keepalives introduce spurious disconnections in working TCP
 connections that have transient outages which is basic TCP
 functionality that's supposed to work. There are cases where that's
 what you want but it isn't the kind of thing that should be on by
 default, let alone on unconditionally.

I'd buy that if all timeouts and retry counts would default to +infinity. But 
they don't, and hence sufficiently long network outages *will* cause connection 
aborts anyway. That a particular connection might survive due to inactivity 
proves nothing, since whether the connection is active or inactive during an 
outage is usually outside of anyone's control.

I really fail to see why anyone would prefer connections (and therefore 
transactions!) getting stuck forever over a few spurious disconnects. The 
former always require manual intervention and cause all sorts of performance 
and disk-space issues, while the latter won't even be an issue for well-written 
clients who just reconnect and retry.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Greg Stark
On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug f...@phlo.org wrote:
 I'd buy that if all timeouts and retry counts would default to +infinity. But 
 they don't, and hence sufficiently long network outages *will* cause 
 connection aborts anyway. That a particular connection might survive due to 
 inactivity proves nothing, since whether the connection is active or inactive 
 during an outage is usually outside of anyone's control.

 I really fail to see why anyone would prefer connections (and therefore 
 transactions!) getting stuck forever over a few spurious disconnects. The 
 former always require manual intervention and cause all sorts of performance 
 and disk-space issues, while the latter won't even be an issue for 
 well-written clients who just reconnect and retry.


So just as a data point I'm routinely annoyed by reopening my screen
session and finding various session sessions have died since the day
before. Usually this is caused by broken firewalls but there are also
a bunch of SSH options which some servers have enabled which cause my
sessions to never survive very long if there are any network outages.
Servers where those options are disabled work fine.

I admit this is a very different use case though and since we have
control over the behaviour when the connection breaks perhaps the
analogy falls apart completely. I'm not sure we can guarantee that
reconnecting is always so simple though. What if the user set up an
SSH gateway or needs some extra authentication to make the connection.
Are users expecting the slave to randomly disconnect and reconnect
willy nilly or are they expecting that once it connects it'll keep
using that connection forever?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-20 Thread Robert Haas
On Sun, Jun 20, 2010 at 9:31 PM, Greg Stark gsst...@mit.edu wrote:
 On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug f...@phlo.org wrote:
 I'd buy that if all timeouts and retry counts would default to +infinity. 
 But they don't, and hence sufficiently long network outages *will* cause 
 connection aborts anyway. That a particular connection might survive due to 
 inactivity proves nothing, since whether the connection is active or 
 inactive during an outage is usually outside of anyone's control.

 I really fail to see why anyone would prefer connections (and therefore 
 transactions!) getting stuck forever over a few spurious disconnects. The 
 former always require manual intervention and cause all sorts of performance 
 and disk-space issues, while the latter won't even be an issue for 
 well-written clients who just reconnect and retry.


 So just as a data point I'm routinely annoyed by reopening my screen
 session and finding various session sessions have died since the day
 before. Usually this is caused by broken firewalls but there are also
 a bunch of SSH options which some servers have enabled which cause my
 sessions to never survive very long if there are any network outages.
 Servers where those options are disabled work fine.

 I admit this is a very different use case though and since we have
 control over the behaviour when the connection breaks perhaps the
 analogy falls apart completely. I'm not sure we can guarantee that
 reconnecting is always so simple though. What if the user set up an
 SSH gateway or needs some extra authentication to make the connection.
 Are users expecting the slave to randomly disconnect and reconnect
 willy nilly or are they expecting that once it connects it'll keep
 using that connection forever?

I feel like we're getting off in the weeds, here.  Obviously, the user
would ideally like the connection to the master to last forever, but
equally obviously, if the master unexpectedly reboots, they'd like the
slave to notice - ideally within some reasonable time period - that it
needs to reconnect.  There's no perfect way to distinguish the master
croaked from the network administrator unplugged the Ethernet cable
and is planning to plug it back in any hour now, so we'll just need
to pick some reasonable timeout and go with it.  To my way of
thinking, if the master hasn't responded in a minute or two, that's a
sign that it's time to declare the connection dead.  Retrying the
connection *should* be cheap.  If the user has set things up so that a
TCP connection from slave to master is not straightforward, the user
has configured it incorrectly, and no matter what we do it's not going
to be reliable.

I still think there's a decent argument that we might want to have a
protocol-level heartbeat rather than a TCP-level heartbeat.  But doing
the latter is, I think, good enough for 9.0.  We're pretty much
speculating about what the problems with that approach might be, so
getting too worked up about fixing them at this point seems premature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] beta3 the open items list

2010-06-19 Thread Robert Haas
It would be nice to get beta3 out the door sooner rather than later,
but I sort of feel like we're not ready yet.  In fact, we seem to be a
bit stalled.  The open items list currently lists four items.

1. max_standby_delay.  Tom has committed to getting this done, but has
been tied up with non-PostgreSQL related work for the last few weeks.

2. infinite repeat of warning message in standby.  Heikki changed the
code so this isn't a tight loop any more, which is an improvement, but
we've discussed the fact that retrying forever may not be the best
behavior.

http://archives.postgresql.org/pgsql-hackers/2010-06/msg00806.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00838.php

I am not clear, however, on how difficult it is to implement the
proposed behavior, and I'm not sure Heikki's on board with the
proposed change.

3. supply alternate hstore operator for equals-greater in preparation
for later user in function parameter assignment.  There's some work
left to be done here but it's pretty minor.  Mostly we're arguing
about whether to call the hstore slice operator + or  or % or % --
I've written three patches to rename it so far (to three different
alternative names), one of which I committed, and there's still
ongoing discussion as to whether to rename it again and/or remove it.
Aside from that, we need to deal with the singleton-hstore constructor
(text = text); I believe the consensus there is to remove the
operator in favor of the underlying hstore(text, text) function and
backpatch that function name into the back-branches to facilitate
writing hstore code that is portable across major PostgreSQL releases.

4. Streaming Replication needs to detect death of master.  We need
some sort of keep-alive, here.  Whether it's at the TCP level (as
advocated by Tom Lane and others) or at the protocol level (as
advocated by Greg Stark) is something that we have yet to decide; once
it's decided, someone will need to do it...

It would be nice if we could make a final push to get these issues
resolved and another beta out the door before the end of the month...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Joshua D. Drake
On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:

 4. Streaming Replication needs to detect death of master.  We need
 some sort of keep-alive, here.  Whether it's at the TCP level (as
 advocated by Tom Lane and others) or at the protocol level (as
 advocated by Greg Stark) is something that we have yet to decide; once
 it's decided, someone will need to do it...

TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
humbly suggest we *not* be pedantic and implement something practical
and less prone to variables outside the control of Pg.

Sincerely,

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Greg Stark
On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas robertmh...@gmail.com wrote:
 4. Streaming Replication needs to detect death of master.  We need
 some sort of keep-alive, here.  Whether it's at the TCP level (as
 advocated by Tom Lane and others) or at the protocol level (as
 advocated by Greg Stark) is something that we have yet to decide; once
 it's decided, someone will need to do it...

This sounds like a useful feature but I don't see why it's not 9.1
material. The status quo is that the expected usage pattern is manual
failover. As long as the slave responds to manual intervention when in
this state I don't think this is a blocking issue. Monitoring and
automatic failover are clearly things we plan to add features to
handle better in the future.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Robert Haas
On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark gsst...@mit.edu wrote:
 On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas robertmh...@gmail.com wrote:
 4. Streaming Replication needs to detect death of master.  We need
 some sort of keep-alive, here.  Whether it's at the TCP level (as
 advocated by Tom Lane and others) or at the protocol level (as
 advocated by Greg Stark) is something that we have yet to decide; once
 it's decided, someone will need to do it...

 This sounds like a useful feature but I don't see why it's not 9.1
 material. The status quo is that the expected usage pattern is manual
 failover. As long as the slave responds to manual intervention when in
 this state I don't think this is a blocking issue. Monitoring and
 automatic failover are clearly things we plan to add features to
 handle better in the future.

Right now, if the SR master reboots unexpectedly (say, power plug pull
and restart), the slave never notices.  It just sits there forever
waiting for the next byte of data from the master to arrive (which it
never will).  You have to manually restart the server or hit
walreceiver with a SIGTERM to get it to start streaming agian.  I
guess we could decide we're just not going to deal with that, but it
seems like a fairly large misfeature to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Right now, if the SR master reboots unexpectedly (say, power plug pull
 and restart), the slave never notices.  It just sits there forever
 waiting for the next byte of data from the master to arrive (which it
 never will).

This is nonsense --- the slave's kernel *will* eventually notice that
the TCP connection is dead, and tell walreceiver so.  I don't doubt
that the standard TCP timeout is longer than people want to wait for
that, but claiming that it will never happen is simply wrong.

I think that enabling slave-side TCP keepalives and control of the
keepalive timeout parameters is probably sufficient for 9.0 here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Andres Freund
On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
 On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
  4. Streaming Replication needs to detect death of master.  We need
  some sort of keep-alive, here.  Whether it's at the TCP level (as
  advocated by Tom Lane and others) or at the protocol level (as
  advocated by Greg Stark) is something that we have yet to decide; once
  it's decided, someone will need to do it...
 
 TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
 humbly suggest we *not* be pedantic and implement something practical
 and less prone to variables outside the control of Pg.
And has the huge advantage of being implementable in about 5 lines of C 
(setsockopt + error checking). Considering what time in the release cycle this 
is...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Stefan Kaltenbrunner

On 06/19/2010 09:13 PM, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

Right now, if the SR master reboots unexpectedly (say, power plug pull
and restart), the slave never notices.  It just sits there forever
waiting for the next byte of data from the master to arrive (which it
never will).


This is nonsense --- the slave's kernel *will* eventually notice that
the TCP connection is dead, and tell walreceiver so.  I don't doubt
that the standard TCP timeout is longer than people want to wait for
that, but claiming that it will never happen is simply wrong.

I think that enabling slave-side TCP keepalives and control of the
keepalive timeout parameters is probably sufficient for 9.0 here.


yeah I would agree - we do have tcp keepalive code in the backend for a 
while now and adding that to libpq as well just seems like an easy 
enough fix at this time in the release cycle.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Florian Pflug
On Jun 19, 2010, at 21:13 , Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right now, if the SR master reboots unexpectedly (say, power plug pull
 and restart), the slave never notices.  It just sits there forever
 waiting for the next byte of data from the master to arrive (which it
 never will).
 
 This is nonsense --- the slave's kernel *will* eventually notice that
 the TCP connection is dead, and tell walreceiver so.  I don't doubt
 that the standard TCP timeout is longer than people want to wait for
 that, but claiming that it will never happen is simply wrong.

No, Robert is correct AFAIK. If you're *waiting* for data, TCP generates no 
traffic (expect with keepalive enabled). From the slave's kernel POV, a dead 
master is therefore indistinguishable from a inactive master.

Things are different from a sender's POV, though. Since sent data is ACK'ed by 
the receiving end, the TCP stack can (and does) detect a broken connection.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Simon Riggs
On Sat, 2010-06-19 at 14:53 -0400, Robert Haas wrote:
 On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark gsst...@mit.edu wrote:
  On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas robertmh...@gmail.com wrote:
  4. Streaming Replication needs to detect death of master.  We need
  some sort of keep-alive, here.  Whether it's at the TCP level (as
  advocated by Tom Lane and others) or at the protocol level (as
  advocated by Greg Stark) is something that we have yet to decide; once
  it's decided, someone will need to do it...
 
  This sounds like a useful feature but I don't see why it's not 9.1
  material. The status quo is that the expected usage pattern is manual
  failover. As long as the slave responds to manual intervention when in
  this state I don't think this is a blocking issue. Monitoring and
  automatic failover are clearly things we plan to add features to
  handle better in the future.
 
 Right now, if the SR master reboots unexpectedly (say, power plug pull
 and restart), the slave never notices.  It just sits there forever
 waiting for the next byte of data from the master to arrive (which it
 never will).  You have to manually restart the server or hit
 walreceiver with a SIGTERM to get it to start streaming agian.  I
 guess we could decide we're just not going to deal with that, but it
 seems like a fairly large misfeature to me.

Are you saying it doesn't respond to a trigger file any any point? That
would be a problem.

Sounds like we should have a pg_restart_walreceiver() function. We
shouldn't be encouraging people to send signals to backends, its too
easy to get wrong.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 the open items list

2010-06-19 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Jun 19, 2010, at 21:13 , Tom Lane wrote:
 This is nonsense --- the slave's kernel *will* eventually notice that
 the TCP connection is dead, and tell walreceiver so.  I don't doubt
 that the standard TCP timeout is longer than people want to wait for
 that, but claiming that it will never happen is simply wrong.

 No, Robert is correct AFAIK. If you're *waiting* for data, TCP
 generates no traffic (expect with keepalive enabled).

Mph.  I was thinking that keepalive was on by default with a very long
interval, but I see this isn't so.  However, if we enable keepalive,
then it's irrelevant to the point anyway.  Nobody's produced any
evidence that keepalive is an unsuitable solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] beta3 CFLAGS issue on openbsd

2006-11-10 Thread Peter Eisentraut
Am Freitag, 10. November 2006 08:29 schrieb Jeremy Drake:
 I figured out that the -g flag was being surreptitiously added to my
 CFLAGS.  It was like pulling teeth trying to get the -g flag out.  I tried
 --disable-debug to configure, which did not work.  I had to do
 CFLAGS=-O2 ./configure ...

Apparently you have some CFLAGS setting in your environment.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] beta3 CFLAGS issue on openbsd

2006-11-09 Thread Jeremy Drake
I was trying to compile 8.2beta3 on openbsd, and ran into an interesting
issue.  My account on the particular openbsd box has some restrictive
ulimit settings, so I don't have a lot of memory to work with.  I was
getting an out of memory issue linking postgres, while I did not before.
I figured out that the -g flag was being surreptitiously added to my
CFLAGS.  It was like pulling teeth trying to get the -g flag out.  I tried
--disable-debug to configure, which did not work.  I had to do
CFLAGS=-O2 ./configure ...

Is this a known feature in the betas to get people running with -g in case
things break, or is this a configure bug, or expected?

Here is the first bit from configure, note the -g in the using CFLAGS line
at the end.

[EMAIL PROTECTED](~/build/postgres/postgresql-8.2beta3)$ ./configure 
--prefix=/home/jeremyd/progs/pg82 --with-perl --with-openssl --with-pgport=54322
checking build system type... x86_64-unknown-openbsd3.9
checking host system type... x86_64-unknown-openbsd3.9
checking which template to use... openbsd
checking whether to build with 64-bit integer date/time support... no
checking whether NLS is wanted... no
checking for default port number... 54322
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ANSI C... none needed
checking if cc supports -Wdeclaration-after-statement... no
checking if cc supports -Wendif-labels... yes
checking if cc supports -fno-strict-aliasing... yes
configure: using CFLAGS=-O2 -g -pipe -Wall -Wmissing-prototypes -Wpointer-arith 
-Winline -Wendif-labels -fno-strict-aliasing


-- 
It's odd, and a little unsettling, to reflect upon the fact that
English is the only major language in which I is capitalized; in many
other languages You is capitalized and the i is lower case.
-- Sydney J. Harris

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] beta3 CFLAGS issue on openbsd

2006-11-09 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 I figured out that the -g flag was being surreptitiously added to my
 CFLAGS.  It was like pulling teeth trying to get the -g flag out.

I believe that this is a default behavior of autoconf scripts.
I remember having done some ugly hacks years ago to prevent an autoconf
configure script from adding -g by default to libjpeg builds... and
the argument for not having -g has gotten ever weaker since then,
so I really doubt you'll get far complaining to the autoconf maintainers
about it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Beta3 Bundled

2005-10-12 Thread Dave Page
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc 
 G. Fournier
 Sent: 12 October 2005 00:50
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Beta3 Bundled
 
 
 Sizes look right compared to beta2 ... please check it over 
 and make sure 
 there are no outstanding issues ... will announce over the 
 next 24-48 hrs, 
 once Dave has had a change to get the pgInstaller up to date ...

Uploaded to svr1 - will take a while to hit the mirrors of course.

/D

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Beta3 Bundled

2005-10-11 Thread Marc G. Fournier


Sizes look right compared to beta2 ... please check it over and make sure 
there are no outstanding issues ... will announce over the next 24-48 hrs, 
once Dave has had a change to get the pgInstaller up to date ...





Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] beta3 on unixware 714

2004-10-09 Thread ohp
Hi all,

I've been giving a shot to beta3 since yesterday.
make check produces a hang when testing the 14 parallel tests (limit...)
at that point, no tests ever returns, one postmaster is 100% cpu bound and
nothing occurs.

Beta2 was ok; I wonder what changed.
Also, I tried to compile with --enable-cassert, this causes a symbol not
found in createlang while make check.

Is there anything I can provide the list to help debug this?

TIA and regards

-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] beta3 on unixware 714

2004-10-09 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Also, I tried to compile with --enable-cassert, this causes a symbol not
 found in createlang while make check.

Sounds like picking up the wrong version of a shared library.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] beta3 on unixware 714

2004-10-09 Thread ohp
On Sat, 9 Oct 2004, Tom Lane wrote:

 Date: Sat, 09 Oct 2004 11:19:51 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers list [EMAIL PROTECTED]
 Subject: Re: [HACKERS] beta3 on unixware 714

 [EMAIL PROTECTED] writes:
  Also, I tried to compile with --enable-cassert, this causes a symbol not
  found in createlang while make check.

 Sounds like picking up the wrong version of a shared library.

   regards, tom lane

not sure:
createlang: language installation failed: ERROR:  could not load library
 
/home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so:
 dynamic linker: 
/home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/postmaster:
 relocation error: symbol not found: assert_enabled; referenced from: 
/home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so

gmake[2]: *** [check] Error 2
gmake[1]: *** [check] Error 2
gmake: *** [check] Error 2
UX:make: ERREUR: erreur irrémédiable.
no old pgsql library involved (this is with snapshot but same message with
beta3)

As for the first part of my message (hang in make check)
the hang occurs when compiling with --enable-thread-safey and NOT
otherwise.

While I strongly suspect a SCO pthread bug, I'm at lost why it works
perfectly with beta1 and 2.

Did signal handling changed between beta2 and beta3?

Regards
-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] beta3 on unixware 714

2004-10-09 Thread Tom Lane
[EMAIL PROTECTED] writes:
 not sure:
 createlang: language installation failed: ERROR:  could not load library
  
 /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so:
  dynamic linker: 
 /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/postmaster:
  relocation error: symbol not found: assert_enabled; referenced from: 
 /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so

Hmm.  That looks like trying to load an assert-enabled plpgsql.so into a
*not* assert-enabled backend.  You sure you built the whole thing with
asserts?

 As for the first part of my message (hang in make check)
 the hang occurs when compiling with --enable-thread-safey and NOT
 otherwise.
 While I strongly suspect a SCO pthread bug, I'm at lost why it works
 perfectly with beta1 and 2.
 Did signal handling changed between beta2 and beta3?

No, but Bruce has been fooling with the configure logic for threads,
IIRC, so it's quite possible that we are now supplying a different
set of compile or link switches, or a different set of libraries
requested in the link.  That's probably the first thing to look at.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] beta3 on unixware 714

2004-10-09 Thread ohp
On Sat, 9 Oct 2004, Tom Lane wrote:

 Date: Sat, 09 Oct 2004 11:46:36 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers list [EMAIL PROTECTED]
 Subject: Re: [HACKERS] beta3 on unixware 714

 [EMAIL PROTECTED] writes:
  not sure:
  createlang: language installation failed: ERROR:  could not load library
   
  /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so:
   dynamic linker: 
  /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/postmaster:
   relocation error: symbol not found: assert_enabled; referenced from: 
  /home/postgres/postgresql-snapshot/src/test/regress/./tmp_check/install//usr/local/pgsql/lib/plpgsql.so

 Hmm.  That looks like trying to load an assert-enabled plpgsql.so into a
 *not* assert-enabled backend.  You sure you built the whole thing with
 asserts?
Positive! (make distclean, configure, make;make check several times)
But it doesn't matter now with what you said below...
I wanted this to debug my thread bug...
Thanks anyway

  As for the first part of my message (hang in make check)
  the hang occurs when compiling with --enable-thread-safey and NOT
  otherwise.
  While I strongly suspect a SCO pthread bug, I'm at lost why it works
  perfectly with beta1 and 2.
  Did signal handling changed between beta2 and beta3?

 No, but Bruce has been fooling with the configure logic for threads,
 IIRC, so it's quite possible that we are now supplying a different
 set of compile or link switches, or a different set of libraries
 requested in the link.  That's probably the first thing to look at.

   regards, tom lane


-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] beta3 tag, bundled and available ...

2003-09-15 Thread Marc G. Fournier

Just finished bundling it, so will give a few for the mirrors to pull it
in, but it is now available for download if ppl want to confirm its okay
...

I also removed the beta1 bundles, but have left the beta2 ones in place
...

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] beta3 packaged ...

2002-10-27 Thread Tom Lane
Marc G. Fournier [EMAIL PROTECTED] writes:
 Please check it and confirm ... I believe everything, includin the docs,
 should be right on this ... if not, I'm going to repackage before
 announcing ...

Code looks good from here, can't check the docs very easily over this
dialup connection ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] beta3 packaged ...

2002-10-27 Thread Marc G. Fournier
On Sun, 27 Oct 2002, Tom Lane wrote:

 Marc G. Fournier [EMAIL PROTECTED] writes:
  Please check it and confirm ... I believe everything, includin the docs,
  should be right on this ... if not, I'm going to repackage before
  announcing ...

 Code looks good from here, can't check the docs very easily over this
 dialup connection ...

'K, if I haven't heard anything negative by the time I'm finished the
server upgrade this evening, I'll put out an announce ...

Just need to confirm, beta3 *does* require an initdb from beta2, right?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] beta3 packaged ...

2002-10-27 Thread Bruce Momjian
Marc G. Fournier wrote:
 On Sun, 27 Oct 2002, Tom Lane wrote:
 
  Marc G. Fournier [EMAIL PROTECTED] writes:
   Please check it and confirm ... I believe everything, includin the docs,
   should be right on this ... if not, I'm going to repackage before
   announcing ...
 
  Code looks good from here, can't check the docs very easily over this
  dialup connection ...
 
 'K, if I haven't heard anything negative by the time I'm finished the
 server upgrade this evening, I'll put out an announce ...
 
 Just need to confirm, beta3 *does* require an initdb from beta2, right?

Yes.  If we din't need an initdb, we may have just skipped beta3 and
jumped right to RC1 next Friday.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] beta3 packaged ...

2002-10-27 Thread Marc G. Fournier
On Sun, 27 Oct 2002, Bruce Momjian wrote:

 Marc G. Fournier wrote:
  On Sun, 27 Oct 2002, Tom Lane wrote:
 
   Marc G. Fournier [EMAIL PROTECTED] writes:
Please check it and confirm ... I believe everything, includin the docs,
should be right on this ... if not, I'm going to repackage before
announcing ...
  
   Code looks good from here, can't check the docs very easily over this
   dialup connection ...
 
  'K, if I haven't heard anything negative by the time I'm finished the
  server upgrade this evening, I'll put out an announce ...
 
  Just need to confirm, beta3 *does* require an initdb from beta2, right?

 Yes.  If we din't need an initdb, we may have just skipped beta3 and
 jumped right to RC1 next Friday.

With the amount of chnages that went in between beta2 and no, there is no
way that an RC1 would go out without a beta3 :)



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] beta3 packaged ...

2002-10-26 Thread Marc G. Fournier

Please check it and confirm ... I believe everything, includin the docs,
should be right on this ... if not, I'm going to repackage before
announcing ...



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-29 Thread Pete Forman

Ross J. Reedstrom writes:
  Hmm, multiple processors, and lots of IPC:
  [snip]
  Since it's just you and the sysadmin: any chance you could bring
  the system up uniprocessor (I don't even know if this is _possible_
  with Sun hardware, let alone how hard) and run the regressions some
  more?  If that makes it go away, I'd say it pretty well points
  straight into the Solaris kernel.

My observations of Solaris UNIX domain socket problems were on single
processor machines.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Lookingfor . . . ]

2001-01-26 Thread Pete Forman

Peter Eisentraut writes:
  Frank Joerdens writes:
  
I have experienced before that Unix sockets will cause random
connection abortions on Solaris [ . . . ]
  
   Isn't that _really_ bad? Random connection abortions when going
   over Unix sockets?? My app does _all_ the connecting over Unix
   sockets?!
  
  That's bad, for sure.  Maybe you can check for odd conditions
  surrounding the /tmp directory, like is it on NFS, permission
  problems, mount options.  Or is there something odd in the kernel
  configuration?  If I'm counting correctly this is the third
  independent report of this problem, which is scary.

I'm not sure if you counted me.  I also observed that Unix sockets
cause the parallel tests to fail in random places on Solaris.


We had a similar problem porting a product that uses a lot of IPC to
Solaris.  There were failures involving the overloading of the Unix
domain sockets.  We took the code to Sun and they were unable to
resolve the problems.  It should have been possible to tune the kernel
to provide more resources.  However it turns out that some of the
parameters that we wanted to tune were ignored in favour of hard coded
values.  In the end we rewrote our code to use Internet domain sockets
(AF_INET).



BTW, owing to a DNS error email to me has bounced over the last couple
of days.  It should be okay now if anything needs to be resent.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-26 Thread Frank Joerdens

On Fri, Jan 26, 2001 at 03:29:59PM +, Patrick Welche wrote:
 On Thu, Jan 25, 2001 at 10:13:29PM -0500, Tom Lane wrote:
  Frank Joerdens [EMAIL PROTECTED] writes:
   I just did that and ran make check 4 times. 3 times went completely
   smoothly, once I had random fail. This is the same behaviour that I saw
   when running make installcheck (76 successful most of the time,
   sometimes you get 75 out of 76 with random being the one that fails).
  
  Er, you do realize that the random test is *supposed* to fail every so
  often?

I do. I just included the info for completeness' sake.

  What troubles me is the nonrepeatable failures you saw on other tests.
  As Peter says, if "make installcheck" (serial tests) is perfectly solid
  and "make check" (parallel tests) is not, that suggests some kind of
  interprocess locking problem.  But we haven't heard about any such issue
  on Solaris.
 
 Or simply running out of processes - check maxproc? (Deleted beginning of
 this thread, so may have missed something)

There is no load at all on this server at the moment. The sysadmin and
myself are currently the only people accessing a brand new UltraSPARC with 3
CPUs and 3/4 GB of RAM to install stuff.

Whatever the reason for it, Peter's suggestion at least seems to
mitigate the issue with the regression tests. I've set DEFAULT_PGSOCKET_DIR
in src/include/config.h.in to /usr/db/pgsql/tmp (/usr/db/pgsql is the
postgres user's home dir and the install dir for Postgres). Running make
check after that gives:

1: none failed
2: random   ... failed (ignored)
3: Oh. What's the expression (in German you'd say 'Zu frueh gefreut.')
here. Now I get:

 select_distinct_on   ... FAILED
 select_implicit  ... FAILED
 random   ... failed (ignored)
 portals  ... FAILED
test misc ... FAILED

Typing 

$ ps -a 

I can see that 2 postgres processes are still active . . . ?? And
/usr/db/pgsql/tmp does not contain any lock file??? I killed those 2 and
ran make check again:

4: none failed
5: random   ... failed (ignored)
6: none failed
7: random   ... failed (ignored)
8: none failed
9: none failed
9: comments ... FAILED

Hm. Bizarre. The issue isn't solved but it definitely looks better than
before (also, the sysadmin just told me that /tmp is cleaned out
nightly anyway by cron). I'm gonna test it over TCP/IP sockets again,
and if that works, stick with those:

When setting unix_sockets=no; for any plattform in
src/test/regress/pg_regress.sh, 7 consecutive tests showed no errors.
I'll just connect to the server over TCP/IP.

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-26 Thread Tom Lane

Frank Joerdens [EMAIL PROTECTED] writes:
 Now I get:

  select_distinct_on   ... FAILED
  select_implicit  ... FAILED
  random   ... failed (ignored)
  portals  ... FAILED
 test misc ... FAILED

Reporting a regression failure this way is pretty unhelpful.  What are
the actual diffs (regression.diffs)?  What shows up in the postmaster
log (logs/postmaster.log)?

regards, tom lane



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-26 Thread Ross J. Reedstrom

On Fri, Jan 26, 2001 at 05:03:13PM +0100, Frank Joerdens wrote:
 
 There is no load at all on this server at the moment. The sysadmin and
 myself are currently the only people accessing a brand new UltraSPARC with 3
 CPUs and 3/4 GB of RAM to install stuff.

Hmm, multiple processors, and lots of IPC: I've got a bad feeling
about this.  Nothing solid (don't do a lot with Solaris), but there are
a _lot_ of gotchas in getting that combo right, many of which _kill_
performance for the normal case to get correct behavior in an edge
case. I could imagine Sun missing one or two, and not catching it (or
actively ignoring it, to get better CPU utilization)

Since it seems to hit only when using Unix domain sockets, I'd take a
wild guess that explicit use of shared memory and Unix domain sockets
are stepping on each other in a multiprocessor environment. Invoking
Inet sockets gets more of the networking code in play, which is usually
more heavily tested in such an environment.

Since it's just you and the sysadmin: any chance you could bring the
system up uniprocessor (I don't even know if this is _possible_ with
Sun hardware, let alone how hard) and run the regressions some more?
If that makes it go away, I'd say it pretty well points straight into
the Solaris kernel.

Ross
-- 
Ross J. Reedstrom, Ph.D., [EMAIL PROTECTED] 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-26 Thread Frank Joerdens

On Fri, Jan 26, 2001 at 11:15:45AM -0500, Tom Lane wrote:
 Frank Joerdens [EMAIL PROTECTED] writes:
  Now I get:
 
   select_distinct_on   ... FAILED
   select_implicit  ... FAILED
   random   ... failed (ignored)
   portals  ... FAILED
  test misc ... FAILED
 
 Reporting a regression failure this way is pretty unhelpful.  

Sorry. My thinking was that the bottom line here is the very
non-reproducability of particular results. No two regression test
failures where identical of the couple dozen or so I conducted, and
hence it wouldn't make all that much sense to analyze any single test
all by itself.

As I wrote earlier, I don't have neither physical nor root access to
this box. Moreover, the sysadmin tells me that he didn't install the OS
himself, a friend of his did, because he himself was on holiday. There
may well be something very fishy about the OSs configuration, but I
wouldn't have the first notion as to where to start looking. It
_appears_ that setting DEFAULT_PGSOCKET_DIR somewhere else besides /tmp
has some positive effect, but that ain't conclusive.

 What are
 the actual diffs (regression.diffs)? What shows up in the postmaster
 log (logs/postmaster.log)?

Those results were overwritten by the last 10 tests that didn't show any
errors, so I can't retrieve them, now.

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-26 Thread Peter Eisentraut

Ross J. Reedstrom writes:

 Hmm, multiple processors, and lots of IPC: I've got a bad feeling
 about this.

Although I'm not absolutely certain, the systems on which I had this
problem were not multi-processor, they were just plain-old workstations in
a university computer lab.  At the time (7.0 beta) I had attributed this
problem to the possibly supicious nature of the /tmp partition, since Marc
didn't have any such problems with his Solaris boxes.

After reading Pete Forman's anecdote I looked around some more and found
this:

http://www.cise.ufl.edu/depot/doc/postfix/HISTORY

19990321

Workaround: from now on, Postfix on Solaris uses stream
pipes instead of UNIX-domain sockets. Despite workarounds,
the latter were causing more trouble than anything else on
all systems combined.


There are also some reports that indicate problems in this direction at
http://www.landfield.com/faqs/usenet/software/inn-faq/part2/.


Conclusion: Don't use it.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-26 Thread Patrick Welche

On Thu, Jan 25, 2001 at 10:13:29PM -0500, Tom Lane wrote:
 Frank Joerdens [EMAIL PROTECTED] writes:
  I just did that and ran make check 4 times. 3 times went completely
  smoothly, once I had random fail. This is the same behaviour that I saw
  when running make installcheck (76 successful most of the time,
  sometimes you get 75 out of 76 with random being the one that fails).
 
 Er, you do realize that the random test is *supposed* to fail every so
 often?  (Else it'd not be random...)  See the pages on interpreting
 regression test results in the admin guide.
 
 What troubles me is the nonrepeatable failures you saw on other tests.
 As Peter says, if "make installcheck" (serial tests) is perfectly solid
 and "make check" (parallel tests) is not, that suggests some kind of
 interprocess locking problem.  But we haven't heard about any such issue
 on Solaris.

Or simply running out of processes - check maxproc? (Deleted beginning of
this thread, so may have missed something)

Cheers,

Patrick



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 12:42:45AM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
 [randomly varying set of regression tests fail]
 
  Running the tests on my Linux box gives no failed tests. Must I assume
  that those failed tests indicate some issue that is is detrimental to
  the proper functioning of the server on this Solaris installation? Do
  you want the regression.diffs?
 
 Could you go into src/test/regress/pg_regress.sh and edit around line 162
 
 #case $host_platform in
 #*-*-qnx* | *beos*)
 unix_sockets=no;;
 #*)
 #unix_sockets=yes;;
 #esac
 
 (i.e., ensure that unix_sockets is set to 'no'), and rerun 'make check'.

I just did that and ran make check 4 times. 3 times went completely
smoothly, once I had random fail. This is the same behaviour that I saw
when running make installcheck (76 successful most of the time,
sometimes you get 75 out of 76 with random being the one that fails).
 
 I have experienced before that Unix sockets will cause random connection
 abortions on Solaris [ . . . ]

Isn't that _really_ bad? Random connection abortions when going over
Unix sockets?? My app does _all_ the connecting over Unix sockets?!

  I also tried using the Sun compiler, which didn't work at all.
 
 details on "didn't work" requested...

-- begin details --
$ export CC=CC
$ echo $CC
CC
$ ./configure
creating cache ./config.cache
checking host system type... sparc-sun-solaris2.7
checking which template to use... solaris
checking whether to build with locale support... no
checking whether to build with recode support... no
checking whether to build with multibyte character support... no
checking whether to build with Unicode conversion support... no
checking for default port number... 5432
checking for default soft limit on number of connections... 32
checking for gcc... CC
checking whether the C compiler (CC  ) works... yes
checking whether the C compiler (CC  ) is a cross-compiler... no
checking whether we are using GNU C... no
checking whether CC accepts -g... yes
using CFLAGS=-v
checking whether the C compiler (CC -Xa -v ) works... no
configure: error: installation or configuration problem: C compiler
cannot create executables.
-- end details --

Cheers, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Lookingfor . . . ]

2001-01-25 Thread bpalmer

Worked fine for me...

% uname -a

SunOS lancelot 5.7 Generic_106541-14 sun4m sparc SUNW,SPARCstation-4

% ls -l

-rw-r--r--   1 bpalmer  staff32860160 Jan 23 16:45
postgresql-snapshot.tar

...
...
...
 transactions ... ok
 random   ... failed (ignored)
 portals  ... ok
...
...
...

==
 75 of 76 tests passed, 1 failed test(s) ignored.
==



On Thu, 25 Jan 2001, Peter Eisentraut wrote:

 Frank Joerdens writes:

 [randomly varying set of regression tests fail]

  Running the tests on my Linux box gives no failed tests. Must I assume
  that those failed tests indicate some issue that is is detrimental to
  the proper functioning of the server on this Solaris installation? Do
  you want the regression.diffs?

 Could you go into src/test/regress/pg_regress.sh and edit around line 162

 #case $host_platform in
 #*-*-qnx* | *beos*)
 unix_sockets=no;;
 #*)
 #unix_sockets=yes;;
 #esac

 (i.e., ensure that unix_sockets is set to 'no'), and rerun 'make check'.

 I have experienced before that Unix sockets will cause random connection
 abortions on Solaris, which will cause the regression tests to fail
 arbitrarily.

  I also tried using the Sun compiler, which didn't work at all.

 details on "didn't work" requested...

  now I get scary stuff like:
 
  --- begin scary stuff ---
  test int2 ... ERROR:  pg_atoi: error in "34.5": can't
  parse ".5"
  ERROR:  pg_atoi: error reading "10": Result too large
  ERROR:  pg_atoi: error in "asdf": can't parse "asdf"

 This is normal.  The regression tests sometimes involve intentional
 invalid input.

 --
 Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/





b. palmer,  [EMAIL PROTECTED]
pgp:  www.crimelabs.net/bpalmer.pgp5





Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 05:12:02PM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
   I have experienced before that Unix sockets will cause random connection
   abortions on Solaris [ . . . ]
 
  Isn't that _really_ bad? Random connection abortions when going over
  Unix sockets?? My app does _all_ the connecting over Unix sockets?!
 
 That's bad, for sure.  Maybe you can check for odd conditions surrounding
 the /tmp directory, like is it on NFS, permission problems, mount options.

I don't have neither root nor physical access to this machine, hence my
options are kinda limited. However, the sysadmin told me that most of
the storage space on this box is mounted over a fibre channel (I only
have a very hazy notion of what exactly that might be) from a "storage
server" which is allegedly as fast as a local SCSI disk.

 Or is there something odd in the kernel configuration?  If I'm counting
 correctly this is the third independent report of this problem, which is
 scary.

I'll question the sysadmin about that. But why does make installcheck
work? Because it goes over TCP/IP sockets by default?

 
I also tried using the Sun compiler, which didn't work at all.
  
   details on "didn't work" requested...
 
  -- begin details --
  $ export CC=CC
 
 Using a C++ compiler to compile C code won't work.  You probably meant
 CC=cc and CXX=CC.

When I do that, make fails with the following error (after giving lots
of warnings):

"pg_dump.c", line 1063: warning: Function has no return statement : main
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
common.o common.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_archiver.o pg_backup_archiver.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_db.o pg_backup_db.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_custom.o pg_backup_custom.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_files.o pg_backup_files.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_null.o pg_backup_null.c
"pg_backup_null.c", line 90: controlling expressions must have scalar
type
cc: acomp failed for pg_backup_null.c
make[3]: *** [pg_backup_null.o] Error 2
make[3]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src/bin/pg_dump'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src/bin'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src'
make: *** [all] Error 2

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 05:12:02PM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
   I have experienced before that Unix sockets will cause random connection
   abortions on Solaris [ . . . ]
 
  Isn't that _really_ bad? Random connection abortions when going over
  Unix sockets?? My app does _all_ the connecting over Unix sockets?!
 
 That's bad, for sure.  Maybe you can check for odd conditions surrounding
 the /tmp directory, like is it on NFS, permission problems, mount options.

I just typed

$ mount

and I get

/tmp on swap read/write/setuid on Mon Jan 22 16:39:32 2001

for the /tmp directory, which looks distinctly odd to me. What kind of
device is swap (I know what swap is normally but I didn't know you could
mount stuff there . . . )??

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Lookingfor . . . ]

2001-01-25 Thread Peter Eisentraut

Frank Joerdens writes:

  That's bad, for sure.  Maybe you can check for odd conditions surrounding
  the /tmp directory, like is it on NFS, permission problems, mount options.

 I don't have neither root nor physical access to this machine, hence my
 options are kinda limited.

Entering 'mount' should tell you.

 I'll question the sysadmin about that. But why does make installcheck
 work? Because it goes over TCP/IP sockets by default?

No.  Presumably because it does not run more than one test in parallel.

 "pg_backup_null.c", line 90: controlling expressions must have scalar type
 cc: acomp failed for pg_backup_null.c

Line 90 has a comment in my copy.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 12:04:40PM -0800, Ian Lance Taylor wrote:
[ . . . ]
  for the /tmp directory, which looks distinctly odd to me. What kind of
  device is swap (I know what swap is normally but I didn't know you could
  mount stuff there . . . )??
 
 That is a tmpfs file system which uses swap space for /tmp storage.
 Both swap usage and /tmp compete for the same partition on the disk.
 If you have a lot of swapping programs, you don't get to put much in
 /tmp.  If you have a lot of files in /tmp, you don't get to run many
 programs.
 
 As far as I can recall, this is a Sun specific thing.
 
 It's a reasonable idea on a stable system.  It's a pretty crummy idea
 on a development system, or one with unpredictable loads.  My
 experience is that either something goes crazy and fills up /tmp and
 then you can't run anything else and you have to reboot, or something
 goes crazy and fills up swap and then you can't write any /tmp files
 and daemon processes start to silently die and you have to reboot.

Very peculiar, or crummy, indeed. This is system is not used by anyone
else besides myself at the moment (cuz it's just being built up), as far
a I can tell, and is ludicrously overpowered (3 CPUs, 768 MB RAM) for
the mundane uses I am subjecting it to (installing and testing
Postgresql).

Regards, Frank 



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report

2001-01-25 Thread Nathan Myers

On Thu, Jan 25, 2001 at 09:47:16PM +0100, Frank Joerdens wrote:
 On Thu, Jan 25, 2001 at 12:04:40PM -0800, Ian Lance Taylor wrote:
 [ . . . ]
   for the /tmp directory, which looks distinctly odd to me. What kind of
   device is swap (I know what swap is normally but I didn't know you could
   mount stuff there . . . )??
  
  That is a tmpfs file system which uses swap space for /tmp storage.
  Both swap usage and /tmp compete for the same partition on the disk.
  If you have a lot of swapping programs, you don't get to put much in
  /tmp.  If you have a lot of files in /tmp, you don't get to run many
  programs.
  
  As far as I can recall, this is a Sun specific thing.
  
  It's a reasonable idea on a stable system.  It's a pretty crummy idea
  on a development system, or one with unpredictable loads.  My
  experience is that either something goes crazy and fills up /tmp and
  then you can't run anything else and you have to reboot, or something
  goes crazy and fills up swap and then you can't write any /tmp files
  and daemon processes start to silently die and you have to reboot.
 
 Very peculiar, or crummy, indeed. This is system is not used by anyone
 else besides myself at the moment (cuz it's just being built up), as far
 a I can tell, and is ludicrously overpowered (3 CPUs, 768 MB RAM) for
 the mundane uses I am subjecting it to (installing and testing
 Postgresql).

I doubt you can blame any problems on tmpfs, here.  tmpfs has been 
in Solarix for many years, and has had plenty of time to stabilize.
With 768M of RAM and running only PG you not using any swap space at 
all, and unix sockets don't use any appreciable space either, so the 
conflicts Ian describes are impossible in your case.  

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Tom Lane

Frank Joerdens [EMAIL PROTECTED] writes:
 I just did that and ran make check 4 times. 3 times went completely
 smoothly, once I had random fail. This is the same behaviour that I saw
 when running make installcheck (76 successful most of the time,
 sometimes you get 75 out of 76 with random being the one that fails).

Er, you do realize that the random test is *supposed* to fail every so
often?  (Else it'd not be random...)  See the pages on interpreting
regression test results in the admin guide.

What troubles me is the nonrepeatable failures you saw on other tests.
As Peter says, if "make installcheck" (serial tests) is perfectly solid
and "make check" (parallel tests) is not, that suggests some kind of
interprocess locking problem.  But we haven't heard about any such issue
on Solaris.

regards, tom lane



[HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-24 Thread Frank Joerdens

On Tue, Jan 23, 2001 at 11:57:52AM -0500, Tom Lane wrote:
[ . . . ]
 After you build PG and test it, send us a port report, and we'll add
 Solaris 7 to the list of recently tested platforms.  That's how it
 works ...

The installation by simply running configure, make, make install went
completely smoothly, no hassle whatsoever (except for the
flex-is-not-present warning which I think you can ignore)! 

The system is, to be precise:

$ uname -a 

SunOS [hostname] 5.7 Generic_106541-12 sun4u sparc SUNW,Ultra-4

I did encounter some _weird_ stuff with the regression tests. Does that
not work via make check (the 'standalone' variety) when you've already
typed make install (on Linux it does!)?? Make installcheck seems to
produce non-failures semi-reliably (why does the random test not fail on
the 1st try, but on the 2nd, and then again not on the 3rd???). Below
are the dirty details.

As to what is mentioned in the Admin Guide about Solaris' default
settings for shared memore being too low, at least on the machine I am
testing on it is set to 4 GB!

$ cat /etc/system |grep shm
*   exclude: sys/shmsys
set shmsys:shminfo_shmmax = 4294967295
set shmsys:shminfo_shmmin = 1
set shmsys:shminfo_shmmni = 100
set shmsys:shminfo_shmseg = 10


Cheers, Frank

-- begin dirty details --
I can start, connect, create databases etc.. However, running the
regression tests gives 4 failed out of 76:

 reltime  ... FAILED
 tinterval... FAILED
test horology ... FAILED
test misc ... FAILED

I checked the timezone issue mentioned in the src/test/regress/README
file. The command

$ env TZ=PST8PDT date

returns 'Wed Jan 24 11:19:02 PST 2001', 9 hrs back, which is the time
difference between here and California, so I guess that is OK.

Running the tests on my Linux box gives no failed tests. Must I assume
that those failed tests indicate some issue that is is detrimental to
the proper functioning of the server on this Solaris installation? Do
you want the regression.diffs?

I also tried using the Sun compiler, which didn't work at all. 

 . . . [ goes away to do more testing ] . . .

What's really weird, I just ran ./configure, make, make install, make
check again, again with 4 failed, but different ones! 


 tinterval... FAILED
 inet ... FAILED
 comments ... FAILED
test misc ... FAILED


2 things were different: a) I set the compiler explicitly to
/usr/local/bin/gcc via the CC environment variable and b) I used the
default prefix this time. I'll try again with the old settings. 

 . . . [ goes away to do more testing ] . . .

make distclean
./configure --prefix=/usr/db/pgsql
make
make check

produces 6 out of 76 this time! They are:

 date ... FAILED
 type_sanity  ... FAILED
 opr_sanity   ... FAILED
 arrays   ... FAILED
 btree_index  ... FAILED
test misc ... FAILED

It looks progressively worse. I'll remove the source tree and start from scratch.

 . . . [ goes away to do more testing ] . . .

6 out of 76 again, but different ones . . .

 interval ... FAILED
 abstime  ... FAILED
 comments ... FAILED
 oidjoins ... FAILED
test horology ... FAILED
test misc ... FAILED

 . . . [ goes away to do more testing ] . . .

This time with the already installed database after initdb:

$ make installcheck

now I get scary stuff like:

--- begin scary stuff ---
test int2 ... ERROR:  pg_atoi: error in "34.5": can't
parse ".5"
ERROR:  pg_atoi: error reading "10": Result too large
ERROR:  pg_atoi: error in "asdf": can't parse "asdf"
ok
test int4 ... ERROR:  pg_atoi: error in "34.5": can't
parse ".5"
ERROR:  pg_atoi: error reading "1": Result too large
ERROR:  pg_atoi: error in "asdf": can't parse "asdf"
ok
test int8 ... ok
test oid  ... ERROR:  oidin: error in "asdfasd": can't
parse "asdfasd"
ERROR:  oidin: error in "99asdfasd": can't parse "asdfasd"
ok
test float4   ... ERROR:  Bad float4 input format --
overflow
--- end scary stuff ---

However, it works! All 76 tests pass.

 . . . [ goes away to do more testing ] . . .

running make installcheck again gives:

test random   ... failed (ignored)

 . . . [ goes away to do more testing ] . . .

All 76 tests pass.
-- end dirty details --



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Lookingfor . . . ]

2001-01-24 Thread Peter Eisentraut

Frank Joerdens writes:

[randomly varying set of regression tests fail]

 Running the tests on my Linux box gives no failed tests. Must I assume
 that those failed tests indicate some issue that is is detrimental to
 the proper functioning of the server on this Solaris installation? Do
 you want the regression.diffs?

Could you go into src/test/regress/pg_regress.sh and edit around line 162

#case $host_platform in
#*-*-qnx* | *beos*)
unix_sockets=no;;
#*)
#unix_sockets=yes;;
#esac

(i.e., ensure that unix_sockets is set to 'no'), and rerun 'make check'.

I have experienced before that Unix sockets will cause random connection
abortions on Solaris, which will cause the regression tests to fail
arbitrarily.

 I also tried using the Sun compiler, which didn't work at all.

details on "didn't work" requested...

 now I get scary stuff like:

 --- begin scary stuff ---
 test int2 ... ERROR:  pg_atoi: error in "34.5": can't
 parse ".5"
 ERROR:  pg_atoi: error reading "10": Result too large
 ERROR:  pg_atoi: error in "asdf": can't parse "asdf"

This is normal.  The regression tests sometimes involve intentional
invalid input.

-- 
Peter Eisentraut  [EMAIL PROTECTED]   http://yi.org/peter-e/




[HACKERS] beta3 vacuum crash

2001-01-23 Thread Frank Joerdens

I haven't tried everything to recover from this yet, but will quickly try to document 
the
crash before I lose track of what exactly went into it and what I did: Basically I 
deleted
a table and then ran vacuum verbose, with the net result that I cannot connect to this
database anymore with the error:

frank@kelis:/usr/local/httpd/htdocs  psql mpi
psql: FATAL 1:  Index 'pg_trigger_tgrelid_index' does not exist

This is, fortunately, not the production system but my development machine. I was 
going to
go live with this in a couple of week's time on beta3. Should I reconsider and move 
back
to 7.03 (I'd hate to cuz I'll have rows bigger than 32K, potentially . . . )?

The vacuum went like this:

--- begin vacuum ---
mpi=# drop table wimis;
DROP
mpi=# vacuum verbose;
NOTICE:  --Relation pg_type--
NOTICE:  Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE:  Rel pg_type: Pages: 3 -- 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u 
sec.
NOTICE:  --Relation pg_attribute--
NOTICE:  Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 
0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE:  Rel pg_attribute: Pages: 16 -- 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  --Relation pg_class--
NOTICE:  Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_class_oid_index: Pages 2; Tuples 136: Deleted 257. CPU 0.00s/0.01u 
sec.
NOTICE:  Index pg_class_relname_index: Pages 6; Tuples 136: Deleted 257. CPU 
0.00s/0.00u
sec.
NOTICE:  Rel pg_class: Pages: 7 -- 3; Tuple(s) moved: 76. CPU 0.01s/0.01u sec.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!# \q
--- end vacuum ---

The log says (I'm running the backend with -d 2):

--- begin log ---
DEBUG:  query: vacuum verbose;
DEBUG:  ProcessUtility: vacuum verbose;
NOTICE:  --Relation pg_type--
NOTICE:  Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE:  Rel pg_type: Pages: 3 -- 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u 
sec.
NOTICE:  --Relation pg_attribute--
NOTICE:  Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 
0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE:  Rel pg_attribute: Pages: 16 -- 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  --Relation pg_class--
NOTICE:  Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE:  Index 

Re: [HACKERS] beta3 vacuum crash

2001-01-21 Thread Frank Joerdens

On Sat, Jan 20, 2001 at 05:35:41PM -0500, Tom Lane wrote:
 Frank Joerdens [EMAIL PROTECTED] writes:
  I haven't tried everything to recover from this yet, but will quickly
  try to document the crash before I lose track of what exactly went
  into it and what I did: Basically I deleted a table and then ran
  vacuum verbose, with the net result that I cannot connect to this
  database anymore with the error:
 
 Any chance of a backtrace from the core dump file?  The log only tells
 us that the vacuum process crashed, which isn't much to go on ...

Silly me, this was in fact a December 13 snapshot. I have 3 machines
with an identical setup, two of which I already had moved to beta3. On
those, the error does not occur. On the December 13 version, I can
reproduce the error, which obviously has been fixed.

Sorry about the confusion.

- Frank



[HACKERS] beta3 vacuum crash

2001-01-20 Thread Frank Joerdens

I haven't tried everything to recover from this yet, but will quickly try to document 
the
crash before I lose track of what exactly went into it and what I did: Basically I 
deleted
a table and then ran vacuum verbose, with the net result that I cannot connect to this
database anymore with the error:

frank@kelis:/usr/local/httpd/htdocs  psql mpi
psql: FATAL 1:  Index 'pg_trigger_tgrelid_index' does not exist

This is, fortunately, not the production system but my development machine. I was 
going to
go live with this in a couple of week's time on beta3. Should I reconsider and move 
back
to 7.03 (I'd hate to cuz I'll have rows bigger than 32K, potentially . . . )?

The vacuum went like this:

--- begin vacuum ---
mpi=# drop table wimis;
DROP
mpi=# vacuum verbose;
NOTICE:  --Relation pg_type--
NOTICE:  Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE:  Rel pg_type: Pages: 3 -- 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u 
sec.
NOTICE:  --Relation pg_attribute--
NOTICE:  Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 
0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE:  Rel pg_attribute: Pages: 16 -- 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  --Relation pg_class--
NOTICE:  Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_class_oid_index: Pages 2; Tuples 136: Deleted 257. CPU 0.00s/0.01u 
sec.
NOTICE:  Index pg_class_relname_index: Pages 6; Tuples 136: Deleted 257. CPU 
0.00s/0.00u
sec.
NOTICE:  Rel pg_class: Pages: 7 -- 3; Tuple(s) moved: 76. CPU 0.01s/0.01u sec.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!# \q
--- end vacuum ---

The log says (I'm running the backend with -d 2):

--- begin log ---
DEBUG:  query: vacuum verbose;
DEBUG:  ProcessUtility: vacuum verbose;
NOTICE:  --Relation pg_type--
NOTICE:  Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE:  Rel pg_type: Pages: 3 -- 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE:  Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u 
sec.
NOTICE:  --Relation pg_attribute--
NOTICE:  Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 
0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE:  Rel pg_attribute: Pages: 16 -- 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE:  Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE:  --Relation pg_class--
NOTICE:  Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE:  Index