Re: [HACKERS] buildfarm failures on smew and anole

2013-10-17 Thread Andres Freund
On 2013-10-16 09:35:46 -0400, Robert Haas wrote:
 Gah.  I fixed one instance of that problem in test_config_settings(),
 but missed the other.

Maybe it'd be better to default to none, just as max_connections
defaults to 1 and shared_buffers to 16? As we write out the value in the
config file, everything should still continue to work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-16 Thread Robert Haas
On Tue, Oct 15, 2013 at 11:17 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
  I cleaned the semaphores on smew, but they came back.  Whatever is
  crashing is leaving the semaphores lying around.

 Ugh.  When did you do that exactly?  I thought I fixed the problem
 that was causing that days ago, and the last 4 days worth of runs all
 show the too many clients error.

 I did it a few times over the weekend.  At least twice less than 4 days
 ago.  There are currently no semaphores left around, so whatever
 happened in the last run cleaned it up.

That seems to suggest I've introduced some bug.  I'm at a loss as to
what it is, though.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-16 Thread Andres Freund
On 2013-10-16 08:39:10 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 11:17 PM, Peter Eisentraut pete...@gmx.net wrote:
  On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
   I cleaned the semaphores on smew, but they came back.  Whatever is
   crashing is leaving the semaphores lying around.
 
  Ugh.  When did you do that exactly?  I thought I fixed the problem
  that was causing that days ago, and the last 4 days worth of runs all
  show the too many clients error.
 
  I did it a few times over the weekend.  At least twice less than 4 days
  ago.  There are currently no semaphores left around, so whatever
  happened in the last run cleaned it up.
 
 That seems to suggest I've introduced some bug.  I'm at a loss as to
 what it is, though.  :-(

Ah. I see the issue. To reproduce do something like
# mkdir /tmp/empty
# mount --bind /tmp/empty /dev/shm/
and then run initdb.

The issue is that test_config_settings determines max_connections
without disabling dynamic shared memory which consequently chooses posix
which doesn't work. Setting it to none during the test makes it work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-16 Thread Robert Haas
On Wed, Oct 16, 2013 at 8:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-16 08:39:10 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 11:17 PM, Peter Eisentraut pete...@gmx.net wrote:
  On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
   I cleaned the semaphores on smew, but they came back.  Whatever is
   crashing is leaving the semaphores lying around.
 
  Ugh.  When did you do that exactly?  I thought I fixed the problem
  that was causing that days ago, and the last 4 days worth of runs all
  show the too many clients error.
 
  I did it a few times over the weekend.  At least twice less than 4 days
  ago.  There are currently no semaphores left around, so whatever
  happened in the last run cleaned it up.

 That seems to suggest I've introduced some bug.  I'm at a loss as to
 what it is, though.  :-(

 Ah. I see the issue. To reproduce do something like
 # mkdir /tmp/empty
 # mount --bind /tmp/empty /dev/shm/
 and then run initdb.

 The issue is that test_config_settings determines max_connections
 without disabling dynamic shared memory which consequently chooses posix
 which doesn't work. Setting it to none during the test makes it work.

Gah.  I fixed one instance of that problem in test_config_settings(),
but missed the other.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-16 Thread Robert Haas
On Wed, Oct 16, 2013 at 9:37 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-16 09:35:46 -0400, Robert Haas wrote:
 Gah.  I fixed one instance of that problem in test_config_settings(),
 but missed the other.

 Maybe it'd be better to default to none, just as max_connections
 defaults to 1 and shared_buffers to 16? As we write out the value in the
 config file, everything should still continue to work.

Hmm, possibly.  But how would we document that?  It seems strange to
say that the default is none, but the actual setting probably won't be
none on your system because we hack up postgresql.conf.
shared_buffers pretty much just glosses over the distinction between
default and what you probably have configured, but I'm not sure
that's actually great policy.

Trivial fixed pushed, for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-16 Thread Andres Freund
On 2013-10-16 09:44:32 -0400, Robert Haas wrote:
 On Wed, Oct 16, 2013 at 9:37 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-10-16 09:35:46 -0400, Robert Haas wrote:
  Gah.  I fixed one instance of that problem in test_config_settings(),
  but missed the other.
 
  Maybe it'd be better to default to none, just as max_connections
  defaults to 1 and shared_buffers to 16? As we write out the value in the
  config file, everything should still continue to work.
 
 Hmm, possibly.  But how would we document that?  It seems strange to
 say that the default is none, but the actual setting probably won't be
 none on your system because we hack up postgresql.conf.
 shared_buffers pretty much just glosses over the distinction between
 default and what you probably have configured, but I'm not sure
 that's actually great policy.

I can't remember somebody actually being confused by that with s_b or
max_connections. So maybe it's just ok not to document it. But yes, I
can't come up with a succinct description of that behaviour either.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-15 Thread Peter Eisentraut
On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
  I cleaned the semaphores on smew, but they came back.  Whatever is
  crashing is leaving the semaphores lying around.
 
 Ugh.  When did you do that exactly?  I thought I fixed the problem
 that was causing that days ago, and the last 4 days worth of runs all
 show the too many clients error.

I did it a few times over the weekend.  At least twice less than 4 days
ago.  There are currently no semaphores left around, so whatever
happened in the last run cleaned it up.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Robert Haas
On Fri, Oct 11, 2013 at 4:03 PM, Andrew Dunstan and...@dunslane.net wrote:
 Can the owners of these buildfarm machines please check whether there
 are extra semaphores allocated and if so free them?  Or at least
 reboot, to see if that unbreaks the build?

 It is possible to set the buildfarm config

 build_env= {MAX_CONNECTIONS = 10 },

 and the tests will run with that constraint.

 Not sure if this would help.

Maybe I didn't explain that well.  The problem is that the regression
tests require at least 20 connections to run, and those two machines
are currently auto-selecting 10 connections, so make check is failing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Andres Freund
On 2013-10-14 09:12:09 -0400, Robert Haas wrote:
 On Fri, Oct 11, 2013 at 4:03 PM, Andrew Dunstan and...@dunslane.net wrote:
  Can the owners of these buildfarm machines please check whether there
  are extra semaphores allocated and if so free them?  Or at least
  reboot, to see if that unbreaks the build?
 
  It is possible to set the buildfarm config
 
  build_env= {MAX_CONNECTIONS = 10 },
 
  and the tests will run with that constraint.
 
  Not sure if this would help.
 
 Maybe I didn't explain that well.  The problem is that the regression
 tests require at least 20 connections to run, and those two machines
 are currently auto-selecting 10 connections, so make check is failing.

I think pg_regress has support for spreading groups to fewer connections
if max_connections is set appropriately. I guess that's what Andrew is
referring to.

That said, I don't think that's the solution here. The machine clearly
worked with more connections until recently.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Andrew Dunstan


On 10/14/2013 09:12 AM, Robert Haas wrote:

On Fri, Oct 11, 2013 at 4:03 PM, Andrew Dunstan and...@dunslane.net wrote:

Can the owners of these buildfarm machines please check whether there
are extra semaphores allocated and if so free them?  Or at least
reboot, to see if that unbreaks the build?

It is possible to set the buildfarm config

 build_env= {MAX_CONNECTIONS = 10 },

and the tests will run with that constraint.

Not sure if this would help.

Maybe I didn't explain that well.  The problem is that the regression
tests require at least 20 connections to run, and those two machines
are currently auto-selecting 10 connections, so make check is failing.



Why do they need 20 connections? pg_regress has code in it to limit the 
degree of parallelism of tests, and has done for years, specifically to 
cater for buildfarm machines that are unable to handle the defaults. 
Using this option in the buildfarm client config triggers use of this 
feature.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Robert Haas
On Mon, Oct 14, 2013 at 9:22 AM, Andrew Dunstan and...@dunslane.net wrote:
 Maybe I didn't explain that well.  The problem is that the regression
 tests require at least 20 connections to run, and those two machines
 are currently auto-selecting 10 connections, so make check is failing.

 Why do they need 20 connections? pg_regress has code in it to limit the
 degree of parallelism of tests, and has done for years, specifically to
 cater for buildfarm machines that are unable to handle the defaults. Using
 this option in the buildfarm client config triggers use of this feature.

Hmm, I wasn't aware of that.  I thought they needed 20 connections
because parallel_schedule says:

# By convention, we put no more than twenty tests in any one parallel group;
# this limits the number of connections needed to run the tests.

If it's not supposed to matter how many connections are available,
then that comment is misleading.  But I think it does matter, at least
in some situations, because otherwise these machines wouldn't be
failing with sorry, too many clients already.

Anyway, as Andres said, the machines were working fine until recently,
so I think we just need to get them un-broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Andres Freund
On 2013-10-14 09:28:04 -0400, Robert Haas wrote:
 # By convention, we put no more than twenty tests in any one parallel group;
 # this limits the number of connections needed to run the tests.
 
 If it's not supposed to matter how many connections are available,
 then that comment is misleading.  But I think it does matter, at least
 in some situations, because otherwise these machines wouldn't be
 failing with sorry, too many clients already.

Well, you need to explicitly pass --max-connections to pg_regress.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Anyway, as Andres said, the machines were working fine until recently,
 so I think we just need to get them un-broken.

I think you're talking past each other.  What would be useful here is
to find out *why* these machines are now failing, when they didn't before.
There might or might not be anything useful to be done about it, but if
we don't have that information, we can't tell.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Robert Haas
On Mon, Oct 14, 2013 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Anyway, as Andres said, the machines were working fine until recently,
 so I think we just need to get them un-broken.

 I think you're talking past each other.  What would be useful here is
 to find out *why* these machines are now failing, when they didn't before.
 There might or might not be anything useful to be done about it, but if
 we don't have that information, we can't tell.

Well, my OP had a working theory which I think fits the facts, and
some suggested troubleshooting steps.  How about that for a start?

The real problem here is that neither of the buildfarm owners has
responded to this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Peter Eisentraut
On Fri, 2013-10-11 at 15:33 -0400, Robert Haas wrote:
 Can the owners of these buildfarm machines please check whether there
 are extra semaphores allocated and if so free them?  Or at least
 reboot, to see if that unbreaks the build? 

I cleaned the semaphores on smew, but they came back.  Whatever is
crashing is leaving the semaphores lying around.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-14 Thread Robert Haas
On Mon, Oct 14, 2013 at 4:29 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Fri, 2013-10-11 at 15:33 -0400, Robert Haas wrote:
 Can the owners of these buildfarm machines please check whether there
 are extra semaphores allocated and if so free them?  Or at least
 reboot, to see if that unbreaks the build?

 I cleaned the semaphores on smew, but they came back.  Whatever is
 crashing is leaving the semaphores lying around.

Ugh.  When did you do that exactly?  I thought I fixed the problem
that was causing that days ago, and the last 4 days worth of runs all
show the too many clients error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buildfarm failures on smew and anole

2013-10-11 Thread Andrew Dunstan


On 10/11/2013 03:33 PM, Robert Haas wrote:

The build is continuing to fail on smew and anole.  The reason it's
failing is because those machines are choosing max_connections = 10,
which is not enough to run the regression tests.  I think this is
probably because of System V semaphore exhaustion.  The machines are
not choosing a small value for shared_buffers - they're still picking
128MB - so the problem is not the operating system's shared memory
limit.  But it might be that the operating system is short on some
other resource that prevents starting up with a more normal value for
max_connections.  My best guess is System V semaphores; I think that
one of the failed runs caused by the dynamic shared memory patch
probably left a bunch of semaphores allocated, so the build will keep
failing until those are manually cleaned up.

Can the owners of these buildfarm machines please check whether there
are extra semaphores allocated and if so free them?  Or at least
reboot, to see if that unbreaks the build?



It is possible to set the buildfarm config

build_env= {MAX_CONNECTIONS = 10 },

and the tests will run with that constraint.

Not sure if this would help.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers