Re: [GENERAL] Postgres DB crashing

2013-06-23 Thread Tom Lane
Alan Hodgson ahodg...@simkin.ca writes:
 On Thursday, June 20, 2013 07:52:21 AM Merlin Moncure wrote:
 OP needs to explore use of connection pooler, in particular pgbouncer.
 Anyways none of this explains why the server is actually crashing.

 It might be hitting file descriptor limits. I didn't dig into the earlier 
 part 
 of this thread much, though.

The disturbing part of the original report was this:

 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC:  stuck spinlock 
 (0x2aaab54279d4) detected at bufmgr.c:1239

which implies that something was holding a buffer header spinlock for an
unreasonably long time (roughly 2 minutes, when no operation that holds
such a lock should take more than a few nanoseconds).  But if you were
running a load test that absolutely mashed the machine into the ground,
as the OP seems to have been doing, maybe that could happen --- perhaps
some unlucky backend got interrupted and then swapped out during the
narrow window where it held such a lock, and the machine was too
overloaded to give that process any more cycles for a very long time.

As has been noted already, this test setup seems to have overloaded the
machine by at least two orders of magnitude compared to useful settings
for the available hardware.  The stuck spinlock error would only come
out if a lock had been held for quite a lot more than two orders of
magnitude more time than expected, though.  So I'm not entirely sure
that I buy this theory; but it's hard to see another one.  (I discount
the obvious other theory that there's a software bug, because I just
looked through 9.2's bufmgr.c very carefully, and there are no code
paths where it fails to release a buffer header lock within a very few
instructions from where it took the lock.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres DB crashing

2013-06-20 Thread Richard Huxton

On 18/06/13 18:31, bhanu udaya wrote:

Hello,
Greetings.

My PostgresSQL (9.2) is crashing after certain load tests. Currently,
postgressql is crashing when simulatenously 800 to 1000 threads are run
on a 10 million records schema. Not sure, if we have to tweak some more
parameters of postgres. Currently, the postgressql is configured as
below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres
limitation to support only 800 threads or any other configuration
required. Please look at the log as below with errors. Please reply


max_connections 5000
shared_buffers  2024 MB
synchronous_commit  off
wal_buffers 100 MB
wal_writer_delays   1000ms
checkpoint_segments 512
checkpoint_timeout  5 min
checkpoint_completion_target0.5
checkpoint_warning  30s
work_memory 1G
effective_cache_size5 GB


Just to point out, your memory settings are set to allow *at least*

 shared-buffers 2GB + (5000 * 1GB) = 5TB+

You don't have that much memory. You probably don't have that much disk. 
This is never going to work.


As has been said, there's no way you can do useful work simultaneously 
with 1000 threads if you only have 4 cores - use a connection pooler. 
You'll also need to reduce work_mem to 1MB or so.


--
  Richard Huxton
  Archonet Ltd


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres DB crashing

2013-06-20 Thread Merlin Moncure
On Thu, Jun 20, 2013 at 5:17 AM, Richard Huxton d...@archonet.com wrote:
 On 18/06/13 18:31, bhanu udaya wrote:

 Hello,
 Greetings.

 My PostgresSQL (9.2) is crashing after certain load tests. Currently,
 postgressql is crashing when simulatenously 800 to 1000 threads are run
 on a 10 million records schema. Not sure, if we have to tweak some more
 parameters of postgres. Currently, the postgressql is configured as
 below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres
 limitation to support only 800 threads or any other configuration
 required. Please look at the log as below with errors. Please reply


 max_connections 5000
 shared_buffers  2024 MB
 synchronous_commit  off
 wal_buffers 100 MB
 wal_writer_delays   1000ms
 checkpoint_segments 512
 checkpoint_timeout  5 min
 checkpoint_completion_target0.5
 checkpoint_warning  30s
 work_memory 1G
 effective_cache_size5 GB


 Just to point out, your memory settings are set to allow *at least*

  shared-buffers 2GB + (5000 * 1GB) = 5TB+

 You don't have that much memory. You probably don't have that much disk.
 This is never going to work.

 As has been said, there's no way you can do useful work simultaneously with
 1000 threads if you only have 4 cores - use a connection pooler. You'll also
 need to reduce work_mem to 1MB or so.

aside: if you have particular query that needs extra work_mem, you can
always temporarily raise it at run time (unlike shared buffers).

OP needs to explore use of connection pooler, in particular pgbouncer.
 Anyways none of this explains why the server is actually crashing.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres DB crashing

2013-06-20 Thread Alan Hodgson
On Thursday, June 20, 2013 07:52:21 AM Merlin Moncure wrote:
 OP needs to explore use of connection pooler, in particular pgbouncer.
  Anyways none of this explains why the server is actually crashing.

It might be hitting file descriptor limits. I didn't dig into the earlier part 
of this thread much, though.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [pgadmin-support] [GENERAL] Postgres DB crashing

2013-06-19 Thread bhanu udaya
Thanks for the quick response. These errors are after disabling the autovacuum. 
 auto_vacuum parameter was set to off.
 Can find the exact reason for this crash. 
 
Thanks and Regards
Radha Krishna
Date: Tue, 18 Jun 2013 13:54:09 -0400
Subject: Re: [pgadmin-support] [GENERAL] Postgres DB crashing
From: rumman...@gmail.com
To: udayabhanu1...@hotmail.com
CC: kgri...@mail.com; adrian.kla...@gmail.com; pgsql-general@postgresql.org; 
pgadmin-supp...@postgresql.org; laurenz.a...@wien.gv.at; 
chris.trav...@gmail.com; mag...@hagander.net

Stop the autovacuum process and try again.

On Tue, Jun 18, 2013 at 1:31 PM, bhanu udaya udayabhanu1...@hotmail.com wrote:







Hello,
Greetings. 
My PostgresSQL (9.2) is crashing after certain load tests. Currently, 
postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 
million records schema. Not sure, if we have to tweak some more parameters of 
postgres. Currently, the postgressql is configured as below on a 7GB Ram on an 
Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 
threads or any other configuration required. Please look at the log as below 
with errors. Please reply

 
 


 
 
 
  max_connections 
  5000
 
 
  shared_buffers 
  2024 MB
 
 
  synchronous_commit
  off
 
 
  wal_buffers
  100 MB
 
 
  wal_writer_delays
  1000ms
 
 
  checkpoint_segments
  512
 
 
  checkpoint_timeout
  5 min
 
 
  checkpoint_completion_target
  0.5
 
 
  checkpoint_warning
  30s
 
 
  work_memory
  1G
 
 
  effective_cache_size
  5 GB
 

 
 
 
2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR:  canceling autovacuum task


2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT:  automatic vacuum of table 
newrelic.tenant1.customer


2013-06-11 15:11:17 GMT [25242]: [1-1]LOG:  sending cancel to blocking 
autovacuum PID 26201


2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL:  Process 25242 waits for 
ExclusiveLock on extension of relation 679054 of database 666546.


2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT:  UPDATE tenant1.customer SET 
lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) from 
tenant1.customer ), (select max(id) from tenant1.customer )) as id ) AND 
softdeleteflag IS NOT TRUE



2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING:  could not send signal to 
process 26201: No such process


2013-06-11 15:22:29 GMT [9]: [11-1]WARNING:  worker took too long to start; 
canceled


2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING:  autovacuum worker started 
without a worker entry


2013-06-11 16:03:33 GMT [23092]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:06:05 GMT [23222]: [5-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL:  canceling authentication due to 
timeout


2013-06-11 16:23:16 GMT [25128]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:23:20 GMT [25128]: [2-1]LOG:  unexpected EOF on client connection 
with an open transaction


2013-06-11 16:30:56 GMT [23695]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:43:55 GMT [24618]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:44:29 GMT [25204]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:54:14 GMT [6]: [1-1]PANIC:  stuck spinlock (0x2aaab54279d4) 
detected at bufmgr.c:1239


2013-06-11 16:54:14 GMT [32521]: [8-1]LOG:  checkpointer process (PID 6) 
was terminated by signal 6: Aborted


2013-06-11 16:54:14 GMT [32521]: [9-1]LOG:  terminating any other active server 
processes


2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.



2013-06-11 16:54:14 GMT [26931]: [3-1]HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.


2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.

 
2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL:  the database system is in 
recovery mode


2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current 
 

  

  

[GENERAL] Postgres DB crashing

2013-06-18 Thread bhanu udaya



Hello,
Greetings. 
My PostgresSQL (9.2) is crashing after certain load tests. Currently, 
postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 
million records schema. Not sure, if we have to tweak some more parameters of 
postgres. Currently, the postgressql is configured as below on a 7GB Ram on an 
Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 
threads or any other configuration required. Please look at the log as below 
with errors. Please reply
 
 


 
 
 
  max_connections 
  5000
 
 
  shared_buffers 
  2024 MB
 
 
  synchronous_commit
  off
 
 
  wal_buffers
  100 MB
 
 
  wal_writer_delays
  1000ms
 
 
  checkpoint_segments
  512
 
 
  checkpoint_timeout
  5 min
 
 
  checkpoint_completion_target
  0.5
 
 
  checkpoint_warning
  30s
 
 
  work_memory
  1G
 
 
  effective_cache_size
  5 GB
 

 
 
 
2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR:  canceling autovacuum task


2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT:  automatic vacuum of table 
newrelic.tenant1.customer


2013-06-11 15:11:17 GMT [25242]: [1-1]LOG:  sending cancel to blocking 
autovacuum PID 26201


2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL:  Process 25242 waits for 
ExclusiveLock on extension of relation 679054 of database 666546.


2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT:  UPDATE tenant1.customer SET 
lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) from 
tenant1.customer ), (select max(id) from tenant1.customer )) as id ) AND 
softdeleteflag IS NOT TRUE


2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING:  could not send signal to 
process 26201: No such process


2013-06-11 15:22:29 GMT [9]: [11-1]WARNING:  worker took too long to start; 
canceled


2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING:  autovacuum worker started 
without a worker entry


2013-06-11 16:03:33 GMT [23092]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:06:05 GMT [23222]: [5-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL:  canceling authentication due to 
timeout


2013-06-11 16:23:16 GMT [25128]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:23:20 GMT [25128]: [2-1]LOG:  unexpected EOF on client connection 
with an open transaction


2013-06-11 16:30:56 GMT [23695]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:43:55 GMT [24618]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:44:29 GMT [25204]: [1-1]LOG:  could not receive data from client: 
Connection timed out


2013-06-11 16:54:14 GMT [6]: [1-1]PANIC:  stuck spinlock (0x2aaab54279d4) 
detected at bufmgr.c:1239


2013-06-11 16:54:14 GMT [32521]: [8-1]LOG:  checkpointer process (PID 6) 
was terminated by signal 6: Aborted


2013-06-11 16:54:14 GMT [32521]: [9-1]LOG:  terminating any other active server 
processes


2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.


2013-06-11 16:54:14 GMT [26931]: [3-1]HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.


2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, because 
another server process exited abnormally and possibly corrupted shared memory.
 
2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL:  the database system is in 
recovery mode


2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING:  terminating connection because 
of crash of another server process


2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL:  The postmaster has commanded 
this server process to roll back the current 
 

  

Re: [GENERAL] Postgres DB crashing

2013-06-18 Thread AI Rumman
Stop the autovacuum process and try again.


On Tue, Jun 18, 2013 at 1:31 PM, bhanu udaya udayabhanu1...@hotmail.comwrote:

  Hello,
 Greetings.

 My PostgresSQL (9.2) is crashing after certain load tests. Currently,
 postgressql is crashing when simulatenously 800 to 1000 threads are run on
 a 10 million records schema. Not sure, if we have to tweak some more
 parameters of postgres. Currently, the postgressql is configured as below
 on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres
 limitation to support only 800 threads or any other configuration required.
 Please look at the log as below with errors. Please reply


   max_connections  5000  shared_buffers  2024 MB  synchronous_commit off
 wal_buffers 100 MB  wal_writer_delays 1000ms  checkpoint_segments 512
 checkpoint_timeout 5 min  checkpoint_completion_target 0.5
 checkpoint_warning 30s  work_memory 1G  effective_cache_size 5 GB



 2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR: canceling autovacuum task

 2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT: automatic vacuum of table
 newrelic.tenant1.customer

 2013-06-11 15:11:17 GMT [25242]: [1-1]LOG: sending cancel to blocking
 autovacuum PID 26201

 2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL: Process 25242 waits for
 ExclusiveLock on extension of relation 679054 of database 666546.

 2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT: UPDATE tenant1.customer
 SET lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id)
 from tenant1.customer ), (select max(id) from tenant1.customer )) as id )
 AND softdeleteflag IS NOT TRUE

 2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING: could not send signal to
 process 26201: No such process

 2013-06-11 15:22:29 GMT [9]: [11-1]WARNING: worker took too long to
 start; canceled

 2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING: autovacuum worker started
 without a worker entry

 2013-06-11 16:03:33 GMT [23092]: [1-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:06:05 GMT [23222]: [5-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL: canceling authentication due
 to timeout

 2013-06-11 16:23:16 GMT [25128]: [1-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:23:20 GMT [25128]: [2-1]LOG: unexpected EOF on client
 connection with an open transaction

 2013-06-11 16:30:56 GMT [23695]: [1-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:43:55 GMT [24618]: [1-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:44:29 GMT [25204]: [1-1]LOG: could not receive data from
 client: Connection timed out

 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC: stuck spinlock
 (0x2aaab54279d4) detected at bufmgr.c:1239

 2013-06-11 16:54:14 GMT [32521]: [8-1]LOG: checkpointer process (PID
 6) was terminated by signal 6: Aborted

 2013-06-11 16:54:14 GMT [32521]: [9-1]LOG: terminating any other active
 server processes

 2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING: terminating connection
 because of crash of another server process

 2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL: The postmaster has commanded
 this server process to roll back the current transaction and exit, because
 another server process exited abnormally and possibly corrupted shared
 memory.

 2013-06-11 16:54:14 GMT [26931]: [3-1]HINT: In a moment you should be able
 to reconnect to the database and repeat your command.

 2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING: terminating connection
 because of crash of another server process

 2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL: The postmaster has commanded
 this server process to roll back the current transaction and exit, because
 another server process exited abnormally and possibly corrupted shared
 memory.

 2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL: the database system is in
 recovery mode

 2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING: terminating connection
 because of crash of another server process

 2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL: The postmaster has commanded
 this server process to roll back the current




Re: [GENERAL] Postgres DB crashing

2013-06-18 Thread John R Pierce

On 6/18/2013 10:31 AM, bhanu udaya wrote:
My PostgresSQL (9.2) is crashing after certain load tests. Currently, 
postgressql is crashing when simulatenously 800 to 1000 threads are 
run on a 10 million records schema. Not sure, if we have to tweak some 
more parameters of postgres. Currently, the postgressql is configured 
as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this 
postgres limitation to support only 800 threads or any other 
configuration required. Please look at the log as below with errors. 
Please reply


thats an insanely high number of connections on a quad core processor.   
in general, any more than 2-4X the number of cpu hardware threads in 
concurrent queries is counterproductive and will result in higher 
overhead and less throughput.   if you have a real world use case for 
800-1000 client threads and your workload is predominately short fast 
transactions (OLTP), you should use a connection pooler like pgbouncer 
and limit the number of active connections to something like 32 on your 
4 core/8 thread CPU.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general