Re: [GENERAL] Postgres DB crashing
Alan Hodgson ahodg...@simkin.ca writes: On Thursday, June 20, 2013 07:52:21 AM Merlin Moncure wrote: OP needs to explore use of connection pooler, in particular pgbouncer. Anyways none of this explains why the server is actually crashing. It might be hitting file descriptor limits. I didn't dig into the earlier part of this thread much, though. The disturbing part of the original report was this: 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC: stuck spinlock (0x2aaab54279d4) detected at bufmgr.c:1239 which implies that something was holding a buffer header spinlock for an unreasonably long time (roughly 2 minutes, when no operation that holds such a lock should take more than a few nanoseconds). But if you were running a load test that absolutely mashed the machine into the ground, as the OP seems to have been doing, maybe that could happen --- perhaps some unlucky backend got interrupted and then swapped out during the narrow window where it held such a lock, and the machine was too overloaded to give that process any more cycles for a very long time. As has been noted already, this test setup seems to have overloaded the machine by at least two orders of magnitude compared to useful settings for the available hardware. The stuck spinlock error would only come out if a lock had been held for quite a lot more than two orders of magnitude more time than expected, though. So I'm not entirely sure that I buy this theory; but it's hard to see another one. (I discount the obvious other theory that there's a software bug, because I just looked through 9.2's bufmgr.c very carefully, and there are no code paths where it fails to release a buffer header lock within a very few instructions from where it took the lock.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB crashing
On 18/06/13 18:31, bhanu udaya wrote: Hello, Greetings. My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply max_connections 5000 shared_buffers 2024 MB synchronous_commit off wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 checkpoint_timeout 5 min checkpoint_completion_target0.5 checkpoint_warning 30s work_memory 1G effective_cache_size5 GB Just to point out, your memory settings are set to allow *at least* shared-buffers 2GB + (5000 * 1GB) = 5TB+ You don't have that much memory. You probably don't have that much disk. This is never going to work. As has been said, there's no way you can do useful work simultaneously with 1000 threads if you only have 4 cores - use a connection pooler. You'll also need to reduce work_mem to 1MB or so. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB crashing
On Thu, Jun 20, 2013 at 5:17 AM, Richard Huxton d...@archonet.com wrote: On 18/06/13 18:31, bhanu udaya wrote: Hello, Greetings. My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply max_connections 5000 shared_buffers 2024 MB synchronous_commit off wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 checkpoint_timeout 5 min checkpoint_completion_target0.5 checkpoint_warning 30s work_memory 1G effective_cache_size5 GB Just to point out, your memory settings are set to allow *at least* shared-buffers 2GB + (5000 * 1GB) = 5TB+ You don't have that much memory. You probably don't have that much disk. This is never going to work. As has been said, there's no way you can do useful work simultaneously with 1000 threads if you only have 4 cores - use a connection pooler. You'll also need to reduce work_mem to 1MB or so. aside: if you have particular query that needs extra work_mem, you can always temporarily raise it at run time (unlike shared buffers). OP needs to explore use of connection pooler, in particular pgbouncer. Anyways none of this explains why the server is actually crashing. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres DB crashing
On Thursday, June 20, 2013 07:52:21 AM Merlin Moncure wrote: OP needs to explore use of connection pooler, in particular pgbouncer. Anyways none of this explains why the server is actually crashing. It might be hitting file descriptor limits. I didn't dig into the earlier part of this thread much, though. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [pgadmin-support] [GENERAL] Postgres DB crashing
Thanks for the quick response. These errors are after disabling the autovacuum. auto_vacuum parameter was set to off. Can find the exact reason for this crash. Thanks and Regards Radha Krishna Date: Tue, 18 Jun 2013 13:54:09 -0400 Subject: Re: [pgadmin-support] [GENERAL] Postgres DB crashing From: rumman...@gmail.com To: udayabhanu1...@hotmail.com CC: kgri...@mail.com; adrian.kla...@gmail.com; pgsql-general@postgresql.org; pgadmin-supp...@postgresql.org; laurenz.a...@wien.gv.at; chris.trav...@gmail.com; mag...@hagander.net Stop the autovacuum process and try again. On Tue, Jun 18, 2013 at 1:31 PM, bhanu udaya udayabhanu1...@hotmail.com wrote: Hello, Greetings. My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply max_connections 5000 shared_buffers 2024 MB synchronous_commit off wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 checkpoint_timeout 5 min checkpoint_completion_target 0.5 checkpoint_warning 30s work_memory 1G effective_cache_size 5 GB 2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR: canceling autovacuum task 2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT: automatic vacuum of table newrelic.tenant1.customer 2013-06-11 15:11:17 GMT [25242]: [1-1]LOG: sending cancel to blocking autovacuum PID 26201 2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL: Process 25242 waits for ExclusiveLock on extension of relation 679054 of database 666546. 2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT: UPDATE tenant1.customer SET lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) from tenant1.customer ), (select max(id) from tenant1.customer )) as id ) AND softdeleteflag IS NOT TRUE 2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING: could not send signal to process 26201: No such process 2013-06-11 15:22:29 GMT [9]: [11-1]WARNING: worker took too long to start; canceled 2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING: autovacuum worker started without a worker entry 2013-06-11 16:03:33 GMT [23092]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:06:05 GMT [23222]: [5-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL: canceling authentication due to timeout 2013-06-11 16:23:16 GMT [25128]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:23:20 GMT [25128]: [2-1]LOG: unexpected EOF on client connection with an open transaction 2013-06-11 16:30:56 GMT [23695]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:43:55 GMT [24618]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:44:29 GMT [25204]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC: stuck spinlock (0x2aaab54279d4) detected at bufmgr.c:1239 2013-06-11 16:54:14 GMT [32521]: [8-1]LOG: checkpointer process (PID 6) was terminated by signal 6: Aborted 2013-06-11 16:54:14 GMT [32521]: [9-1]LOG: terminating any other active server processes 2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:54:14 GMT [26931]: [3-1]HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL: the database system is in recovery mode 2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current
[GENERAL] Postgres DB crashing
Hello, Greetings. My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply max_connections 5000 shared_buffers 2024 MB synchronous_commit off wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 checkpoint_timeout 5 min checkpoint_completion_target 0.5 checkpoint_warning 30s work_memory 1G effective_cache_size 5 GB 2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR: canceling autovacuum task 2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT: automatic vacuum of table newrelic.tenant1.customer 2013-06-11 15:11:17 GMT [25242]: [1-1]LOG: sending cancel to blocking autovacuum PID 26201 2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL: Process 25242 waits for ExclusiveLock on extension of relation 679054 of database 666546. 2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT: UPDATE tenant1.customer SET lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) from tenant1.customer ), (select max(id) from tenant1.customer )) as id ) AND softdeleteflag IS NOT TRUE 2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING: could not send signal to process 26201: No such process 2013-06-11 15:22:29 GMT [9]: [11-1]WARNING: worker took too long to start; canceled 2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING: autovacuum worker started without a worker entry 2013-06-11 16:03:33 GMT [23092]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:06:05 GMT [23222]: [5-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL: canceling authentication due to timeout 2013-06-11 16:23:16 GMT [25128]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:23:20 GMT [25128]: [2-1]LOG: unexpected EOF on client connection with an open transaction 2013-06-11 16:30:56 GMT [23695]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:43:55 GMT [24618]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:44:29 GMT [25204]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC: stuck spinlock (0x2aaab54279d4) detected at bufmgr.c:1239 2013-06-11 16:54:14 GMT [32521]: [8-1]LOG: checkpointer process (PID 6) was terminated by signal 6: Aborted 2013-06-11 16:54:14 GMT [32521]: [9-1]LOG: terminating any other active server processes 2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:54:14 GMT [26931]: [3-1]HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL: the database system is in recovery mode 2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current
Re: [GENERAL] Postgres DB crashing
Stop the autovacuum process and try again. On Tue, Jun 18, 2013 at 1:31 PM, bhanu udaya udayabhanu1...@hotmail.comwrote: Hello, Greetings. My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply max_connections 5000 shared_buffers 2024 MB synchronous_commit off wal_buffers 100 MB wal_writer_delays 1000ms checkpoint_segments 512 checkpoint_timeout 5 min checkpoint_completion_target 0.5 checkpoint_warning 30s work_memory 1G effective_cache_size 5 GB 2013-06-11 15:11:17 GMT [26201]: [1-1]ERROR: canceling autovacuum task 2013-06-11 15:11:17 GMT [26201]: [2-1]CONTEXT: automatic vacuum of table newrelic.tenant1.customer 2013-06-11 15:11:17 GMT [25242]: [1-1]LOG: sending cancel to blocking autovacuum PID 26201 2013-06-11 15:11:17 GMT [25242]: [2-1]DETAIL: Process 25242 waits for ExclusiveLock on extension of relation 679054 of database 666546. 2013-06-11 15:11:17 GMT [25242]: [3-1]STATEMENT: UPDATE tenant1.customer SET lastmodifieddate = $1 WHERE id IN ( select random_range((select min(id) from tenant1.customer ), (select max(id) from tenant1.customer )) as id ) AND softdeleteflag IS NOT TRUE 2013-06-11 15:11:17 GMT [25242]: [4-1]WARNING: could not send signal to process 26201: No such process 2013-06-11 15:22:29 GMT [9]: [11-1]WARNING: worker took too long to start; canceled 2013-06-11 15:24:10 GMT [26511]: [1-1]WARNING: autovacuum worker started without a worker entry 2013-06-11 16:03:33 GMT [23092]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:06:05 GMT [23222]: [5-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:07:06 GMT [26869]: [1-1]FATAL: canceling authentication due to timeout 2013-06-11 16:23:16 GMT [25128]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:23:20 GMT [25128]: [2-1]LOG: unexpected EOF on client connection with an open transaction 2013-06-11 16:30:56 GMT [23695]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:43:55 GMT [24618]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:44:29 GMT [25204]: [1-1]LOG: could not receive data from client: Connection timed out 2013-06-11 16:54:14 GMT [6]: [1-1]PANIC: stuck spinlock (0x2aaab54279d4) detected at bufmgr.c:1239 2013-06-11 16:54:14 GMT [32521]: [8-1]LOG: checkpointer process (PID 6) was terminated by signal 6: Aborted 2013-06-11 16:54:14 GMT [32521]: [9-1]LOG: terminating any other active server processes 2013-06-11 16:54:14 GMT [26931]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26931]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:54:14 GMT [26931]: [3-1]HINT: In a moment you should be able to reconnect to the database and repeat your command. 2013-06-11 16:54:14 GMT [26401]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:54:14 GMT [26401]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2013-06-11 16:55:08 GMT [27579]: [1-1]FATAL: the database system is in recovery mode 2013-06-11 16:55:08 GMT [24041]: [1-1]WARNING: terminating connection because of crash of another server process 2013-06-11 16:55:08 GMT [24041]: [2-1]DETAIL: The postmaster has commanded this server process to roll back the current
Re: [GENERAL] Postgres DB crashing
On 6/18/2013 10:31 AM, bhanu udaya wrote: My PostgresSQL (9.2) is crashing after certain load tests. Currently, postgressql is crashing when simulatenously 800 to 1000 threads are run on a 10 million records schema. Not sure, if we have to tweak some more parameters of postgres. Currently, the postgressql is configured as below on a 7GB Ram on an Intel Xeon CPU E5507 2.27 GZ. Is this postgres limitation to support only 800 threads or any other configuration required. Please look at the log as below with errors. Please reply thats an insanely high number of connections on a quad core processor. in general, any more than 2-4X the number of cpu hardware threads in concurrent queries is counterproductive and will result in higher overhead and less throughput. if you have a real world use case for 800-1000 client threads and your workload is predominately short fast transactions (OLTP), you should use a connection pooler like pgbouncer and limit the number of active connections to something like 32 on your 4 core/8 thread CPU. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general