RE : RE : [HACKERS] Stability problems

2002-11-15 Thread Verger Nicolas
  Scott you're right, it was a hardware problem.
  Thanks for your help.
 
 
 Glad to be of help.  What was the problem?  Bad memory or bad hard
drive?
 Just curious.

It was a bad 512Mo memory module and a bad memory slot on the
motherboard.
Our hosting provider never checks memory before, but now it will make
the test systematically.

Nicolas VERGER


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE : [HACKERS] Stability problems

2002-11-15 Thread Verger Nicolas
You're right, it was a hardware problem.
Thanks for your help.

Nicolas VERGER

 -Message d'origine-
 De : [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED]] De la part de scott.marlowe
 Envoyé : mercredi 6 novembre 2002 21:38
 À : Nicolas VERGER
 Cc : 'PostgreSQL Hackers Mailing List'
 Objet : Re: [HACKERS] Stability problems
 
 I would recommend checking your memory (look for memtest86 online
 somewhere.  Good tool.)  Anytime a machine seems to act flakely
there's a
 better than even chance it has a bad bit of memory in it.
 
 On Wed, 6 Nov 2002, Nicolas VERGER wrote:
 
  Hi,
  I have strange stability problems.
  I can't access a table (the table is different each time I get the
  problem, it could be a system table (pg_am), or a user defined one):
  Can't select * the whole table but can select * limit x offset
y, so
  it appears that only a tuple is in bad status. I can't vacuum or
pg_dump
  this table too.
  The error disappears after waiting some time.
 
  I get the following error in log when select the 'bad' line:
 

  
  2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
  terminated by signal 11
  2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active
server
  processes
  2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
  recovery mode
  2002-11-05 11:26:42 [3062]   DEBUG:  all server processes
terminated;
  reinitializing shared memory and semaphores
  2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted
at
  2002-11-05 11:23:00 CET
 

  
 
  I get the following error in log when vacuuming the 'bad' table:
 

  
  2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len =
191
  to page 150 (free space 4294967096, nusd 0, noff 0)
  2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768)
exited
  with exit code 2
  2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active
server
  processes
  2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL
backend:
  The Postmaster has informed me that some other backend
  died abnormally and possibly corrupted shared memory.
  I have rolled back the current transaction and am
  going to terminate your database system connection and exit.
  Please reconnect to the database system and repeat your
query.
  2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL
backend:
  The Postmaster has informed me that some other backend
  died abnormally and possibly corrupted shared memory.
  I have rolled back the current transaction and am
  going to terminate your database system connection and exit.
  Please reconnect to the database system and repeat your
query.
  2002-11-05 14:46:44 [5569]   DEBUG:  all server processes
terminated;
  reinitializing shared memory and semaphores
  2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted
at
  2002-11-05 14:46:40 CET
 

  
 
  template1=# select version();
  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
 
  Is it a lock problem? Is there a way to log it?
 
 
  Thanks for all making such a good job.
 
  Nicolas VERGER
 
 
  ---(end of
broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 
 
 
 ---(end of
broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



RE : [HACKERS] Stability problems

2002-11-12 Thread Nicolas VERGER
Scott you're right, it was a hardware problem.
Thanks for your help.

Nicolas VERGER

 -Message d'origine-
 De : [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED]] De la part de scott.marlowe
 Envoyé : mercredi 6 novembre 2002 21:38
 À : Nicolas VERGER
 Cc : 'PostgreSQL Hackers Mailing List'
 Objet : Re: [HACKERS] Stability problems
 
 I would recommend checking your memory (look for memtest86 online
 somewhere.  Good tool.)  Anytime a machine seems to act flakely
there's a
 better than even chance it has a bad bit of memory in it.
 
 On Wed, 6 Nov 2002, Nicolas VERGER wrote:
 
  Hi,
  I have strange stability problems.
  I can't access a table (the table is different each time I get the
  problem, it could be a system table (pg_am), or a user defined one):
  Can't select * the whole table but can select * limit x offset
y, so
  it appears that only a tuple is in bad status. I can't vacuum or
pg_dump
  this table too.
  The error disappears after waiting some time.
 
  I get the following error in log when select the 'bad' line:
 

  
  2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
  terminated by signal 11
  2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active
server
  processes
  2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
  recovery mode
  2002-11-05 11:26:42 [3062]   DEBUG:  all server processes
terminated;
  reinitializing shared memory and semaphores
  2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted
at
  2002-11-05 11:23:00 CET
 

  
 
  I get the following error in log when vacuuming the 'bad' table:
 

  
  2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len =
191
  to page 150 (free space 4294967096, nusd 0, noff 0)
  2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768)
exited
  with exit code 2
  2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active
server
  processes
  2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL
backend:
  The Postmaster has informed me that some other backend
  died abnormally and possibly corrupted shared memory.
  I have rolled back the current transaction and am
  going to terminate your database system connection and exit.
  Please reconnect to the database system and repeat your
query.
  2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL
backend:
  The Postmaster has informed me that some other backend
  died abnormally and possibly corrupted shared memory.
  I have rolled back the current transaction and am
  going to terminate your database system connection and exit.
  Please reconnect to the database system and repeat your
query.
  2002-11-05 14:46:44 [5569]   DEBUG:  all server processes
terminated;
  reinitializing shared memory and semaphores
  2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted
at
  2002-11-05 14:46:40 CET
 

  
 
  template1=# select version();
  PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
 
  Is it a lock problem? Is there a way to log it?
 
 
  Thanks for all making such a good job.
 
  Nicolas VERGER
 
 
  ---(end of
broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 
 
 
 ---(end of
broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: RE : [HACKERS] Stability problems

2002-11-12 Thread scott.marlowe
On Tue, 12 Nov 2002, Nicolas VERGER wrote:

 Scott you're right, it was a hardware problem.
 Thanks for your help.
 

Glad to be of help.  What was the problem?  Bad memory or bad hard drive?  
Just curious.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Stability problems

2002-11-06 Thread Tom Lane
Nicolas VERGER [EMAIL PROTECTED] writes:
 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len = 191
 to page 150 (free space 4294967096, nusd 0, noff 0)

 template1=# select version();
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

Hmm.  This looks a lot like the bug I recently noted in vacuum's
free-space calculations --- but that bug only affects machines where
MAXALIGN  4, which I would not expect for an Intel machine.  Anyway
you might try this patch:

*** pgsql-server/src/backend/commands/vacuum.c  2002/10/21 22:06:19 1.243
--- pgsql-server/src/backend/commands/vacuum.c  2002/10/31 19:25:29 1.244
***
*** 1753,1759 
}
to_vacpage-free -= MAXALIGN(tlen);
if (to_vacpage-offsets_used = 
to_vacpage-offsets_free)
!   to_vacpage-free -= 
MAXALIGN(sizeof(ItemIdData));
(to_vacpage-offsets_used)++;
if (free_vtmove == 0)
{
--- 1753,1759 
}
to_vacpage-free -= MAXALIGN(tlen);
if (to_vacpage-offsets_used = 
to_vacpage-offsets_free)
!   to_vacpage-free -= sizeof(ItemIdData);
(to_vacpage-offsets_used)++;
if (free_vtmove == 0)
{

(Line numbers are for recent CVS tip and are off a little for 7.2, but
there's only one occurrence of MAXALIGN(sizeof(... in vacuum.c; you
can't miss it.)

While you are at it, be sure to update to 7.2.3.  There are some
*critical* bug fixes in 7.2.3.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Stability problems

2002-11-06 Thread scott.marlowe
I would recommend checking your memory (look for memtest86 online 
somewhere.  Good tool.)  Anytime a machine seems to act flakely there's a 
better than even chance it has a bad bit of memory in it.

On Wed, 6 Nov 2002, Nicolas VERGER wrote:

 Hi,
 I have strange stability problems.
 I can't access a table (the table is different each time I get the
 problem, it could be a system table (pg_am), or a user defined one):
 Can't select * the whole table but can select * limit x offset y, so
 it appears that only a tuple is in bad status. I can't vacuum or pg_dump
 this table too.
 The error disappears after waiting some time.
 
 I get the following error in log when select the 'bad' line: 
 
 
 2002-11-05 11:26:42 [3062]   DEBUG:  server process (pid 4551) was
 terminated by signal 11
 2002-11-05 11:26:42 [3062]   DEBUG:  terminating any other active server
 processes
 2002-11-05 11:26:42 [4555]   FATAL 1:  The database system is in
 recovery mode
 2002-11-05 11:26:42 [3062]   DEBUG:  all server processes terminated;
 reinitializing shared memory and semaphores
 2002-11-05 11:26:42 [4557]   DEBUG:  database system was interrupted at
 2002-11-05 11:23:00 CET
 
 
 
 I get the following error in log when vacuuming the 'bad' table: 
 
 
 2002-11-05 14:46:44 [5768]   FATAL 2:  failed to add item with len = 191
 to page 150 (free space 4294967096, nusd 0, noff 0)
 2002-11-05 14:46:44 [5569]   DEBUG:  server process (pid 5768) exited
 with exit code 2
 2002-11-05 14:46:44 [5569]   DEBUG:  terminating any other active server
 processes
 2002-11-05 14:46:44 [5771]   NOTICE:  Message from PostgreSQL backend:
 The Postmaster has informed me that some other backend
 died abnormally and possibly corrupted shared memory.
 I have rolled back the current transaction and am
 going to terminate your database system connection and exit.
 Please reconnect to the database system and repeat your query.
 2002-11-05 14:46:44 [5772]   NOTICE:  Message from PostgreSQL backend:
 The Postmaster has informed me that some other backend
 died abnormally and possibly corrupted shared memory.
 I have rolled back the current transaction and am
 going to terminate your database system connection and exit.
 Please reconnect to the database system and repeat your query.
 2002-11-05 14:46:44 [5569]   DEBUG:  all server processes terminated;
 reinitializing shared memory and semaphores
 2002-11-05 14:46:44 [5774]   DEBUG:  database system was interrupted at
 2002-11-05 14:46:40 CET
 
 
 
 template1=# select version();
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
 
 Is it a lock problem? Is there a way to log it?
 
 
 Thanks for all making such a good job.
 
 Nicolas VERGER
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly