Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

2005-10-07 Thread Alvaro Herrera
On Fri, Oct 07, 2005 at 11:19:25AM -0400, Jean-Pierre Pelletier wrote:

> Our only remaining PostgreSQL problem is with pg_stat_actitivity
> being unreliable and the statistics collector being restarted many times
> every day.

The stats collector (which mantains pg_stat_activity among other things)
uses an UDP socket to receive info from the backends, so if UDP
communication is crippled, it's going to be unreliable.  Maybe there are
too many lost packets.  I don't know what could cause it to die though
-- certainly not lost packets.  (The postmaster restarts it
automatically if it detects it's not running.)

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

2005-10-07 Thread Jean-Pierre Pelletier

Turning off the antivirus fixed the problem.
We haven't have any read/write/open error in more
than  two days.

Thank you very much for your help and keep up the good work.

Our only remaining PostgreSQL problem is with pg_stat_actitivity
being unreliable and the statistics collector being restarted many times
every day.

Any idea what might be causing that?

Jean-Pierre Pelletier

- Original Message - 
From: "Jean-Pierre Pelletier" <[EMAIL PROTECTED]>

To: "Qingqing Zhou" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, October 05, 2005 2:58 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, 
Windows 2000




I'll recompile with the trace that's no problem,
and install the patched release tonight.

After your last email, I've excluded the postgreSQL
directory from the antivirus because I could do it without
rebooting.

I was also sometimes getting read/write or open
error Invalid argument without the server crashing.
After two days, if I haven't seen any of these
error messages there is a very high chance that it's
been fixed by turning off the antivirus.

Jean-Pierre Pelletier

- Original Message - 
From: "Qingqing Zhou" <[EMAIL PROTECTED]>

To: 
Sent: Wednesday, October 05, 2005 5:16 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 
beta2,

Windows 2000




""Jean-Pierre Pelletier"" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]


Yes, there is an antivirus software on the machine, a reboot is needed
when it's turned off,
I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
before that.

There are around 15 connections to PostgreSQL when it crashes but most
are idle
there may be a few inserts but no bulk inserts, the biggest load would
come from
select statements.



We haven't identified that the failed read/write are caused by anti-virus
software or intensive read/write. If you can compile the source, can you
patch smgrread()/smgrwrite() like this to capture the native windows
error:

void
smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
isTemp)
{
if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
  isTemp))
  ereport(ERROR,
   (errcode_for_file_access(),
errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
 blocknum,
 reln->smgr_rnode.spcNode,
 reln->smgr_rnode.dbNode,
 reln->smgr_rnode.relNode,
 GetLastError(;
}

Regards,
Qingqing


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

2005-10-07 Thread Qingqing Zhou

""Jean-Pierre Pelletier"" <[EMAIL PROTECTED]> wrote
> Turning off the antivirus fixed the problem.
> We haven't have any read/write/open error in more
> than  two days.
>
> Thank you very much for your help and keep up the good work.
>

You are welcome :-) But I still suspect if this really solves the problem 
... by the way, may I know what anti-virus software are you using? And, if 
it is possible, can you please turn on the anti-virus software again and 
check the GetLastError()?

A more detailed "guess" of the problem is here:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php

Thanks a lot,
Qingqing 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly