Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket

2006-05-09 Thread Peter Brant
Yep, the pipe.c patch is unnecessary now.

Pete

>>> Bruce Momjian  05/07/06 3:44 am >>>
Now that we know the cause of the Win32 failure (FRONTEND), we don't
need the Win32 part of this patch anymore right?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket error fix

2006-05-06 Thread Bruce Momjian

Now that we know the cause of the Win32 failure (FRONTEND), we don't
need the Win32 part of this patch anymore right?  (The stats display
part was already applied.)

---

Peter Brant wrote:
> Hi all,
> 
> Attached are two patches which in combination make pg_stat_activity
> work reliably for us on Windows.
> 
> The mysterious socket error turned out to be WSAEWOULDBLOCK.  Per
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/windows_sockets_error_codes_2.asp
> , it seems the thing to do is loop and try again.  pipe.patch does
> that.
> 
> pgstat.patch removes the delayed destroy code for backends, databases,
> and tables.  Database and table entries are cleaned up immediately upon
> receipt of the appropriate message.
> 
> Both patches were necessary to make pg_stat_activity work reliably. 
> With no changes, with a connection pool size of 31, under load, we'd
> typically see < 5 rows in pg_stat_activity.  With pgstat.patch applied,
> the number of rows would typically be between 15 and 20.  With
> pipe.patch also applied, the number of rows in pg_stat_activity was
> accurate.
> 
> The test server withstood an approximately four hour test stress test
> which replays captured Web traffic, but at full blast.  The machine was
> completely swamped, but there were no socket errors over the test run
> (compared to a frequency of once every couple minutes before).
> 
> The one remaining problem is that there seems to be a race condition
> when installing the temporary stats file on Windows.  As we were
> monitoring pg_stat_activity during the test run, occasionally we'd get a
> response with zero rows.  This may not be much of a problem during
> normal conditions (the server was completely overloaded and we were
> banging away with "Up Arrow", "Enter" watching pg_stat_activity).
> 
> What's the best way to do an atomic rename on Windows?  Alternatively,
> would it make sense to sleep and try again (up to some limit) when
> trying to open the stats file on Windows?
> 
> Pete
> 

[ Attachment, skipping... ]

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket

2006-04-06 Thread Peter Brant
Also, do we want to move the retry loop to pgwin32_recv?  That seems
like a good idea.  I'm not sure users of recv should ever have to deal
with WSAEWOULDBLOCK as it's not really an error.

Pete

>>> "Magnus Hagander" <[EMAIL PROTECTED]> 04/06/06 9:58 pm >>>
> > Attached are two patches which in combination make pg_stat_activity

> > work reliably for us on Windows.
> > ...
> > pgstat.patch removes the delayed destroy code for backends, 
> databases, 
> > and tables.  Database and table entries are cleaned up immediately

> > upon receipt of the appropriate message.
> 
> I'll go ahead and apply the delayed-destroy-removal part 
> (just to HEAD for the time being --- seems a bit risky to 
> apply it to the stable branches).  The Windows-specific 
> change sounds like it may need more review.

Actually, I think that's mostly me being confused and taking others
with
me ;-)

It's definitly a problem, and we have a solution there. The one thing
I'm still a bit concerned about is: Do we need a CHECK_FOR_INTERRUPTS,
and do we need an upper limit on the spinning. In theory we can spin
with 100% CPU usage, which is not good. So we should either spin a
limited amount of times, or we should perhaps introduce a tiny delay?

//Magnus

---(end of
broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that
your
   message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket error fix

2006-04-06 Thread Magnus Hagander
> > Attached are two patches which in combination make pg_stat_activity 
> > work reliably for us on Windows.
> > ...
> > pgstat.patch removes the delayed destroy code for backends, 
> databases, 
> > and tables.  Database and table entries are cleaned up immediately 
> > upon receipt of the appropriate message.
> 
> I'll go ahead and apply the delayed-destroy-removal part 
> (just to HEAD for the time being --- seems a bit risky to 
> apply it to the stable branches).  The Windows-specific 
> change sounds like it may need more review.

Actually, I think that's mostly me being confused and taking others with
me ;-)

It's definitly a problem, and we have a solution there. The one thing
I'm still a bit concerned about is: Do we need a CHECK_FOR_INTERRUPTS,
and do we need an upper limit on the spinning. In theory we can spin
with 100% CPU usage, which is not good. So we should either spin a
limited amount of times, or we should perhaps introduce a tiny delay?

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket error fix

2006-04-06 Thread Tom Lane
"Peter Brant" <[EMAIL PROTECTED]> writes:
> Attached are two patches which in combination make pg_stat_activity
> work reliably for us on Windows.
> ...
> pgstat.patch removes the delayed destroy code for backends, databases,
> and tables.  Database and table entries are cleaned up immediately upon
> receipt of the appropriate message.

I'll go ahead and apply the delayed-destroy-removal part (just to HEAD
for the time being --- seems a bit risky to apply it to the stable
branches).  The Windows-specific change sounds like it may need more
review.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket

2006-04-06 Thread Peter Brant
>>> "Magnus Hagander" <[EMAIL PROTECTED]> 04/06/06 7:26 pm >>>

> BTW, what's with the change to all the error msgs?
Ah, I'd assumed the %ui was a typo in the format string.  If the intent
was to print e.g. 10022i, I'll change it back.

> And finally, the error handling looks a bit off? We specifically
*don't*
> want it to log an error for the WSAECONNRESET state - it's a normal
> state. Or am I reading the patch wrong?
If error is WSAECONNRESET, the return value is reset to zero.  That
will prevent an error message from being displayed (unless I'm reading
something wrong).

> Do you get anything at all in the logs when this happens? Are you
sure
> the reason is that it picks up an empty file, or could it be
something
> else?
There is nothing in the logs.  It could definitely be something else. 
A rename race condition was just our best theory.

Pete  


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket

2006-04-06 Thread Kevin Grittner
>>> On Thu, Apr 6, 2006 at 12:26 pm, in message
<[EMAIL PROTECTED]>, "Magnus
Hagander"
<[EMAIL PROTECTED]> wrote: 
> And finally, the error handling looks a bit off? We specifically
*don't*
> want it to log an error for the WSAECONNRESET state -  it's a normal
> state. Or am I reading the patch wrong?

It looks to me like error is never referenced except when ret ==
SOCKET_ERROR.  When ret == SOCKET_ERROR and error == WSAEWOULDBLOCK it
loops back without generating an error message.

-Kevin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket error fix

2006-04-06 Thread Magnus Hagander
> Oh, and checking the code go pgwin32_recv, I think I see 
> where this is coming from: pgwin32_recv calls 
> pgwin32_waitforsinglesocket(). If that one succeeds *and a 
> signal is delivered while it's waiting*, we'll get out og 
> pgwin32_waitforsinglesocket() without clearing the WSA code. 
> The rest of the pg code uses errno which is properly set to 
> EINTR, but the pipe code uses WSAGetLastError() directly. 
> 
> The fix for that is probably to add a 
> WSASetLastError(WSAEINTR)  right after the errno=EINTR in 
> pgwin32_waitforsinglesocket().
> 
> Does this seem right? Can you try by adding this, and then 
> backing out your pgstats patch, and see if this alone is enough?

Nevermind, this is not going to work. Looking closer at the stats code,
it shows that the stats code will retry based on what's in errno, not
WSAGetLastError(). I still think we need to set the WSA error as well,
but that alone won't fix it.

The question still remains - how did we get  WSAEWOULDBLOCK. It must be
WSARecv returning WSAEWOULDBLOCK even if pgwin32_waitforsinglesocket()
says it's available to be read from. 

Perhaps the loop needs to be in pgwin32_recv instead?

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] pgstat: remove delayed destroy / pipe: socket error fix

2006-04-06 Thread Magnus Hagander
> Hi all,
> 
> Attached are two patches which in combination make 
> pg_stat_activity work reliably for us on Windows.
> 
> The mysterious socket error turned out to be WSAEWOULDBLOCK.  
> Per 
> http://msdn.microsoft.com/library/default.asp?url=/library/en-
> us/winsock/winsock/windows_sockets_error_codes_2.asp
> , it seems the thing to do is loop and try again.  pipe.patch 
> does that.

Good catch. I always knew there was something afoot in that code, but
never thought it was that simple ;-) I kinda got snowed in on the idea
that it had to do with inheritance :-(

Also, might we want a CHECK_FOR_INTERRUPTS in the loop? Since it's a
potential stuck-forever place.

Oh, and checking the code go pgwin32_recv, I think I see where this is
coming from: pgwin32_recv calls pgwin32_waitforsinglesocket(). If that
one succeeds *and a signal is delivered while it's waiting*, we'll get
out og pgwin32_waitforsinglesocket() without clearing the WSA code. The
rest of the pg code uses errno which is properly set to EINTR, but the
pipe code uses WSAGetLastError() directly. 

The fix for that is probably to add a WSASetLastError(WSAEINTR)  right
after the errno=EINTR in pgwin32_waitforsinglesocket().

Does this seem right? Can you try by adding this, and then backing out
your pgstats patch, and see if this alone is enough?


BTW, what's with the change to all the error msgs?

And finally, the error handling looks a bit off? We specifically *don't*
want it to log an error for the WSAECONNRESET state - it's a normal
state. Or am I reading the patch wrong?


> The one remaining problem is that there seems to be a race 
> condition when installing the temporary stats file on 
> Windows.  As we were monitoring pg_stat_activity during the 
> test run, occasionally we'd get a response with zero rows.  
> This may not be much of a problem during normal conditions 
> (the server was completely overloaded and we were banging 
> away with "Up Arrow", "Enter" watching pg_stat_activity).
> 
> What's the best way to do an atomic rename on Windows?  
> Alternatively, would it make sense to sleep and try again (up 
> to some limit) when trying to open the stats file on Windows?

rename() should be atomic. We do a special version on win32 (see
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/port/dirmod.c?rev=1
.42), but I don't see how that would change the atomicity of it.

Do you get anything at all in the logs when this happens? Are you sure
the reason is that it picks up an empty file, or could it be something
else?

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match