Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Attached patch seems to solve the problem on my machine at least. Uses
 pgwin32_waitforsinglesocket() instead of pgwin32_select(). Changes
 pgwin32_waitforsinglesocket() to accept the timeout as a parameter (this
 is why it touches files outside of the stats area).
 
 Applied to HEAD and 8.2 --- assuming the Windows buildfarm machines go
 green, we should probably consider back-porting this to 8.1 and 8.0.

Seems HEAD now passes thie check stage on at least snake and bandicoot.
They're now exposing another problem with ECPG though...

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Magnus Hagander
Jeremy Haile wrote:
 Using standard build (none of the things you mentioned) on 8.2.1
 currently.
 
 I really appreciate it!

Ok. I've built a binary for you at
http://www.hagander.net/download/pgsql_8.2.1_win32select.zip

Note that while I named the file 8.2.1 it really isn't - it's current
head of the REL_8_2_STABLE branch. Which means it has a few other fixes
as well.

I think it *should* work to just replace the postgres.exe file and
ignore the rest, so I didn't put them in the ZIP. But as always, make
sure you take a backup first :-)

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Jeremy Haile
Will do - thanks Magnus!  I'll test it for a while and post the results
here.  

Jeremy Haile

On Sat, 27 Jan 2007 17:21:23 +0100, Magnus Hagander
[EMAIL PROTECTED] said:
 Jeremy Haile wrote:
  Using standard build (none of the things you mentioned) on 8.2.1
  currently.
  
  I really appreciate it!
 
 Ok. I've built a binary for you at
 http://www.hagander.net/download/pgsql_8.2.1_win32select.zip
 
 Note that while I named the file 8.2.1 it really isn't - it's current
 head of the REL_8_2_STABLE branch. Which means it has a few other fixes
 as well.
 
 I think it *should* work to just replace the postgres.exe file and
 ignore the rest, so I didn't put them in the ZIP. But as always, make
 sure you take a backup first :-)
 
 //Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Jeremy Haile
So far, the patched binary seems to have fixed the statistics issue in
my environment.  pgstat.stat is now updating constantly,
pg_stat_user_tables is updating correctly, and autovacuum is running!

Thanks for your quick response to this issue!  Let me know if there is
anything else I can do to assist.

Jeremy Haile


On Sat, 27 Jan 2007 11:23:39 -0500, Jeremy Haile [EMAIL PROTECTED]
said:
 Will do - thanks Magnus!  I'll test it for a while and post the results
 here.  
 
 Jeremy Haile
 
 On Sat, 27 Jan 2007 17:21:23 +0100, Magnus Hagander
 [EMAIL PROTECTED] said:
  Jeremy Haile wrote:
   Using standard build (none of the things you mentioned) on 8.2.1
   currently.
   
   I really appreciate it!
  
  Ok. I've built a binary for you at
  http://www.hagander.net/download/pgsql_8.2.1_win32select.zip
  
  Note that while I named the file 8.2.1 it really isn't - it's current
  head of the REL_8_2_STABLE branch. Which means it has a few other fixes
  as well.
  
  I think it *should* work to just replace the postgres.exe file and
  ignore the rest, so I didn't put them in the ZIP. But as always, make
  sure you take a backup first :-)
  
  //Magnus
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Michael Meskes
On Sat, Jan 27, 2007 at 03:17:14PM +0100, Magnus Hagander wrote:
 Seems HEAD now passes thie check stage on at least snake and bandicoot.
 They're now exposing another problem with ECPG though...

This should be fixed now. I just missed to change MinGW expected files
when I changed the tests. That's why I prefer to have no platform
specific expected files.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-27 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 On Fri, Jan 26, 2007 at 09:55:39AM -0500, Tom Lane wrote:
 Keep in mind also that we have seen the stats-test failure on
 non-Windows machines, so we still need to explain that ...

 Yeah. But it *could* be two different stats issues lurking. Perhaps the
 issue we've seen on non-windows can be fixed by the settings Alvaro had
 me try (increasing autovacuum_vacuum_cost_delay or the delay in the
 regression test).

I had a sudden thought about that: the stats machinery is designed to be
non-reliable, ie, drop messages under load.  Maybe the occasional stats
failures we see are just an artifact of that happening.  It would be
pretty unfortunate if the stats test and autovacuum together were
sufficient load to cause message drops, but I doubt that's the
explanation.  I think the important change here has been the default
enablement of stats_row_level.  That means that some of the tests
terminating just before the stats test starts may still be trying to
dump statistics out to the collector at the same time the stats test is.
(Keep in mind that psql does not wait around for the backend to be
actually gone before it exits, hence backend-exit cleanup is very likely
to happen in parallel with the start of the next test.)  This idea
explains why we mostly see the failure in parallel tests not serial:
in the serial schedule there's no opportunity to have a gang of backends
all exiting at the critical time.

If this theory is correct, then we can improve the reliability of the
stats test a good deal if we put a sleep() at the *start* of the test,
to let any old backends get out of the way.  It seems worth a try
anyway.  I'll add this to HEAD and if the stats failure noise seems to
go down, we can back-port it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Thu, Jan 25, 2007 at 04:29:58PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Jeremy Haile wrote:
  If anyone else is experiencing similar problems, please post your
  situation.
 
  All the Windows buildfarm machines are, apparently.
 
 Can't anyone with a debugger duplicate this and get a stack trace for
 us?  If the stats collector is indeed freezing up, a stack trace showing
 where it's stuck would be exceedingly helpful.

Done some checking here. What happens is that suddenly the pgstats
socket stops receiving data. select() (pgstat.c line 1802) returns after
timeout, so got_data is always zero.

Interesting note: I just ran the serial regression tests, and they pass
fine. With the parallel tests, it always stops receiving data somewhere
during the first parallel group.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Thu, Jan 25, 2007 at 04:29:58PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Jeremy Haile wrote:
  If anyone else is experiencing similar problems, please post your
  situation.
 
  All the Windows buildfarm machines are, apparently.
 
 Can't anyone with a debugger duplicate this and get a stack trace for
 us?  If the stats collector is indeed freezing up, a stack trace showing
 where it's stuck would be exceedingly helpful.

Another update - what broke it was not the enabling of autovacuum, it
was the enabling of row level stats. If I disable stats_row_level,
parallel tests pass again.

That doesn't actually tell us *why* it's broken, I think, but it does
tell us why the autovac patch caused it.

I don't have 8.1 or 8.2 around on win32, but it'd be interesting to see if the
same issue happens if you run the tests on that with stats_row_level
enabled. Most likely the same thing happens.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Thu, Jan 25, 2007 at 04:29:58PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Jeremy Haile wrote:
  If anyone else is experiencing similar problems, please post your
  situation.
 
  All the Windows buildfarm machines are, apparently.
 
 Can't anyone with a debugger duplicate this and get a stack trace for
 us?  If the stats collector is indeed freezing up, a stack trace showing
 where it's stuck would be exceedingly helpful.

Maybe I should finish testing before I send my emails.

Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
if I put a #undef select right before the select in pgstat.c, the
regression tests pass. 

I guess the bug is shown because with row level stats we simply have
more data to process. And it appears only to happen on UDP sockets from
what I can tell.

Now, what Iwould *like* to do is to re-implement that part of the code
using the Win32 APIs instead of going through select(). Since it's very
isolated code. I'm going to try that and see how invasive it is, then
see if it'll get accepted :-)

(This would of course give us better performance in general in that
codepath, since all the emulation stuff wouldn't be needed, so there's a
point to doing that other than finding the obscure UDP-related bug in
pgwin23_select)

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Teodor Sigaev

Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
if I put a #undef select right before the select in pgstat.c, the
regression tests pass. 


May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ?
WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to write, 
so, may be there is symmetrical problem with read? Or pgwin32_select() is used 
for waiting write too?




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Fri, Jan 26, 2007 at 01:11:00PM +0300, Teodor Sigaev wrote:
 Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
 if I put a #undef select right before the select in pgstat.c, the
 regression tests pass. 
 
 May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ?
 WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to 
 write, so, may be there is symmetrical problem with read? Or 
 pgwin32_select() is used for waiting write too?


pgwin32_waitforsinglesocket() appears to work fine. And we only use
FD_READ, so it's not affected by your patch from what I can tell.

I've got it passnig tests with select replaced with waitforsinglesocket
- now I just need to implement timeout in that one :-)

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Fri, Jan 26, 2007 at 12:26:58PM +0100, Magnus Hagander wrote:
 On Fri, Jan 26, 2007 at 01:11:00PM +0300, Teodor Sigaev wrote:
  Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
  if I put a #undef select right before the select in pgstat.c, the
  regression tests pass. 
  
  May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ?
  WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to 
  write, so, may be there is symmetrical problem with read? Or 
  pgwin32_select() is used for waiting write too?
 
 
 pgwin32_waitforsinglesocket() appears to work fine. And we only use
 FD_READ, so it's not affected by your patch from what I can tell.
 
 I've got it passnig tests with select replaced with waitforsinglesocket
 - now I just need to implement timeout in that one :-)

Attached patch seems to solve the problem on my machine at least. Uses
pgwin32_waitforsinglesocket() instead of pgwin32_select(). Changes
pgwin32_waitforsinglesocket() to accept the timeout as a parameter (this
is why it touches files outside of the stats area).

//Magnus
Index: src/backend/libpq/be-secure.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/libpq/be-secure.c,v
retrieving revision 1.75
diff -c -r1.75 be-secure.c
*** src/backend/libpq/be-secure.c   5 Jan 2007 22:19:29 -   1.75
--- src/backend/libpq/be-secure.c   26 Jan 2007 11:32:07 -
***
*** 275,281 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | 
FD_CLOSE : FD_WRITE | FD_CLOSE);
  #endif
goto rloop;
case SSL_ERROR_SYSCALL:
--- 275,281 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | 
FD_CLOSE : FD_WRITE | FD_CLOSE, INFINITE);
  #endif
goto rloop;
case SSL_ERROR_SYSCALL:
***
*** 374,380 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | 
FD_CLOSE : FD_WRITE | FD_CLOSE);
  #endif
goto wloop;
case SSL_ERROR_SYSCALL:
--- 374,380 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | 
FD_CLOSE : FD_WRITE | FD_CLOSE, INFINITE);
  #endif
goto wloop;
case SSL_ERROR_SYSCALL:
***
*** 889,895 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | FD_CLOSE | FD_ACCEPT : 
FD_WRITE | FD_CLOSE);
  #endif
goto aloop;
case SSL_ERROR_SYSCALL:
--- 889,895 
  #ifdef WIN32

pgwin32_waitforsinglesocket(SSL_get_fd(port-ssl),

(err == SSL_ERROR_WANT_READ) ?
!  FD_READ | FD_CLOSE | FD_ACCEPT : 
FD_WRITE | FD_CLOSE, INFINITE);
  #endif
goto aloop;
case SSL_ERROR_SYSCALL:
Index: src/backend/port/win32/socket.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/port/win32/socket.c,v
retrieving revision 1.16
diff -c -r1.16 socket.c
*** src/backend/port/win32/socket.c 5 Jan 2007 22:19:35 -   1.16
--- src/backend/port/win32/socket.c 26 Jan 2007 11:35:16 -
***
*** 114,120 
  }

  int
! pgwin32_waitforsinglesocket(SOCKET s, int what)
  {
static HANDLE waitevent = INVALID_HANDLE_VALUE;
static SOCKET current_socket = -1;
--- 114,120 
  }

  int
! pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout)
  {
static HANDLE waitevent = INVALID_HANDLE_VALUE;
static SOCKET current_socket = -1;
***
*** 

Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Jeremy Haile
   Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
   if I put a #undef select right before the select in pgstat.c, the
   regression tests pass. 
   
   May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ?
   WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to 
   write, so, may be there is symmetrical problem with read? Or 
   pgwin32_select() is used for waiting write too?
  
  
  pgwin32_waitforsinglesocket() appears to work fine. And we only use
  FD_READ, so it's not affected by your patch from what I can tell.
  
  I've got it passnig tests with select replaced with waitforsinglesocket
  - now I just need to implement timeout in that one :-)
 
 Attached patch seems to solve the problem on my machine at least. Uses
 pgwin32_waitforsinglesocket() instead of pgwin32_select(). Changes
 pgwin32_waitforsinglesocket() to accept the timeout as a parameter (this
 is why it touches files outside of the stats area).

Magnus - thanks for your investigation and work!  Any chance I could get
a patched exe for win32 and test on my servers?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Alvaro Herrera
Jeremy Haile wrote:

 Magnus - thanks for your investigation and work!  Any chance I could get
 a patched exe for win32 and test on my servers?

In the meantime, could you please confirm that turning row_level_stats
off makes the regression test pass?  And whatever tests you were doing?
(Note that by doing that, autovacuum won't work at all, because it needs
those stats.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
 if I put a #undef select right before the select in pgstat.c, the
 regression tests pass. 
 I guess the bug is shown because with row level stats we simply have
 more data to process. And it appears only to happen on UDP sockets from
 what I can tell.

Hmm ... if this theory is correct, then statistics collection has
never worked at all on Windows, at least not under more than the most
marginal load; and hence neither has autovacuum.

Does that conclusion agree with reality?  You'd think we'd have heard
a whole lot of complaints about it, not just Jeremy's; and I don't
remember it being a sore point.  (But then again I just woke up.)
What seems somewhat more likely is that we broke pgwin32_select
recently, in which case we oughta find out why.  Or else remove it
entirely (does your patch make that possible?).

Keep in mind also that we have seen the stats-test failure on
non-Windows machines, so we still need to explain that ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
On Fri, Jan 26, 2007 at 09:55:39AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
  if I put a #undef select right before the select in pgstat.c, the
  regression tests pass. 
  I guess the bug is shown because with row level stats we simply have
  more data to process. And it appears only to happen on UDP sockets from
  what I can tell.
 
 Hmm ... if this theory is correct, then statistics collection has
 never worked at all on Windows, at least not under more than the most
 marginal load; and hence neither has autovacuum.

We have had lots of reports of issues with the stats collector on
Windows. Some were definitly fixed by the patch by OT, but I don't
think all.
The thing is, since it didn't give any error messages at all, most users
wouldn't notice. Other than their tables getting bloated, in which case
they would do a manual vacuum and conlcude autovacuum wasn't good
enough. Or something.


 Does that conclusion agree with reality?  You'd think we'd have heard
 a whole lot of complaints about it, not just Jeremy's; and I don't
 remember it being a sore point.  (But then again I just woke up.)
 What seems somewhat more likely is that we broke pgwin32_select
 recently, in which case we oughta find out why.  Or else remove it
 entirely (does your patch make that possible?).

AFAIK, it only affects UDP connections, and this patch takes
pgwin32_select out of the loop for all UDP stuff.
But if we get this in, pgwin32_select is only used in the postmaster
accept-new-connections loop (from what I can tell by a quick look), so
I'd definitly want to rewrite that one as well to use a better way than
select-emulation. Then it could go away completely.


 Keep in mind also that we have seen the stats-test failure on
 non-Windows machines, so we still need to explain that ...

Yeah. But it *could* be two different stats issues lurking. Perhaps the
issue we've seen on non-windows can be fixed by the settings Alvaro had
me try (increasing autovacuum_vacuum_cost_delay or the delay in the
regression test).

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Jeremy Haile
 We have had lots of reports of issues with the stats collector on
 Windows. Some were definitly fixed by the patch by OT, but I don't
 think all.

Here were a couple of other reports I found:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg00415.php
http://archives.postgresql.org/pgsql-hackers/2006-04/msg00127.php


 The thing is, since it didn't give any error messages at all, most users
 wouldn't notice. Other than their tables getting bloated, in which case
 they would do a manual vacuum and conlcude autovacuum wasn't good
 enough. Or something.

This is indeed what I assumed at first.  I started running vacuum
analyze hourly and turned off autovacuum.  Later, I decided to try the
autovacuum route again and investigated why it wasn't working well.

Magnus - could you send me a patched exe to try in my environment? 
Would it be compatible with 8.2.1?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Attached patch seems to solve the problem on my machine at least. Uses
 pgwin32_waitforsinglesocket() instead of pgwin32_select(). Changes
 pgwin32_waitforsinglesocket() to accept the timeout as a parameter (this
 is why it touches files outside of the stats area).

Applied to HEAD and 8.2 --- assuming the Windows buildfarm machines go
green, we should probably consider back-porting this to 8.1 and 8.0.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Jeremy Haile
 Applied to HEAD and 8.2 --- assuming the Windows buildfarm machines go
 green, we should probably consider back-porting this to 8.1 and 8.0.

Not trying to be a nuisance, but I'd really like to try this out in my
environment and see if my problems disappear.  Is there anyone out there
who could provide me with a patched exe for Win32?  If not, I could try
to get my system setup to build for Windows, but I'm not sure what all
that would involve.

Thanks,
Jeremy Haile

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Magnus Hagander
Jeremy Haile wrote:
 Applied to HEAD and 8.2 --- assuming the Windows buildfarm machines go
 green, we should probably consider back-porting this to 8.1 and 8.0.
 
 Not trying to be a nuisance, but I'd really like to try this out in my
 environment and see if my problems disappear.  Is there anyone out there
 who could provide me with a patched exe for Win32?  If not, I could try
 to get my system setup to build for Windows, but I'm not sure what all
 that would involve.

I'll see if I can build you something tomorrow. You're on 8.2, right? Do
you use any features like Kerberos, SSL or NLS? I don't think I have
them set up properly in my mingw build env, so it'd be easier if I could
build without them.

Unless beaten by someone who has a complete env ;-)

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Jeremy Haile
Using standard build (none of the things you mentioned) on 8.2.1
currently.

I really appreciate it!


On Fri, 26 Jan 2007 21:24:09 +0100, Magnus Hagander
[EMAIL PROTECTED] said:
 Jeremy Haile wrote:
  Applied to HEAD and 8.2 --- assuming the Windows buildfarm machines go
  green, we should probably consider back-porting this to 8.1 and 8.0.
  
  Not trying to be a nuisance, but I'd really like to try this out in my
  environment and see if my problems disappear.  Is there anyone out there
  who could provide me with a patched exe for Win32?  If not, I could try
  to get my system setup to build for Windows, but I'm not sure what all
  that would involve.
 
 I'll see if I can build you something tomorrow. You're on 8.2, right? Do
 you use any features like Kerberos, SSL or NLS? I don't think I have
 them set up properly in my mingw build env, so it'd be easier if I could
 build without them.
 
 Unless beaten by someone who has a complete env ;-)
 
 //Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Jeremy Haile
Tom,

Did this information shed any light on what the problem might be?  Any
solution or workaround?

Thanks!
Jeremy Haile

On Wed, 24 Jan 2007 14:19:05 -0500, Jeremy Haile [EMAIL PROTECTED]
said:
 pgstat.stat was last updated 1/22 12:25pm - there is no pgstat.tmp.  
 
 Coincidentally (I think not) - the last auto-analyze was performed at
 2007-01-22 12:24:11.424-05.
 
 The logs for 1/22 are empty - so no errors or anything like that to give
 clues...
 
 Thanks!
 Jeremy Haile
 
 
 On Wed, 24 Jan 2007 14:00:52 -0500, Tom Lane [EMAIL PROTECTED] said:
  Jeremy Haile [EMAIL PROTECTED] writes:
   The stats collector appears to still be running, since I can see a
   postgres.exe process with -forkcol.  However, I never notice it using
   I/O or CPU usage.  Also, querying the pg_stat_user_tables view shows no
   change in table stats even for tables that change very frequently.
  
  Is $PGDATA/global/pgstat.stat getting updated?  (watch the file mod time
  at least)  Do you see any pgstat.tmp file in there?
  
  regards, tom lane
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 Did this information shed any light on what the problem might be?

It seems to buttress Magnus' theory that the intermittent (or not so
intermittent) stats-test buildfarm failures we've been seeing have to
do with the stats collector actually freezing up, rather than just
not reacting fast enough as most of us (or me anyway) thought.  But
why that is happening remains anyone's guess.  I don't suppose you
have debugging tools that would let you get a stack trace from the
collector process?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Jeremy Haile
Unfortunately I don't have any debugging tools installed that would work
against postgres - although I'd be glad to do something if you could
tell me the steps involved.  I can reproduce the issue quite easily on
two different Windows machines (one is XP, the other is 2003).

Please let me know if there is anything else I can do to help debug this
problem.  

Do you know of any workaround other than restarting the whole server? 
Can the collector be restarted individually? 

Thanks,
Jeremy Haile


On Thu, 25 Jan 2007 12:42:11 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  Did this information shed any light on what the problem might be?
 
 It seems to buttress Magnus' theory that the intermittent (or not so
 intermittent) stats-test buildfarm failures we've been seeing have to
 do with the stats collector actually freezing up, rather than just
 not reacting fast enough as most of us (or me anyway) thought.  But
 why that is happening remains anyone's guess.  I don't suppose you
 have debugging tools that would let you get a stack trace from the
 collector process?
 
   regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 Unfortunately I don't have any debugging tools installed that would work
 against postgres - although I'd be glad to do something if you could
 tell me the steps involved.  I can reproduce the issue quite easily on
 two different Windows machines (one is XP, the other is 2003).

Sorry, I don't know anything about Windows debugging either.  Can you
put together a test case that would let one of the Windows-using hackers
reproduce it?

 Do you know of any workaround other than restarting the whole server? 
 Can the collector be restarted individually? 

On Unix you can just 'kill -TERM' the collector process and the postmaster
will start a new one without engaging in a database panic cycle.  Dunno
what the equivalent is on Windows but it's probably possible.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Magnus Hagander
Tom Lane wrote:
 Jeremy Haile [EMAIL PROTECTED] writes:
 Unfortunately I don't have any debugging tools installed that would work
 against postgres - although I'd be glad to do something if you could
 tell me the steps involved.  I can reproduce the issue quite easily on
 two different Windows machines (one is XP, the other is 2003).
 
 Sorry, I don't know anything about Windows debugging either.  Can you
 put together a test case that would let one of the Windows-using hackers
 reproduce it?

That would help a lot.


 Do you know of any workaround other than restarting the whole server? 
 Can the collector be restarted individually? 
 
 On Unix you can just 'kill -TERM' the collector process and the postmaster
 will start a new one without engaging in a database panic cycle.  Dunno
 what the equivalent is on Windows but it's probably possible.

You can use pg_ctl to send the int signal. If it's completely hung, that
may not work. In that case you can kill it from task manager, but that's
equiv of a kill -9, which means that the postmaster will restart all
backends.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Jeremy Haile [EMAIL PROTECTED] writes:
 Do you know of any workaround other than restarting the whole server? 
 Can the collector be restarted individually? 

 You can use pg_ctl to send the int signal. If it's completely hung, that
 may not work. In that case you can kill it from task manager, but that's
 equiv of a kill -9, which means that the postmaster will restart all
 backends.

No, the postmaster does not care how badly the stats collector croaks,
because the collector's not connected to shared memory, so there's no
risk of collateral damage.  It'll just start a new one without bothering
the backends.  So you can do whatever you have to do to kill the stuck
collector --- I was just not sure how to do that under Windows.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Jeremy Haile
 Then just pick it up in Task Manager or Process Explorer or whatever and
 kill it off. Just make sure you pick the right process.

I mentioned earlier that killing off the collector didn't work - however
I was wrong.  I just wasn't giving it enough time.  If I kill the
postgres.exe -forkcol process, it does gets restarted, although
sometimes it takes a minute.

Since it only seems to update pgstat.stat once after restarting, I'd
need to kill it once-a-minute to keep my statistics up to date =)  So,
unfortunately it's not a great workaround to my problem.


Jeremy Haile

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Jeremy Haile
I'll try to put together a test case for hackers, although I'm not sure
what exactly causes it.  

Basically, when I fire up PostgreSQL - after about a minute the stats
collector runs once (pgstat.stat is updated, autovacuum fires up, etc.)
- and then the collector seems to hang.  If I watch it's performance
information, it does not read or write to disk again and pgstat.stat is
never updated again.  It never updates pgstat.stat more than once after
restart.  There are no errors in the log

I tried killing the collector a variety of ways on Windows, but it seems
to terminate indefinitely.  I don't see a kill program for windows that
lets me specify the signal to use.  So other than restarting PostgreSQL,
I'm not sure how to workaround this problem.

If anyone else is experiencing similar problems, please post your
situation.

On Thu, 25 Jan 2007 12:51:31 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  Unfortunately I don't have any debugging tools installed that would work
  against postgres - although I'd be glad to do something if you could
  tell me the steps involved.  I can reproduce the issue quite easily on
  two different Windows machines (one is XP, the other is 2003).
 
 Sorry, I don't know anything about Windows debugging either.  Can you
 put together a test case that would let one of the Windows-using hackers
 reproduce it?
 
  Do you know of any workaround other than restarting the whole server? 
  Can the collector be restarted individually? 
 
 On Unix you can just 'kill -TERM' the collector process and the
 postmaster
 will start a new one without engaging in a database panic cycle.  Dunno
 what the equivalent is on Windows but it's probably possible.
 
   regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Alvaro Herrera
Jeremy Haile wrote:
 I'll try to put together a test case for hackers, although I'm not sure
 what exactly causes it.  
 
 Basically, when I fire up PostgreSQL - after about a minute the stats
 collector runs once (pgstat.stat is updated, autovacuum fires up, etc.)
 - and then the collector seems to hang.  If I watch it's performance
 information, it does not read or write to disk again and pgstat.stat is
 never updated again.  It never updates pgstat.stat more than once after
 restart.  There are no errors in the log
 
 I tried killing the collector a variety of ways on Windows, but it seems
 to terminate indefinitely.  I don't see a kill program for windows that
 lets me specify the signal to use.  So other than restarting PostgreSQL,
 I'm not sure how to workaround this problem.
 
 If anyone else is experiencing similar problems, please post your
 situation.

All the Windows buildfarm machines are, apparently.

We verified this with Magnus.  He found that the tenk2 table does not
seem to get stat updates -- the numbers are all zero, at all times.  I
thought I had blogged about this ... oh yeah, it's here

http://www.advogato.org/person/alvherre/diary.html?start=11

AFAIR (Magnus can surely confirm) there were some other tables that
weren't showing stats as all zeros -- but there's no way to know whether
those numbers were put there before the collector had frozen (if
that's really what's happening).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Jeremy Haile wrote:
 If anyone else is experiencing similar problems, please post your
 situation.

 All the Windows buildfarm machines are, apparently.

Can't anyone with a debugger duplicate this and get a stack trace for
us?  If the stats collector is indeed freezing up, a stack trace showing
where it's stuck would be exceedingly helpful.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Jeremy Haile
 AFAIR (Magnus can surely confirm) there were some other tables that
 weren't showing stats as all zeros -- but there's no way to know whether
 those numbers were put there before the collector had frozen (if
 that's really what's happening).

Yeah - I have numbers that updated before the stats collector started
freezing.  Do you know which version of PG this started with?  I have
upgraded 8.1.3, 8.1.4, 8.2, and 8.2.1 in the past months and I didn't
have the collector enabled until 8.2.1 - so I'm not sure how long this
has been a problem.  

I might try rolling back to a previous version - it's either that or
setup a scheduled vacuum analyze until we figure out this problem.  I'm
having to manually run it every day now... =)  I think this is a pretty
critical problem since it cripples autovacuum on Windows.

Are you guys in a position to debug the collector process and see where
it is freezing (ala Tom's earlier comment)?  Anything I can do to help
debug this problem faster?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Stats collector frozen?

2007-01-25 Thread Magnus Hagander
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 Jeremy Haile wrote:
 If anyone else is experiencing similar problems, please post your
 situation.
 
 All the Windows buildfarm machines are, apparently.
 
 Can't anyone with a debugger duplicate this and get a stack trace for
 us?  If the stats collector is indeed freezing up, a stack trace showing
 where it's stuck would be exceedingly helpful.

Must've been asleep when reading and writing in this thread. Didn't
realize it was the same issue as the  buildfarm-killer. Will do the
debugger+stacktrace tomorrow on my VC install.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Stats collector frozen?

2007-01-24 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 The stats collector appears to still be running, since I can see a
 postgres.exe process with -forkcol.  However, I never notice it using
 I/O or CPU usage.  Also, querying the pg_stat_user_tables view shows no
 change in table stats even for tables that change very frequently.

Is $PGDATA/global/pgstat.stat getting updated?  (watch the file mod time
at least)  Do you see any pgstat.tmp file in there?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-24 Thread Jeremy Haile
pgstat.stat was last updated 1/22 12:25pm - there is no pgstat.tmp.  

Coincidentally (I think not) - the last auto-analyze was performed at
2007-01-22 12:24:11.424-05.

The logs for 1/22 are empty - so no errors or anything like that to give
clues...

Thanks!
Jeremy Haile


On Wed, 24 Jan 2007 14:00:52 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  The stats collector appears to still be running, since I can see a
  postgres.exe process with -forkcol.  However, I never notice it using
  I/O or CPU usage.  Also, querying the pg_stat_user_tables view shows no
  change in table stats even for tables that change very frequently.
 
 Is $PGDATA/global/pgstat.stat getting updated?  (watch the file mod time
 at least)  Do you see any pgstat.tmp file in there?
 
   regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stats collector frozen?

2007-01-24 Thread Jeremy Haile
Searching the archives, I found a couple of 2006 posts that seem
somewhat related to my problem (although I don't see any solutions
listed...):
http://archives.postgresql.org/pgsql-hackers/2006-09/msg00415.php
http://archives.postgresql.org/pgsql-hackers/2006-04/msg00127.php

Tom, since you were involved in these - did you ever figure out how to
resolve the issues of the stats collector getting stuck in Windows?

Thanks, Jeremy Haile

On Wed, 24 Jan 2007 14:19:05 -0500, Jeremy Haile [EMAIL PROTECTED]
said:
 pgstat.stat was last updated 1/22 12:25pm - there is no pgstat.tmp.  
 
 Coincidentally (I think not) - the last auto-analyze was performed at
 2007-01-22 12:24:11.424-05.
 
 The logs for 1/22 are empty - so no errors or anything like that to give
 clues...
 
 Thanks!
 Jeremy Haile
 
 
 On Wed, 24 Jan 2007 14:00:52 -0500, Tom Lane [EMAIL PROTECTED] said:
  Jeremy Haile [EMAIL PROTECTED] writes:
   The stats collector appears to still be running, since I can see a
   postgres.exe process with -forkcol.  However, I never notice it using
   I/O or CPU usage.  Also, querying the pg_stat_user_tables view shows no
   change in table stats even for tables that change very frequently.
  
  Is $PGDATA/global/pgstat.stat getting updated?  (watch the file mod time
  at least)  Do you see any pgstat.tmp file in there?
  
  regards, tom lane
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq