Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-05 Thread Jim Nasby

On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
Any guess on when we'd see an 8.2.4?  I have a business reason for  
asking.


If history is any guide, probably when a 'reasonably large fix' to  
something comes around, though reviewing early 8.1 release history  
might prove enlightening...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-05 Thread Tom Lane
Jim Nasby [EMAIL PROTECTED] writes:
 On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
 Any guess on when we'd see an 8.2.4?  I have a business reason for  
 asking.

 If history is any guide, probably when a 'reasonably large fix' to  
 something comes around,

Well, what with Bruce on vacation for the next ten days, and tremendous
pressure to review various people's random 8.3 patches, you should not
expect to see any new back-branch releases in March unless the sky
starts falling on us.  I'm not real happy about that, because we've
accumulated several moderately significant bug fixes in the 8.2 branch
since 8.2.3, but I don't currently see a sufficient case for expending
a day on a new release...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-02 Thread Darcy Buskermolen
On Thursday 01 March 2007 11:57, Tom Lane wrote:
 Darcy Buskermolen [EMAIL PROTECTED] writes:
  I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
  collector on an 8.2.3 box  investigation has lead me to belive that the
  stats file is written a lot more often that once every 500ms  the
  following shows this behavior.

 Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
 and 8.2 ... it's waiting 500 microseconds, not the intended 500
 milliseconds.

I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.

The stats collector CPU usage has dropped from inexcess of 95% to 5%

Thanks.


   regards, tom lane

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org

-- 


Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-02 Thread Larry Rosenman

On Fri, 2 Mar 2007, Darcy Buskermolen wrote:


On Thursday 01 March 2007 11:57, Tom Lane wrote:

Darcy Buskermolen [EMAIL PROTECTED] writes:

I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
collector on an 8.2.3 box  investigation has lead me to belive that the
stats file is written a lot more often that once every 500ms  the
following shows this behavior.


Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
and 8.2 ... it's waiting 500 microseconds, not the intended 500
milliseconds.


I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.

The stats collector CPU usage has dropped from inexcess of 95% to 5%

Thanks.


Any guess on when we'd see an 8.2.4?  I have a business reason for asking.

Thanks!

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Darcy Buskermolen
On Wednesday 28 February 2007 14:31, Joshua D. Drake wrote:
 Joshua D. Drake wrote:
  Darcy Buskermolen wrote:
  I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
  collector on an 8.2.3 box  investigation has lead me to belive that the
  stats file is written a lot more often that once every 500ms  the
  following shows this behavior.

And just so that noone thinks it's linux specific, I tried the same test on a 
FreeBSD box with 8.2 on it and got the following.  Ignore the time portion of 
the results, but what is important is the number of calls occuring in the 
timeframe (given this test took about 38:30, it should have resulted in about 
4620 calls to rename)

% time seconds  usecs/call  calls  err syscall
-- --- --- - - 
149.231170797349.233781   46415  2519188 2write
 57.71 452768432.575458  2036010019   71515   20238  poll
 17.59 138017435.049569  -1944030463  20766getppid
-136.71  -1072628999.112811   -1254412922  20317rename
-136.71  -1072629000.340448   -1254412960  20317close
-136.71  -1072629001.605166   -1254412996  20317setitimer
-146.81  -1151850347.801977   -86150023020316 open
-146.81  -1151850349.967150   -86150032020316 fstat
-226.75  -1779034289.1666093 -63926395950831 recvfrom
-- --- --- - - 

real38m29.897s
user0m40.641s
sys 0m47.487s





 I have just done a test separate from darcy on my workstation:


 [EMAIL PROTECTED]:~/82$ strace -c -p 16130
 Process 16130 attached - interrupt to quit
 Process 16130 detached
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  50.000.167103   3 60264 10225 poll
  28.310.094600   9 10398   rename
  16.890.056457   5 10398   open
   2.480.008278   0 50023   recvfrom
   1.140.003804   0 10398   setitimer
   0.530.001781   0 20796   write
   0.430.001432   0 10398   close
   0.210.000690   0 10398   munmap
   0.020.57   0 10398   mmap
   0.000.00   0 10398   fstat
   0.000.00   0 10398 10225 rt_sigreturn
   0.000.00   0 10414   getppid
 -- --- --- - - 
 100.000.334202224681 20450 total


 Query:

 time for i in `bin/psql -p8000 -d postgres -c select
 generate_series(1,1)`; do bin/psql -p8000 -d postgres -qc select 1
 from pg_database where datname = 'postgres'; done;

 Time:

 real2m5.077s
 user0m28.414s
 sys 0m39.762s


 PostgreSQL 8.2.0, Ubuntu Edgy 64bit.

 Seems like something is extremely wonky here.

 Joshua D. Drake

  PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
  3.4.6 20060404 (Red Hat 3.4.6-3)
 
  I ran a
  time for i in `psql -qtc select generate_series(1,1)`; do psql 
  -qtc select 1 from test where msg_id=$i ; done
 
  which took
  real1m23.288s
  user0m24.142s
  sys 0m21.536s
 
 
  to execute, during which time I ran a strace on the stats collector
  which produces the following output.  From this it looks like the stats
  file is getting rewritten for each connection teardown, not just every
  500ms.
 
  Process 10061 attached - interrupt to quit
  Process 10061 detached
  % time seconds  usecs/call callserrors syscall
  -- --- --- - - 
   68.14   28.811963  17   1663827   write
   18.227.701885 123 62808 12793 poll
   11.314.783082 365 13101   rename
0.580.246169   5 50006   recvfrom
0.570.241073  18 13101   open
0.430.182816  14 13101   munmap
0.180.076176   6 13101   mmap
0.170.072746   6 13101   close
0.140.060483   5 13101   setitimer
0.100.041344   3 13101 12793 rt_sigreturn
0.090.039240   3 13101   fstat
0.060.024041   2 13110   getppid
  -- --- --- - - 
  100.00   42.281018   1894559 25586 total
 
  As you can see rename was called more than the theroitcal 167  times for
  500ms slices that elapsed during the test
 
  Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
  gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 

Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Tom Lane
Darcy Buskermolen [EMAIL PROTECTED] writes:
 I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
 collector on an 8.2.3 box  investigation has lead me to belive that the
 stats file is written a lot more often that once every 500ms  the
 following shows this behavior.

Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
and 8.2 ... it's waiting 500 microseconds, not the intended 500
milliseconds.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Andrew Dunstan

Tom Lane wrote:

Darcy Buskermolen [EMAIL PROTECTED] writes:
  

I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
collector on an 8.2.3 box  investigation has lead me to belive that the
stats file is written a lot more often that once every 500ms  the
following shows this behavior.



Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
and 8.2 ... it's waiting 500 microseconds, not the intended 500
milliseconds.
  


Good catch. I am also a bit dubious about this code:

   input_fd.fd = pgStatSock;
   input_fd.events = POLLIN | POLLERR;
   input_fd.revents = 0;

   if (poll(input_fd, 1, PGSTAT_SELECT_TIMEOUT * 1000)  0)
   {
   if (errno == EINTR)
   continue;
   ereport(ERROR,
   (errcode_for_socket_access(),
errmsg(poll() failed in statistics collector: %m)));
   }

   got_data = (input_fd.revents != 0);


AIUI you are not supposed to put POLLERR in the events field. We should 
probably be setting POLLIN | POLLPRI,  and we should also probably check 
exactly what event was returned in revents.


cheers

andrew




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Good catch. I am also a bit dubious about this code:

 input_fd.fd = pgStatSock;
 input_fd.events = POLLIN | POLLERR;
 input_fd.revents = 0;

Hm.  The Single Unix Spec saith that POLLERR is ignored in the events
field, but it is not clear to me that older systems might not treat it
as a condition bit.  For instance on HPUX the poll man page says only

 The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in
 revents if the conditions they indicate are true for the specified
 file descriptor, whether or not these flags are set in events.

 AIUI you are not supposed to put POLLERR in the events field. We should 
 probably be setting POLLIN | POLLPRI,  and we should also probably check 
 exactly what event was returned in revents.

We don't need to check what was returned because the action is the same
either way: do a recv().  I'm not seeing the point of setting POLLPRI.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Andrew Dunstan

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  

Good catch. I am also a bit dubious about this code:



  

input_fd.fd = pgStatSock;
input_fd.events = POLLIN | POLLERR;
input_fd.revents = 0;



Hm.  The Single Unix Spec saith that POLLERR is ignored in the events
field, but it is not clear to me that older systems might not treat it
as a condition bit.  For instance on HPUX the poll man page says only

 The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in
 revents if the conditions they indicate are true for the specified
 file descriptor, whether or not these flags are set in events.
  



yeah.

http://www.opengroup.org/onlinepubs/009695399/functions/poll.html says:

An error has occurred on the device or stream. This flag is only valid 
in the /revents/ bitmask; it shall be ignored in the /events/ member.


  
AIUI you are not supposed to put POLLERR in the events field. We should 
probably be setting POLLIN | POLLPRI,  and we should also probably check 
exactly what event was returned in revents.



We don't need to check what was returned because the action is the same
either way: do a recv().  I'm not seeing the point of setting POLLPRI.


  


Maybe none in this case - I guess we're not sending priority data.

This might all be OK - I just noticed it as I was looking at the 
problem, so I though I'd mention it. I have no doubt your fix is the 
correct one, though.



cheers

andrew


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-03-01 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 This might all be OK - I just noticed it as I was looking at the 
 problem, so I though I'd mention it.

I'm inclined to leave that code alone unless someone can point to a
platform where setting POLLERR in events actually causes a problem.
The pgstat code was modeled on libpq, which has been passing POLLERR
ever since its poll() support was written (for 7.4), so I think we
have lots of evidence that this way works everywhere, and none that
the other one does.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-02-28 Thread Joshua D. Drake
Darcy Buskermolen wrote:
 I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
 on an 8.2.3 box  investigation has lead me to belive that the stats file is 
 written a lot more often that once every 500ms  the following shows this 
 behavior.
 
 PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
 20060404 (Red Hat 3.4.6-3)
 
 I ran a 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done
 
 which took
 real1m23.288s
 user0m24.142s
 sys 0m21.536s
 
 
 to execute, during which time I ran a strace on the stats collector which 
 produces the following output.  From this it looks like the stats file is 
 getting rewritten for each connection teardown, not just every 500ms.
 
 Process 10061 attached - interrupt to quit
 Process 10061 detached
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  68.14   28.811963  17   1663827   write
  18.227.701885 123 62808 12793 poll
  11.314.783082 365 13101   rename
   0.580.246169   5 50006   recvfrom
   0.570.241073  18 13101   open
   0.430.182816  14 13101   munmap
   0.180.076176   6 13101   mmap
   0.170.072746   6 13101   close
   0.140.060483   5 13101   setitimer
   0.100.041344   3 13101 12793 rt_sigreturn
   0.090.039240   3 13101   fstat
   0.060.024041   2 13110   getppid
 -- --- --- - - 
 100.00   42.281018   1894559 25586 total
 
 As you can see rename was called more than the theroitcal 167  times for 
 500ms 
 slices that elapsed during the test
 
 Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
 gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
 [All be it this is slower hardware..]
 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done
 
 which took
 real9m25.380s
 user6m51.254s
 sys 1m47.687s
 (and therefor should be about 1130 stat write cycles)
 
 and yielded the following strace
 
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  93.64   20.422006 334 61212   select
   3.490.760963   7110192   read
   1.820.396654  19 21128   write
   0.640.139679 126  1112   rename
   0.270.057970  52  1112   open
   0.060.012177  11  1112   munmap
   0.040.008901   8  1112   mmap
   0.030.006402   6  1112   close
   0.020.004282   4  1112   fstat
 -- --- --- - - 
 100.00   21.809034199204   total
 
 
 
 During this run the stats collector does not even show and CPU usage 
 according 
 to top.
 
 
 both 8.1 and 8.2 have the following postgresql.conf parameters
 
 stats_command_string =  off
 stats_start_collector = on
 stats_block_level = on
 stats_row_level = on
 
 
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Possible Bug: high CPU usage for stats collector in 8.2

2007-02-28 Thread Joshua D. Drake
Joshua D. Drake wrote:
 Darcy Buskermolen wrote:
 I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats 
 collector 
 on an 8.2.3 box  investigation has lead me to belive that the stats file is 
 written a lot more often that once every 500ms  the following shows this 
 behavior.

I have just done a test separate from darcy on my workstation:


[EMAIL PROTECTED]:~/82$ strace -c -p 16130
Process 16130 attached - interrupt to quit
Process 16130 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 50.000.167103   3 60264 10225 poll
 28.310.094600   9 10398   rename
 16.890.056457   5 10398   open
  2.480.008278   0 50023   recvfrom
  1.140.003804   0 10398   setitimer
  0.530.001781   0 20796   write
  0.430.001432   0 10398   close
  0.210.000690   0 10398   munmap
  0.020.57   0 10398   mmap
  0.000.00   0 10398   fstat
  0.000.00   0 10398 10225 rt_sigreturn
  0.000.00   0 10414   getppid
-- --- --- - - 
100.000.334202224681 20450 total


Query:

time for i in `bin/psql -p8000 -d postgres -c select
generate_series(1,1)`; do bin/psql -p8000 -d postgres -qc select 1
from pg_database where datname = 'postgres'; done;

Time:

real2m5.077s
user0m28.414s
sys 0m39.762s


PostgreSQL 8.2.0, Ubuntu Edgy 64bit.

Seems like something is extremely wonky here.

Joshua D. Drake







 PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
 20060404 (Red Hat 3.4.6-3)

 I ran a 
 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done

 which took
 real1m23.288s
 user0m24.142s
 sys 0m21.536s


 to execute, during which time I ran a strace on the stats collector which 
 produces the following output.  From this it looks like the stats file is 
 getting rewritten for each connection teardown, not just every 500ms.

 Process 10061 attached - interrupt to quit
 Process 10061 detached
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  68.14   28.811963  17   1663827   write
  18.227.701885 123 62808 12793 poll
  11.314.783082 365 13101   rename
   0.580.246169   5 50006   recvfrom
   0.570.241073  18 13101   open
   0.430.182816  14 13101   munmap
   0.180.076176   6 13101   mmap
   0.170.072746   6 13101   close
   0.140.060483   5 13101   setitimer
   0.100.041344   3 13101 12793 rt_sigreturn
   0.090.039240   3 13101   fstat
   0.060.024041   2 13110   getppid
 -- --- --- - - 
 100.00   42.281018   1894559 25586 total

 As you can see rename was called more than the theroitcal 167  times for 
 500ms 
 slices that elapsed during the test

 Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
 gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
 [All be it this is slower hardware..]

 time for i in `psql -qtc select generate_series(1,1)`; do psql  -qtc 
 select 1 from test where msg_id=$i ; done

 which took
 real9m25.380s
 user6m51.254s
 sys 1m47.687s
 (and therefor should be about 1130 stat write cycles)

 and yielded the following strace

 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  93.64   20.422006 334 61212   select
   3.490.760963   7110192   read
   1.820.396654  19 21128   write
   0.640.139679 126  1112   rename
   0.270.057970  52  1112   open
   0.060.012177  11  1112   munmap
   0.040.008901   8  1112   mmap
   0.030.006402   6  1112   close
   0.020.004282   4  1112   fstat
 -- --- --- - - 
 100.00   21.809034199204   total



 During this run the stats collector does not even show and CPU usage 
 according 
 to top.


 both 8.1 and 8.2 have the following postgresql.conf parameters

 stats_command_string =  off
 stats_start_collector = on
 stats_block_level = on
 stats_row_level = on



 
 


-- 

  === The PostgreSQL