[HACKERS] pgstat wait timeout (RE: contrib/cache_scan)

2014-03-12 Thread Kouhei Kaigai
It is another topic from the main thread,

I noticed the following message under the test cases that
takes heavy INSERT workload; provided by Haribabu.

[kaigai@iwashi ~]$ createdb mytest
[kaigai@iwashi ~]$ psql -af ~/cache_scan.sql mytest
\timing
Timing is on.
--cache scan select 5 million
create table test(f1 int, f2 char(70), f3 float, f4 char(100));
CREATE TABLE
Time: 22.373 ms
truncate table test;
TRUNCATE TABLE
Time: 17.705 ms
insert into test values (generate_series(1,500), 'fujitsu', 1.1, 'Australia 
software tech pvt ltd');
WARNING:  pgstat wait timeout
WARNING:  pgstat wait timeout
WARNING:  pgstat wait timeout
WARNING:  pgstat wait timeout
   :

Once I got above messages, write performance is dramatically
degraded, even though I didn't take detailed investigation.

I could reproduce it on the latest master branch without my
enhancement, so I guess it is not a problem something special
to me.
One other strangeness is, right now, this problem is only
happen on my virtual machine environment - VMware ESXi 5.5.0.
I couldn't reproduce the problem on my physical environment
(Fedora20, core i5-4570S).
Any ideas?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Wednesday, March 12, 2014 3:26 PM
 To: Haribabu Kommi; Kohei KaiGai
 Cc: Tom Lane; PgHacker; Robert Haas
 Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for cache-only
 table scan?)
 
 Thanks for your efforts!
   Head  patched
  Diff
  Select -  500K772ms2659ms-200%
  Insert - 400K   3429ms 1948ms  43% (I am
  not sure how it improved in this case)
  delete - 200K 2066ms 3978ms-92%
  update - 200K3915ms  5899ms-50%
 
  This patch shown how the custom scan can be used very well but coming
  to patch as It is having some performance problem which needs to be
  investigated.
 
  I attached the test script file used for the performance test.
 
 First of all, it seems to me your test case has too small data set that
 allows to hold all the data in memory - briefly 500K of 200bytes record
 will consume about 100MB. Your configuration allocates 512MB of
 shared_buffer, and about 3GB of OS-level page cache is available.
 (Note that Linux uses free memory as disk cache adaptively.)
 
 This cache is designed to hide latency of disk accesses, so this test case
 does not fit its intention.
 (Also, the primary purpose of this module is a demonstration for
 heap_page_prune_hook to hook vacuuming, so simple code was preferred than
 complicated implementation but better performance.)
 
 I could reproduce the overall trend, no cache scan is faster than cached
 scan if buffer is in memory. Probably, it comes from the cost to walk down
 T-tree index using ctid per reference.
 Performance penalty around UPDATE and DELETE likely come from trigger
 invocation per row.
 I could observe performance gain on INSERT a little bit.
 It's strange for me, also. :-(
 
 On the other hand, the discussion around custom-plan interface effects this
 module because it uses this API as foundation.
 Please wait for a few days to rebase the cache_scan module onto the newer
 custom-plan interface; that I submitted just a moment before.
 
 Also, is it really necessary to tune the performance stuff in this example
 module of the heap_page_prune_hook?
 Even though I have a few ideas to improve the cache performance, like
 insertion of multiple rows at once or local chunk copy instead of t-tree
 walk down, I'm not sure whether it is productive in the current v9.4
 timeframe. ;-(
 
 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
 kai...@ak.jp.nec.com
 
 
  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Haribabu
  Kommi
  Sent: Wednesday, March 12, 2014 1:14 PM
  To: Kohei KaiGai
  Cc: Kaigai Kouhei(海外 浩平); Tom Lane; PgHacker; Robert Haas
  Subject: Re: contrib/cache_scan (Re: [HACKERS] What's needed for
  cache-only table scan?)
 
  On Thu, Mar 6, 2014 at 10:15 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
   2014-03-06 18:17 GMT+09:00 Haribabu Kommi kommi.harib...@gmail.com:
   I will update you later regarding the performance test results.
  
 
  I ran the performance test on the cache scan patch and below are the
 readings.
 
  Configuration:
 
  Shared_buffers - 512MB
  cache_scan.num_blocks - 600
  checkpoint_segments - 255
 
  Machine:
  OS - centos - 6.4
  CPU - 4 core 2.5 GHZ
  Memory - 4GB
 
   Head  patched
  Diff
  Select -  500K772ms2659ms-200%
  Insert - 400K   3429ms 1948ms  43% (I am
  not sure how it 

Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)

2014-03-12 Thread Tom Lane
Kouhei Kaigai kai...@ak.jp.nec.com writes:
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout

 Once I got above messages, write performance is dramatically
 degraded, even though I didn't take detailed investigation.

 I could reproduce it on the latest master branch without my
 enhancement, so I guess it is not a problem something special
 to me.
 One other strangeness is, right now, this problem is only
 happen on my virtual machine environment - VMware ESXi 5.5.0.
 I couldn't reproduce the problem on my physical environment
 (Fedora20, core i5-4570S).

We've seen sporadic reports of that sort of behavior for years, but no
developer has ever been able to reproduce it reliably.  Now that you've
got a reproducible case, do you want to poke into it and see what's going
on?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)

2014-03-12 Thread Tomas Vondra
On 12 Březen 2014, 14:54, Kouhei Kaigai wrote:
 It is another topic from the main thread,

 I noticed the following message under the test cases that
 takes heavy INSERT workload; provided by Haribabu.

 [kaigai@iwashi ~]$ createdb mytest
 [kaigai@iwashi ~]$ psql -af ~/cache_scan.sql mytest
 \timing
 Timing is on.
 --cache scan select 5 million
 create table test(f1 int, f2 char(70), f3 float, f4 char(100));
 CREATE TABLE
 Time: 22.373 ms
 truncate table test;
 TRUNCATE TABLE
 Time: 17.705 ms
 insert into test values (generate_series(1,500), 'fujitsu', 1.1,
 'Australia software tech pvt ltd');
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout
 WARNING:  pgstat wait timeout
:

 Once I got above messages, write performance is dramatically
 degraded, even though I didn't take detailed investigation.

 I could reproduce it on the latest master branch without my
 enhancement, so I guess it is not a problem something special
 to me.
 One other strangeness is, right now, this problem is only
 happen on my virtual machine environment - VMware ESXi 5.5.0.
 I couldn't reproduce the problem on my physical environment
 (Fedora20, core i5-4570S).
 Any ideas?

I've seen this happening in cases when it was impossible to write
the stat file for some reason. IIRC there were two basic causes I've seen
in the past:

(1) writing the stat copy failed - for example when the temporary stat
directory was placed in tmpfs, but it was too small

(2) writing the stat copy took too long - e.g. with tmpfs and memory
pressure, forcing the system to swap to free space for the stat copy

(3) IIRC the inquiry (backend - postmaster) to write the file is sent
using UDP, which may be dropped in some cases (e.g. when the system is
overloaded), so the postmaster does not even know it should write the file

I'm not familiar with VMware ESXi virtualization, but I suppose it might
be relevant to all three causes.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)

2014-03-12 Thread Jeff Janes
On Wed, Mar 12, 2014 at 7:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Kouhei Kaigai kai...@ak.jp.nec.com writes:
  WARNING:  pgstat wait timeout
  WARNING:  pgstat wait timeout
  WARNING:  pgstat wait timeout
  WARNING:  pgstat wait timeout

  Once I got above messages, write performance is dramatically
  degraded, even though I didn't take detailed investigation.

  I could reproduce it on the latest master branch without my
  enhancement, so I guess it is not a problem something special
  to me.
  One other strangeness is, right now, this problem is only
  happen on my virtual machine environment - VMware ESXi 5.5.0.
  I couldn't reproduce the problem on my physical environment
  (Fedora20, core i5-4570S).

 We've seen sporadic reports of that sort of behavior for years, but no
 developer has ever been able to reproduce it reliably.  Now that you've
 got a reproducible case, do you want to poke into it and see what's going
 on?


I didn't know we were trying to reproduce it, nor that it was a mystery.
 Do anything that causes serious IO constipation, and you will probably see
that message.  For example, turn off synchronous_commit and run the default
pgbench transaction at a large scale but that still comfortably fits in
RAM, and wait for a checkpoint sync phase to kick in.

The pgstat wait timeout is a symptom, not the cause.

Cheers,

Jeff


Re: [HACKERS] pgstat wait timeout (RE: contrib/cache_scan)

2014-03-12 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Wed, Mar 12, 2014 at 7:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 We've seen sporadic reports of that sort of behavior for years, but no
 developer has ever been able to reproduce it reliably.  Now that you've
 got a reproducible case, do you want to poke into it and see what's going
 on?

 I didn't know we were trying to reproduce it, nor that it was a mystery.
  Do anything that causes serious IO constipation, and you will probably see
 that message.

The cases that are a mystery to me are where there's no reason to believe
that I/O is particularly overloaded.  But perhaps Kaigai-san's example is
only that ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-12 Thread Magnus Hagander
On Fri, May 11, 2012 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, May 10, 2012 at 6:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
 nasty bug: it does not reset ImmediateInterruptOK before returning.
 How is it that Windows machines aren't falling over constantly?

 Hmm. the commit you made to fix it says it changes how
 ImmediateInterruptOK is handled, but there was not a single line of
 code that actually changed that? Or am I misreading this completely?

 Exit is now out the bottom of the loop, not by a raw return;.

oh, d'uh. Sorry, missed that one completely.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-11 Thread Magnus Hagander
On Thu, May 10, 2012 at 6:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Hence I think we oughta swap the order of those two array
 elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
 pgwin32_select.)

 Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
 nasty bug: it does not reset ImmediateInterruptOK before returning.
 How is it that Windows machines aren't falling over constantly?

Hmm. the commit you made to fix it says it changes how
ImmediateInterruptOK is handled, but there was not a single line of
code that actually changed that? Or am I misreading this completely?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-11 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, May 10, 2012 at 6:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
 nasty bug: it does not reset ImmediateInterruptOK before returning.
 How is it that Windows machines aren't falling over constantly?

 Hmm. the commit you made to fix it says it changes how
 ImmediateInterruptOK is handled, but there was not a single line of
 code that actually changed that? Or am I misreading this completely?

Exit is now out the bottom of the loop, not by a raw return;.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-10 Thread Tom Lane
I wrote:
 Last night I changed the stats collector process to use
 WaitLatchOrSocket instead of a periodic forced wakeup to see whether
 the postmaster has died.  This morning I observe that several Windows
 buildfarm members are showing regression test failures caused by
 unexpected pgstat wait timeout warnings.  Everybody else is fine.

 This suggests that there is something broken in the Windows
 implementation of WaitLatchOrSocket.  I wonder whether it also
 tells us something we did not know about the underlying cause of
 those messages.  Not sure what though.  Ideas?  Can anyone who
 knows Windows take another look at WaitLatchOrSocket?

Anybody have any clues about that?  If not, I think I'll have to revert
the pgstat changes for beta1, which isn't really forward progress.

I spent some time staring at the Windows WaitLatchOrSocket code myself.
The only thing I could find that seemed wrong is that in the event
array, we list the latch's event before pgwin32_signal_event.  The
Microsoft documentation I looked at says that if more than one event
is ready, WaitforMultipleObjects reports the first such array member.
This means that if the latch is already set when control gets here,
signal handlers will not be serviced.  That doesn't match what would
happen on a Unix machine, so it seems like at least a violation of the
POLA.  Hence I think we oughta swap the order of those two array
elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
pgwin32_select.)  I do not however see a way that that would explain the
pgstat failures, because the stats collector's latch really shouldn't
ever get set during normal regression test runs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-10 Thread Magnus Hagander
On May 10, 2012 4:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  Last night I changed the stats collector process to use
  WaitLatchOrSocket instead of a periodic forced wakeup to see whether
  the postmaster has died.  This morning I observe that several Windows
  buildfarm members are showing regression test failures caused by
  unexpected pgstat wait timeout warnings.  Everybody else is fine.

  This suggests that there is something broken in the Windows
  implementation of WaitLatchOrSocket.  I wonder whether it also
  tells us something we did not know about the underlying cause of
  those messages.  Not sure what though.  Ideas?  Can anyone who
  knows Windows take another look at WaitLatchOrSocket?

 Anybody have any clues about that?  If not, I think I'll have to revert
 the pgstat changes for beta1, which isn't really forward progress.

Haven't had time to look at the code itself, and won't before wrap time.
Sorry.

 I spent some time staring at the Windows WaitLatchOrSocket code myself.
 The only thing I could find that seemed wrong is that in the event
 array, we list the latch's event before pgwin32_signal_event.  The
 Microsoft documentation I looked at says that if more than one event
 is ready, WaitforMultipleObjects reports the first such array member.
 This means that if the latch is already set when control gets here,
 signal handlers will not be serviced.

Yeah, that does seem wrong.

  That doesn't match what would
 happen on a Unix machine, so it seems like at least a violation of the
 POLA.  Hence I think we oughta swap the order of those two array
 elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
 pgwin32_select.)  I do not however

Maybe we need a loop that checks for all events?

 see a way that that would explain the
 pgstat failures, because the stats collector's latch really shouldn't
 ever get set during normal regression test runs.

So could there be something wrong in the other end, meaning the latch
*does* get set?

/Magnus


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-10 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On May 10, 2012 4:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I spent some time staring at the Windows WaitLatchOrSocket code myself.
 The only thing I could find that seemed wrong is that in the event
 array, we list the latch's event before pgwin32_signal_event.  The
 Microsoft documentation I looked at says that if more than one event
 is ready, WaitforMultipleObjects reports the first such array member.
 This means that if the latch is already set when control gets here,
 signal handlers will not be serviced.

 Yeah, that does seem wrong.

 That doesn't match what would
 happen on a Unix machine, so it seems like at least a violation of the
 POLA.  Hence I think we oughta swap the order of those two array
 elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
 pgwin32_select.)  I do not however

 Maybe we need a loop that checks for all events?

I don't think so.  It's already the case that WaitLatch doesn't
guarantee that all possible flags are set in its result.  In connection
with Peter G's observation that we could simplify the API by rechecking
PostmasterIsAlive for WL_POSTMASTER_DEATH, I was planning to clarify
the API spec as result bits that are set are guaranteed to reflect
reality, but it's not guaranteed that we set every bit that could
possibly be set.  This should not break any caller since the same
result could occur given a slight change in timing anyway; the caller
has to be prepared to come back and check for more conditions after it
services whatever WaitLatch does report.  However, signal service is
not a condition the caller is supposed to deal with, so I think we
want a guarantee that that happens inside WaitLatch.

 see a way that that would explain the
 pgstat failures, because the stats collector's latch really shouldn't
 ever get set during normal regression test runs.

 So could there be something wrong in the other end, meaning the latch
 *does* get set?

Even if it did, it'd get cleared at the top of the loop, so that the
next call ought to handle things.  Tis a puzzlement.  AFAICS the only
condition WaitforMultipleObjects is going to see in these tests is
read-ready on the socket; surely it wouldn't fail to notice that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-10 Thread Tom Lane
I wrote:
 Hence I think we oughta swap the order of those two array
 elements.  (Same issue in PGSemaphoreLock, btw, and I'm suspicious of
 pgwin32_select.)

Oh ... while hacking win32 PGSemaphoreLock I saw that it has a *seriously*
nasty bug: it does not reset ImmediateInterruptOK before returning.
How is it that Windows machines aren't falling over constantly?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat wait timeout just got a lot more common on Windows

2012-05-09 Thread Tom Lane
Last night I changed the stats collector process to use
WaitLatchOrSocket instead of a periodic forced wakeup to see whether
the postmaster has died.  This morning I observe that several Windows
buildfarm members are showing regression test failures caused by
unexpected pgstat wait timeout warnings.  Everybody else is fine.

This suggests that there is something broken in the Windows
implementation of WaitLatchOrSocket.  I wonder whether it also
tells us something we did not know about the underlying cause of
those messages.  Not sure what though.  Ideas?  Can anyone who
knows Windows take another look at WaitLatchOrSocket?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-31 Thread pratikchirania
Hi,

I Disabled autovacuuming and the warnings stopped logging.
After enabling Autovacuuming, the warnings again started logging.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5444033.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-23 Thread pratikchirania
Hi,

Any ideas on this?

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5165651.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania
I have installed RAMdisk and pointed the parameter:

#stats_temp_directory = 'B:\pg_stat_tmp'
I also tried #stats_temp_directory = 'B:/pg_stat_tmp'

But, still there is no file created in the RAM disk.
The previous stat file is touched even after the change is made. (I have
restarted the service after effecting the change)

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5119436.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread Tomas Vondra
On 4 Leden 2012, 13:17, pratikchirania wrote:
 I have installed RAMdisk and pointed the parameter:

 #stats_temp_directory = 'B:\pg_stat_tmp'
 I also tried #stats_temp_directory = 'B:/pg_stat_tmp'

 But, still there is no file created in the RAM disk.
 The previous stat file is touched even after the change is made. (I have
 restarted the service after effecting the change)

You have to remove the '#' at the beginning, this way it's commented out.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2012-01-04 Thread pratikchirania
Thanks, i missed that.

After doing these changes, following is the observation:

1. The size of the pgstat file is 86KB. Last edited was when i moved the
file location to RAMdisk.
2. The issue persists. I am seeing continuous logs:

2012-01-05 00:00:06 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:14 JST WARNING:  pgstat wait timeout
2012-01-05 00:00:26 JST WARNING:  pgstat wait timeout
.
.
.
2012-01-05 15:36:25 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:37 JST WARNING:  pgstat wait timeout
2012-01-05 15:36:45 JST WARNING:  pgstat wait timeout


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5121894.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Alvaro Herrera

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
 I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is 
 currently in test/dev mode. I'm currently seeing the following messages 
 occurring every few seconds:
 
 ...
 Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
 Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait 
 timeout
 Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
 Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait 
 timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Steve Crawford

On 12/28/2011 05:05 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:

I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
currently in test/dev mode. I'm currently seeing the following messages
occurring every few seconds:

...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
timeout
Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.

You are correct. 27324 is the launcher and the others are autovac 
workers. Here's the strace of the stats collector process:


getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
rinse...lather...repeat...ad nauseum...

And the backtrace:

#0  0x7ff4d2e80f58 in poll () from /lib/libc.so.6
#1  0x7ff4d4e6f465 in ?? ()
#2  0x7ff4d4e6fd83 in pgstat_start ()
#3  0x7ff4d4e73475 in ?? ()
#4 signal handler called
#5  0x7ff4d2e85fd3 in select () from /lib/libc.so.6
#6  0x7ff4d4e71b93 in ?? ()
#7  0x7ff4d4e74b01 in PostmasterMain ()
#8  0x7ff4d4e193b3 in main ()

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Alvaro Herrera

Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
 On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
  Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
  I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
  currently in test/dev mode. I'm currently seeing the following messages
  occurring every few seconds:
 
  ...
  Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
  Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
  timeout
  Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
  Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
  timeout
  Hm, so can you strace the stats collector to see what it's doing?  Maybe
  grab a backtrace with GDB from it before anything else.
 
  My guess is 27324 is the autovac launcher and the others are autovac
  workers just as they die.
 
 You are correct. 27324 is the launcher and the others are autovac 
 workers. Here's the strace of the stats collector process:
 
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 getppid()   = 27320
 poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
 rinse...lather...repeat...ad nauseum...

Weird ... even across more pgstat wait timeout messages?  It's like
it's not getting the inquiry messages that would tell it to write the
file ... something wrong with the UDP socket perhaps?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-28 Thread Steve Crawford

On 12/28/2011 09:34 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:

On 12/28/2011 05:05 AM, Alvaro Herrera wrote:

Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:

I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
currently in test/dev mode. I'm currently seeing the following messages
occurring every few seconds:

...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait
timeout
Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait
timeout

Hm, so can you strace the stats collector to see what it's doing?  Maybe
grab a backtrace with GDB from it before anything else.

My guess is 27324 is the autovac launcher and the others are autovac
workers just as they die.


You are correct. 27324 is the launcher and the others are autovac
workers. Here's the strace of the stats collector process:

getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
getppid()   = 27320
poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
rinse...lather...repeat...ad nauseum...

Weird ... even across more pgstat wait timeout messages?  It's like
it's not getting the inquiry messages that would tell it to write the
file ... something wrong with the UDP socket perhaps?


Bingo!

postgres  27325 postgres8u *IPv6*5379428   
0t0UDP localhost:47204-localhost:47204


In working on diagnosing a network timeout issue over an IPv4 to IPv4 
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot 
about it since we are still IPv4 internally. But PostgreSQL had already 
established a (now non-functional) IPv6 local connection. Re-enabling 
IPv6, as it was not related to the VPN timeouts, corrected the pgstat 
wait timeout issue.


Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat wait timeout

2011-12-27 Thread Steve Crawford
I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is 
currently in test/dev mode. I'm currently seeing the following messages 
occurring every few seconds:


...
Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:43:59 foo postgres[23697]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:04 foo postgres[27324]: [71402-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:09 foo postgres[23715]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:17 foo postgres[27324]: [71403-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:22 foo postgres[23716]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:27 foo postgres[27324]: [71404-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:33 foo postgres[23718]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:44:54 foo postgres[27324]: [71405-1] : WARNING:  pgstat wait 
timeout

Dec 27 17:44:59 foo postgres[23824]: [6-1] : WARNING:  pgstat wait timeout
Dec 27 17:45:04 foo postgres[27324]: [71406-1] : WARNING:  pgstat wait 
timeout


I can't correlate events exactly, but the messages seem to have started 
shortly after I dropped a pgbench user and database. My Googling turned 
up various requests for debugging info on hackers. Since the system 
isn't live, I haven't touched it in case anyone wants me to collect 
debugging info.


Otherwise, I plan on just blowing the install away and replacing it with 9.1

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-20 Thread pratikchirania
Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

I am not aware how to do this. I am using a windows server OS.
The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp'

What do I change it to? Please elucidate.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5088497.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-20 Thread Andrew Dunstan



On 12/20/2011 05:13 AM, pratikchirania wrote:

Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

I am not aware how to do this. I am using a windows server OS.
The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp'

What do I change it to? Please elucidate.



On Windows it appears you need third party software for a ramdisk. 
Search google for info.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread pratikchirania
OS: I am using Windows server 2003

Version upgrade: hmm.. Is there any fix/change related to this issue in
9.0.6?
If yes, I will upgrade in next scheduled downtime (I am using this as
production server)...

postgres queries are very occasionly used (a set of calls once in 30
minutes).. so I guess I am not calling my DB component heavily.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5086379.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 10:02 AM, pratikchirania pratik.chira...@hp.com wrote:
 Version upgrade: hmm.. Is there any fix/change related to this issue in
 9.0.6?

You could read the release notes for those minor version upgrades.

Based on a quick look through the commit logs, and a quick grep of
release-9-0.sgml, I don't think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread Andrew Dunstan



On 12/19/2011 11:45 AM, Robert Haas wrote:

On Mon, Dec 19, 2011 at 10:02 AM, pratikchiraniapratik.chira...@hp.com  wrote:

Version upgrade: hmm.. Is there any fix/change related to this issue in
9.0.6?

You could read the release notes for those minor version upgrades.

Based on a quick look through the commit logs, and a quick grep of
release-9-0.sgml, I don't think so.




Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat wait timeout

2011-12-15 Thread pratikchirania
Hi,

I am having a scenario where I get consistent warnings in the pglog folder:

2011-12-11 00:00:03 JST WARNING:  pgstat wait timeout
2011-12-11 00:00:14 JST WARNING:  pgstat wait timeout
2011-12-11 00:00:24 JST WARNING:  pgstat wait timeout
2011-12-11 00:00:31 JST WARNING:  pgstat wait timeout
2011-12-11 00:00:44 JST WARNING:  pgstat wait timeout
2011-12-11 00:00:52 JST WARNING:  pgstat wait timeout
2011-12-11 00:01:03 JST WARNING:  pgstat wait timeout
2011-12-11 00:01:11 JST WARNING:  pgstat wait timeout
.
.
.

This is impacting database performance.

The issue persists even when I use the database minimally.

I have tried fine-tuning the Auto-vacuum parameters:

Change the parameter autovacuum_vacuum_cost_delay to 40ms ::: Issue is
reproduced 4 hours after this change

Change the parameter autovacuum_max_workers to 20 ::: I got the warning
message 2 times at times 17:20:12 and 17:20:20. After this, no warnings for
5 hours. Then I tried:

Change the parameter autovacuum_vacuum_cost_delay to: 60ms, Change the
parameter autovacuum_max_workers to: 10::: I got the warning message 2 times
at times 17:20:12 and 17:20:20. After this, no warnings for 5 hours

Any Ideas?

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5078125.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-15 Thread Tomas Vondra
On 15 Prosinec 2011, 17:55, pratikchirania wrote:
 Hi,

 I am having a scenario where I get consistent warnings in the pglog
 folder:

 2011-12-11 00:00:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:14 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:24 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:31 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:44 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:52 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:11 JST WARNING:  pgstat wait timeout

 This is impacting database performance.

It's rather a sign that the I/O is overloaded, although in some cases it
may actually be the cause.

 The issue persists even when I use the database minimally.

Yes, because the file is written periodically - twice a second IIRC. If
the file is large, this may be an issue. What is the pgstat.stat size
(should be in data/global).

 I have tried fine-tuning the Auto-vacuum parameters:

Autovacuum has nothing to do with this.

 Any Ideas?

Move the file to a RAM drive - there's even a config parameter
'stats_temp_directory' to do that. See
http://www.postgresql.org/docs/9.1/static/runtime-config-statistics.html

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2011 at 18:13, Tomas Vondra t...@fuzzy.cz wrote:
 On 15 Prosinec 2011, 17:55, pratikchirania wrote:
 Hi,

 I am having a scenario where I get consistent warnings in the pglog
 folder:

 2011-12-11 00:00:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:14 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:24 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:31 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:44 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:52 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:11 JST WARNING:  pgstat wait timeout

 This is impacting database performance.

 It's rather a sign that the I/O is overloaded, although in some cases it
 may actually be the cause.

 The issue persists even when I use the database minimally.

 Yes, because the file is written periodically - twice a second IIRC. If
 the file is large, this may be an issue. What is the pgstat.stat size
 (should be in data/global).

That was only true prior to 8.4. As of 8.4 it's only written when
necessary, which is usually a lot less than twice / second.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-15 Thread Tomas Vondra
On 15 Prosinec 2011, 18:19, Magnus Hagander wrote:
 On Thu, Dec 15, 2011 at 18:13, Tomas Vondra t...@fuzzy.cz wrote:
 On 15 Prosinec 2011, 17:55, pratikchirania wrote:
 Hi,

 I am having a scenario where I get consistent warnings in the pglog
 folder:

 2011-12-11 00:00:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:14 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:24 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:31 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:44 JST WARNING:  pgstat wait timeout
 2011-12-11 00:00:52 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:03 JST WARNING:  pgstat wait timeout
 2011-12-11 00:01:11 JST WARNING:  pgstat wait timeout

 This is impacting database performance.

 It's rather a sign that the I/O is overloaded, although in some cases it
 may actually be the cause.

 The issue persists even when I use the database minimally.

 Yes, because the file is written periodically - twice a second IIRC. If
 the file is large, this may be an issue. What is the pgstat.stat size
 (should be in data/global).

 That was only true prior to 8.4. As of 8.4 it's only written when
 necessary, which is usually a lot less than twice / second.

Thanks for the correction. Nevertheless, it would be useful to know what
is the size of the file and what is the I/O usage.

Pratik, can you post the size of the pgstat.stat file and post a few lines
of iostat -x 1 collected when the pgstat wait timeout happens?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-15 Thread pratikchirania
Size of pgstat.stat file: 86KB

I did not understand the second part. Where do I get iostat -x 1 message?
(Its not present in any file in the pg_log folder)


I am using postgres 9.0.1

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5078391.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-15 Thread Tomas Vondra
On 15 Prosinec 2011, 19:42, pratikchirania wrote:
 Size of pgstat.stat file: 86KB

That's pretty small.

 I did not understand the second part. Where do I get iostat -x 1
 message?
 (Its not present in any file in the pg_log folder)

iostat is not part of PostgreSQL, it's a tool used to display various I/O
metrics in Linux (and Unix in general). What OS are you using?

It seems the I/O subsystem is so busy it can't write the pgstat.stat on
time, so a warning is printed. You need to find out why the I/O is so
overutilized.

 I am using postgres 9.0.1

That's way too old. Upgrade to 9.0.6.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout warnings

2011-08-11 Thread Bernd Helmle



--On 10. August 2011 21:54:06 +0300 Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:



So my theory is that if the I/O is really busy, write() on the stats file
blocks for more than 5 seconds, and you get the timeout.


I've seen it on customer instances with very high INSERT peak loads (several 
dozens backends INSERTing/UPDATEing data concurrently). We are using a RAM disk 
for stats_temp_directory now for a while, and the timeout never occured again.


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout warnings

2011-08-11 Thread Andres Freund
On Thursday, August 11, 2011 11:49:12 Bernd Helmle wrote:
 --On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
 
 heikki.linnakan...@enterprisedb.com wrote:
  So my theory is that if the I/O is really busy, write() on the stats
  file
  blocks for more than 5 seconds, and you get the timeout.
 
 I've seen it on customer instances with very high INSERT peak loads (several
 dozens backends INSERTing/UPDATEing data concurrently). We are using a RAM
 disk for stats_temp_directory now for a while, and the timeout never
 occured again.
Yes, I have seen it several times as well. I can actually reproduce it without 
much problems, so if you have some idea to test...

I also routinely use stats_temp_directory + tmpfs to solve this (and related 
issues). I really think the stats file mechanism should be improved 
fundamentally.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout warnings

2011-08-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 --On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 So my theory is that if the I/O is really busy, write() on the stats file
 blocks for more than 5 seconds, and you get the timeout.

 Yes, I have seen it several times as well. I can actually reproduce it
 without much problems, so if you have some idea to test...

It doesn't surprise me that it's possible to reproduce it under extreme
I/O load.  What I am wondering about is whether there's some bug/effect
that allows it to happen without that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout warnings

2011-08-11 Thread Robert Haas
On Thu, Aug 11, 2011 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@anarazel.de writes:
 --On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 So my theory is that if the I/O is really busy, write() on the stats file
 blocks for more than 5 seconds, and you get the timeout.

 Yes, I have seen it several times as well. I can actually reproduce it
 without much problems, so if you have some idea to test...

 It doesn't surprise me that it's possible to reproduce it under extreme
 I/O load.  What I am wondering about is whether there's some bug/effect
 that allows it to happen without that.

I got it several times during a pgbench -i -s 5000 run this morning.
I guess that's a lot of I/O, but I'm not sure I'd refer to one process
filling a table with data as extreme.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgstat wait timeout warnings

2011-08-10 Thread Tom Lane
We occasionally see $SUBJECT in the buildfarm, and I've also recently
had reports of them from Red Hat customers.  The obvious theory is that
these reflect high load preventing the stats collector from responding,
but it would really take pretty crushing load to make that happen if
there were not anything funny going on.

It struck me just now while reviewing the latch code that pg_usleep
could sleep for less than the expected time if a signal happened, and
if that happened repeatedly for some reason, perhaps the loop could
complete in much less than the nominal time.  I'm not sure I believe
that idea either, but anyway I'm feeling motivated to try to gather more
data.

Does anyone have a problem with sticking a lot of debugging printout
into backend_read_statsfile() in HEAD only?  I'm envisioning it starting
to dump assorted information including elapsed time, errno values, etc
once the loop counter is more than halfway to expiration, which is
already a situation that we shouldn't see under normal conditions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout warnings

2011-08-10 Thread Heikki Linnakangas

On 10.08.2011 21:45, Tom Lane wrote:

We occasionally see $SUBJECT in the buildfarm, and I've also recently
had reports of them from Red Hat customers.  The obvious theory is that
these reflect high load preventing the stats collector from responding,
but it would really take pretty crushing load to make that happen if
there were not anything funny going on.

It struck me just now while reviewing the latch code that pg_usleep
could sleep for less than the expected time if a signal happened, and
if that happened repeatedly for some reason, perhaps the loop could
complete in much less than the nominal time.  I'm not sure I believe
that idea either, but anyway I'm feeling motivated to try to gather more
data.


I've also seen this on my laptop occasionally. The most recent case I 
remember was when I COPYed a lot of data, so that the harddisk was 
really busy. The system was a bit unresponsive anyway, because of all 
the I/O happening.


So my theory is that if the I/O is really busy, write() on the stats 
file blocks for more than 5 seconds, and you get the timeout.



Does anyone have a problem with sticking a lot of debugging printout
into backend_read_statsfile() in HEAD only?  I'm envisioning it starting
to dump assorted information including elapsed time, errno values, etc
once the loop counter is more than halfway to expiration, which is
already a situation that we shouldn't see under normal conditions.


No objections here.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers