Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-29 Thread Anders K. Pedersen
scott.marlowe wrote:
Just to add to the clutch here, also check your bdflush settings (if 
you're on linux) or equivalent (if you're not.)

Many times the swapping algo in linux can be quite bursty if you have it 
set to move too many pages at a time during cleanup / flush.
According to vmstat it doesn't swap near the spikes, so I don't think 
this is the problem. I posted a vmstat sample in another reply, where 
you can see an example of what happens.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Hello,

We're running a set of Half-Life based game servers that lookup user 
privileges from a central PostgreSQL 7.3.4 database server (I recently 
ported the MySQL code in Adminmod to PostgreSQL to be able to do this).

The data needed by the game servers are combined from several different 
tables, so we have some views set up to provide the data in the format 
needed.

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.

This is rather critical, as the game server software isn't asynchonous 
and thus waits for a reply before continuing, so when someone connects, 
and the user lookup happens to have one of these very long durations, 
the players on this server experience a major lag spike, which isn't 
very popular :-(

All the game servers and the database server are connected to the same 
switch, so I don't think, that it is a network problem.

So far I've been unable to locate the problem, so any suggestions are 
very welcome.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Rod Taylor
 Currently there's only a few users in the database for testing purposes, 
 and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
 queries and duration turned on), but several times per hour the duration 
 for one of the queries is 2-3 seconds (1000 times larger), while the 
 surrounding lookups take the usual 2-3 ms.

Are there any other jobs running at the time of these excessive queries?


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 1:07, Anders K. Pedersen wrote:

 Hello,
 
 We're running a set of Half-Life based game servers that lookup user 
 privileges from a central PostgreSQL 7.3.4 database server (I recently 
 ported the MySQL code in Adminmod to PostgreSQL to be able to do this).
 
 The data needed by the game servers are combined from several different 
 tables, so we have some views set up to provide the data in the format 
 needed.
 
 Currently there's only a few users in the database for testing purposes, 
 and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
 queries and duration turned on), but several times per hour the duration 
 for one of the queries is 2-3 seconds (1000 times larger), while the 
 surrounding lookups take the usual 2-3 ms.

Check vmstat during the same period if it is syncing at that point as Tom 
suggested.

Are you using pooled connections? If yes you could shorten life of a connection 
and force making a new connection every 10-15 minutes say. That would avoid IO 
avelanche at the end of the hour types.

HTH.

Bye
 Shridhar

--
ignorance, n.:  When you don't know anything, and someone else finds out.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Russell Garrett
We have a somewhat similar situation - we're running a fairly constant, but
low priority, background load of about 70 selects and 40 inserts per second
(batched into fairly large transactions), and on top of that we're trying to
run time-sensitive queries for a web site (well two). I should emphasize
that this is low low priority - if a query is delayed by an hour here, it
doesn't matter.

The web site queries will jump up one or two orders of magnitude (I have
seen a normally 100ms query take in excess of 30 seconds) in duration at
seemingly random points. It's not always when the transactions are
committing, and it doesn't seem to be during checkpointing either. The same
thing happens with WAL switched off. It appears to happen the first time the
query runs after a while. If I run the same query immediately afterwards, it
will take the normal amount of time.

Any ideas?

Cheers,

Russ Garrett

[EMAIL PROTECTED] wrote:
 Subject: [PERFORM] Queries sometimes take 1000 times the normal time


 Hello,

 We're running a set of Half-Life based game servers that lookup user
 privileges from a central PostgreSQL 7.3.4 database server (I recently
 ported the MySQL code in Adminmod to PostgreSQL to be able to do
 this).

 The data needed by the game servers are combined from several
 different tables, so we have some views set up to provide the data in
 the format needed.

 Currently there's only a few users in the database for testing
 purposes, and most of the time the user lookup's take 2-3 ms (I have
 syslog'ing of queries and duration turned on), but several times per
 hour the duration for one of the queries is 2-3 seconds (1000 times
 larger), while the surrounding lookups take the usual 2-3 ms.

 This is rather critical, as the game server software isn't asynchonous
 and thus waits for a reply before continuing, so when someone
 connects, and the user lookup happens to have one of these very long
 durations, the players on this server experience a major lag spike,
 which isn't very popular :-(

 All the game servers and the database server are connected to the same
 switch, so I don't think, that it is a network problem.

 So far I've been unable to locate the problem, so any suggestions are
 very welcome.

 Regards,
 Anders K. Pedersen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Shridhar Daithankar
On 28 Aug 2003 at 10:02, Russell Garrett wrote:

 The web site queries will jump up one or two orders of magnitude (I have
 seen a normally 100ms query take in excess of 30 seconds) in duration at
 seemingly random points. It's not always when the transactions are
 committing, and it doesn't seem to be during checkpointing either. The same
 thing happens with WAL switched off. It appears to happen the first time the
 query runs after a while. If I run the same query immediately afterwards, it
 will take the normal amount of time.

Looks like it got flushed out of every type of cache and IO scheduler could not 
deliver immediately because of other loads...


Bye
 Shridhar

--
Abstainer, n.:  A weak person who yields to the temptation of denying himself a 
pleasure.   -- Ambrose Bierce, The Devil's Dictionary


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Russell Garrett
 The web site queries will jump up one or two orders of magnitude (I
 have seen a normally 100ms query take in excess of 30 seconds) in
 duration at seemingly random points. It's not always when the
 transactions are committing, and it doesn't seem to be during
 checkpointing either. The same thing happens with WAL switched off.
 It appears to happen the first time the query runs after a while. If
 I run the same query immediately afterwards, it will take the normal
 amount of time.

 Looks like it got flushed out of every type of cache and IO scheduler
 could not deliver immediately because of other loads...

Yeah, I wasn't sure what (or how) Postgres caches. The db server does have
2Gb of memory, but then again the database amounts to more than 2Gb, so it's
fairly possible it's getting pushed out of cache. It's also fairly possible
that it's not tuned completely optimally. I wonder if FreeBSD/kernel 2.6
would perform better in such a situation...

Russ


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Shridhar Daithankar wrote:
On 28 Aug 2003 at 1:07, Anders K. Pedersen wrote:
We're running a set of Half-Life based game servers that lookup user 
privileges from a central PostgreSQL 7.3.4 database server (I recently 
ported the MySQL code in Adminmod to PostgreSQL to be able to do this).

The data needed by the game servers are combined from several different 
tables, so we have some views set up to provide the data in the format 
needed.

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.


Check vmstat during the same period if it is syncing at that point as Tom 
suggested.
I've been running a vmstat 1 logging process for a while now, and the 
sample below shows what happende around one of these spikes - at 
18:18:03 specifically (actually there were two 1 second long queries, 
that finished at 18:18:03).

Thu Aug 28 18:17:53 2003 0  0  0  40904   4568  22288 404352   0   0 
12 0  181   362  2   1  97
Thu Aug 28 18:17:54 2003 0  0  0  40904   4580  22260 404380   0   0 
128 0  205   330  2   3  95
Thu Aug 28 18:17:55 2003 0  0  2  40904   4576  22264 404380   0   0 
 0   284  224   127  0   1  99
Thu Aug 28 18:17:56 2003 0  0  2  40904   5008  22268 404512   0   0 
128   728  571   492  2   3  95
Thu Aug 28 18:17:57 2003 0  0  1  40904   5000  22276 404512   0   0 
 0   120  201   181  1   0  99
Thu Aug 28 18:17:58 2003 0  0  1  40904   4936  22284 404528   0   0 
 8 0 1147  2204 12   3  85
Thu Aug 28 18:17:59 2003 0  0  0  40904   4784  22304 404660   0   0 
148 0 2112  3420  2   3  95
Thu Aug 28 18:18:00 2003 1  1  3  40904   4760  22324 404664   0   0 
20   456 2374  3277  2   1  97
Thu Aug 28 18:18:01 2003 0  2 10  40904   4436  22000 401456   0   0 
144   540  510   457 11   6  83
Thu Aug 28 18:18:02 2003 1  1  2  40904   8336  22032 401512   0   0 
68   676 1830  2540  4   3  93
Thu Aug 28 18:18:04 2003 1  0  1  40904   8160  22052 401664   0   0 
140   220 2308  3253  2   3  95
Thu Aug 28 18:18:05 2003 0  0  1  40904   7748  22064 402064   0   0 
288 0 1941  2856  1   3  96
Thu Aug 28 18:18:06 2003 0  0  3  40904   6704  22064 403100   0   0 
496   992 2326  3510  0   5  95
Thu Aug 28 18:18:07 2003 1  0  0  40904   6324  22088 402716   0   0 
260   188 1984  2927 11   4  85
Thu Aug 28 18:18:08 2003 0  0  0  40904   6920  22088 402828   0   0 
72 0  419  1473 17   5  78
Thu Aug 28 18:18:09 2003 0  0  0  40904   6784  22088 402964   0   0 
128 0  235   476  2   1  97
Thu Aug 28 18:18:10 2003 0  0  1  40904   6404  22088 402980   0   0 
 0 0  343   855 14   2  84

As this shows, some disk I/O and an increased amount of interrupts and 
context switches is taking place at this time, and this also happens at 
the same time as all the other long queries I examined. However, vmstat 
also shows this pattern at a lot of other times, where the queries 
aren't affected by it.

Are you using pooled connections? If yes you could shorten life of a connection 
and force making a new connection every 10-15 minutes say. That would avoid IO 
avelanche at the end of the hour types.
I'm not quite sure, what you mean by pooled connections. Each game 
server has one connection to the PostgreSQL server, which is opened, 
when the server is first started, and then never closed (until the game 
server terminates, but there's days between this happens).

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Tom Lane wrote:
Anders K. Pedersen [EMAIL PROTECTED] writes:

Currently there's only a few users in the database for testing purposes, 
and most of the time the user lookup's take 2-3 ms (I have syslog'ing of 
queries and duration turned on), but several times per hour the duration 
for one of the queries is 2-3 seconds (1000 times larger), while the 
surrounding lookups take the usual 2-3 ms.


One thing that comes to mind is that the slow query could be occurring
at the same time as a checkpoint, or some other cycle-chewing background
operation.  It's not clear why a checkpoint would slow things down that
much, though.  Anyway I'd suggest looking for such activities; once we
know if that's the issue or not, we can make some progress.
One of my colleagues suggested looking for checkpoints as well; I 
searched the log, but only the following messages turned up:

Aug 11 15:21:04 gs1 postgres[5447]: [2] LOG:  checkpoint record is at 
0/80193C
Aug 23 13:59:51 gs1 postgres[16451]: [2] LOG:  checkpoint record is at 
0/201EB74
Aug 25 02:48:17 gs1 postgres[1059]: [2] LOG:  checkpoint record is at 
0/2B787D0

Currently there are only relatively few changes to the database - one 
INSERT everytime one of our game admins executes an administrative 
command (like ban or kick), and this happens at most 10 times per hour. 
As I understand checkpoints, this should mean, that they aren't 
happening very often, and when they do, should be able to finish almost 
immediately.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Rod Taylor
 With regards to other jobs on the server, there is a MySQL server on it 
 as well, which from time to time has some multi-second queries generated 
 from a webserver also on this host, but the MySQL is running with nice 
 10 (PostgreSQL isn't nice'd).

Do those MySQL queries hit disk hard?

I've never seen PostgreSQL have hicups like you describe when running on
a machine by itself.  I have experienced similar issues when another
process (cron job in my case) caused brief swapping to occur.


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Queries sometimes take 1000 times the normal time

2003-08-28 Thread Anders K. Pedersen
Rod Taylor wrote:
With regards to other jobs on the server, there is a MySQL server on it 
as well, which from time to time has some multi-second queries generated 
from a webserver also on this host, but the MySQL is running with nice 
10 (PostgreSQL isn't nice'd).
Do those MySQL queries hit disk hard?
I guess they may be able to do so - the MySQL database is 450 MB, and 
the server has 512 MB RAM, and some of the queries pretty summarizes 
everything in the database.

However, I just cross-referenced the access logs from the webserver with 
the duration logs, and although some of the spikes did happen, while 
there would have been some MySQL activity (I can't tell for sure, if it 
was simple queries or the long ones), other spikes happened without any 
website activity in the surrounding minutes.

I've never seen PostgreSQL have hicups like you describe when running on
a machine by itself.  I have experienced similar issues when another
process (cron job in my case) caused brief swapping to occur.
OK. I may have to try to put the database on a separate server.

Regards,
Anders K. Pedersen
---(end of broadcast)---
TIP 8: explain analyze is your friend