Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Tomas Vondra
On 6 Září 2011, 10:26, Gerhard Wohlgenannt wrote:
 Thanks a lot to everybody for their helpful hints!!!

 I am running all these benchmarks while the VMs are up .. with the
 system under something like typical loads ..

 The RAID is hardware based. On of my colleagues will check if there is
 any hardware problem on the RAID (the disks) today, but nothing no
 errors have been reported.

 please find below the results of
 iostat -x 2
 vmstat 2

 hmm, looks like we definitely do have a problem with I/O load?!
 btw: dm-19 is the logical volume where the /var (postgresql) is on ..

Well, it definitely looks like that. Something is doing a lot of writes on
that drive - the drive is 100% utilized, i.e. it's a bottleneck. You need
to find out what is writing the data - try iotop or something like that.

And it's probably the reason why the bonnie results were so poor.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt

Thanks a lot to everybody for their helpful hints!!!

I am running all these benchmarks while the VMs are up .. with the 
system under something like typical loads ..


The RAID is hardware based. On of my colleagues will check if there is 
any hardware problem on the RAID (the disks) today, but nothing no 
errors have been reported.


please find below the results of
iostat -x 2
vmstat 2

hmm, looks like we definitely do have a problem with I/O load?!
btw: dm-19 is the logical volume where the /var (postgresql) is on ..

cheers gerhard

procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa
 1 16 1370892 434996  33840 2893834811   615   31294  
5  2 81 12
 0 15 1370892 440832  33840 2893838000 4   136 2086 3899  
0  4 12 84
 1 16 1370892 447008  33864 2893838000 027 2442 4252  
1  5 10 83
 1 11 1370892 452272  33864 289383800012 5 2106 3886  
0  4 12 83
 2  4 1370892 315880  33888 2894139600  1522  3084 2213 4120  
4  3 57 37
 0 10 1370892 240900  33628 2893406000  1060 17275 3396 4793  
3  3 55 40
 1  5 1370892 238172  33044 2890565200   148   267 3943 5284  
2  3 26 69
 2  2 1370916 232932  31960 286940240   12  1170  5625 3037 6336  
6  7 61 26
 1  2 1370912 232788  27588 28697216   100  1016  3848 2780 5669  
8  5 56 31
 1  4 1370908 2392224  27608 2814471200   936  8811 2514 5244  
8  6 61 25
 0  1 1370908 2265428  27612 2815318800  4360  1598 2822 4784 
13  3 69 15
 1  2 1370908 2041260  27612 2817678800 11842   474 3679 4255 
12  4 78  6
 0  3 1370908 2199880  27624 2827211200 47638   569 7798 5495 
11  4 70 14
 0  3 1370908 2000752  27624 2831869200 23492   275 5084 5161 
10  3 71 17
 1  0 1370908 1691000  27624 2836506000 22920   117 4961 5426 
12  5 69 15
 1  0 1370908 2123512  27624 2836757600  1244   145 2053 3728 
12  3 83  2
 2  0 1370908 1740724  27636 2840374800 18272   190 2920 4188 
12  4 76  8
 2  0 1370908 1305856  27636 2846017200 28174   493 3744 4750 
11  6 68 15
 1  2 1370908 973412  27644 2852964000 34614   305 3419 4522 
12  5 69 13
 2  2 1370904 1790820  27656 2865908020 64376   389 5527 5374 
12  7 69 12
 1  2 1370904 1384100  27656 2875033600 45740   351 4898 5381 
13  6 68 13
 1  0 1370904 954200  27656 2886425200 56544   413 4596 5470 
13  7 66 14
 1  0 1370904 1597264  27656 2886575600   926   391 2009 3502 
11  4 81  4
 3  2 1370904 1219180  27668 2886824400  1160   500 2180 3772 
11  5 80  4
 2  7 1370900 809128  27680 2886902000   298 21875 2417 3936 
11  5 49 35
 0  9 1370900 1693360  27680 2886903200 8 0 2756 4174  
8  5 28 59
 1  2 1370900 1531100  27688 2887110400  1034  7849 2646 4571 
10  3 72 15




iostat -x 2:

Linux 2.6.32-33-server (voyager)06.09.2011  _x86_64_
(8 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5,020,002,41   11,600,00   80,97

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   3,05 5,221,050,67   117,5445,72
95,37 0,013,94   0,75   0,13
sdb  10,02   148,15  157,91   93,49 10019,50  5098,93
60,14 4,53   18,04   2,30  57,75
dm-0  0,00 0,003,034,8724,2138,96 
8,00 0,45   56,83   0,06   0,05
dm-1  0,00 0,001,070,8793,32 6,77
51,59 0,012,71   0,42   0,08
dm-2  0,00 0,000,000,00 0,00 0,00 
8,00 0,006,30   6,30   0,00
dm-3  0,00 0,000,190,32 1,54 2,55 
8,00 0,03   63,61   2,72   0,14
dm-4  0,00 0,000,190,88 1,54 7,05 
8,00 0,04   33,91  12,84   1,38
dm-5  0,00 0,000,100,04 0,83 0,33 
8,00 0,00   16,22   2,63   0,04
dm-6  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,88   4,88   0,00
dm-7  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,37   4,37   0,00
dm-8  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,69   4,69   0,00
dm-9  0,00 0,000,000,00 0,00 0,00 
8,00 0,005,71   5,71   0,00
dm-10 0,00 0,000,000,00 0,00 0,00 
8,00 0,004,65   4,65   0,00
dm-11 0,00 0,000,000,00 0,00 0,00 
8,00 0,004,17   4,17   0,00
dm-12 0,00 0,000,111,34 0,9010,73 
8,00 0,12   76,31  12,61   1,83
dm-13 0,00 0,000,010,00 0,09 0,01 
8,00 0,00   18,70   1,26   0,00
dm-14 0,00 0,00   

Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt

hi,


What does a normal load mean? Does that mean a time when the queries are
slow?


yes, we are have slow queries (according to postgresql.log) with such load

Are you sure the machine really has 48GB of RAM? Because from the vmstat
output it seems like there's just 32GB.

procs ---memory-- ---swap-- -io -system--
cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa
   0  0 1342168 336936 107636 313535120014   183 1911 3426
2  1 93  4


1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB


strange.
we paid for 48G :-) and top and free show 48G
/root# free
 total   used   free sharedbuffers cached
Mem:  49564860   49310444 254416  0  30908   30329576
-/+ buffers/cache:   18949960   30614900
Swap: 209715121370960   19600552



Otherwise I don't see anything wrong in the output. What is the size of
the database (use pg_database_size to get it)? Did it grow significantly
recently?



there are a number of databases in the cluster on that machine,
in the filesystem it adds up to 271G

cheers gerhard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt



That's why I love dstat, just do this

$ dstat -C 0,1,2,3,4,5,6,7

and you know all you need.


dstat looks like a very nice tool, results below ..
(now the system load seems a bit lower then before when generating 
results for vmstat and iostat)

Good catch, thanks Scott.

Yes, good catch.

Still, this does not explain why the queries were running fast before, and
why the RAID array is so sluggish. Not to mention that we don't know what
were the conditions when collecting those numbers (were the VMs off or
running?).

the VMs were running. they are in something like production use, so i 
shouldn't just turn them off ..  :-)
and the processes in the VMs cause a big portion of the DB load, so 
turning them off would distort the results ...


and thanks again for all the replies!!! :-)


~# dstat -C 0,1,2,3,4,5,6,7
---cpu0-usage--cpu1-usage--cpu2-usage--cpu3-usage--cpu4-usage--cpu5-usage--cpu6-usage--cpu7-usage-- 
-dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq 
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq 
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv  
send|  in   out | int   csw
  7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0   
0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0   
0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0 0 
|9760B 9431B|2468  4126
  0   0  98   2   0   0:  0   0  98   2   0   0:  6   2  22  71   0   
0:  5   0  76  19   0   0:  0  12  82   6   0   0:  3   7  88   2   0   
0:  3   1  84  12   0   0:  2   0  94   4   0   0|5160k 1376k|  60k  
225k|   0 0 |2101  3879
 11   1  84   4   0   0:  2   0  93   6   0   0:  3   4  72  22   0   
0:  2   2  92   3   0   1: 10  13  22  54   0   1:  6   7  75  12   0   
0:  3   0  87  10   0   0: 12   0  81   7   0   0|6640k 1683k| 140k  
240k|   0 0 |2860  4617
  1   1  29  68   0   1: 12   0  80   8   0   0:  6   0  78  16   0   
1:  3   1  80  16   0   0: 14  14  57  16   0   0:  0  11  78  12   0   
0:  9   1  83   7   0   0:  0   0  96   4   0   0|4448k 1266k| 102k  
336k|   0 0 |2790  4645
  0   0  89  11   0   0:  1   0  98   1   0   0: 14   0  57  29   0   
0:  1   1  89   9   0   0:  1  15  41  43   0   0:  3  15  75   7   0   
0:  3   2  60  35   0   0:  0   0  95   5   0   0|  18M 1622k|  97k  
285k|   0 0 |3303  4764
  0   0  96   4   0   0:  0   0  99   0   0   1:  1   2  14  83   0   
0:  1  25  17  57   0   0:  1   0  87  12   0   0:  1   0  19  80   0   
0:  3   3   0  94   0   0:  0   0  48  52   0   0|1320k   19M|  40k  
113k|   0 0 |2909  4709
  1   0  63  36   0   0:  5   2  88   5   0   0: 34   2   0  63   1   
0:  8   8  72  12   0   0:  0   9  85   6   0   0:  1   2  84  13   0   
0:  2   1  60  37   0   0:  1   1  62  36   0   0|9160k 5597k|  52k  
143k|  32k0 |2659  4650
  4   0  43  53   0   0:  2   0  93   5   0   0:  9   0  63  28   0   
0:  3   1  89   7   0   0:  2   9  72  16   0   1:  0  13  81   6   0   
0:  9   1  52  38   0   0:  3   0  84  13   0   0|4980k 1358k| 106k  
239k|   0 0 |2993  5158
  2   1  90   7   0   0:  2   0  95   3   0   0:  2   3  82  13   0   
0:  0   0  87  13   0   0:  6  10  32  52   0   0:  2  10  82   6   0   
0:  5   0  86   9   0   0: 10   5  81   4   0   0|4376k 2949k| 119k  
295k|   0 0 |2729  4630
  1   0  93   6   0   0:  2   0  91   6   1   0: 15   4  71  11   0   
0:  7   2  90   1   0   0: 13  10  12  65   0   0:  2  13  41  45   0   
0:  1   0  97   2   0   0:  1   0  94   5   0   0|3896k   15M|  87k  
242k|   0 0 |2809  5514
  2   0  98   0   0   0:  0   0  73  27   0   0:  0   0 100   0   0   
0:  2   1  29  68   0   0:  4   5   0  92   0   0:  2   5  92   2   0   
0:  0   0 100   0   0   0:  1   0  77  22   0   0| 172k   19M|  40k  
127k|   0 0 |2221  4069
  0   0  48  52   0   0:  0   0  97   3   0   0:  0   0  92   8   0   
0:  3   0  91   6   0   0:  2  10  10  78   0   0:  4  10  81   6   0   
0:  2   0  29  69   0   0:  1   0  26  73   0   0| 652k 6931k|  66k  
233k|   0 0 |2416  4389
  6   2  72  21   0   0:  3   1  86  10   0   0:  7   0  60  34   0   
0:  2   2  91   6   0   0:  1  13  78   9   0   0:  2   8  84   6   0   
0:  2   0  79  19   0   0:  0   2  87  11   0   0|2784k 1456k|  96k  
206k|   0 0 |2854  5226
  9   4  50  37   0   0:  3   3  84  10   0   0:  4   0  84  12   0   
0:  2   3  86   9   0   0: 10   2  73  15   0   0:  3   5  84   8   0   
0:  8   4  81   6   0   0:  1   2  84  13   0   0|2952k 1374k| 133k  
305k|   0 0 |3249  5076
  9   1  78  13   0   0:  4   4  83   9   0   0:  3   1  68  28   0   
0:  3   3  82  12   0   0:  9   0  64  26   0   1:  2   1  83  13   0   
1:  9   3  63  24   0   1:  3   1  91   5   0   0|3648k 1420k| 188k  
444k|   0 0 |3560  5981
  3   1  63  33   0   0:  0   1  86  13   0   0:  1   0  

Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt


That's a good thought,  maybe the stats are old and you have bad 
plans?  It could also be major updates to the data too (as opposed to 
growth).


we have made checks for number of dead tuples etc recently, but looks 
ok. and as everything in the database seems to be very slow atm, I 
guess the problem is not caused by bad plans for specific tables/queries.


Gerhard, have you done an 'explain analyze' on any of your slow 
queries?   Have you done an analyze lately?




yes we added the 'auto_explain' module to log/analyze queries = 5000ms.
a sample result from the logs (there is lots of stuff in the logs, I 
selected this query because it is very simple):


2011-09-06 04:00:35 CEST ANWEISUNG:  INSERT into 
keywords.table_x_site_impact (content_id, site_impact_id, site_impact) 
VALUES (199083087, 1, 1.00)
2011-09-06 04:00:35 CEST LOG:  Dauer: 15159.723 ms  Anweisung: INSERT 
into keywords.table_x_site_impact (content_id, site_impact_id, 
site_impact) VALUES (199083087, 1 , 1.00)

2011-09-06 04:00:35 CEST LOG:  duration: 15159.161 ms  plan:
Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.017..0.019 rows=1 loops=1)
  Output: 
nextval('keywords.table_x_site_impact_internal_id_seq'::regclass), 
199083087::bigint, 1::smallint, 1::double precision





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Tomas Vondra
On 6 Září 2011, 10:55, Gerhard Wohlgenannt wrote:

 That's why I love dstat, just do this

 $ dstat -C 0,1,2,3,4,5,6,7

 and you know all you need.

 dstat looks like a very nice tool, results below ..
 (now the system load seems a bit lower then before when generating
 results for vmstat and iostat)
 Good catch, thanks Scott.
 Yes, good catch.

 Still, this does not explain why the queries were running fast before,
 and
 why the RAID array is so sluggish. Not to mention that we don't know
 what
 were the conditions when collecting those numbers (were the VMs off or
 running?).

 the VMs were running. they are in something like production use, so i
 shouldn't just turn them off ..  :-)
 and the processes in the VMs cause a big portion of the DB load, so
 turning them off would distort the results ...

Distort the results? If you want to measure the RAID performance, you have
to do that when there are no other processes using it.

 and thanks again for all the replies!!! :-)

Please, use something like pastebin.com to post there results. It was
bearable for the vmstat output but this is alamost unreadable due to the
wrapping.

 ~# dstat -C 0,1,2,3,4,5,6,7
 ---cpu0-usage--cpu1-usage--cpu2-usage--cpu3-usage--cpu4-usage--cpu5-usage--cpu6-usage--cpu7-usage--
 -dsk/total- -net/total- ---paging-- ---system--
 usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
 siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
 siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv
 send|  in   out | int   csw
7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0
 0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0
 0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0 0
 |9760B 9431B|2468  4126
...

But if I read that correctly, the wait for the cores is 17%, 7%, 12%, 9%,
16%, 8%, 16% and 8%, and the cores are mostly idle (idl is about 85%). So
it seems there's a low number of processes, switched between the cpus and
most of the time they're waiting for the I/O.

Given the low values for disk I/O and the iostat output we've seen before,
it's obvious there's a lot of random I/O (mostly writes).

Let's speculate for a while what could cause this (in arbitrary order):

1) Checkpoints. Something is doing a lot of writes, and with DB that often
means a checkpoint is in progress. I'm not sure about your
checkpoint_timeout, but you do have 192 segments and about 7GB of shared
buffers. That means there may be a lot of dirty buffers (even 100% of the
buffers).

You're using RAID5 and that really is not a write-friendly RAID version.
We don't know actual performance as the bonnie was run with VMs accessing
the volume, but RAID10 usually performs much better.

Enable log_checkpoints in the config and see what's going on. You can also
use iotop to see what processes are doing the writes (it might be a
background writer, ...).

2) The RAID is broken and can't handle the load it handled fine before.
This is not very likely, as you've mentioned that there were no warnings
etc.

3) There are some new jobs that do a lot of I/O. Is there anything new
that wasn't running before? I guess you'd mention that.

4) The database significantly grew in a short period of time, and the
active part now does not fit into the RAM (page cache), so the data has to
be retrieved from the disk. And it's not just about the database size,
it's about the active part of the database - if you're suddenly accessing
more data, the cache may not be large enough.

This is usually a gradual process (cache hit ratio slowly decreases as the
database grows), but if the database grew rapidly ... This could be a
caused by MVCC, i.e. there may be a lot of dead tuples - have you done a
big UPDATE / DELETE or something like that recently?

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Gerhard Wohlgenannt

Dear list,

we are encountering serious performance problems with our database. 
Queries which took around 100ms or less last week now take several seconds.


The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on 
hardware as follows:

8-core Intel Xeon CPU with 2.83GHz
48 GB RAM
RAID 5 with 8 SAS disks
PostgreSQL 8.4.8 (installed from the Ubuntu repository).

Additionally to the DB the machine also hosts a few virtual machines. In 
the past everything worked very well and the described problem occurs 
just out of the blue. We don't know of any postgresql config changes or 
anything else which might explain the performance reduction.
We have a number of DBs running in the cluster, and the problem seems to 
affect all of them.


We checked the performance of the RAID .. which is reasonable for eg. 
hdparm -tT. Memory is well used, but not swapping.
vmstat shows, that the machine isn't using the swap and the load 
shouldn't be also to high:

  root@host:~# vmstat
  procs ---memory-- ---swap-- -io -system-- 
cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us 
sy id wa
   0  0  0 308024 884812 4051293200   464   168  353   92  
4  2 84  9


Bonnie++ results given below, I am no expert at interpreting those :-)


Activating log_min_duration shows for instance this query --- there are 
now constantly queries which take absurdely long.


2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT 
keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'


db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE 
keyword=E'diplomaten';


QUERY PLAN
---
 Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29 
rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)

   Index Cond: ((keyword)::text = 'diplomaten'::text)
 Total runtime: 0.087 ms
(3 Zeilen)

db=# \d keywords.table_x
 Tabelle »keywords.table_x«
   Spalte   |Typ
|  Attribute

+---+--
 keyword_id | integer   | not null Vorgabewert 
nextval('keywords.table_x_keyword_id_seq'::regclass)

 keyword| character varying |
 so | double precision  |
Indexe:
table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER
idx_table_x_keyword btree (keyword)
Fremdschlüsselverweise von:
TABLE keywords.table_x_has CONSTRAINT 
table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES 
keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE





Could you be so kind and give us any advice how to track down the 
problem or comment on possible reasons???


Thank you very much in advance!!!

Regards,
 heinz + gerhard





name
|   current_setting

+-
 version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, 
compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
 archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh 
%p %f

 archive_mode   | on
 checkpoint_segments| 192
 effective_cache_size   | 25000MB
 external_pid_file  | /var/run/postgresql/8.4-main.pid
 full_page_writes   | on
 geqo   | on
 lc_collate | de_AT.UTF-8
 lc_ctype   | de_AT.UTF-8
 listen_addresses   | *
 log_line_prefix| %t
 log_min_duration_statement | 3s
 maintenance_work_mem   | 500MB
 max_connections| 250
 max_stack_depth| 2MB
 port   | 5432
 server_encoding| UTF8
 shared_buffers | 7000MB
 ssl| on
 TimeZone   | localtime
 unix_socket_directory  | /var/run/postgresql
 work_mem   | 256MB


Results of Bonnie++

Version  1.96   --Sequential Output-- --Sequential Input- 
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
voyager 95G  1400  93 27804   3 16324   2  2925  96 41636   3 
374.9   4
Latency  7576us  233s  164s   15647us   13120ms
3302ms
Version  1.96   --Sequential Create-- Random 
Create
voyager -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
  files  /sec %CP  

Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread pasman pasmański
Hi.
Autoexplain module allow to log plans and statistics of live queries. Try it.

2011/9/3, Gerhard Wohlgenannt wo...@ai.wu.ac.at:
 Dear list,

 we are encountering serious performance problems with our database.
 Queries which took around 100ms or less last week now take several seconds.

 The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
 hardware as follows:
 8-core Intel Xeon CPU with 2.83GHz
 48 GB RAM
 RAID 5 with 8 SAS disks
 PostgreSQL 8.4.8 (installed from the Ubuntu repository).

 Additionally to the DB the machine also hosts a few virtual machines. In
 the past everything worked very well and the described problem occurs
 just out of the blue. We don't know of any postgresql config changes or
 anything else which might explain the performance reduction.
 We have a number of DBs running in the cluster, and the problem seems to
 affect all of them.

 We checked the performance of the RAID .. which is reasonable for eg.
 hdparm -tT. Memory is well used, but not swapping.
 vmstat shows, that the machine isn't using the swap and the load
 shouldn't be also to high:
root@host:~# vmstat
procs ---memory-- ---swap-- -io -system--
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us
 sy id wa
 0  0  0 308024 884812 4051293200   464   168  353   92
 4  2 84  9

 Bonnie++ results given below, I am no expert at interpreting those :-)


 Activating log_min_duration shows for instance this query --- there are
 now constantly queries which take absurdely long.

 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
 keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'

 db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
 keyword=E'diplomaten';

 QUERY PLAN
 ---
   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
 Index Cond: ((keyword)::text = 'diplomaten'::text)
   Total runtime: 0.087 ms
 (3 Zeilen)

 db=# \d keywords.table_x
   Tabelle »keywords.table_x«
 Spalte   |Typ
 |  Attribute
 +---+--
   keyword_id | integer   | not null Vorgabewert
 nextval('keywords.table_x_keyword_id_seq'::regclass)
   keyword| character varying |
   so | double precision  |
 Indexe:
  table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER
  idx_table_x_keyword btree (keyword)
 Fremdschlüsselverweise von:
  TABLE keywords.table_x_has CONSTRAINT
 table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES
 keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE




 Could you be so kind and give us any advice how to track down the
 problem or comment on possible reasons???

 Thank you very much in advance!!!

 Regards,
   heinz + gerhard





  name
 |   current_setting
 +-
   version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
 compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
   archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh
 %p %f
   archive_mode   | on
   checkpoint_segments| 192
   effective_cache_size   | 25000MB
   external_pid_file  | /var/run/postgresql/8.4-main.pid
   full_page_writes   | on
   geqo   | on
   lc_collate | de_AT.UTF-8
   lc_ctype   | de_AT.UTF-8
   listen_addresses   | *
   log_line_prefix| %t
   log_min_duration_statement | 3s
   maintenance_work_mem   | 500MB
   max_connections| 250
   max_stack_depth| 2MB
   port   | 5432
   server_encoding| UTF8
   shared_buffers | 7000MB
   ssl| on
   TimeZone   | localtime
   unix_socket_directory  | /var/run/postgresql
   work_mem   | 256MB


 Results of Bonnie++

 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
 /sec %CP
 voyager 95G  1400  93 27804   3 16324   2  2925  96 41636   3
 374.9   4
 Latency  7576us  233s  164s   15647us   13120ms
 3302ms
 Version  1.96   --Sequential Create-- Random
 

Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Tomas Vondra
On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:
 Dear list,

 we are encountering serious performance problems with our database.
 Queries which took around 100ms or less last week now take several
 seconds.

 The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
 hardware as follows:
 8-core Intel Xeon CPU with 2.83GHz
 48 GB RAM
 RAID 5 with 8 SAS disks
 PostgreSQL 8.4.8 (installed from the Ubuntu repository).

 Additionally to the DB the machine also hosts a few virtual machines. In
 the past everything worked very well and the described problem occurs
 just out of the blue. We don't know of any postgresql config changes or
 anything else which might explain the performance reduction.
 We have a number of DBs running in the cluster, and the problem seems to
 affect all of them.

What are the virtual machines doing? Are you sure they are not doing a lot
of IO?


 We checked the performance of the RAID .. which is reasonable for eg.
 hdparm -tT. Memory is well used, but not swapping.
 vmstat shows, that the machine isn't using the swap and the load
 shouldn't be also to high:
root@host:~# vmstat
procs ---memory-- ---swap-- -io -system--
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us
 sy id wa
 0  0  0 308024 884812 4051293200   464   168  353   92
 4  2 84  9

 Bonnie++ results given below, I am no expert at interpreting those :-)


 Activating log_min_duration shows for instance this query --- there are
 now constantly queries which take absurdely long.

 2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
 keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'

 db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
 keyword=E'diplomaten';
   
QUERY
 PLAN
 ---
   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
 rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
 Index Cond: ((keyword)::text = 'diplomaten'::text)
   Total runtime: 0.087 ms
 (3 Zeilen)

 db=# \d keywords.table_x
   Tabelle »keywords.table_x«
 Spalte   |Typ
 |  Attribute
 +---+--
   keyword_id | integer   | not null Vorgabewert
 nextval('keywords.table_x_keyword_id_seq'::regclass)
   keyword| character varying |
   so | double precision  |
 Indexe:
  table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER
  idx_table_x_keyword btree (keyword)
 Fremdschlüsselverweise von:
  TABLE keywords.table_x_has CONSTRAINT
 table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES
 keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE

But in this explain analyze, the query finished in 41 ms. Use auto-explain
contrib module to see the explain plan of the slow execution.

 Could you be so kind and give us any advice how to track down the
 problem or comment on possible reasons???

One of the things


 Thank you very much in advance!!!

 Regards,
   heinz + gerhard





  name
 |   current_setting
 +-
   version| PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
 compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
   archive_command| /usr/local/sbin/weblyzard-wal-archiver.sh
 %p %f
   archive_mode   | on
   checkpoint_segments| 192
   effective_cache_size   | 25000MB
   external_pid_file  | /var/run/postgresql/8.4-main.pid
   full_page_writes   | on
   geqo   | on
   lc_collate | de_AT.UTF-8
   lc_ctype   | de_AT.UTF-8
   listen_addresses   | *
   log_line_prefix| %t
   log_min_duration_statement | 3s
   maintenance_work_mem   | 500MB
   max_connections| 250
   max_stack_depth| 2MB
   port   | 5432
   server_encoding| UTF8
   shared_buffers | 7000MB
   ssl| on
   TimeZone   | localtime
   unix_socket_directory  | /var/run/postgresql
   work_mem   | 256MB


 Results of Bonnie++

 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP

Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Andy Colson

On 09/05/2011 02:48 AM, Tomas Vondra wrote:

On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:

Dear list,

we are encountering serious performance problems with our database.
Queries which took around 100ms or less last week now take several
seconds.



Results of Bonnie++

Version  1.96   --Sequential Output-- --Sequential Input-
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
voyager 95G  1400  93 27804   3 16324   2  2925  96 41636   3
374.9   4
Latency  7576us  233s  164s   15647us   13120ms
3302ms
Version  1.96   --Sequential Create-- Random
Create
voyager -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
   16   141   0 + +++   146   0   157   0 + +++
172   0
Latency  1020ms 128us9148ms 598ms  37us
485ms



That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
slow with 8 drives.

Tomas




Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  Did you run 
Bonnie while the VM's were up and running?

  root@host:~# vmstat
  procs ---memory-- ---swap-- -io -system-- cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
   0  0  0 308024 884812 4051293200   464   168  353   92  4  2 84  
9


Only one line?  That does not help much.  Can you run it as 'vmstat 2' and let 
it run while a few slow queries are performed?  Then paste all the lines?


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Tomas Vondra
On 5 Září 2011, 15:51, Andy Colson wrote:
 On 09/05/2011 02:48 AM, Tomas Vondra wrote:
 That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
 slow with 8 drives.

 Tomas



 Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  Did
 you run Bonnie while the VM's were up and running?

root@host:~# vmstat
procs ---memory-- ---swap-- -io -system--
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
 id wa
 0  0  0 308024 884812 4051293200   464   168  353   92  4
 2 84  9


 Only one line?  That does not help much.  Can you run it as 'vmstat 2' and
 let it run while a few slow queries are performed?  Then paste all the
 lines?

And maybe a few lines from iostat -x 2 too.

BTW what kind of raid is it? Is it hw or sw based? Have you checked health
of the drives?

Are you sure there's nothing else using the drives (e.g. one of the VMs,
rebuild of the array or something like that)?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Tomas Vondra
On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote:
 Below please find the results of vmstat 2 over some periode of time ..
 with normal database / system load.

What does a normal load mean? Does that mean a time when the queries are
slow?

Are you sure the machine really has 48GB of RAM? Because from the vmstat
output it seems like there's just 32GB.

procs ---memory-- ---swap-- -io -system--
cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa
  0  0 1342168 336936 107636 313535120014   183 1911 3426
2  1 93  4


1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB

BTW there's 1.3GB of swap, although it's not used heavily (according to
the vmstat output).

Otherwise I don't see anything wrong in the output. What is the size of
the database (use pg_database_size to get it)? Did it grow significantly
recently?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Andy Colson

On 09/05/2011 09:39 AM, Tomas Vondra wrote:

On 5 Září 2011, 16:08, Gerhard Wohlgenannt wrote:

Below please find the results of vmstat 2 over some periode of time ..
with normal database / system load.


What does a normal load mean? Does that mean a time when the queries are
slow?

Are you sure the machine really has 48GB of RAM? Because from the vmstat
output it seems like there's just 32GB.

procs ---memory-- ---swap-- -io -system--
cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa
   0  0 1342168 336936 107636 313535120014   183 1911 3426
2  1 93  4


1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB

BTW there's 1.3GB of swap, although it's not used heavily (according to
the vmstat output).

Otherwise I don't see anything wrong in the output. What is the size of
the database (use pg_database_size to get it)? Did it grow significantly
recently?

Tomas



Yeah, its interesting that it swapped in memory, but never out.  Looking at this vmstat, 
it does not look like a hardware problem.(Assuming normal load means slow 
queries)


Did it grow significantly recently?


That's a good thought,  maybe the stats are old and you have bad plans?  It 
could also be major updates to the data too (as opposed to growth).

Gerhard, have you done an 'explain analyze' on any of your slow queries?   Have 
you done an analyze lately?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Gerhard Wohlgenannt

On 09/05/2011 03:51 PM, Andy Colson wrote:

On 09/05/2011 02:48 AM, Tomas Vondra wrote:

On 3 Září 2011, 9:26, Gerhard Wohlgenannt wrote:

Dear list,

we are encountering serious performance problems with our database.
Queries which took around 100ms or less last week now take several
seconds.



Results of Bonnie++

Version  1.96   --Sequential Output-- --Sequential Input-
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
voyager 95G  1400  93 27804   3 16324   2  2925  96 41636   3
374.9   4
Latency  7576us  233s  164s   15647us   13120ms
3302ms
Version  1.96   --Sequential Create-- Random
Create
voyager -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
   16   141   0 + +++   146   0   157   0 + +++
172   0
Latency  1020ms 128us9148ms 598ms  37us
485ms



That seems a bit slow ... 27MB/s for writes and 41MB/s forreads is ait
slow with 8 drives.

Tomas




Agreed, that's really slow.  A single SATA drive will get 60 MB/s.  
Did you run Bonnie while the VM's were up and running?


  root@host:~# vmstat
  procs ---memory-- ---swap-- -io -system-- 
cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us 
sy id wa
   0  0  0 308024 884812 4051293200   464   168  353   92  
4  2 84  9



Only one line?  That does not help much.  Can you run it as 'vmstat 2' 
and let it run while a few slow queries are performed?  Then paste all 
the lines?



-Andy


Hi Andy,

thanks a lot for your help.

Below please find the results of vmstat 2 over some periode of time .. 
with normal database / system load.


 0  0 1344332 237196 104140 3146841200   330   102 4322 7130  
4  2 90  4
 1  1 1344332 236708 104144 3146900000   322   105 2096 3723  
1  2 92  5
 2  1 1344204 240924 104156 31462484  3500  1906   234 3687 4512 
12  3 77  9
 0  0 1344200 238372 104168 3146245200 8   109 4050 8376  
8  3 86  3
 0  0 1344200 232668 104168 314624680012   158 2036 3633  
2  2 92  3
 0  3 1344196 282784 104180 3141338440  1768   343 2490 4391  
1  2 84 13
 1  1 1344196 278188 104192 3141608000  1392   341 2215 3850  
1  2 82 15
 0  0 1344120 276964 104608 31416904   900   634   304 2390 3949  
4  2 86  8
 1  1 1344120 277096 104628 3141775200   492   378 2394 3866  
2  1 87 10
 0  1 1344120 274476 104628 3141862000   260   233 1997 3255  
2  1 91  6
 1  1 1344120 276584 104628 3141880800   128   208 2015 3266  
2  1 91  6
 0  0 1343672 272352 106288 31418788  6940  1346   344 2170 3660  
3  1 89  6
 0  1 1343632 270220 107648 31419152   480   468   490 2356 3622  
4  2 88  5
 0  0 1343624 270708 107660 31419344   200   228   138 2086 3518  
2  3 91  4
 0  1 1343612 268732 107660 31419584   120   168   112 2100 3585  
3  2 91  3
 0  0 1343544 266616 107660 31420112   140   15473 2059 3719  
3  2 93  3
 0  1 1343540 267368 107684 314201680078   260 2256 3970  
3  2 90  6
 0  1 1343540 268352 107692 314203560094   284 2239 4086  
2  2 89  6
 0  0 1343540 274064 107692 3142358400  1622   301 2322 4258  
2  3 83 13
 0  2 1343440 273064 107704 31423696   960   106   180 2158 3795  
3  2 90  5
 0  0 1342184 262888 107708 31426040  8400  2014   146 2309 3713  
5  3 83  9
 0  0 1342184 261904 107732 314261280060   158 1893 3510  
1  3 91  5
 2  0 1342184 258680 107732 3142743600   794   114 2160 3647  
2  3 90  5
 0  2 1342176 258184 107744 31428308   240   310   116 1943 3335  
2  2 91  4
 1  0 1342172 259068 107756 3142870020   138   143 1976 3468  
1  1 93  5
 0  0 1342172 258084 107756 3142994800   62088 2117 3565  
3  1 90  6
 0  0 1342172 258456 107952 314300280062   305 2174 3827  
1  2 91  6
 1  0 1342172 257480 107952 3143063600   300   256 2316 3959  
3  2 86  8
 0  0 1342172 257720 107952 314307720046   133 2411 4047  
3  2 91  3
 1  2 1342172 257844 107976 3143077600   136   184 2111 3841  
1  1 92  6
 1  2 1342172 338376 107576 3134941200   462  8615 2655 5508  
5  3 79 13
 1  2 1342172 340772 107580 3135108000   682   377 2503 4022  
2  1 87 10
 1  2 1342172 335688 107596 3135199200   548   306 2480 3867  
4  1 86  9
 0  2 1342168 337432 107608 3135270400   224   188 1919 3158  
1  1 93  6
 0  0 1342168 337804 107608 3135302000   154   249 1933 3175  
1  1 92  6
 0  1 1342168 335944 107636 3135346400   212   173 1912 3280  
4  2 89  5
procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   

Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Gerhard Wohlgenannt

hi,

thanks a lot for your help!


Dear list,

we are encountering serious performance problems with our database.
Queries which took around 100ms or less last week now take several
seconds.

The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
hardware as follows:
8-core Intel Xeon CPU with 2.83GHz
48 GB RAM
RAID 5 with 8 SAS disks
PostgreSQL 8.4.8 (installed from the Ubuntu repository).

Additionally to the DB the machine also hosts a few virtual machines. In
the past everything worked very well and the described problem occurs
just out of the blue. We don't know of any postgresql config changes or
anything else which might explain the performance reduction.
We have a number of DBs running in the cluster, and the problem seems to
affect all of them.

What are the virtual machines doing? Are you sure they are not doing a lot
of IO?


we also have a ssd-disk in the machine, and the virtual machines do most 
of their IO on that. But there sure also is some amount of IO onto the 
systems raid array. maybe we should consider having a dedicated database 
server.



We checked the performance of the RAID .. which is reasonable for eg.
hdparm -tT. Memory is well used, but not swapping.
vmstat shows, that the machine isn't using the swap and the load
shouldn't be also to high:
root@host:~# vmstat
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us
sy id wa
 0  0  0 308024 884812 4051293200   464   168  353   92
4  2 84  9

Bonnie++ results given below, I am no expert at interpreting those :-)


Activating log_min_duration shows for instance this query --- there are
now constantly queries which take absurdely long.

2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'

db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
keyword=E'diplomaten';

  QUERY
PLAN
---
   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
 Index Cond: ((keyword)::text = 'diplomaten'::text)
   Total runtime: 0.087 ms
(3 Zeilen)

db=# \d keywords.table_x
   Tabelle »keywords.table_x«
 Spalte   |Typ
|  Attribute
+---+--
   keyword_id | integer   | not null Vorgabewert
nextval('keywords.table_x_keyword_id_seq'::regclass)
   keyword| character varying |
   so | double precision  |
Indexe:
  table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER
  idx_table_x_keyword btree (keyword)
Fremdschlüsselverweise von:
  TABLE keywords.table_x_has CONSTRAINT
table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES
keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE

But in this explain analyze, the query finished in 41 ms. Use auto-explain
contrib module to see the explain plan of the slow execution.


thanks. we will use auto_explain as soon as some long running updates 
are finished (don't won't to kill them)


cheers gerhard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Gerhard Wohlgenannt

hi,

thanks a lot for your help!


Dear list,

we are encountering serious performance problems with our database.
Queries which took around 100ms or less last week now take several
seconds.

The database runs on Ubuntu Server 10.4.3 (kernel: 2.6.32-33) on
hardware as follows:
8-core Intel Xeon CPU with 2.83GHz
48 GB RAM
RAID 5 with 8 SAS disks
PostgreSQL 8.4.8 (installed from the Ubuntu repository).

Additionally to the DB the machine also hosts a few virtual machines. In
the past everything worked very well and the described problem occurs
just out of the blue. We don't know of any postgresql config changes or
anything else which might explain the performance reduction.
We have a number of DBs running in the cluster, and the problem seems to
affect all of them.

What are the virtual machines doing? Are you sure they are not doing a lot
of IO?


we also have a ssd-disk in the machine, and the virtual machines do most 
of their IO on that. But there sure also is some amount of I/O onto the 
systems raid array coming from the virtual machines. maybe we should 
consider having a dedicated database server.



We checked the performance of the RAID .. which is reasonable for eg.
hdparm -tT. Memory is well used, but not swapping.
vmstat shows, that the machine isn't using the swap and the load
shouldn't be also to high:
root@host:~# vmstat
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us
sy id wa
 0  0  0 308024 884812 4051293200   464   168  353   92
4  2 84  9

Bonnie++ results given below, I am no expert at interpreting those :-)


Activating log_min_duration shows for instance this query --- there are
now constantly queries which take absurdely long.

2011-09-02 22:38:18 CEST LOG:  Dauer: 25520.374 ms  Anweisung: SELECT
keyword_id FROM keywords.table_x WHERE keyword=E'diplomaten'

db=# explain analyze SELECT keyword_id FROM keywords.table_x WHERE
keyword=E'diplomaten';

  QUERY
PLAN
---
   Index Scan using idx_table_x_keyword on table_x  (cost=0.00..8.29
rows=1 width=4) (actual time=0.039..0.041 rows=1 loops=1)
 Index Cond: ((keyword)::text = 'diplomaten'::text)
   Total runtime: 0.087 ms
(3 Zeilen)

db=# \d keywords.table_x
   Tabelle »keywords.table_x«
 Spalte   |Typ
|  Attribute
+---+--
   keyword_id | integer   | not null Vorgabewert
nextval('keywords.table_x_keyword_id_seq'::regclass)
   keyword| character varying |
   so | double precision  |
Indexe:
  table_x_pkey PRIMARY KEY, btree (keyword_id) CLUSTER
  idx_table_x_keyword btree (keyword)
Fremdschlüsselverweise von:
  TABLE keywords.table_x_has CONSTRAINT
table_x_has_keyword_id_fkey FOREIGN KEY (keyword_id) REFERENCES
keywords.table_x(keyword_id) ON UPDATE CASCADE ON DELETE CASCADE

But in this explain analyze, the query finished in 41 ms. Use auto-explain
contrib module to see the explain plan of the slow execution.


thanks. we will use auto_explain as soon as some long running updates 
are finished (don't want to kill them)


cheers gerhard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Scott Marlowe
On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenannt wo...@ai.wu.ac.at wrote:
 Below please find the results of vmstat 2 over some periode of time .. with
 normal database / system load.

  0  0 1344332 237196 104140 31468412    0    0   330   102 4322 7130  4  2
 90  4
  1  1 1344332 236708 104144 31469000    0    0   322   105 2096 3723  1  2
 92  5
  2  1 1344204 240924 104156 31462484  350    0  1906   234 3687 4512 12  3
 77  9
  0  0 1344200 238372 104168 31462452    0    0     8   109 4050 8376  8  3
 86  3
  0  0 1344200 232668 104168 31462468    0    0    12   158 2036 3633  2  2
 92  3
  0  3 1344196 282784 104180 31413384    4    0  1768   343 2490 4391  1  2
 84 13
  1  1 1344196 278188 104192 31416080    0    0  1392   341 2215 3850  1  2
 82 15
  0  0 1344120 276964 104608 31416904   90    0   634   304 2390 3949  4  2
 86  8
  1  1 1344120 277096 104628 31417752    0    0   492   378 2394 3866  2  1
 87 10
  0  1 1344120 274476 104628 31418620    0    0   260   233 1997 3255  2  1
 91  6
  1  1 1344120 276584 104628 31418808    0    0   128   208 2015 3266  2  1
 91  6
  0  0 1343672 272352 106288 31418788  694    0  1346   344 2170 3660  3  1
 89  6
  0  1 1343632 270220 107648 31419152   48    0   468   490 2356 3622  4  2
 88  5
  0  0 1343624 270708 107660 31419344   20    0   228   138 2086 3518  2  3
 91  4
  0  1 1343612 268732 107660 31419584   12    0   168   112 2100 3585  3  2
 91  3
  0  0 1343544 266616 107660 31420112   14    0   154    73 2059 3719  3  2
 93  3
  0  1 1343540 267368 107684 31420168    0    0    78   260 2256 3970  3  2
 90  6
  0  1 1343540 268352 107692 31420356    0    0    94   284 2239 4086  2  2
 89  6
  0  0 1343540 274064 107692 31423584    0    0  1622   301 2322 4258  2  3
 83 13
  0  2 1343440 273064 107704 31423696   96    0   106   180 2158 3795  3  2
 90  5
  0  0 1342184 262888 107708 31426040  840    0  2014   146 2309 3713  5  3
 83  9
  0  0 1342184 261904 107732 31426128    0    0    60   158 1893 3510  1  3
 91  5
  2  0 1342184 258680 107732 31427436    0    0   794   114 2160 3647  2  3
 90  5
  0  2 1342176 258184 107744 31428308   24    0   310   116 1943 3335  2  2
 91  4
  1  0 1342172 259068 107756 31428700    2    0   138   143 1976 3468  1  1
 93  5
  0  0 1342172 258084 107756 31429948    0    0   620    88 2117 3565  3  1
 90  6
  0  0 1342172 258456 107952 31430028    0    0    62   305 2174 3827  1  2
 91  6
  1  0 1342172 257480 107952 31430636    0    0   300   256 2316 3959  3  2
 86  8
  0  0 1342172 257720 107952 31430772    0    0    46   133 2411 4047  3  2
 91  3
  1  2 1342172 257844 107976 31430776    0    0   136   184 2111 3841  1  1
 92  6
  1  2 1342172 338376 107576 31349412    0    0   462  8615 2655 5508  5  3
 79 13
  1  2 1342172 340772 107580 31351080    0    0   682   377 2503 4022  2  1
 87 10
  1  2 1342172 335688 107596 31351992    0    0   548   306 2480 3867  4  1
 86  9
  0  2 1342168 337432 107608 31352704    0    0   224   188 1919 3158  1  1
 93  6
  0  0 1342168 337804 107608 31353020    0    0   154   249 1933 3175  1  1
 92  6
  0  1 1342168 335944 107636 31353464    0    0   212   173 1912 3280  4  2
 89  5
 procs ---memory-- ---swap-- -io -system--
 cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
 wa
  0  0 1342168 336936 107636 31353512    0    0    14   183 1911 3426  2  1
 93  4
  0  1 1342168 334440 107656 31353736    0    0   264   372 2119 3400  6  2
 84  8
  0  0 1342164 334084 107680 31354468    0    0   302   413 2361 3613  2  1
 87 10
  2  0 1342160 342764 107680 31354916    8    0   184   332 2142 3117  1  1
 90  7
  0  1 1342160 343788 107680 31355808    0    0   360   211 2247 3249  1  2
 91  5
  2  1 1342156 340804 107704 31355904    0    0    88   280 2287 3448  2  2
 90  6
  0  1 1342156 344276 107704 31356464    0    0   316   276 2050 3298  1  2
 90  7
  0  0 1342156 344160 107712 31356576    0    0     4   225 1884 3194  1  3
 90  6
  0  0 1342152 342548 107724 31356688    0    0    52   231 1963 3232  1  3
 89  6
  2  1 1342152 343664 107724 31356764    0    0   104   348 2643 3614  3  2
 88  8
  1  1 1342144 341060 107760 31357080   16    0   120   307 2511 3474  4  3
 87  7
  1  0 1342140 342332 107780 31357500    8    0   206   193 2243 3448  4  2
 89  5
  1  0 1342136 339472 107780 31357508    0    0    32   142 4290 3799  6  3
 87  4
  0  0 1342136 341160 107780 31357992    0    0   216   171 2613 3995  4  2
 88  5
  0  0 1342136 342168 107820 31357988    0    0    26   140 2347 3753  3  4
 89  4
  0  0 1342136 342532 107820 31358128    0    0    36   155 2119 3653  2  1
 91  5
  2  0 1342136 341564 107828 31358144    0    0     0   151 1973 3486  4  2
 90  4
  1  1 1342136 342076 107852 31358416    0    0   148   284 2251 3857  6  2
 84  8
  0  1 1342136 339944 107852 31359284    0    0   482   478 2902 5210  4  2
 84 10
  0  1 1342136 342184 107852 31359836    0    0   238   372 2292 4063  2  1
 88  9


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Andy Colson

On 09/05/2011 01:45 PM, Scott Marlowe wrote:

On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenanntwo...@ai.wu.ac.at  wrote:

Below please find the results of vmstat 2 over some periode of time .. with
normal database / system load.


2  1 1344204 240924 104156 31462484  3500  1906   234 3687 4512 12  3 77  9


Your IO Wait is actually pretty high.  On an 8 core machine, 12.5%
means one core is doing nothing but waiting for IO.



My server is 2-core, so these numbers looked fine by me.  I need to remember 
core count when I look at these.

So the line above, for 2 core's would not worry me a bit, but on 8 cores, it 
pretty much means one core was pegged (with 9% wait?  Or is it one core was 
pegged, and another was 72% io wait?)

I have always loved the vmstat output, but its starting to get confusing when 
you have to take core's into account.  (And my math was never strong in the 
first place :-) )

Good catch, thanks Scott.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-05 Thread Tomas Vondra
On 5 Září 2011, 21:07, Andy Colson wrote:
 On 09/05/2011 01:45 PM, Scott Marlowe wrote:
 On Mon, Sep 5, 2011 at 8:08 AM, Gerhard Wohlgenanntwo...@ai.wu.ac.at
 wrote:
 Below please find the results of vmstat 2 over some periode of time ..
 with
 normal database / system load.

 2  1 1344204 240924 104156 31462484  3500  1906   234 3687 4512 12  3
 77  9

 Your IO Wait is actually pretty high.  On an 8 core machine, 12.5%
 means one core is doing nothing but waiting for IO.


 My server is 2-core, so these numbers looked fine by me.  I need to
 remember core count when I look at these.

 So the line above, for 2 core's would not worry me a bit, but on 8 cores,
 it pretty much means one core was pegged (with 9% wait?  Or is it one core
 was pegged, and another was 72% io wait?)

AFAIK it's as if one core was 72% io wait. Anyway that's exactly why I was
asking for iostat -x because the util% gives a better idea of what's
going on.

 I have always loved the vmstat output, but its starting to get confusing
 when you have to take core's into account.  (And my math was never strong
 in the first place :-) )

That's why I love dstat, just do this

$ dstat -C 0,1,2,3,4,5,6,7

and you know all you need.

 Good catch, thanks Scott.

Yes, good catch.

Still, this does not explain why the queries were running fast before, and
why the RAID array is so sluggish. Not to mention that we don't know what
were the conditions when collecting those numbers (were the VMs off or
running?).

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance