Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Tomas Vondra
On 6 Září 2011, 10:26, Gerhard Wohlgenannt wrote:
 Thanks a lot to everybody for their helpful hints!!!

 I am running all these benchmarks while the VMs are up .. with the
 system under something like typical loads ..

 The RAID is hardware based. On of my colleagues will check if there is
 any hardware problem on the RAID (the disks) today, but nothing no
 errors have been reported.

 please find below the results of
 iostat -x 2
 vmstat 2

 hmm, looks like we definitely do have a problem with I/O load?!
 btw: dm-19 is the logical volume where the /var (postgresql) is on ..

Well, it definitely looks like that. Something is doing a lot of writes on
that drive - the drive is 100% utilized, i.e. it's a bottleneck. You need
to find out what is writing the data - try iotop or something like that.

And it's probably the reason why the bonnie results were so poor.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Fwd: [PERFORM] Summaries on SSD usage?

2011-09-06 Thread Stefan Keller
Shaun,

2011/9/2 Shaun Thomas stho...@peak6.com:
 Ironically, this is actually the topic of my presentation at Postgres Open.

Do you think my problem would now be solved with NVRAM PCI card?

Stefan

-- Forwarded message --
From: Stefan Keller sfkel...@gmail.com
Date: 2011/9/3
Subject: Re: [PERFORM] Summaries on SSD usage?
To: Jesper Krogh jes...@krogh.cc
Cc: pgsql-performance@postgresql.org


2011/9/3 Jesper Krogh jes...@krogh.cc:
 On 2011-09-03 00:04, Stefan Keller wrote:
 It's not that hard to figure out.. take some of your typical queries.
 say the one above..  Change the search-term to something you'd expect
 the user to enter in a minute, but hasn't been run. (could be museum
 instead
 of zoo.. then you run it with \timing  and twice.. if the two queries are
 close to each other in timing, then you only hit memory anyway and
 neither SSD, NVRAM or more RAM will buy you anything. Faster memory
 and faster CPU-cores will..  if you have a significant speedup to the
 second run, then more RAM, NVRAM, SSD is a good fix.

 Typically I have slow-query-logging turned on, permanently set to around
 250ms.
 If I find queries in the log that i didnt expect to take above 250ms then
 I'd start to investigate if query-plans are correct .. and so on..

 The above numbers are raw-data size and now how PG uses them.. or?
 And you havent told anything about the size of your current system.

Its definitely the case that the second query run is much faster
(first ones go up to 30 seconds and more...).

PG uses the raw data for Switzerlad like this: 10 GB total disk space
based on 2 GB raw XML input. Table osm_point is one of the four big
tables and uses 984 MB for table and 1321 MB for indexes (where hstore
is the biggest from id, name and geometry).

Stefan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Fwd: [PERFORM] Summaries on SSD usage?

2011-09-06 Thread Shaun Thomas

On 09/06/2011 08:45 AM, Stefan Keller wrote:


Do you think my problem would now be solved with NVRAM PCI card?


That's a tough call. Part of the reason I'm doing the presentation is 
because there are a lot of other high OLTP databases out there which 
have (or will) reached critical mass where cache can't fulfill generic 
database requests anymore.


As an example, we were around 11k database transactions per second on 
250GB of data with 32GB of RAM. The first thing we tried was bumping it 
up to 64GB, and that kinda worked. But what you'll find, is that an 
autovacuum, or a nightly vacuum, will occasionally hit a large table and 
flush all of that handy cached data down the tubes, and then your 
database starts choking trying to keep up with the requests.


Even a large, well equipped RAID can only really offer 2500-ish TPS 
before you start getting into the larger and more expensive SANs, so you 
either have to pre-load your memory with dd or pgfincore, or if your 
random access patterns actually exceed your RAM, you need a bigger disk 
pool or tiered storage. And by tiered storage, I mean tablespaces, with 
critical high-TPS tables located on a PCIe card or a pool of modern 
(capacitor-backed, firmware GC) SSDs.


Your case looks more like you have just a couple big-ass queries/tables 
that occasionally give you trouble. If optimizing the queries, index 
tweaks, and other sundry tools can't help anymore, you may have to start 
dragging ou the bigger guns. But if you can afford it, having some NVRam 
storage around as a top-tier tablespace for critical-need data is 
probably good practice these days.


They're expensive, though. Even the cheap ones start around $5k. Just 
remember you're paying for the performance in this case, and not storage 
capacity. Some vendors have demo hardware they'll let you use to 
determine if it applies to your case, so you might want to contact 
FusionIO, RAMSAN, Virident, or maybe OCZ.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw

/

OS and Postgres on same mount point

On 6 Sep 2011, at 00:31, Scott Marlowe wrote:

 On Mon, Sep 5, 2011 at 4:36 PM, Richard Shaw rich...@aggress.net wrote:
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
 avgqu-sz   await  svctm  %util
 sda   1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
 2.784.41   1.56  98.35
 sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
 0.000.00   0.00   0.00
 sda2  1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
 2.784.41   1.56  98.35
 
 So what is /dev/sda2 mounted as?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Andres Freund
On Monday 05 Sep 2011 22:23:32 Scott Marlowe wrote:
 On Mon, Sep 5, 2011 at 11:24 AM, Andres Freund and...@anarazel.de wrote:
  On Monday, September 05, 2011 14:57:43 Richard Shaw wrote:
  Autovacuum has been disabled and set to run manually via cron during a
  quiet period and fsync has recently been turned off to gauge any real
  world performance increase, there is battery backup on the raid card
  providing some level of resilience.
  
  That doesn't help you against a failure due to fsync() off as the BBU can
  only protect data that actually has been written to disk. Without
  fsync=on no guarantee about that exists.
 
 Further, if you've got a bbu cache on the RAID card the gains from
 fsync=off wll be low / nonexistent.
Thats not necessarily true. If you have a mixed load of many small writes and 
some parallel huge writes (especially in combination with big indexes) 
fsync=off still can give you quite big performance increases. Even in the 
presenence of synchronous_commit=off.

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt

Thanks a lot to everybody for their helpful hints!!!

I am running all these benchmarks while the VMs are up .. with the 
system under something like typical loads ..


The RAID is hardware based. On of my colleagues will check if there is 
any hardware problem on the RAID (the disks) today, but nothing no 
errors have been reported.


please find below the results of
iostat -x 2
vmstat 2

hmm, looks like we definitely do have a problem with I/O load?!
btw: dm-19 is the logical volume where the /var (postgresql) is on ..

cheers gerhard

procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa
 1 16 1370892 434996  33840 2893834811   615   31294  
5  2 81 12
 0 15 1370892 440832  33840 2893838000 4   136 2086 3899  
0  4 12 84
 1 16 1370892 447008  33864 2893838000 027 2442 4252  
1  5 10 83
 1 11 1370892 452272  33864 289383800012 5 2106 3886  
0  4 12 83
 2  4 1370892 315880  33888 2894139600  1522  3084 2213 4120  
4  3 57 37
 0 10 1370892 240900  33628 2893406000  1060 17275 3396 4793  
3  3 55 40
 1  5 1370892 238172  33044 2890565200   148   267 3943 5284  
2  3 26 69
 2  2 1370916 232932  31960 286940240   12  1170  5625 3037 6336  
6  7 61 26
 1  2 1370912 232788  27588 28697216   100  1016  3848 2780 5669  
8  5 56 31
 1  4 1370908 2392224  27608 2814471200   936  8811 2514 5244  
8  6 61 25
 0  1 1370908 2265428  27612 2815318800  4360  1598 2822 4784 
13  3 69 15
 1  2 1370908 2041260  27612 2817678800 11842   474 3679 4255 
12  4 78  6
 0  3 1370908 2199880  27624 2827211200 47638   569 7798 5495 
11  4 70 14
 0  3 1370908 2000752  27624 2831869200 23492   275 5084 5161 
10  3 71 17
 1  0 1370908 1691000  27624 2836506000 22920   117 4961 5426 
12  5 69 15
 1  0 1370908 2123512  27624 2836757600  1244   145 2053 3728 
12  3 83  2
 2  0 1370908 1740724  27636 2840374800 18272   190 2920 4188 
12  4 76  8
 2  0 1370908 1305856  27636 2846017200 28174   493 3744 4750 
11  6 68 15
 1  2 1370908 973412  27644 2852964000 34614   305 3419 4522 
12  5 69 13
 2  2 1370904 1790820  27656 2865908020 64376   389 5527 5374 
12  7 69 12
 1  2 1370904 1384100  27656 2875033600 45740   351 4898 5381 
13  6 68 13
 1  0 1370904 954200  27656 2886425200 56544   413 4596 5470 
13  7 66 14
 1  0 1370904 1597264  27656 2886575600   926   391 2009 3502 
11  4 81  4
 3  2 1370904 1219180  27668 2886824400  1160   500 2180 3772 
11  5 80  4
 2  7 1370900 809128  27680 2886902000   298 21875 2417 3936 
11  5 49 35
 0  9 1370900 1693360  27680 2886903200 8 0 2756 4174  
8  5 28 59
 1  2 1370900 1531100  27688 2887110400  1034  7849 2646 4571 
10  3 72 15




iostat -x 2:

Linux 2.6.32-33-server (voyager)06.09.2011  _x86_64_
(8 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5,020,002,41   11,600,00   80,97

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   3,05 5,221,050,67   117,5445,72
95,37 0,013,94   0,75   0,13
sdb  10,02   148,15  157,91   93,49 10019,50  5098,93
60,14 4,53   18,04   2,30  57,75
dm-0  0,00 0,003,034,8724,2138,96 
8,00 0,45   56,83   0,06   0,05
dm-1  0,00 0,001,070,8793,32 6,77
51,59 0,012,71   0,42   0,08
dm-2  0,00 0,000,000,00 0,00 0,00 
8,00 0,006,30   6,30   0,00
dm-3  0,00 0,000,190,32 1,54 2,55 
8,00 0,03   63,61   2,72   0,14
dm-4  0,00 0,000,190,88 1,54 7,05 
8,00 0,04   33,91  12,84   1,38
dm-5  0,00 0,000,100,04 0,83 0,33 
8,00 0,00   16,22   2,63   0,04
dm-6  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,88   4,88   0,00
dm-7  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,37   4,37   0,00
dm-8  0,00 0,000,000,00 0,00 0,00 
8,00 0,004,69   4,69   0,00
dm-9  0,00 0,000,000,00 0,00 0,00 
8,00 0,005,71   5,71   0,00
dm-10 0,00 0,000,000,00 0,00 0,00 
8,00 0,004,65   4,65   0,00
dm-11 0,00 0,000,000,00 0,00 0,00 
8,00 0,004,17   4,17   0,00
dm-12 0,00 0,000,111,34 0,9010,73 
8,00 0,12   76,31  12,61   1,83
dm-13 0,00 0,000,010,00 0,09 0,01 
8,00 0,00   18,70   1,26   0,00
dm-14 0,00 0,00   

Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt

hi,


What does a normal load mean? Does that mean a time when the queries are
slow?


yes, we are have slow queries (according to postgresql.log) with such load

Are you sure the machine really has 48GB of RAM? Because from the vmstat
output it seems like there's just 32GB.

procs ---memory-- ---swap-- -io -system--
cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa
   0  0 1342168 336936 107636 313535120014   183 1911 3426
2  1 93  4


1342168 + 336936 + 107636 + 31353512 = 33140252 ~ 31GB


strange.
we paid for 48G :-) and top and free show 48G
/root# free
 total   used   free sharedbuffers cached
Mem:  49564860   49310444 254416  0  30908   30329576
-/+ buffers/cache:   18949960   30614900
Swap: 209715121370960   19600552



Otherwise I don't see anything wrong in the output. What is the size of
the database (use pg_database_size to get it)? Did it grow significantly
recently?



there are a number of databases in the cluster on that machine,
in the filesystem it adds up to 271G

cheers gerhard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt



That's why I love dstat, just do this

$ dstat -C 0,1,2,3,4,5,6,7

and you know all you need.


dstat looks like a very nice tool, results below ..
(now the system load seems a bit lower then before when generating 
results for vmstat and iostat)

Good catch, thanks Scott.

Yes, good catch.

Still, this does not explain why the queries were running fast before, and
why the RAID array is so sluggish. Not to mention that we don't know what
were the conditions when collecting those numbers (were the VMs off or
running?).

the VMs were running. they are in something like production use, so i 
shouldn't just turn them off ..  :-)
and the processes in the VMs cause a big portion of the DB load, so 
turning them off would distort the results ...


and thanks again for all the replies!!! :-)


~# dstat -C 0,1,2,3,4,5,6,7
---cpu0-usage--cpu1-usage--cpu2-usage--cpu3-usage--cpu4-usage--cpu5-usage--cpu6-usage--cpu7-usage-- 
-dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq 
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq 
siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv  
send|  in   out | int   csw
  7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0   
0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0   
0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0 0 
|9760B 9431B|2468  4126
  0   0  98   2   0   0:  0   0  98   2   0   0:  6   2  22  71   0   
0:  5   0  76  19   0   0:  0  12  82   6   0   0:  3   7  88   2   0   
0:  3   1  84  12   0   0:  2   0  94   4   0   0|5160k 1376k|  60k  
225k|   0 0 |2101  3879
 11   1  84   4   0   0:  2   0  93   6   0   0:  3   4  72  22   0   
0:  2   2  92   3   0   1: 10  13  22  54   0   1:  6   7  75  12   0   
0:  3   0  87  10   0   0: 12   0  81   7   0   0|6640k 1683k| 140k  
240k|   0 0 |2860  4617
  1   1  29  68   0   1: 12   0  80   8   0   0:  6   0  78  16   0   
1:  3   1  80  16   0   0: 14  14  57  16   0   0:  0  11  78  12   0   
0:  9   1  83   7   0   0:  0   0  96   4   0   0|4448k 1266k| 102k  
336k|   0 0 |2790  4645
  0   0  89  11   0   0:  1   0  98   1   0   0: 14   0  57  29   0   
0:  1   1  89   9   0   0:  1  15  41  43   0   0:  3  15  75   7   0   
0:  3   2  60  35   0   0:  0   0  95   5   0   0|  18M 1622k|  97k  
285k|   0 0 |3303  4764
  0   0  96   4   0   0:  0   0  99   0   0   1:  1   2  14  83   0   
0:  1  25  17  57   0   0:  1   0  87  12   0   0:  1   0  19  80   0   
0:  3   3   0  94   0   0:  0   0  48  52   0   0|1320k   19M|  40k  
113k|   0 0 |2909  4709
  1   0  63  36   0   0:  5   2  88   5   0   0: 34   2   0  63   1   
0:  8   8  72  12   0   0:  0   9  85   6   0   0:  1   2  84  13   0   
0:  2   1  60  37   0   0:  1   1  62  36   0   0|9160k 5597k|  52k  
143k|  32k0 |2659  4650
  4   0  43  53   0   0:  2   0  93   5   0   0:  9   0  63  28   0   
0:  3   1  89   7   0   0:  2   9  72  16   0   1:  0  13  81   6   0   
0:  9   1  52  38   0   0:  3   0  84  13   0   0|4980k 1358k| 106k  
239k|   0 0 |2993  5158
  2   1  90   7   0   0:  2   0  95   3   0   0:  2   3  82  13   0   
0:  0   0  87  13   0   0:  6  10  32  52   0   0:  2  10  82   6   0   
0:  5   0  86   9   0   0: 10   5  81   4   0   0|4376k 2949k| 119k  
295k|   0 0 |2729  4630
  1   0  93   6   0   0:  2   0  91   6   1   0: 15   4  71  11   0   
0:  7   2  90   1   0   0: 13  10  12  65   0   0:  2  13  41  45   0   
0:  1   0  97   2   0   0:  1   0  94   5   0   0|3896k   15M|  87k  
242k|   0 0 |2809  5514
  2   0  98   0   0   0:  0   0  73  27   0   0:  0   0 100   0   0   
0:  2   1  29  68   0   0:  4   5   0  92   0   0:  2   5  92   2   0   
0:  0   0 100   0   0   0:  1   0  77  22   0   0| 172k   19M|  40k  
127k|   0 0 |2221  4069
  0   0  48  52   0   0:  0   0  97   3   0   0:  0   0  92   8   0   
0:  3   0  91   6   0   0:  2  10  10  78   0   0:  4  10  81   6   0   
0:  2   0  29  69   0   0:  1   0  26  73   0   0| 652k 6931k|  66k  
233k|   0 0 |2416  4389
  6   2  72  21   0   0:  3   1  86  10   0   0:  7   0  60  34   0   
0:  2   2  91   6   0   0:  1  13  78   9   0   0:  2   8  84   6   0   
0:  2   0  79  19   0   0:  0   2  87  11   0   0|2784k 1456k|  96k  
206k|   0 0 |2854  5226
  9   4  50  37   0   0:  3   3  84  10   0   0:  4   0  84  12   0   
0:  2   3  86   9   0   0: 10   2  73  15   0   0:  3   5  84   8   0   
0:  8   4  81   6   0   0:  1   2  84  13   0   0|2952k 1374k| 133k  
305k|   0 0 |3249  5076
  9   1  78  13   0   0:  4   4  83   9   0   0:  3   1  68  28   0   
0:  3   3  82  12   0   0:  9   0  64  26   0   1:  2   1  83  13   0   
1:  9   3  63  24   0   1:  3   1  91   5   0   0|3648k 1420k| 188k  
444k|   0 0 |3560  5981
  3   1  63  33   0   0:  0   1  86  13   0   0:  1   0  

Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Gerhard Wohlgenannt


That's a good thought,  maybe the stats are old and you have bad 
plans?  It could also be major updates to the data too (as opposed to 
growth).


we have made checks for number of dead tuples etc recently, but looks 
ok. and as everything in the database seems to be very slow atm, I 
guess the problem is not caused by bad plans for specific tables/queries.


Gerhard, have you done an 'explain analyze' on any of your slow 
queries?   Have you done an analyze lately?




yes we added the 'auto_explain' module to log/analyze queries = 5000ms.
a sample result from the logs (there is lots of stuff in the logs, I 
selected this query because it is very simple):


2011-09-06 04:00:35 CEST ANWEISUNG:  INSERT into 
keywords.table_x_site_impact (content_id, site_impact_id, site_impact) 
VALUES (199083087, 1, 1.00)
2011-09-06 04:00:35 CEST LOG:  Dauer: 15159.723 ms  Anweisung: INSERT 
into keywords.table_x_site_impact (content_id, site_impact_id, 
site_impact) VALUES (199083087, 1 , 1.00)

2011-09-06 04:00:35 CEST LOG:  duration: 15159.161 ms  plan:
Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.017..0.019 rows=1 loops=1)
  Output: 
nextval('keywords.table_x_site_impact_internal_id_seq'::regclass), 
199083087::bigint, 1::smallint, 1::double precision





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden drop in DBb performance

2011-09-06 Thread Tomas Vondra
On 6 Září 2011, 10:55, Gerhard Wohlgenannt wrote:

 That's why I love dstat, just do this

 $ dstat -C 0,1,2,3,4,5,6,7

 and you know all you need.

 dstat looks like a very nice tool, results below ..
 (now the system load seems a bit lower then before when generating
 results for vmstat and iostat)
 Good catch, thanks Scott.
 Yes, good catch.

 Still, this does not explain why the queries were running fast before,
 and
 why the RAID array is so sluggish. Not to mention that we don't know
 what
 were the conditions when collecting those numbers (were the VMs off or
 running?).

 the VMs were running. they are in something like production use, so i
 shouldn't just turn them off ..  :-)
 and the processes in the VMs cause a big portion of the DB load, so
 turning them off would distort the results ...

Distort the results? If you want to measure the RAID performance, you have
to do that when there are no other processes using it.

 and thanks again for all the replies!!! :-)

Please, use something like pastebin.com to post there results. It was
bearable for the vmstat output but this is alamost unreadable due to the
wrapping.

 ~# dstat -C 0,1,2,3,4,5,6,7
 ---cpu0-usage--cpu1-usage--cpu2-usage--cpu3-usage--cpu4-usage--cpu5-usage--cpu6-usage--cpu7-usage--
 -dsk/total- -net/total- ---paging-- ---system--
 usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
 siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq
 siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| read  writ| recv
 send|  in   out | int   csw
7   1  75  17   0   0:  4   5  84   7   0   0:  5   3  80  12   0
 0:  4   3  85   9   0   0:  7   2  75  16   0   0:  4   2  87   8   0
 0:  7   2  75  16   0   0:  4   1  87   8   0   0|5071k 2578k|   0 0
 |9760B 9431B|2468  4126
...

But if I read that correctly, the wait for the cores is 17%, 7%, 12%, 9%,
16%, 8%, 16% and 8%, and the cores are mostly idle (idl is about 85%). So
it seems there's a low number of processes, switched between the cpus and
most of the time they're waiting for the I/O.

Given the low values for disk I/O and the iostat output we've seen before,
it's obvious there's a lot of random I/O (mostly writes).

Let's speculate for a while what could cause this (in arbitrary order):

1) Checkpoints. Something is doing a lot of writes, and with DB that often
means a checkpoint is in progress. I'm not sure about your
checkpoint_timeout, but you do have 192 segments and about 7GB of shared
buffers. That means there may be a lot of dirty buffers (even 100% of the
buffers).

You're using RAID5 and that really is not a write-friendly RAID version.
We don't know actual performance as the bonnie was run with VMs accessing
the volume, but RAID10 usually performs much better.

Enable log_checkpoints in the config and see what's going on. You can also
use iotop to see what processes are doing the writes (it might be a
background writer, ...).

2) The RAID is broken and can't handle the load it handled fine before.
This is not very likely, as you've mentioned that there were no warnings
etc.

3) There are some new jobs that do a lot of I/O. Is there anything new
that wasn't running before? I guess you'd mention that.

4) The database significantly grew in a short period of time, and the
active part now does not fit into the RAM (page cache), so the data has to
be retrieved from the disk. And it's not just about the database size,
it's about the active part of the database - if you're suddenly accessing
more data, the cache may not be large enough.

This is usually a gradual process (cache hit ratio slowly decreases as the
database grows), but if the database grew rapidly ... This could be a
caused by MVCC, i.e. there may be a lot of dead tuples - have you done a
big UPDATE / DELETE or something like that recently?

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how fast index works?

2011-09-06 Thread Anibal David Acosta
Hi everyone, 

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row
is returned in 10 milliseconds, if the table has 6,000,000 of rows and
everything is OK (statistics, vacuum etc) 

can i suppose that elapsed time will be near to 10?

 

 

 

 



Re: [PERFORM] Rather large LA

2011-09-06 Thread Alan Hodgson
On September 5, 2011 03:36:09 PM you wrote:
 After Restart
 
 procs ---memory-- ---swap-- -io --system--
 -cpu-- r  b   swpd   free   buff  cache   si   sobibo   in
   cs us sy id wa st 2 34   2332 5819012  75632 258553680089   
 4200  7  5 85  3  0 4 39   2332 5813344  75628 2583358800 
 5104   324 5480 27047  3  1 84 11  0 2 47   2332 5815212  75336 25812064  
  00  4356  1664 5627 28695  3  1 84 12  0 2 40   2332 5852452  75340
 2581749600  5632   828 5817 28832  3  1 84 11  0 1 45   2332
 5835704  75348 2581707200  4960  1004 5111 25782  2  1 88  9  0 2
 42   2332 5840320  75356 2581163200  3884   492 5405 27858  3  1
 88  8  0 0 47   2332 5826648  75348 2580529600  4432  1268 5888
 29556  3  1 83 13  0
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
3.260.001.69   25.210.00   69.84
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util sda   0.5045.00 520.00 
 2.50  8316.00   380.0016.6471.70  118.28   1.92 100.10 sda1   
   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00   
 0.00   0.00   0.00 sda2  0.5045.00 520.00  2.50  8316.00  
 380.0016.6471.70  118.28   1.92 100.10 sda3  0.00
 0.00  0.00  0.00 0.00 0.00 0.00 0.000.00   0.00   0.00
 sdb   0.00   196.50  0.00 10.50 0.00  1656.00   157.71
 0.010.67   0.52   0.55 sdb1  0.00   196.50  0.00 10.50
 0.00  1656.00   157.71 0.010.67   0.52   0.55
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
3.970.001.71   20.880.00   73.44

Yeah 20% I/O wait I imagine feels pretty slow. 8 cores? 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw

24 :)

4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] 

On 6 Sep 2011, at 20:07, Alan Hodgson wrote:

 On September 5, 2011 03:36:09 PM you wrote:
 After Restart
 
 procs ---memory-- ---swap-- -io --system--
 -cpu-- r  b   swpd   free   buff  cache   si   sobibo   in
  cs us sy id wa st 2 34   2332 5819012  75632 258553680089   
 4200  7  5 85  3  0 4 39   2332 5813344  75628 2583358800 
 5104   324 5480 27047  3  1 84 11  0 2 47   2332 5815212  75336 25812064  
 00  4356  1664 5627 28695  3  1 84 12  0 2 40   2332 5852452  75340
 2581749600  5632   828 5817 28832  3  1 84 11  0 1 45   2332
 5835704  75348 2581707200  4960  1004 5111 25782  2  1 88  9  0 2
 42   2332 5840320  75356 2581163200  3884   492 5405 27858  3  1
 88  8  0 0 47   2332 5826648  75348 2580529600  4432  1268 5888
 29556  3  1 83 13  0
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.260.001.69   25.210.00   69.84
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util sda   0.5045.00 520.00 
 2.50  8316.00   380.0016.6471.70  118.28   1.92 100.10 sda1   
  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00   
 0.00   0.00   0.00 sda2  0.5045.00 520.00  2.50  8316.00  
 380.0016.6471.70  118.28   1.92 100.10 sda3  0.00
 0.00  0.00  0.00 0.00 0.00 0.00 0.000.00   0.00   0.00
 sdb   0.00   196.50  0.00 10.50 0.00  1656.00   157.71
 0.010.67   0.52   0.55 sdb1  0.00   196.50  0.00 10.50
 0.00  1656.00   157.71 0.010.67   0.52   0.55
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.970.001.71   20.880.00   73.44
 
 Yeah 20% I/O wait I imagine feels pretty slow. 8 cores? 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how fast index works?

2011-09-06 Thread Craig James

On 9/6/11 11:31 AM, Anibal David Acosta wrote:


Hi everyone,

My question is, if I have a table with 500,000 rows, and a SELECT of one row is 
returned in 10 milliseconds, if the table has 6,000,000 of rows and everything 
is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?



Theoretically the index is a B-tree with log(N) performance, so a larger table 
could be slower.  But in a real database, the entire subtree might fall 
together in one spot on the disk, so retrieving a record from a 500,000 record 
database could take the same time as a 6,000,000 record database.

On the other hand, if you do a lot of updates and don't have your autovacuum 
parameters set right, a 500,000 record index might get quite bloated and slow 
as it digs through several disk blocks to find one record.

There is no simple answer to your question.  In a well-maintained database, 
6,000,000 records are not a problem.

Craig


Re: [PERFORM] Rather large LA

2011-09-06 Thread Alan Hodgson
On September 6, 2011 12:11:10 PM Richard Shaw wrote:
 24 :)
 
 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz]
 

Nice box.

Still I/O-bound, though. SSDs would help a lot, I would think.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw
Thanks for the advice, It's one under consideration at the moment.   What are 
your thoughts on increasing RAM and shared_buffers?


On 6 Sep 2011, at 20:21, Alan Hodgson wrote:

 On September 6, 2011 12:11:10 PM Richard Shaw wrote:
 24 :)
 
 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz]
 
 
 Nice box.
 
 Still I/O-bound, though. SSDs would help a lot, I would think.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Alan Hodgson
On September 6, 2011 12:35:35 PM Richard Shaw wrote:
 Thanks for the advice, It's one under consideration at the moment.   What
 are your thoughts on increasing RAM and shared_buffers?
 

If it's running OK after the startup rush, and it seems to be, I would leave 
them alone. More RAM is always good, but I don't see it helping with this 
particular issue.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how fast index works?

2011-09-06 Thread Anibal David Acosta
Thanks!

 

 

De: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] En nombre de Craig James
Enviado el: martes, 06 de septiembre de 2011 03:18 p.m.
Para: pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how fast index works?

 

On 9/6/11 11:31 AM, Anibal David Acosta wrote: 

Hi everyone, 

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row
is returned in 10 milliseconds, if the table has 6,000,000 of rows and
everything is OK (statistics, vacuum etc) 

can i suppose that elapsed time will be near to 10?


Theoretically the index is a B-tree with log(N) performance, so a larger
table could be slower.  But in a real database, the entire subtree might
fall together in one spot on the disk, so retrieving a record from a 500,000
record database could take the same time as a 6,000,000 record database.

On the other hand, if you do a lot of updates and don't have your autovacuum
parameters set right, a 500,000 record index might get quite bloated and
slow as it digs through several disk blocks to find one record.

There is no simple answer to your question.  In a well-maintained database,
6,000,000 records are not a problem.

Craig



Re: [PERFORM] Rather large LA

2011-09-06 Thread Damon Snyder
If you are not doing so already, another approach to preventing the slam at
startup would be to implement some form of caching either in memcache or an
http accelerator such as varnish (https://www.varnish-cache.org/). Depending
on your application and the usage patterns, you might be able to fairly
easily insert varnish into your web stack.

Damon

On Tue, Sep 6, 2011 at 12:47 PM, Alan Hodgson ahodg...@simkin.ca wrote:

 On September 6, 2011 12:35:35 PM Richard Shaw wrote:
  Thanks for the advice, It's one under consideration at the moment.   What
  are your thoughts on increasing RAM and shared_buffers?
 

 If it's running OK after the startup rush, and it seems to be, I would
 leave
 them alone. More RAM is always good, but I don't see it helping with this
 particular issue.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] how fast index works?

2011-09-06 Thread Craig Ringer

On 7/09/2011 2:31 AM, Anibal David Acosta wrote:


Hi everyone,

My question is, if I have a table with 500,000 rows, and a SELECT of 
one row is returned in 10 milliseconds, if the table has 6,000,000 of 
rows and everything is OK (statistics, vacuum etc)


can i suppose that elapsed time will be near to 10?




It's not that simple. In addition to the performance scaling Craig James 
mentioned, there are cache effects.


Your 500,000 row index might fit entirely in RAM. This means that no 
disk access is required to query and search it, making it extremely 
fast. If the index on the larger table does NOT fit entirely in RAM, or 
competes for cache space with other things so it isn't always cached in 
RAM, then it might be vastly slower.


This is hard to test, because it's not easy to empty the caches. On 
Linux you can the the VM's drop_caches feature, but that drops *all* 
caches, including cached disk data from running programs, the PostgreSQL 
system catalogs, etc. That makes it a rather unrealistic test when the 
only thing you really want to remove from cache is your index and the 
table associated with it.


The best way to test whether data of a certain size will perform well is 
to create dummy data of that size and test with it. Anything else is 
guesswork.


--
Craig Ringer