Re: [PERFORM] FSM - per database or per installation?

2009-12-24 Thread Craig Ringer

On 20/11/2009 2:33 AM, Heikki Linnakangas wrote:

Craig James wrote:

Are the FSM parameters for each database, or the entire Postgres
system?  In other words, if I have 100 databases, do I need to increase
max_fsm_pages and max_fsm_relations by a factor of 100, or keep them the
same as if I just have one database?

I suspect they're per-database, i.e. as I add databases, I don't have to
increase the FSM parameters, but the documentation isn't 100% clear on
this point.


It's per cluster, ie *not* per-database.

The parameter is gone in 8.4, BTW.


See:

  http://www.postgresql.org/docs/8.4/static/release-8-4.html#AEN95067

for why they've been removed, which boils down to PostgreSQL manages 
the fsm automatically now and no longer requires all that RAM to do it, 
either.


Thanks Heikki - the fsm _really_ simplify admin and remove a bunch of 
common gotchas for Pg users.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Optimizer use of index slows down query by factor

2009-12-24 Thread Michael Ruf

Hi,

we experience some strange performance problems, we've already found a 
workaround for us, but are curious if it's a known problem of the optimizer.


Tested with the following Postgres Version: 8.2.15 and 8.3.9
AUTOVACUUM is enabled, explicit VACUUM and REINDEX both tables and the
whole DB.

effective_cache_size = 3096MB
default_statistics_target = 100
shared_buffers = 1024MB
work_mem = 64MB

Table Schema:

Table click
 Column  |  Type   | Modifiers
-+-+---
 click_id| integer | not null
 member_id   | integer |
 link_id | integer |
 click_timestamp | bigint  |
 remote_host | text|
 user_agent  | text|
Indexes:
click_pkey PRIMARY KEY, btree (click_id)
idx_click_1 btree (link_id)
idx_click_2 btree (click_timestamp)

Table link
   Column   |  Type   | Modifiers
+-+---
 link_id| integer | not null
 link_url   | text|
 task_id| integer |
 link_type  | integer |
 action_id  | integer |
 link_alias | text|
 deleted| boolean |
 deletable  | boolean |
Indexes:
link_pkey PRIMARY KEY, btree (link_id)
idx_link_1 btree (task_id)

Rows in click table contains:   22874089
Rows in link table contains:4220601


The following query is slow when index scan is enabled:

SELECT
link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct
click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id
WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND
(link.deletable IS NULL OR link.deletable=false)GROUP BY
link.link_type,link.link_alias LIMIT 1000


Explain with index scan enabled:

explain analyze SELECT
link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct
click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id
WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND
(link.deletable IS NULL OR link.deletable=false)GROUP BY
link.link_type,link.link_alias LIMIT 1000;

QUERY PLAN
-
 Limit  (cost=1416936.47..1417849.48 rows=1 width=30) (actual
time=277062.951..277073.144 rows=12 loops=1)
   -  GroupAggregate  (cost=1416936.47..1417849.48 rows=1 width=30)
(actual time=277062.949..277073.126 rows=12 loops=1)
 -  Sort  (cost=1416936.47..1417119.07 rows=73040 width=30)
(actual time=277062.820..277066.219 rows=6445 loops=1)
   Sort Key: link.link_type, link.link_alias
   Sort Method:  quicksort  Memory: 696kB
   -  Merge Right Join  (cost=1604.91..1411036.15
rows=73040 width=30) (actual time=277027.644..277050.946 rows=6445 loops=1)
 Merge Cond: (click.link_id = link.link_id)
 -  Index Scan using idx_click_1 on click
(cost=0.00..1351150.42 rows=22874088 width=12) (actual
time=6.915..263327.439 rows=22409997 loops=1)
 -  Sort  (cost=1604.91..1638.61 rows=13477
width=26) (actual time=12.172..15.640 rows=6445 loops=1)
   Sort Key: link.link_id
   Sort Method:  quicksort  Memory: 33kB
   -  Index Scan using idx_link_1 on link
(cost=0.00..680.51 rows=13477 width=26) (actual time=5.707..12.043
rows=126 loops=1)
 Index Cond: (task_id = 1556)
 Filter: (((deletable IS NULL) OR (NOT
deletable)) AND ((link_type = 8) OR (link_type = 9)))
 Total runtime: 277082.204 ms
(15 rows)


Explain with set enable_indexscan=false;

explain analyze SELECT
link.link_alias,link.link_type,COUNT(click.click_id),COUNT(distinct
click.member_id) FROM link LEFT JOIN click ON link.link_id=click.link_id
WHERE (link.link_type=8 OR link.link_type=9) AND link.task_id=1556 AND
(link.deletable IS NULL OR link.deletable=false)GROUP BY
link.link_type,link.link_alias LIMIT 1000;

   QUERY PLAN
--
 Limit  (cost=2577764.28..2578677.29 rows=1 width=30) (actual
time=51713.324..51723.517 rows=12 loops=1)
   -  GroupAggregate  (cost=2577764.28..2578677.29 rows=1 width=30)
(actual time=51713.322..51723.499 rows=12 loops=1)
 -  Sort  (cost=2577764.28..2577946.88 rows=73040 width=30)
(actual time=51713.191..51716.600 rows=6445 loops=1)
   Sort Key: link.link_type, link.link_alias
   Sort Method:  quicksort  Memory: 696kB
   -  Hash Left Join  (cost=1140942.18..2571863.96
rows=73040 width=30) (actual time=45276.194..51702.053 rows=6445 loops=1)
 Hash Cond: (link.link_id = click.link_id)
 -  Bitmap Heap Scan on link
(cost=253.20..34058.86 rows=13477 width=26) (actual time=0.044..0.168
rows=126 

[PERFORM] Multicolumn index - WHERE ... ORDER BY

2009-12-24 Thread Lucas Maystre

Hi there,

I've got a small question about multicolumn indexes.

I have a table with ~5M rows (43 bytes per column -  is that relevant?) 
(but eventually it could grow up to 50M rows), used to store e-mail 
logs. I am trying to build a web frontend to search mails in this table.


I usually want only the last mails processed by my mail system, so 
typically all my queries end with:

... ORDER BY time LIMIT 50;

Before that, I have usually have a WHERE clause on a indexed column. 
Example of a query I might have:


SELECT id FROM mail WHERE from_address LIKE 'bill%'
ORDER BY time DESC LIMIT 50;

I observed that the ordering adds a significant overhead to my queries - 
this seems quite logical, because of the ORDER BY which has to inspect 
every row matching the WHERE clause.


The approach taken by the query planner is one of the following:

1) if it thinks there are not so much rows containg 'bill' as prefix 
of the 'from_address' column, it performs an index scan (or a bitmap 
index scan) using my index on 'from_address', then sorts all results 
according to the 'time' column.


2) if it thinks there are many rows containing 'bill' as prefix of the 
'from_address' column, it performs an reverse index scan using my index 
on 'time', and looks sequentially if the 'from_address' column 
contains 'bill' as prefix.


The problem is that not so much is in my case approx 10K rows 
sometimes. It seems to be pretty costly to perform an (bitmap) index 
scan over all these rows. As I only want the first few rows anyway 
(LIMIT 50), I thought that there had to be some better solution.


The solution I had in mind was to create a multicolumn index over 
'from_address' and 'time':


CREATE INDEX idx_from_time ON mail (from_address, time DESC);

so that it could directly use the 'time' ordering and lookup only the 
first 50 rows using the index.


but... it doesn't work :-) i.e. my multicolumn index is never used. So:
- do you guys have any ideas why it doesn't work?
- do you see an alternative solution?

Infos:
- I use PostgreSQL 8.4.2
- I regularly VACUUM and ANALYZE my db. Statistics look OK.
- I'm relatively new to PostgreSQL, so maybe this question is trivial?

Thanks in advance, and happy holidays!

--
lucas maystre
trainee

open systems ag
raeffelstrasse 29
ch-8045 zurich
t: +41 44 455 74 00
f: +41 44 455 74 01
l...@open.ch

http://www.open.ch

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SATA drives performance

2009-12-24 Thread Ognjen Blagojevic

Hi all,

I'm trying to figure out which HW configuration with 3 SATA drives is 
the best in terms of reliability and performance for Postgres database.


I'm thinking to connect two drives in RAID 0, and to keep the database 
(and WAL) on these disks - to improve the write performance of the SATA 
drives.


The third drive will be used to reduce the cost of the RAID 0 failure 
without reducing the performance. Say, I could configure Postgres to use 
the third drive as backup for WAL files, with archive_timeout set to 15 
minutes. Daily backups will be created on different server. Loss of last 
15 minute updates is something the customer can afford. Also, one day 
restore time is case of failure is also affordable (to reinstall the OS, 
Postgres, restore backup, and load WALs).


The server will be remotely administered, that is why I'm not going for 
RAID 1, 1+0 or some other solution for which, I beleive, the local 
administion is crucial.


Server must be low budget, that is why I'm avoiding SAS drives. We will 
use CentOS Linux and Postgres 8.4. The database will have 90% of read 
actions, and 10% of writes.


I would like to hear your opinion, is this reasonable or I should 
reconsider RAID 1?


Regards,
Ognjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance with partitions/inheritance and multiple tables

2009-12-24 Thread Radhika S
Hi,
We currently have a large table (9 million rows) of which only the last
couple of days worth of data is queried on a regular basis.
To improve performance we are thinking of partitioning the table.

One idea is:
Current_data = last days worth
archive_data  today (goes back to 2005)

The idea being proposed at work is:
current_data = today's data
prior years data - be broken down into one table per day
archive_data - data older than a year.

My question is:
a) Does postgres suffer a performance hit say if there are 200 child tables.
b) What about aggregation between dates in the last year. eg total sales for
firm a  for the last year. It will need to look up n number of tables.

Any ideas, tips, gotchas in implementing partitioning would be welcome. It
is a somewhat mission critical (not trading, so not as mission critical)
system.

How expensive is maintaining so many partitions both in terms of my writing
/ maintaining scripts and performance.

Thanks in advance.
Radhika


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Scott Marlowe
2009/12/24 Ognjen Blagojevic ogn...@etf.bg.ac.yu:
 Hi all,

 I'm trying to figure out which HW configuration with 3 SATA drives is the
 best in terms of reliability and performance for Postgres database.

 I'm thinking to connect two drives in RAID 0, and to keep the database (and
 WAL) on these disks - to improve the write performance of the SATA drives.

 The third drive will be used to reduce the cost of the RAID 0 failure
 without reducing the performance. Say, I could configure Postgres to use the
 third drive as backup for WAL files, with archive_timeout set to 15 minutes.
 Daily backups will be created on different server. Loss of last 15 minute
 updates is something the customer can afford. Also, one day restore time is
 case of failure is also affordable (to reinstall the OS, Postgres, restore
 backup, and load WALs).

 The server will be remotely administered, that is why I'm not going for RAID
 1, 1+0 or some other solution for which, I beleive, the local administion is
 crucial.

 Server must be low budget, that is why I'm avoiding SAS drives. We will use
 CentOS Linux and Postgres 8.4. The database will have 90% of read actions,
 and 10% of writes.

 I would like to hear your opinion, is this reasonable or I should reconsider
 RAID 1?

If you're running RAID-0 and suffer a drive failure, the system
becomes somewhat less cheaper because you now have to rescue it and
get it up and running again.  I.e. you've moved your cost from
hardware to your time.

I'd recommend RAID-1 with a 3 disk mirror.  Linux now knows to read
from  1 drive at a time even for a single user to get very good read
bandwidth ( I routinely see read speeds on a pair of WD Black 7200 RPM
SATA drives approaching 200MB/s (they are ~100MB/s each).  Your
redundancy is increased, so that should one drive fail you're still
completely redundant.  Also 1TB drives are CHEAP nowadays, even the WD
blacks and similar drives from other manufacturers.  If you need more
storage than a single 1TB drive can provide, then you'll need some
other answer.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill

A couple of thoughts occur to me:

1. For reads, RAID 1 should also be good: it will allow a read to occur
from whichever disk can provide the data fastest.

2. Also, for reads, the more RAM you have, the better (for caching). I'd
suspect that another 8GB of RAM is a better expenditure than a 2nd drive
in many cases.

3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
intead. If you use the Linux software mdraid, remote admin is easy.

4. If you can tolerate the risk of the most recent transactions being
lost, look at asynchronous commit. Likewise, you *might* consider
operating with a write cache enabled. Otherwise, the time for
fdatasync() is what's critical.

5. For a 2-disk setup, I think that main DB on one, with WAL on the
other will beat having everything on a single RAID0.

6. The WAL is relatively small: you might consider a (cheap) solid-state
disk for it.

7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4, then have the first disk set up
as a split between OS and WAL; the 2nd disk set up for
/var/lib/postgresql,  and the 3rd disk as a backup for everything (and a
spare OS with SSH access).

8. Lastly, if you need remote administration, and can justify another
£100 or so, the HP iLO (integrated lights out) cards are rather
useful: these effectively give you VNC without OS support, even for the
BIOS.

Best wishes,

Richard


Ognjen Blagojevic wrote:

Hi all,

I'm trying to figure out which HW configuration with 3 SATA drives is 
the best in terms of reliability and performance for Postgres database.


I'm thinking to connect two drives in RAID 0, and to keep the database 
(and WAL) on these disks - to improve the write performance of the SATA 
drives.


The third drive will be used to reduce the cost of the RAID 0 failure 
without reducing the performance. Say, I could configure Postgres to use 
the third drive as backup for WAL files, with archive_timeout set to 15 
minutes. Daily backups will be created on different server. Loss of last 
15 minute updates is something the customer can afford. Also, one day 
restore time is case of failure is also affordable (to reinstall the OS, 
Postgres, restore backup, and load WALs).


The server will be remotely administered, that is why I'm not going for 
RAID 1, 1+0 or some other solution for which, I beleive, the local 
administion is crucial.


Server must be low budget, that is why I'm avoiding SAS drives. We will 
use CentOS Linux and Postgres 8.4. The database will have 90% of read 
actions, and 10% of writes.


I would like to hear your opinion, is this reasonable or I should 
reconsider RAID 1?


Regards,
Ognjen




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizer use of index slows down query by factor

2009-12-24 Thread Tom Lane
Michael Ruf m...@inxmail.de writes:
 we experience some strange performance problems, we've already found a 
 workaround for us, but are curious if it's a known problem of the optimizer.

I think you need to see about getting this rowcount estimate to be more
accurate:

 -  Index Scan using idx_link_1 on link
 (cost=0.00..680.51 rows=13477 width=26) (actual time=5.707..12.043
 rows=126 loops=1)
   Index Cond: (task_id = 1556)
   Filter: (((deletable IS NULL) OR (NOT
 deletable)) AND ((link_type = 8) OR (link_type = 9)))

If it realized there'd be only 126 rows out of that scan, it'd probably
have gone for a nestloop join against the big table, which I think would
be noticeably faster than either of the plans you show here.

You already did crank up default_statistics_target, so I'm not sure if
raising it further would help any.  What I'd suggest is trying to avoid
using non-independent AND/OR conditions.  For instance recasting the
first OR as just deletable is not true would probably result in a
better estimate.  The size of the error seems to be more than that would
account for though, so I suspect that the deletable and link_type
conditions are interdependent.  Is it practical to recast your data
representation to avoid that?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance with partitions/inheritance and multiple tables

2009-12-24 Thread Shrirang Chitnis
Radhika,

If the data is 9 million rows, then I would suggest that you leave it as it is, 
unless the server configuration and the number of users firing queries 
simultaneously is a matter of concern.

Try creating indexes on often used fields and use EXPLAIN to speed performance 
of the queries ... and of course proper configuration of autovacuum. I have 
seen query results within a few ms. on similar amount of data on a 2GB RHEL 
RAID 5 system, so it should not have been an issue.

HTH,

Shrirang Chitnis
--

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Radhika S
Sent: Thursday, December 24, 2009 8:12 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance with partitions/inheritance and multiple tables

Hi,
We currently have a large table (9 million rows) of which only the last couple 
of days worth of data is queried on a regular basis.
To improve performance we are thinking of partitioning the table.

One idea is:
Current_data = last days worth
archive_data  today (goes back to 2005)

The idea being proposed at work is:
current_data = today's data
prior years data - be broken down into one table per day
archive_data - data older than a year.

My question is:
a) Does postgres suffer a performance hit say if there are 200 child tables.
b) What about aggregation between dates in the last year. eg total sales for 
firm a  for the last year. It will need to look up n number of tables.

Any ideas, tips, gotchas in implementing partitioning would be welcome. It is a 
somewhat mission critical (not trading, so not as mission critical) system.

How expensive is maintaining so many partitions both in terms of my writing / 
maintaining scripts and performance.

Thanks in advance.
Radhika

The information contained in this message, including any attachments, is 
attorney privileged and/or confidential information intended only for the use 
of the individual or entity named as addressee.  The review, dissemination, 
distribution or copying of this communication by or to anyone other than the 
intended addressee is strictly prohibited.  If you have received this 
communication in error, please immediately notify the sender by replying to the 
message and destroy all copies of the original message.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Greg Smith

Richard Neill wrote:

3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
intead. If you use the Linux software mdraid, remote admin is easy.


The main thing to be wary of with Linux software RAID-1 is that you 
configure things so that both drives are capable of booting the system.  
It's easy to mirror the data, but not the boot loader and the like.




7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4...


I have yet to yet a single positive thing about using ext4 for 
PostgreSQL.  Stick with ext3, where the problems you might run into are 
at least well understood and performance is predictable.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Multicolumn index - WHERE ... ORDER BY

2009-12-24 Thread Tom Lane
Lucas Maystre l...@open.ch writes:
 Example of a query I might have:
 SELECT id FROM mail WHERE from_address LIKE 'bill%'
 ORDER BY time DESC LIMIT 50;

 The solution I had in mind was to create a multicolumn index over 
 'from_address' and 'time':
 CREATE INDEX idx_from_time ON mail (from_address, time DESC);
 so that it could directly use the 'time' ordering and lookup only the 
 first 50 rows using the index.

 but... it doesn't work :-) i.e. my multicolumn index is never used. So:
 - do you guys have any ideas why it doesn't work?

The from_address condition isn't simple equality, so the output of a
scan wouldn't be sorted by time --- it would have subranges that are
sorted, but that's no help overall.  You still have to read the whole
scan output and re-sort.  So this index has no advantage over the
smaller index on just from_address.

 - do you see an alternative solution?

There might be some use in an index on (time, from_address).  That
gives the correct time ordering, and at least the prefix part of the
from_address condition can be checked in the index without visiting the
heap.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Gaël Le Mignot

Hello,

Instead of using 3 disks in RAID-0 and one without RAID for archive, I
would rather  invest into one  extra disk and  have either a  RAID 1+0
setup or use two  disks in RAID-1 for the WAL and  two disks in RAID-1
for the main database (I'm not sure which perform better between those
two solutions).

RAID-1 will  give you about twice as  fast reads as no  RAID (and RAID
1+0  will give  you twice  as  fast as  RAID 0),  with no  significant
penalty for writing, and it'll save  a lot of manpower in case on disk
dies.

If you can  afford hot-swappable disks, you can  even replace a failed
disk live, in a few minutes, with no failure at software level.

Everything  can be  remotely setup,  including adding/removing  a disk
from RAID  array, if  you use Linux  software RAID (mdadm),  except of
course  the physical  swap of  the disk,  but that  can be  done  by a
non-technician.

This  solution  costs  only  one  extra disk  (which  is  quite  cheap
nowadays) and  will deliver  enhanced performances and  save a  lot of
manpower and downtime in case of disk breaking.

Regards,

-- 
Gaël Le Mignot - g...@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Greg Smith

Gaël Le Mignot wrote:

This  solution  costs  only  one  extra disk  (which  is  quite  cheap
nowadays)


I would wager that the system being used here only has enough space to 
house 3 drives, thus the question, which means that adding a fourth 
drive probably requires buying a whole new server.  Nowadays the drives 
themselves are rarely the limiting factor on how many people use, since 
you can get a stack of them for under $100 each.  Instead the limit for 
small servers is always based on the physical enclosure and then 
potentially the number of available drive ports.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Mark Mielke

On 12/24/2009 10:51 AM, Greg Smith wrote:

7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4...


I have yet to yet a single positive thing about using ext4 for 
PostgreSQL.  Stick with ext3, where the problems you might run into 
are at least well understood and performance is predictable.




Hi Greg:

Can you be more specific? I am using ext4 without problems than I have 
discerned - but mostly for smaller databases (~10 databases, one almost 
about 1 Gbyte, most under 500 Mbytes).


Is it the delayed allocation feature that is of concern? I believe this 
feature is in common with other file systems such as XFS, and provided 
that the caller is doing things properly according to POSIX and/or the 
file system authors understanding of POSIX, which includes 
fsync()/fdatasync()/O_DIRECT (which PostgreSQL does?), everything is fine?


File systems failures have been pretty rare for me lately, so it's hard 
to say for sure whether my setup is really running well until it does 
fail one day and I find out. (Not too concerned, though, as I keep off 
site pg_dump backups of the database on a regular schedule - the 
databases are small enough to afford this :-) )


Cheers,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Greg Smith

Mark Mielke wrote:
Can you be more specific? I am using ext4 without problems than I have 
discerned - but mostly for smaller databases (~10 databases, one 
almost about 1 Gbyte, most under 500 Mbytes).


Every time I do hear about ext4, so far it's always in the context of 
something that doesn't work well--not hearing about improvements yet.  
For example, there was a thread on this list earlier this month titled 
8.4.1 ubuntu karmic slow createdb that had a number of people chime 
saying they weren't happy with ext4 for various reasons.


Also, I have zero faith in the ability of the Linux kernel development 
process to produce stable code anymore, they're just messing with too 
many things every single day.  Any major new features that come out of 
there I assume need a year or two to stabilize before I'll put a 
production server on them and feel safe, because that this point a 
stable release means nothing in terms of kernel QA.  Something major 
like a filesystem introduction would be closer to the two year estimate 
side.  We're not even remotely close to stable yet with ext4 when stuff 
like http://bugzilla.kernel.org/show_bug.cgi?id=14354 is still going 
on.  My rough estimate is that ext4 becomes usable and free of major 
bugs in late 2010, best case.  At this point anyone who deploys it is 
still playing with fire.


File systems failures have been pretty rare for me lately, so it's 
hard to say for sure whether my setup is really running well until it 
does fail one day and I find out.


All of the ext4 issues I've heard of that worry me are either a) 
performance related and due to the barrier code not doing what was 
expected, or b) crash related.  No number of anecdotal it works for me 
reports can make up for those classes of issue because you will only see 
both under very specific circumstances.  I'm glad you have a good backup 
plan though.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill


Greg Smith wrote:

Richard Neill wrote:

3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
intead. If you use the Linux software mdraid, remote admin is easy.


The main thing to be wary of with Linux software RAID-1 is that you 
configure things so that both drives are capable of booting the system.  
It's easy to mirror the data, but not the boot loader and the like.


Good point. I actually did this on a home PC (2 disks in RAID 1). The
solution is simple: just grub-install /dev/sda; grub-install /dev/sdb
and that's all you have to do, provided that /boot is on the raid array.

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card. Those are a
pain, and less configurable, but it will take care of the bootloader issue.

Obviously, test it both ways.






7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4...


I have yet to yet a single positive thing about using ext4 for 
PostgreSQL.  Stick with ext3, where the problems you might run into are 
at least well understood and performance is predictable.


I did some measurements on fdatasync() performance for ext2,ext3,ext4.

I found ext2 was fastest, ext4 was twice as slow as ext2, and ext3 was
about 5 times slower than ext2. Also, ext4 is doesn't having an
appallingly slow fsck.

We've had pretty good results from ext4.

Richard





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Jeremy Harris

On 12/24/2009 05:12 PM, Richard Neill wrote:

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card.


Could you expand on that?

- Jeremy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Jeremy Harris wrote:

On 12/24/2009 05:12 PM, Richard Neill wrote:

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card.


Could you expand on that?


Both of the last machines I bought (an IBM X3550 and an HP DL380) come 
with hardware raid solutions. These are an utter nuisance because:


  - they can only be configured from the BIOS (or with a
bootable utility CD). Linux has very basic monitoring tools,
but no way to reconfigure the array, or add disks to empty
hot-swap slots while the system is running.

  - If there is a Linux raid config program, it's not part of the
main packaged distro, but usually a pre-built binary, available
for only one release/kernel of the wrong distro.

  - the IBM one had dodgy firmware, which, until updated, caused the
disk to totally fail after a few days.

  - you pay a lot of money for something effectively pointless, and
have less control and less flexibility.

After my experience with the X3550, I hunted for any server that would 
ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct 
to the motherboard, or where the hardware raid could be de-activated 
completely, and put into pass-through mode. Neither HP nor IBM make such 
a thing.


Richard






- Jeremy



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Scott Marlowe
On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote:


 Jeremy Harris wrote:

 On 12/24/2009 05:12 PM, Richard Neill wrote:

 Of course, with a server machine, it's nearly impossible to use mdadm
 raid: you are usually compelled to use a hardware raid card.

 Could you expand on that?

 Both of the last machines I bought (an IBM X3550 and an HP DL380) come with
 hardware raid solutions. These are an utter nuisance because:

  - they can only be configured from the BIOS (or with a
    bootable utility CD). Linux has very basic monitoring tools,
    but no way to reconfigure the array, or add disks to empty
    hot-swap slots while the system is running.

  - If there is a Linux raid config program, it's not part of the
    main packaged distro, but usually a pre-built binary, available
    for only one release/kernel of the wrong distro.

  - the IBM one had dodgy firmware, which, until updated, caused the
    disk to totally fail after a few days.

  - you pay a lot of money for something effectively pointless, and
    have less control and less flexibility.

 After my experience with the X3550, I hunted for any server that would ship
 without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the
 motherboard, or where the hardware raid could be de-activated completely,
 and put into pass-through mode. Neither HP nor IBM make such a thing.

Yep.  And that's why I never order servers from them.  There are
dozens of reputable white box builders (I use Aberdeen who give me a 5
year all parts warranty and incredible customer service, but there are
plenty to choose from) and they build the machine I ask them to build.
 For hardware RAID I use Areca 1680 series, and they also provide me
with machines with software RAID for lighter loads (slave dbs,
reporting dbs, and stats dbs)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Ognjen Blagojevic

Richard and others, thank you all for your answers.

My comments inline.

Richard Neill wrote:
 2. Also, for reads, the more RAM you have, the better (for caching). I'd
 suspect that another 8GB of RAM is a better expenditure than a 2nd drive
 in many cases.

The size of the RAM is already four time of the database size, so I 
believe I won't get any more benefit if it is increased. The number of 
simultaneous connections to the database is small -- around 5.


What I'm trying to do with the hard disk configuration is to increase 
the write speed.



 3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
 intead. If you use the Linux software mdraid, remote admin is easy.

No, actually it is HP ML series server with HW RAID. I don't have too 
much experience with it, but I believe that the remote administration 
might be hard. And that was the main reason I was avoiding RAID 1.



 5. For a 2-disk setup, I think that main DB on one, with WAL on the
 other will beat having everything on a single RAID0.

 6. The WAL is relatively small: you might consider a (cheap) solid-state
 disk for it.

These are exactly the thing I was also considering. -- but needed advice 
from people who tried it already.


Regards,
Ognjen



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Adam Tauno Williams
This isn't true.  IBMs IPS series controllers can the checked and configured 
via the ipssend utility that works very well in 2.6.x LINUX.

Scott Marlowe scott.marl...@gmail.com wrote:

On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote:


 Jeremy Harris wrote:

 On 12/24/2009 05:12 PM, Richard Neill wrote:

 Of course, with a server machine, it's nearly impossible to use mdadm
 raid: you are usually compelled to use a hardware raid card.

 Could you expand on that?

 Both of the last machines I bought (an IBM X3550 and an HP DL380) come with
 hardware raid solutions. These are an utter nuisance because:

  - they can only be configured from the BIOS (or with a
    bootable utility CD). Linux has very basic monitoring tools,
    but no way to reconfigure the array, or add disks to empty
    hot-swap slots while the system is running.

  - If there is a Linux raid config program, it's not part of the
    main packaged distro, but usually a pre-built binary, available
    for only one release/kernel of the wrong distro.

  - the IBM one had dodgy firmware, which, until updated, caused the
    disk to totally fail after a few days.

  - you pay a lot of money for something effectively pointless, and
    have less control and less flexibility.

 After my experience with the X3550, I hunted for any server that would ship
 without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the
 motherboard, or where the hardware raid could be de-activated completely,
 and put into pass-through mode. Neither HP nor IBM make such a thing.

Yep.  And that's why I never order servers from them.  There are
dozens of reputable white box builders (I use Aberdeen who give me a 5
year all parts warranty and incredible customer service, but there are
plenty to choose from) and they build the machine I ask them to build.
 For hardware RAID I use Areca 1680 series, and they also provide me
with machines with software RAID for lighter loads (slave dbs,
reporting dbs, and stats dbs)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Message composed using K-9 mail on Android.
Apologies for improper reply quoting (not supported) by client.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Adam Tauno Williams wrote:

This isn't true.  IBMs IPS series controllers can the checked and configured 
via the ipssend utility that works very well in 2.6.x LINUX.



Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl 
card. This one is atrocious - it shipped with a hideous firmware bug. 
And there is no way to bypass it.


The HP have the P400 cards, which are decent in themselves, just not as 
good as software raid.


Richard



Scott Marlowe scott.marl...@gmail.com wrote:


On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill rn...@cam.ac.uk wrote:


Jeremy Harris wrote:

On 12/24/2009 05:12 PM, Richard Neill wrote:

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card.

Could you expand on that?

Both of the last machines I bought (an IBM X3550 and an HP DL380) come with
hardware raid solutions. These are an utter nuisance because:

 - they can only be configured from the BIOS (or with a
   bootable utility CD). Linux has very basic monitoring tools,
   but no way to reconfigure the array, or add disks to empty
   hot-swap slots while the system is running.

 - If there is a Linux raid config program, it's not part of the
   main packaged distro, but usually a pre-built binary, available
   for only one release/kernel of the wrong distro.

 - the IBM one had dodgy firmware, which, until updated, caused the
   disk to totally fail after a few days.

 - you pay a lot of money for something effectively pointless, and
   have less control and less flexibility.

After my experience with the X3550, I hunted for any server that would ship
without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the
motherboard, or where the hardware raid could be de-activated completely,
and put into pass-through mode. Neither HP nor IBM make such a thing.

Yep.  And that's why I never order servers from them.  There are
dozens of reputable white box builders (I use Aberdeen who give me a 5
year all parts warranty and incredible customer service, but there are
plenty to choose from) and they build the machine I ask them to build.
For hardware RAID I use Areca 1680 series, and they also provide me
with machines with software RAID for lighter loads (slave dbs,
reporting dbs, and stats dbs)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


--
Message composed using K-9 mail on Android.
Apologies for improper reply quoting (not supported) by client.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Scott Marlowe
On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote:


 Adam Tauno Williams wrote:

 This isn't true.  IBMs IPS series controllers can the checked and
 configured via the ipssend utility that works very well in 2.6.x LINUX.


 Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card.
 This one is atrocious - it shipped with a hideous firmware bug. And there is
 no way to bypass it.

 The HP have the P400 cards, which are decent in themselves, just not as good
 as software raid.

Yeah, the HP400 gets pretty meh reviews here on the lists.  The P600
is adequate and the P800 seems to be a good performer.

Can you replace the IBM RAID controller with some other controller?
Even just a simple 4 or 8 port SATA card with no RAID capability would
be better than something that locks up.

Personally I'd call my rep and ask him to come pick up his crap server
and give me a check to replace it if it was that bad.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Scott Marlowe wrote:

On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote:


Adam Tauno Williams wrote:

This isn't true.  IBMs IPS series controllers can the checked and
configured via the ipssend utility that works very well in 2.6.x LINUX.


Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card.
This one is atrocious - it shipped with a hideous firmware bug. And there is
no way to bypass it.


Can you replace the IBM RAID controller with some other controller?
Even just a simple 4 or 8 port SATA card with no RAID capability would
be better than something that locks up.


A replacement would have been nice, however the 8kl is very tightly 
integrated with the motherboard and the backplane. We'd have had to buy 
a PCI-X card, and then get out the soldering iron to fix the cables.


To be fair, the 8kl is now working OK; also there was a note in the box 
mentioning that firmware updates should be applied if available. What I 
found unbelievable was that IBM shipped the server to me in a state with 
known crashing firmware (a sufficiently bad bug imho to merit a product 
recall), and hadn't bothered to flash it themselves in the factory. 
Usually BIOS updates are only applied by the end user if there is a 
specific issue to fix, and if the product line has been out for years, 
but that particular server was only assembled 3 weeks ago, why would one 
expect a company of IBM's standing to ship it in that state.


Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Scott Marlowe
On Thu, Dec 24, 2009 at 5:15 PM, Richard Neill rn...@cam.ac.uk wrote:


 Scott Marlowe wrote:

 On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill rn...@cam.ac.uk wrote:

 Adam Tauno Williams wrote:

 This isn't true.  IBMs IPS series controllers can the checked and
 configured via the ipssend utility that works very well in 2.6.x LINUX.

 Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl
 card.
 This one is atrocious - it shipped with a hideous firmware bug. And there
 is
 no way to bypass it.

 Can you replace the IBM RAID controller with some other controller?
 Even just a simple 4 or 8 port SATA card with no RAID capability would
 be better than something that locks up.

 A replacement would have been nice, however the 8kl is very tightly
 integrated with the motherboard and the backplane. We'd have had to buy a
 PCI-X card, and then get out the soldering iron to fix the cables.

 To be fair, the 8kl is now working OK; also there was a note in the box
 mentioning that firmware updates should be applied if available. What I
 found unbelievable was that IBM shipped the server to me in a state with
 known crashing firmware (a sufficiently bad bug imho to merit a product
 recall), and hadn't bothered to flash it themselves in the factory. Usually
 BIOS updates are only applied by the end user if there is a specific issue
 to fix, and if the product line has been out for years, but that particular
 server was only assembled 3 weeks ago, why would one expect a company of
 IBM's standing to ship it in that state.

It does kind of knock the stuffing out of the argument that buying
from the big vendors ensures good hardware experiences.  I've had
similar problems from all the big vendors in the past.  I can't
imagine getting treated that way by my current supplied.  It's one
thing for some obscure bug in a particular ubuntu kernel to interact
poorly with a piece of equipment, but when a hardware RAID controller
arrives in a basically broken state, that's inexcusable.  It's really
not too much to expect working hardware on arrival.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance