from:"Alex Turner"

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Alex Turner


You should search the archives for Luke Lonegran's posting about how IO in
Postgresql is significantly bottlenecked because it's not async.  A 12 disk
array is going to max out Postgresql's max theoretical write capacity to
disk, and therefore BigRDBMS is always going to win in such a config.  You
can also look towards Bizgres which allegedly elimates some of these
problems, and is cheaper than most BigRDBMS products.

Alex.

On 12/28/06, Guy Rouillier [EMAIL PROTECTED] wrote:


I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.)  The arrays are set up with RAID5.  So I'm
working with a consistent hardware platform for this comparison.  I'm
only processing a small subset of files (144.)

BigDBMS processed this set of data in 2 seconds, with all foreign
keys in place.  With all foreign keys in place, PG took 54000 seconds to
complete the same job.  I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference.  What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 3 seconds.  Better, but BigDBMS
still has it beat significantly.

I've got PG configured so that that the system database is on disk array
2, as are the transaction log files.  The default table space for the
test database is disk array 3.  I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array.  I also store the stats tables on this array.  Finally, I put the
indexes for the stats tables on disk array 4.  I don't use disk array 1
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd
appreciate some suggestions.  Thanks.

--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Bad iostat numbers

2006-12-05 Thread Alex Turner


The problem I see with software raid is the issue of a battery backed unit:
If the computer loses power, then the 'cache' which is held in system
memory, goes away, and fubars your RAID.

Alex

On 12/5/06, Michael Stone [EMAIL PROTECTED] wrote:


On Tue, Dec 05, 2006 at 01:21:38AM -0500, Alex Turner wrote:
My other and most important point is that I can't find any solid
recommendations for a SCSI card that can perform optimally in Linux or
*BSD.  Off by a factor of 3x is pretty sad IMHO. (and yes, we know the
Adaptec cards suck worse, that doesn't bring us to a _good_ card).

This gets back to my point about terminology. As a SCSI HBA the Adaptec
is decent: I can sustain about 300MB/s off a single channel of the
39320A using an external RAID controller. As a RAID controller I can't
even imagine using the Adaptec; I'm fairly certain they put that
functionality on there just so they could charge more for the card. It
may be that there's not much market for on-board SCSI RAID controllers;
between SATA on the low end and SAS  FC on the high end, there isn't a
whole lotta space left for SCSI. I definitely don't think much
RD is going into SCSI controllers any more, compared to other solutions
like SATA or SAS RAID (the 39320 hasn't change in at least 3 years,
IIRC). Anyway, since the Adaptec part is a decent SCSI controller and a
lousy RAID controller, have you tried just using software RAID?

Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Hardware advice

2006-12-05 Thread Alex Turner


The test that I did - which was somewhat limited, showed no benefit
splitting disks into seperate partitions for large bulk loads.

The program read from one very large file and wrote the input out to two
other large files.

The totaly throughput on a single partition was close to the maximum
theoretical for that logical drive, even though the process was reading and
writing to three seperate places on the disk.  I don't know what this means
for postgresql setups directly, but I would postulate that the benefit from
splitting pg_xlog onto a seperate spindle is not as great as it might once
have been for large bulk transactions.  I am therefore going to be going to
a single 6 drive RAID 5 for my data wharehouse application because I want
the read speed to be availalbe.  I can benefit from fast reads when I want
to do large data scans at the expense of slightly slower insert speed.

Alex.

On 12/5/06, Alexandru Coseru [EMAIL PROTECTED] wrote:


Hello..

Thanks for the advices..

Actually , i'm waiting for the clovertown  to show up on the market...

Regards
Alex

- Original Message -
From: Sven Geisler [EMAIL PROTECTED]
To: Alexandru Coseru [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Sent: Tuesday, December 05, 2006 11:57 AM
Subject: Re: [PERFORM] Hardware advice


 Hi Alexandru,

 Alexandru Coseru schrieb:
 [...]
 Question 1:
The RAID layout should be:
a)  2 hdd in raid 1 for system and pg_xlog  and 6 hdd in
 raid10 for data ?
b)  8 hdd in raid10  for all ?
c)  2 hdd in raid1 for system  , 2 hdd in raid1 for pg_xlog
,
 4 hdd in raid10 for data ?
Obs: I'm going for setup a)  , but i want to hear your thoughts as
 well.

 This depends on you data size. I think, option a and c are good.
 The potential bottleneck may the RAID 1 for pg_xlog if you have huge
 amount of updates and insert.
 What is about another setup

 4 hdd in RAID 10 for System and pg_xlog - System partitions are normally
 not in heavy use and pg_xlog should be fast for writing.
 4 hdd in RAID 10 for data.



 Question 2:  (Don't want to start a flame here. but here is goes)
What filesystem should i run for data ?  ext3 or xfs ?
The tables have ~ 15.000 rel_pages each.  The biggest table has
 now over 30.000 pages.

 We have a database running with 60,000+ tables. The tables size is
 between a few kByte for the small tables and up to 30 GB for the largest
 one. We had no issue with ext3 in the past.


 Question 3:
The block size in postgresql is 8kb.  The strip size in the
 raid ctrl is 64k.
 Should i increase the pgsql block size to 16 or 32 or even 64k
?

 You should keep in mind that the file system has also a block size. Ext3
 has as maximum 4k.
 I would set up the partitions aligned to the stripe size to prevent
 unaligned reads. I guess, you can imagine that a larger block size of
 postgresql may also end up in unaligned reads because the file system
 has a smaller block size.

 RAID Volume and File system set up
 1. Make all partitions aligned to the RAID strip size.
   The first partition should be start at 128 kByte.
   You can do this with fdisk. after you created the partition switch
   to the expert mode (type x) and modify the begin of the partition
   (type b). You should change this value to 128 (default is 63).
   All other partition should also start on a multiple of 128 kByte.

 2. Give the file system a hint that you work with larger block sizes.
   Ext3: mke2fs -b 4096 -j -R stride=2 /dev/sda1 -L LABEL
   I made a I/O test with PostgreSQL on a RAID system with stripe size
   of 64kByte and block size of 8 kByte in the RAID system.
   Stride=2 was the best value.


 PS: You should have a second XEON in your budget plan.

 Sven.




 --
 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.1.409 / Virus Database: 268.15.7/569 - Release Date:
12/5/2006




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Bad iostat numbers

2006-12-04 Thread Alex Turner


The RAID 10 was in there merely for filling in, not really as a compare,
indeed it would be ludicrous to compare a RAID 1 to a 6 drive RAID 10!!

How do I find out if it has version 2 of the driver?

This discussion I think is important, as I think it would be useful for this
list to have a list of RAID cards that _do_ work well under Linux/BSD for
people as recommended hardware for Postgresql.   So far, all I can recommend
is what I've found to be good, which is 3ware 9500 series cards with 10k
SATA drives.  Throughput was great until you reached higher levels of RAID
10 (the bonnie++ mark I posted showed write speed is a bit slow).  But that
doesn't solve the problem for SCSI.  What cards in the SCSI arena solve the
problem optimally?  Why should we settle for sub-optimal performance in SCSI
when there are a number of almost optimally performing cards in the SATA
world (Areca, 3Ware/AMCC, LSI).

Thanks,

Alex

On 12/4/06, Scott Marlowe [EMAIL PROTECTED] wrote:


On Mon, 2006-12-04 at 01:17, Alex Turner wrote:
 People recommend LSI MegaRAID controllers on here regularly, but I
 have found that they do not work that well.  I have bonnie++ numbers
 that show the controller is not performing anywhere near the disk's
 saturation level in a simple RAID 1 on RedHat Linux EL4 on two
 seperate machines provided by two different hosting companies.  In one
 case I asked them to replace the card, and the numbers got a bit
 better, but still not optimal.

 LSI MegaRAID has proved to be a bit of a disapointment.  I have seen
 better numbers from the HP SmartArray 6i, and from 3ware cards with
 7200RPM SATA drives.

 for the output: http://www.infoconinc.com/test/bonnie++.html (the
 first line is a six drive RAID 10 on a 3ware 9500S, the next three are
 all RAID 1s on LSI MegaRAID controllers, verified by lspci).

Wait, you're comparing a MegaRAID running a RAID 1 against another
controller running a 6 disk RAID10?  That's hardly fair.

My experience with the LSI was that with the 1.18 series drivers, they
were slow but stable.

With the version 2.x drivers, I found that the performance was very good
with RAID-5 and fair with RAID-1 and that layered RAID was not any
better than unlayered (i.e. layering RAID0 over RAID1 resulted in basic
RAID-1 performance).

OTOH, with the choice at my last place of employment being LSI or
Adaptec, LSI was a much better choice.  :)

I'd ask which LSI megaraid you've tested, and what driver was used.
Does RHEL4 have the megaraid 2 driver?

Re: [PERFORM] Bad iostat numbers

2006-12-04 Thread Alex Turner


http://en.wikipedia.org/wiki/RAID_controller

Alex

On 12/4/06, Michael Stone [EMAIL PROTECTED] wrote:


On Mon, Dec 04, 2006 at 12:37:29PM -0500, Alex Turner wrote:
This discussion I think is important, as I think it would be useful for
this
list to have a list of RAID cards that _do_ work well under Linux/BSD for
people as recommended hardware for Postgresql.   So far, all I can
recommend
is what I've found to be good, which is 3ware 9500 series cards with 10k
SATA drives.  Throughput was great until you reached higher levels of
RAID
10 (the bonnie++ mark I posted showed write speed is a bit slow).  But
that
doesn't solve the problem for SCSI.  What cards in the SCSI arena solve
the
problem optimally?  Why should we settle for sub-optimal performance in
SCSI
when there are a number of almost optimally performing cards in the SATA
world (Areca, 3Ware/AMCC, LSI).

Well, one factor is to be more precise about what you're looking for; a
HBA != RAID controller, and you may be comparing apples and oranges. (If
you have an external array with an onboard controller you probably want
a simple HBA rather than a RAID controller.)

Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Bad iostat numbers

2006-12-04 Thread Alex Turner


I agree, that MegaRAID is very stable, and it's very appealing from that
perspective.  And two years ago I would have never even mentioned cciss
based cards on this list, because they sucked wind big time, but I believe
some people have started seeing better number from the 6i.  20MB/sec write,
when the number should be closer to 60 thats off by a factor of 3.  For
my data wharehouse application, thats a big difference, and if I can get a
better number from 7200RPM drives and a good SATA controller, I'm gonna do
that because my data isn't OLTP, and I don't care if the whole system shits
itself and I have to restore from backup one day.

My other and most important point is that I can't find any solid
recommendations for a SCSI card that can perform optimally in Linux or
*BSD.  Off by a factor of 3x is pretty sad IMHO. (and yes, we know the
Adaptec cards suck worse, that doesn't bring us to a _good_ card).

Alex.

On 12/4/06, Greg Smith [EMAIL PROTECTED] wrote:


On Mon, 4 Dec 2006, Alex Turner wrote:

 People recommend LSI MegaRAID controllers on here regularly, but I have
 found that they do not work that well.  I have bonnie++ numbers that
 show the controller is not performing anywhere near the disk's
 saturation level in a simple RAID 1 on RedHat Linux EL4 on two seperate
 machines provided by two different hosting companies.
 http://www.infoconinc.com/test/bonnie++.html

I don't know what's going on with your www-september-06 machine, but the
other two are giving 32-40MB/s writes and 53-68MB/s reads.  For a RAID-1
volume, these aren't awful numbers, but I agree they're not great.

My results are no better.  For your comparison, here's a snippet of
bonnie++ results from one of my servers: RHEL 4, P4 3GHz, MegaRAID
firmware 1L37, write-thru cache setup, RAID 1; I think the drives are 10K
RPM Seagate Cheetahs.  This is from the end of the drive where performance
is the worst (I partitioned the important stuff at the beginning where
it's fastest and don't have enough free space to run bonnie there):

--Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
20708  50 21473   9  9603   3 34419  72 55799   7 467.1   1

21Mb/s writes, 56MB/s reads.  Not too different from yours (especially if
your results were from the beginning of the disk), and certainly nothing
special.  I might be able to tune the write performance higher if I cared;
the battery backed cache sits unused and everything is tuned for paranoia
rather than performance.  On this machine it doesn't matter.

The thing is, even though it's rarely the top performing card even when
setup perfectly, the LSI SCSI Megaraid just works.  The driver is stable,
caching behavior is well defined, it's a pleasure to administer.  I'm
never concerned that it's lying to me or doing anything to put data at
risk.  The command-line tools for Linux work perfectly, let me look at or
control whatever I want, and it was straighforward for me to make my own
customized monitoring script using them.

 LSI MegaRAID has proved to be a bit of a disapointment.  I have seen
 better numbers from the HP SmartArray 6i, and from 3ware cards with
 7200RPM SATA drives.

Whereas although I use 7200RPM SATA drives, I always try to keep an eye on
them because I never really trust them.  The performance list archives
here also have plenty of comments about people having issues with the
SmartArray controllers; search the archives for cciss and you'll see
what I'm talking about.

The Megaraid controller is very boring.  That's why I like it.  As a Linux
distribution, RedHat has similar characteristics.  If I were going for a
performance setup, I'd dump that, too, for something sexier with a newish
kernel.  It all depends on which side of the performance/stability
tradeoff you're aiming at.

On Mon, 4 Dec 2006, Scott Marlowe wrote:
 Does RHEL4 have the megaraid 2 driver?

This is from the moderately current RHEL4 installation I had results from
above.  Redhat has probably done a kernel rev since I last updated back in
September, haven't needed or wanted to reboot since then:

megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.6-rh2 (Release Date: Wed Jun 28 12:27:22 EST 2006)

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Bad iostat numbers

2006-12-03 Thread Alex Turner

People recommend LSI MegaRAID controllers on here regularly, but I have
found that they do not work that well. I have bonnie++ numbers that show
the controller is not performing anywhere near the disk's saturation level
in a simple RAID 1 on RedHat Linux EL4 on two seperate machines provided by
two different hosting companies. In one case I asked them to replace the
card, and the numbers got a bit better, but still not optimal.

LSI MegaRAID has proved to be a bit of a disapointment. I have seen better
numbers from the HP SmartArray 6i, and from 3ware cards with 7200RPM SATA
drives.

for the output: http://www.infoconinc.com/test/bonnie++.html (the first line
is a six drive RAID 10 on a 3ware 9500S, the next three are all RAID 1s on
LSI MegaRAID controllers, verified by lspci).

Alex.

On 12/4/06, Greg Smith [EMAIL PROTECTED] wrote:

On Thu, 30 Nov 2006, Carlos H. Reimer wrote:

I would like to discover how much cache is present in
the controller, how can I find this value from Linux?

As far as I know there is no cache on an Adaptec 39320. The write-back
cache Linux was reporting on was the one in the drives, which is 8MB; see

http://www.seagate.com/cda/products/discsales/enterprise/tech/1,1593,541,00.html
Be warned that running your database with the combination of an uncached
controller plus disks with write caching is dangerous to your database
integrity.

There is a common problem with the Linux driver for this card (aic7902)
where it enters what's they're calling an Infinite Interrupt Loop.
That seems to match your readings:

Here is a typical iostat -x:
Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s
sda 0.00 7.80 0.40 6.40 41.60 113.6020.8056.80
avgrq-sz avgqu-sz await svctm %util
22.82 570697.50 10.59 147.06 100.00

An avgqu-sz of 570697.50 is extremely large. That explains why the
utilization is 100%, because there's a massive number of I/O operations
queued up that aren't getting flushed out. The read and write data says
these drives are barely doing anything, as 20kB/s and 57KB/s are
practically idle; they're not even remotely close to saturated.

See http://lkml.org/lkml/2005/10/1/47 for a suggested workaround that may
reduce the magnitude of this issue; lower the card's speed to U160 in the
BIOS was also listed as a useful workaround. You might get better results
by upgrading to a newer Linux kernel, and just rebooting to clear out the
garbage might help if you haven't tried that yet.

On the pessimistic side, other people reporting issues with this
controller are:

http://lkml.org/lkml/2005/12/17/55
http://www.ussg.iu.edu/hypermail/linux/kernel/0512.2/0390.html

http://www.linuxforums.org/forum/peripherals-hardware/59306-scsi-hangs-boot.html
and even under FreeBSD at
http://lists.freebsd.org/pipermail/aic7xxx/2003-August/003973.html

This Adaptec card just barely works under Linux, which happens regularly
with their controllers, and my guess is that you've run into one of the
ways it goes crazy sometimes. I just chuckled when checking
http://linux.adaptec.com/ again and noticing they can't even be bothered
to keep that server up at all. According to

http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=ASC-39320-Rdn=Adaptec+SCSI+Card+39320-R
the driver for your card is *minimally tested* for Linux Kernel v2.6 on
all platforms. Adaptec doesn't care about Linux support on their
products; if you want a SCSI controller that actually works under Linux,
get an LSI MegaRAID.

If this were really a Postgres problem, I wouldn't expect %iowait=1.10.
Were the database engine waiting to read/write data, that number would be
dramatically higher. Whatever is generating all these I/O requests, it's
not waiting for them to complete like the database would be. Besides the
driver problems that I'm very suspicious of, I'd suspect a runaway process
writing garbage to the disks might also cause this behavior.

Ive taken a look in the /var/log/messages and found some temperature
messages about the disk drives:
Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature
changed 2
Celsius to 51 Celsius since last report
Can this temperature influence in the performance?

That's close to the upper tolerance for this drive (55 degrees), which
means the drive is being cooked and will likely wear out quickly. But
that won't slow it down, and you'd get much scarier messages out of smartd
if the drives had a real problem. You should improve cooling in this case
if you want to drives to have a healthy life, odds are low this is
relevant to your performance issue though.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[PERFORM] Performance of Perc 5i

2006-12-01 Thread Alex Turner


Does anyone have any performance experience with the Dell Perc 5i
controllers in RAID 10/RAID 5?

Thanks,

Alex

[PERFORM] Confusion and Questions about blocks read

2006-09-22 Thread Alex Turner

The query expain analyze looks like this:click-counter=# explain analyze select count(*) as count, to_char(date_trunc('day',c.datestamp),'DD-Mon') as day from impression c, url u, handle h where c.url_id=u.url_id and 
c.handle_id=h.handle_id and h.handle like '1.19%' group by date_trunc('day',c.datestamp) order by date_trunc('day',c.datestamp); QUERY PLAN 
--Sort (cost=530282.76..530283.04 rows=113 width=8) (actual time=
191887.059..191887.131 rows=114 loops=1) Sort Key: date_trunc('day'::text, c.datestamp) - HashAggregate (cost=530276.65..530278.91 rows=113 width=8) (actual time=191886.081..191886.509 rows=114 loops=1)
 - Hash Join (cost=128.41..518482.04 rows=2358921 width=8) (actual time=17353.281..190568.890 rows=625212 loops=1) Hash Cond: (outer.handle_id = inner.handle_id)
 - Merge Join (cost=0.00..444641.52 rows=5896746 width=12) (actual time=34.582..183154.561 rows=5896746 loops=1) Merge Cond: (outer.url_id = inner.url_id)
 - Index Scan using url_pkey on url u (cost=0.00..106821.10 rows=692556 width=8) (actual time=0.078..83432.380 rows=692646 loops=1) - Index Scan using impression_url_i on impression c (cost=
0.00..262546.95 rows=5896746 width=16) (actual time=34.473..86701.410 rows=5896746 loops=1) - Hash (cost=123.13..123.13 rows=2115 width=4) (actual time=40.159..40.159 rows=2706 loops=1) - Bitmap Heap Scan on handle h (cost=
24.69..123.13 rows=2115 width=4) (actual time=20.362..36.819 rows=2706 loops=1) Filter: (handle ~~ '1.19%'::text) - Bitmap Index Scan on handles_i (cost=
0.00..24.69 rows=2115 width=0) (actual time=20.264..20.264 rows=2706 loops=1) Index Cond: ((handle = '1.19'::text) AND (handle  '1.1:'::text))Total runtime: 191901.868
 ms(looks like it sped up a bit the second time I did it)When I query relpages for the tables involved:click-counter=# select relpages from pg_class where relname='impression';relpages--
 56869(1 row)click-counter=# select relpages from pg_class where relname='url';relpages-- 66027(1 row)click-counter=# select relpages from pg_class where relname='handle';
relpages-- 72(1 row)click-counter=#they only total 122968.Home come the query statistics showed that 229066 blocks where read given that all the blocks in all the tables put together only total 122968?
LOG: QUERY STATISTICSDETAIL: ! system usage stats: ! 218.630786 elapsed 24.16 user 13.93 system sec ! [261.00 user 85.61 sys total] ! 0/0 [0/0] filesystem blocks in/out
 ! 65/47 [20176/99752] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! buffer usage stats:
 ! Shared blocks: 229066 read, 2 written, buffer hit rate = 55.61% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written
Alex.

Re: [PERFORM] Confusion and Questions about blocks read

2006-09-22 Thread Alex Turner

ahh good pointThanksOn 9/22/06, Tom Lane [EMAIL PROTECTED] wrote:
Alex Turner [EMAIL PROTECTED] writes: Home come the query statistics showed that 229066 blocks where read given that all the blocks in all the tables put together only total 122968?
You forgot to count the indexes.Also, the use of indexscans in themergejoins probably causes multiple re-reads of some table blocks,depending on just what the physical ordering of the rows is.regards, tom lane

Re: [PERFORM] Confusion and Questions about blocks read

2006-09-22 Thread Alex Turner

Ok - so I have another mystery:I insert virtually the same rows into two different tables:trend=# insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;INSERT 0 59913
trend=# insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;INSERT 0 59913trend=#but the stats show one as having written 20x as many blocks:LOG: statement: insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;
LOG: QUERY STATISTICSDETAIL: ! system usage stats: ! 2.098067 elapsed 0.807877 user 1.098833 system sec ! [23.875370 user 27.789775 sys total] ! 0/0 [0/0] filesystem blocks in/out
 ! 0/1 [5/62269] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 72/6 [18464/1126] voluntary/involuntary context switches ! buffer usage stats:
 ! Shared blocks: 79106 read, 420 written, buffer hit rate = 79.39% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written
LOG: statement: insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;LOG: QUERY STATISTICSDETAIL: ! system usage stats:! 16.963729 elapsed 3.533463 user 
1.706740 system sec! [27.408833 user 29.497515 sys total]! 0/0 [0/0] filesystem blocks in/out! 0/1186 [5/63455] page faults/reclaims, 0 [0] swaps! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 59/139 [18525/1265] voluntary/involuntary context switches! buffer usage stats:! Shared blocks: 100744 read, 7352 written, buffer hit rate = 89.71%! Local blocks: 0 read, 0 written, buffer hit rate = 
0.00%! Direct blocks: 0 read, 0 writtenI understand the read blocks difference, the second had to check indexes matching the foreign keys.The table definitions are given below:
trend=# \d fish Table public.fish Column | Type | Modifiers+-+---result_id | bigint |result_entry_order | bigint |
property_id | integer |Indexes: fish_pkey UNIQUE, btree (result_id, result_entry_order)trend=# \d result_Entry Table public.result_entry
 Column | Type | Modifiers +-+---result_id | bigint |result_entry_order | bigint |property_id | integer |
Indexes: fish_pkey UNIQUE, btree (result_id, result_entry_order)The explain analyzes are kind of interesting:trend=# explain analyze insert into fish select 2, nextval('result_entry_order_seq'), property_id from property;
 QUERY PLAN -Seq Scan on property (cost=
0.00..79295.70 rows=59913 width=8) (actual time=0.275..1478.681 rows=59913 loops=1)Total runtime: 2178.600 ms(2 rows)trend=# explain analyze insert into result_entry select 0, nextval('result_entry_order_seq'), property_id from property;
 QUERY PLAN -Seq Scan on property (cost=
0.00..79295.70 rows=59913 width=8) (actual time=0.118..1473.352 rows=59913 loops=1)Trigger for constraint result_entry_result_fk: time=2037.351 calls=59913Trigger for constraint result_entry_property_fk: time=8622.260
 calls=59913Total runtime: 12959.716 ms(4 rows)I don't understand the time for the FK check given the size of the tables they are checking against (and I understand it's the indexes, not the tables that the actualy check is made):
trend=# select count(*) from result_cache;count--- 8(1 row)trend=#trend=# select count(*) from property;count---59913(1 row)trend=#
The database was just re-indexed, and no changes beyond this insert were made in that time and result_entry has recently been vacuumed.Any insight would be greatly appreciatedAlex
On 9/22/06, Alex Turner [EMAIL PROTECTED] wrote:
ahh good pointThanksOn 9/22/06, Tom Lane 
[EMAIL PROTECTED] wrote:
Alex Turner [EMAIL PROTECTED] writes: Home come the query statistics showed that 229066 blocks where read given
 that all the blocks in all the tables put together only total 122968?
You forgot to count the indexes.Also, the use of indexscans in themergejoins probably causes multiple re-reads of some table blocks,depending on just what the physical ordering of the rows is.regards, tom lane

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner

Do the basic math:If you have a table with 100million records, each of which is 200bytes long, that gives you roughtly 20 gig of data (assuming it was all written neatly and hasn't been updated much). If you have to do a full table scan, then it will take roughly 400 seconds with a single 10k RPM SCSI drive with an average read speed of 50MB/sec. If you are going to read indexes, figure out how big your index is, and how many blocks will be returned, and figure out how many blocks this will require transferring from the main table, make an estimate of the seeks, add in the transfer total, and you have a time to get your data. A big array with a good controller can pass 1000MB/sec transfer on the right bus if you buy the write technologies. But be warned, if you buy the wrong ones, your big array can end up being slower than a single drive for sequential transfer. At 1000MB/sec your scan would take 20 seconds.
Be warned, the tech specs page:http://www.sun.com/servers/x64/x4500/specs.xml#anchor3doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and write operations. If you are doing mostly Read, then RAID 5 is passable, but it's redundancy with large numbers of drives is not so great.
Alex.On 9/18/06, Bucky Jordan [EMAIL PROTECTED] wrote:
good normalization skills are really important for large databases, along with materialization strategies for 'denormalized sets'.Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I'veonly found a smattering of references on google. My guess is, you rollyour own for optimal performance... regarding the number of rows, there is no limit to how much pg can
handle per se, just some practical limitations, especially vacuum and reindex times.these are important because they are required to keep a handle on mvcc bloat and its very nice to be able to vaccum bits of
your database at a time.I was hoping for some actual numbers on practical. Hardware isn't toomuch of an issue (within reason- we're not talking an amazon or googlehere... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'djust rather not have to re-design the database. Say the requirement isto keep 12 months of data accessible, each scan produces 100M records,
and I run one per month. What happens if the customer wants to run itonce a week? I was more trying to figure out at what point (ballpark)I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, butnot so easy to redesign/add history tables... just another fyi, if you have a really big database, you can forget about doing pg_dump for backups (unless you really don't care about
being x day or days behind)...you simply have to due some type of replication/failover strategy.i would start with pitr. merlinI was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.Thanks for the pointers though...- Bucky---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread Alex Turner

Sweet - thats good - RAID 10 support seems like an odd thing to leave out.AlexOn 9/18/06, Luke Lonergan 
[EMAIL PROTECTED] wrote:Alex,On 9/18/06 4:14 PM, Alex Turner 
[EMAIL PROTECTED] wrote: Be warned, the tech specs page: http://www.sun.com/servers/x64/x4500/specs.xml#anchor3
 doesn't mention RAID 10 as a possible, and this is probably what most would recommend for fast data access if you are doing both read and write operations.If you are doing mostly Read, then RAID 5 is passable, but it's
 redundancy with large numbers of drives is not so great.RAID10 works great on the X4500  we get 1.6GB/s + per X4500 using RAID10 inZFS.We worked with the Sun Solaris kernel team to make that happen and the
patches are part of Solaris 10 Update 3 due out in November.- Luke

Re: [PERFORM] How to get higher tps

2006-08-22 Thread Alex Turner

First things first, run a bonnie++ benchmark, and post the numbers. That will give a good indication of raw IO performance, and is often the first inidication of problems separate from the DB. We have seen pretty bad performance from SANs in the past. How many FC lines do you have running to your server, remember each line is limited to about 200MB/sec, to get good throughput, you will need multiple connections.
When you run pgbench, run a iostat also and see what the numbers say.Alex.On 8/22/06, Mark Lewis
[EMAIL PROTECTED] wrote:Well, at least on my test machines running gnome-terminal, my pgbench
runs tend to get throttled by gnome-terminal's lousy performance to nomore than 300 tps or so.Running with 2/dev/null to throw away all thedetailed logging gives me 2-3x improvement in scores.Caveat: in my
case the db is on the local machine, so who knows what all theinteractions are.Also, when you initialized the pgbench db what scaling factor did youuse?And does running pgbench with -v improve performance at all?
-- MarkOn Tue, 2006-08-22 at 09:19 -0400, Marty Jia wrote: Joshua, Here is shared_buffers = 8 fsync = on max_fsm_pages = 35 max_connections = 1000
work_mem = 65536 effective_cache_size = 61 random_page_cost = 3 Here is pgbench I used: pgbench -c 10 -t 1 -d HQDB Thanks Marty
-Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED]] Sent: Monday, August 21, 2006 6:09 PM To: Marty Jia Cc:
pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to get higher tps Marty Jia wrote: I'm exhausted to try all performance tuning ideas, like following
parameters shared_buffers fsync max_fsm_pages max_connections shared_buffers work_mem max_fsm_pages
effective_cache_size random_page_cost I believe all above have right size and values, but I just can not get higher tps more than 300 testd by pgbench
What values did you use? Here is our hardwareDual Intel Xeon 2.8GHz 6GB RAM Linux 2.4 kernel RedHat Enterprise Linux AS 3
200GB for PGDATA on 3Par, ext3 50GB for WAL on 3Par, ext3 With PostgreSql 8.1.4 We don't have i/o bottle neck. Are you sure? What does iostat say during a pgbench? What parameters are
you passing to pgbench? Well in theory, upgrading to 2.6 kernel will help as well as making your WAL ext2 instead of ext3. Whatelse I can try to better tps? Someone told me I can should get tps
over 1500, it is hard to believe. 1500? Hmmm... I don't know about that, I can get 470tps or so on my measily dual core 3800 with 2gig of ram though. Joshua D. Drake
Thanks Marty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to
[EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Re: [PERFORM] How to get higher tps

2006-08-22 Thread Alex Turner

Oh - and it's usefull to know if you are CPU bound, or IO bound. Check top or vmstat to get an idea of thatAlexOn 8/22/06, Alex Turner
[EMAIL PROTECTED] wrote:First things first, run a bonnie++ benchmark, and post the numbers. That will give a good indication of raw IO performance, and is often the first inidication of problems separate from the DB. We have seen pretty bad performance from SANs in the past. How many FC lines do you have running to your server, remember each line is limited to about 200MB/sec, to get good throughput, you will need multiple connections.
When you run pgbench, run a iostat also and see what the numbers say.Alex.On 8/22/06,
Mark Lewis
[EMAIL PROTECTED] wrote:Well, at least on my test machines running gnome-terminal, my pgbench
runs tend to get throttled by gnome-terminal's lousy performance to nomore than 300 tps or so.Running with 2/dev/null to throw away all thedetailed logging gives me 2-3x improvement in scores.Caveat: in my
case the db is on the local machine, so who knows what all theinteractions are.Also, when you initialized the pgbench db what scaling factor did youuse?And does running pgbench with -v improve performance at all?
-- MarkOn Tue, 2006-08-22 at 09:19 -0400, Marty Jia wrote: Joshua, Here is shared_buffers = 8 fsync = on max_fsm_pages = 35 max_connections = 1000
work_mem = 65536 effective_cache_size = 61 random_page_cost = 3 Here is pgbench I used: pgbench -c 10 -t 1 -d HQDB Thanks Marty
-Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 21, 2006 6:09 PM To: Marty Jia Cc:
pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to get higher tps
Marty Jia wrote: I'm exhausted to try all performance tuning ideas, like following
parameters shared_buffers fsync max_fsm_pages max_connections shared_buffers work_mem max_fsm_pages

effective_cache_size random_page_cost I believe all above have right size and values, but I just can not get higher tps more than 300 testd by pgbench

What values did you use? Here is our hardwareDual Intel Xeon 2.8GHz 6GB RAM Linux 2.4 kernel RedHat Enterprise Linux AS 3
200GB for PGDATA on 3Par, ext3 50GB for WAL on 3Par, ext3 With PostgreSql 8.1.4 We don't have i/o bottle neck. Are you sure? What does iostat say during a pgbench? What parameters are
you passing to pgbench? Well in theory, upgrading to 2.6 kernel will help as well as making your WAL ext2 instead of ext3. Whatelse I can try to better tps? Someone told me I can should get tps
over 1500, it is hard to believe. 1500? Hmmm... I don't know about that, I can get 470tps or so on my measily dual core 3800 with 2gig of ram though. Joshua D. Drake
Thanks Marty ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
---(end of broadcast)---TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to

[EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Re: [PERFORM] Hardware upgraded but performance still ain't good enough

2006-08-08 Thread Alex Turner

First off - very few third party tools support debian. Debian is a sure fire way to have an unsupported system. Use RedHat or SuSe (flame me all you want, it doesn't make it less true).Second, run bonnie++ benchmark against your disk array(s) to see what performance you are getting, and make sure it's reasonable.
Single drives for stuff is not a great way to go for either speed or reliability, highly not recommended for a production system. Use SAS or SATA for the best speed for your $$s, don't buy SAN, they are overpriced and often don't perform. RAM could be more to be honest too.
Diagnosing the bottleneck can be done with combinations of top, iostat and vmstat. If you have high iowait numbers then your system is waiting on the disks. If you have high system CPU usage, then disks are also worth a look, but not as bad as high iowait. If you have high user CPU with little iowait and little system CPU, and very little io activity in iostat, then you are CPU bound. If you are IO bound, you need to figure if it's reads or writes. If it's reads, then more RAM will help. if it's writes, then you need more spindles and more controller cache with RAID (please think carefully before using RAID 5 in a write intensive environment, it's not ideal).
The other thing is you will probably want to turn on stats in postgres to figure out which queries are the bad ones (does anyone have good docs posted for this?). Once you have identified the bad queries, you can explain analyze them, and figure out why they suck.
Alex.On 8/7/06, Alvaro Nunes Melo [EMAIL PROTECTED] wrote:
Hi,First of all I must tell that my reality in a southern brazilian city isway different than what we read in the list. I was lookig for ways tofind the HW bottleneck and saw a configuration like:we recently upgraded our dual Xeon Dell to a brand new Sun v40z with 4
opterons, 16GB of memory and MegaRAID with enough disks. OS is DebianSarge amd64, PostgreSQL is 8.0.3. on(http://archives.postgresql.org/pgsql-performance/2005-07/msg00431.php
)Our old server was a very modest Dell Xeon 2.8 (512 Kb Cache), with 1 GBRAM and one SCSI disc. This server runs PostgreSQL (8.1.4), Apache (PHP)and other minor services. We managed to get a test machine, a HP Xeon
3.2 (2 MB cache), also with 1 GB RAM but 4 SCSI discs (in one siglearray controller). They're organized in the following way:disk 0: Linux Rootdisk 1: Database Clusterdisk 2: pg_xlogdisk 3: a dir the suffers constant read/write operations
The database size stands around 10 GB. The new server has a betterperformance than the old one, but sometimes it still stucks. We tried touse a HP proprietary tool to monitor the server, and find out what is
the bottleneck, but it's been difficult to install it on Debian. Thetool is only certified for SuSe and RedHat. So we tried to use someLinux tools to see what's going on, like vmstat and iostat. Are thistools (vm and iostat) enough? Should we use something else? Is there any
specifical material about finding bottlenecks in Linux/PostgreSQLmachines? Is our disks design proper?I really apologize for my lack of knowledge in this area, and for theexcessive number of questions in a single e-mail.
Best regards,Alvaro---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not
match

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-08 Thread Alex Turner

These number are pretty darn good for a four disk RAID 10, pretty close to perfect infact. Nice advert for the 642 - I guess we have a Hardware RAID controller than will read indpendently from mirrors.Alex
On 8/8/06, Steve Poe [EMAIL PROTECTED] wrote:
Luke,Here are the results of two runs of 16GB file tests on XFS.scsi disc arrayxfs ,16G,81024,99,153016,24,73422,10,82092,97,243210,17,1043.1,0,16,3172,7,+,+++,2957,9,3197,10,+,+++,2484,8scsi disc array
xfs ,16G,83320,99,155641,25,73662,10,81756,96,243352,18,1029.1,0,16,3119,10,+,+++,2789,7,3263,11,+,+++,2014,6Thanks.Steve Can you run bonnie++ version 1.03a on the machine and report the results
 here? It could be OK if you have the latest Linux driver for cciss, someone has reported good results to this list with the latest, bleeding edge version of Linux (2.6.17).
 - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire tochoose an index scan if your joining column's datatypes do not
match---(end of broadcast)---TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq

Re: [PERFORM] Performance with 2 AMD/Opteron 2.6Ghz and 8gig

2006-08-07 Thread Alex Turner

Although I for one have yet to see a controller that actualy does this (I believe software RAID on linux doesn't either).Alex.On 8/7/06, Markus Schaber
 [EMAIL PROTECTED] wrote:Hi, Charles,
Charles Sprickman wrote: I've also got a 1U with a 9500SX-4 and 4 drives.I like how the 3Ware card scales there - started with 2 drives and got drive speed mirroring. Added two more and most of the bonnie numbers doubled.This
 is not what I'm used to with the Adaptec SCSI junk.Well, for sequential reading, you should be able to get double drivespeed on a 2-disk mirror with a good controller, as it can balance thereads among the drives.
Markus--Markus Schaber | Logical TrackingTracing International AGDipl. Inf. | Software Development GISFight against software patents in EU! www.ffii.org
 www.nosoftwarepatents.org---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not match

Re: [PERFORM] RAID stripe size question

2006-07-18 Thread Alex Turner

This is a great testament to the fact that very often software RAID will seriously outperform hardware RAID because the OS guys who implemented it took the time to do it right, as compared with some controller manufacturers who seem to think it's okay to provided sub-standard performance.
Based on the bonnie++ numbers comming back from your array, I would also encourage you to evaluate software RAID, as you might see significantly better performance as a result. RAID 10 is also a good candidate as it's not so heavy on the cache and CPU as RAID 5.
Alex.On 7/18/06, Luke Lonergan [EMAIL PROTECTED] wrote:





Mikael,

On 7/18/06 6:34 AM, Mikael Carneholm 
[EMAIL PROTECTED] wrote:

 However, what's more important is the seeks/s - ~530/s on a 28 disk
 array is quite lousy compared to the 1400/s on a 12 x 15Kdisk array

I'm getting 2500 seeks/second on a 36 disk SATA software RAID (ZFS, Solaris 10) on a Sun X4500:

=== Single Stream 

With a very recent update to the zfs module that improves I/O scheduling and prefetching, I get the following bonnie++ 1.03a results with a 36 drive RAID10, Solaris 10 U2 on an X4500 with 500GB Hitachi drives (zfs checksumming is off):


Version 1.03 --Sequential Output-- --Sequential Input- --Random-

-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
thumperdw-i-1 32G 120453 99 467814 98 290391 58 109371 99 993344 94 1801 4
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 + +++ + +++ + +++ 30850 99 + +++ + +++

=== Two Streams 

Bumping up the number of concurrent processes to 2, we get about 1.5x speed reads of RAID10 with a concurrent workload (you have to add the rates together): 

Version 1.03 --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
thumperdw-i-1 32G 111441 95 212536 54 171798 51 106184 98 719472 88 1233 2
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 26085 90 + +++ 5700 98 21448 97 + +++ 4381 97

Version 1.03 --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
thumperdw-i-1 32G 116355 99 212509 54 171647 50 106112 98 715030 87 1274 3
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 26082 99 + +++ 5588 98 21399 88 + +++ 4272 97

So that's 2500 seeks per second, 1440MB/s sequential block read, 212MB/s per character sequential read.
===

- Luke

Re: [PERFORM] RAID stripe size question

2006-07-17 Thread Alex Turner

With 18 disks dedicated to data, you could make 100/7*9 seeks/second (7ms av seeks time, 9 independant units) which is 128seeks/second writing on average 64kb of data, which is 4.1MB/sec throughput worst case, probably 10x best case so 40Mb/sec - you might want to take more disks for your data and less for your WAL.
Someone check my math here...And as always - run benchmarks with your app to verifyAlex.On 7/16/06, Mikael Carneholm
[EMAIL PROTECTED] wrote:

I have finally gotten my hands on the MSA1500 that we ordered some time ago. It has 28 x 10K 146Gb drives, currently grouped as 10 (for wal) + 18 (for data). There's only one controller (an emulex), but I hope performance won't suffer too much from that. Raid level is 0+1, filesystem is ext3.

Now to the interesting part: would it make sense to use different stripe sizes on the separate disk arrays? In theory, a smaller stripe size (8-32K) should increase sequential write throughput at the cost of decreased positioning performance, which sounds good for WAL (assuming WAL is never searched during normal operation). And for disks holding the data, a larger stripe size (32K) should provide for more concurrent (small) reads/writes at the cost of decreased raw throughput. This is with an OLTP type application in mind, so I'd rather have high transaction throughput than high sequential read speed. The interface is a 2Gb FC so I'm throttled to (theoretically) 192Mb/s, anyway.

So, does this make sense? Has anyone tried it and seen any performance gains from it?

Regards,

Mikael.

Re: [PERFORM] RAID stripe size question

2006-07-17 Thread Alex Turner

On 7/17/06, Mikael Carneholm [EMAIL PROTECTED] wrote:
 This is something I'd also would like to test, as a common best-practice these days is to go for a SAME (stripe all, mirroreverything) setup. From a development perspective it's easier to use SAME as the
 developers won't have to think about physical location for new tables/indices, so if there's no performance penalty with SAME I'll gladly keep it that way.Usually, it's not the developers task to care about that, but the DBAs
responsibility.As we don't have a full-time dedicated DBA (although I'm the one who domost DBA related tasks) I would aim for making physical location astransparent as possible, otherwise I'm afraid I won't be doing anything
else than supporting developers with that - and I *do* have other thingsto do as well :) In a previous test, using cd=5000 and cs=20 increased transaction throughput by ~20% so I'll definitely fiddle with that in the coming
 tests as well.How many parallel transactions do you have?That was when running BenchmarkSQL(http://sourceforge.net/projects/benchmarksql
) with 100 concurrent users(terminals), which I assume means 100 parallel transactions at most.The target application for this DB has 3-4 times as many concurrentconnections so it's possible that one would have to find other cs/cd
numbers better suited for that scenario. Tweaking bgwriter is anothertask I'll look into as well..Btw, here's the bonnie++ results from two different array sets (10+18,4+24) on the MSA1500:LUN: WAL, 10 disks, stripe size 32K
Version1.03 --Sequential Output-- --Sequential Input-
--Random--Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CPsesell0132G 5613993 7325022 16530 3 3048845 57489 5
477.3 1--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --ReadDelete--
files/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP
/sec %CP 16245890 + +++ + +++312199 + +++
1046998LUN: WAL, 4 disks, stripe size 8K--Version1.03 --Sequential Output-- --Sequential Input-
--Random--Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CPsesell0132G 4917082 6010819 13325 2 1577824 21489 2
266.4 0--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --ReadDelete--
files/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP
/sec %CP 16243286 + +++ + +++310699 + +++
1024898LUN: DATA, 18 disks, stripe size 32K-Version1.03
 --Sequential Output-- --Sequential Input---Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CPsesell0132G 5999097 8734128 19158 4 3020046 57556 6
495.4 1--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --ReadDelete--
files/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP
/sec %CP 16164092 + +++ + +++173699 + +++
1091999LUN: DATA, 24 disks, stripe size 64K-Version1.03
 --Sequential Output-- --Sequential Input---Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CPsesell0132G 5944397 11851539 25023 5 3092649 60835 6
531.8 1--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --ReadDelete--
files/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP
/sec %CP 16249990 + +++ + +++281799 + +++
10971 100These bonnie++ number are very worrying. Your controller should easily max out your FC interface on these tests passing 192MB/sec with ease on anything more than an 6 drive RAID 10 . This is a bad omen if you want high performance... Each mirror pair can do 60-80MB/sec. A 24Disk RAID 10 can do 12*60MB/sec which is 740MB/sec - I have seen this performance, it's not unreachable, but time and again, we see these bad perf numbers from FC and SCSI systems alike. Consider a different controller, because this one is not up to snuff. A single drive would get better numbers than your 4 disk RAID 10, 21MB/sec read speed is really pretty sorry, it should be closer to 120Mb/sec. If you can't swap out, software RAID may turn out to be your friend. The only saving grace is that this is OLTP, and perhaps, just maybe, the controller will be better at ordering IOs, but I highly doubt it.
Please people, do the numbers, benchmark before you buy, many many HBAs really suck under Linux/Free BSD, and you may end up paying vast sums of money for very sub-optimal performance (I'd say sub-standard, but alas, it seems that this kind of poor performance is tolerated, even though it's way off where it should be). There's no point having a 40disk cab, if your controller can't handle

Re: [PERFORM] RAID stripe size question

2006-07-17 Thread Alex Turner

On 7/17/06, Ron Peacetree [EMAIL PROTECTED] wrote:
-Original Message-From: Mikael Carneholm [EMAIL PROTECTED]Sent: Jul 17, 2006 5:16 PMTo: RonPeacetree
[EMAIL PROTECTED], pgsql-performance@postgresql.orgSubject: RE: [PERFORM] RAID stripe size question15Krpm HDs will have average access times of 5-6ms.10Krpm ones of 7-8ms.
Average seek time for that disk is listed as 4.9ms, maybe sounds a bit optimistic?Ah, the games vendors play.average seek time for a 10Krpm HD may very well be 4.9ms.However, what matters to you the user is average =access= time.The 1st is how long it takes to position the heads to the correct track.The 2nd is how long it takes to actually find and get data from a specified HD sector.
28HDs as above setup as 2 RAID 10's = ~75MBps*5= ~375MB/s,~75*9= ~675MB/s.I guess it's still limited by the 2Gbit FC (192Mb/s), right?No.A decent HBA has multiple IO channels on it.So for instance Areca's ARC-6080 (8/12/16-port 4Gbps Fibre-to-SATA ll Controller) has 2 4Gbps FCs in it (...and can support up to 4GB of BB cache!).Nominally, this card can push 8Gbps= 800MBps.~600-700MBps is the RW number.
Assuming ~75MBps ASTR per HD, that's ~ enough bandwidth for a 16 HD RAID 10 set per ARC-6080.Very, very few RAID controllers can do = 1GBps One thing that help greatly withbursty IO patterns is to up your battery backed RAID cache as high as you possibly
can.Even multiple GBs of BBC can be worth it.Another reason to have multiple controllers ;-)I use 90% of the raid cache for writes, don't think I could go higher than that.Too bad the emulex only has 256Mb though :/
If your RAID cache hit rates are in the 90+% range, you probably would find it profitable to make it greater.I've definitely seen access patterns that benefitted from increased RAID cache for any size I could actually install.For those access patterns, no amount of RAID cache commercially available was enough to find the flattening point of the cache percentage curve.256MB of BB RAID cache per HBA is just not that much for many IO patterns.
90% as in 90% of the RAM, not 90% hit rate I'm imagining.
The controller is a FC2143 (http://h71016.www7.hp.com/dstore/MiddleFrame.asp?page=configProductLineId=450FamilyId=1449BaseId=17621oi=E9CEDBEID=19701SBLID=
), which uses PCI-E. Don't know how it compares to other controllers, haven't had the time to search for / read any reviews yet.This is a relatively low end HBA with 1 4Gb FC on it.Max sustained IO on it is going to be ~320MBps.Or ~ enough for an 8 HD RAID 10 set made of 75MBps ASTR HD's.
28 such HDs are =definitely= IO choked on this HBA.Not they aren't. This is OLTP, not data warehousing. I already posted math for OLTP throughput, which is in the order of 8-80MB/second actual data throughput based on maximum theoretical seeks/second.
The arithmatic suggests you need a better HBA or more HBAs or both.
WAL's are basically appends that are written in bursts of your chosen log chunk size and that are almost never read afterwards.Big DB pages and big RAID stripes makes sense for WALs.unless of course you are running OLTP, in which case a big stripe isn't necessary, spend the disks on your data parition, because your WAL activity is going to be small compared with your random IO.
According to
http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html, it seems to be the other way around? (As stripe size is decreased, files are broken into smaller and smaller pieces. This increases the number of drives that an average file will use to hold all the blocks containing the data of that file, theoretically increasing transfer performance, but decreasing positioning performance.)
I guess I'll have to find out which theory that holds by good ol� trial and error... :)IME, stripe sizes of 64, 128, or 256 are the most common found to be optimal for most access patterns + SW + FS + OS + HW.
New records will be posted at the end of a file, and will only increase the file by the number of blocks in the transactions posted at write time. Updated records are modified in place unless they have grown too big to be in place. If you are updated mutiple tables on each transaction, a 64kb stripe size or lower is probably going to be best as block sizes are just 8kb. How much data does your average transaction write? How many xacts per second, this will help determine how many writes your cache will queue up before it flushes, and therefore what the optimal stripe size will be. Of course, the fastest and most accurate way is probably just to try different settings and see how it works. Alas some controllers seem to handle some stripe sizes more effeciently in defiance of any logic.
Work out how big your xacts are, how many xacts/second you can post, and you will figure out how fast WAL will be writting. Allocate enough disk for peak load plus planned expansion on WAL and then put the rest to tablespace. You may well find that a single RAID 1 is enough for WAL (if you acheive theoretical performance levels,

[PERFORM] Quick question about top...

2006-06-21 Thread Alex Turner

I have a really stupid question about top, what exactly is iowait CPU time?Alex

Re: [PERFORM] SAN performance mystery

2006-06-15 Thread Alex Turner

Given the fact that most SATA drives have only an 8MB cache, and your RAID controller should have at least 64MB, I would argue that the system with the RAID controller should always be faster. If it's not, you're getting short-changed somewhere, which is typical on linux, because the drivers just aren't there for a great many controllers that are out there.
Alex.On 6/15/06, Mark Lewis [EMAIL PROTECTED] wrote:
On Thu, 2006-06-15 at 18:24 -0400, Tom Lane wrote: I agree with Brian's suspicion that the SATA drive isn't properly fsync'ing to disk, resulting in bogusly high throughput.However, ISTM a well-configured SAN ought to be able to match even the bogus
 throughput, because it should be able to rely on battery-backed cache to hold written blocks across a power failure, and hence should be able to report write-complete as soon as it's got the page in cache
 rather than having to wait till it's really down on magnetic platter. Which is what the SATA drive is doing ... only it can't keep the promise it's making for lack of any battery backup on its on-board cache.
It really depends on your SAN RAID controller.We have an HP SAN; Idon't remember the model number exactly, but we ran some tests and withthe battery-backed write cache enabled, we got some improvement in write
performance but it wasn't NEARLY as fast as an SATA drive which liedabout write completion.The write-and-fsync latency was only about 2-3 times better than with nowrite cache at all.So I wouldn't assume that just because you've got a
write cache on your SAN, that you're getting the same speed asfsync=off, at least for some cheap controllers.-- Mark Lewis---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Re: [PERFORM] 64-bit vs 32-bit performance ... backwards?

2006-06-12 Thread Alex Turner

Anyone who has tried x86-64 linux knows what a royal pain in the ass it is. They didn't do anything sensible, like just make the whole OS 64 bit, no, they had to split it up, and put 64-bit libs in a new directory /lib64. This means that a great many applications don't know to check in there for libs, and don't compile pleasantly, php is one among them. I forget what others, it's been awhile now. Of course if you actualy want to use more than 4gig RAM in a pleasant way, it's pretty much essential.
Alex.On 6/12/06, Steve Atkins [EMAIL PROTECTED] wrote:
On Jun 12, 2006, at 6:15 PM, Joshua D. Drake wrote: Empirically... postgresql built for 64 bits is marginally slower than that built for a 32 bit api on sparc. None of my customers have found 64 bit x86
systems to be suitable for production use, yet, so I've not tested on any of those architectures. Really? All of our customers are migrating to Opteron and I have many that have been using Opteron for over 12 months happily.
An Opteron is 64 bit capable; that doesn't mean you have to run 64 bitcode on it.Mine're mostly reasonably conservative users, with hundreds of machinesto support. Using 64 bit capable hardware, such as Opterons, is one
thing,but using an entirely different linux installation and userspacecode, say, isa much bigger change in support terms. In the extreme case it makes nosense to double your OS support overheads to get a single digit
percentageperformance improvement on one database system.That's not to say that linux/x86-64 isn't production ready for someusers, justthat it's not necessarily a good operational decision for my
customers. Givenmy internal workloads aren't really stressing the hardware they're onI don'thave much incentive to benchmark x86-64 yet - by the time the numbersmight be useful to me we'll be on a different postgresql, likely a
differentgcc/icc and so on.Cheers, Steve---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] 3WARE Card performance boost?

2006-01-19 Thread Alex Turner

He's talking about RAID 1 here, not a gargantuan RAID 6.  Onboard RAM
on the controller card is going to make very little difference.  All
it will do is allow the card to re-order writes to a point (not all
cards even do this).

Alex.

On 1/18/06, William Yu [EMAIL PROTECTED] wrote:
 Benjamin Arai wrote:
  Obviously, I have done this to improve write performance for the update
  each week.  My question is if I install a 3ware or similar card to
  replace my current software RAID 1 configuration, am I going to see a
  very large improvement?  If so, what would be a ball park figure?

 The key is getting a card with the ability to upgrade the onboard ram.

 Our previous setup was a LSI MegaRAID 320-1 (128MB), 4xRAID10,
 fsync=off. Replaced it with a ARC-1170 (1GB) w/ 24x7200RPM SATA2 drives
 (split into 3 8-drive RAID6 arrays) and performance for us is through
 the ceiling.

 For OLTP type updates, we've gotten about +80% increase. For massive
 1-statement updates, performance increase is in the +triple digits.

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] 3WARE Card performance boost?

2006-01-18 Thread Alex Turner

http://www.3ware.com/products/serial_ata2-9000.asp

Check their data sheet - the cards are BBU ready - all you have to do
is order a BBU
which you can from here:
http://www.newegg.com/Product/Product.asp?Item=N82E16815999601


Alex.

On 1/18/06, Joshua D. Drake [EMAIL PROTECTED] wrote:

  Obviously, I have done this to improve write performance for the update
  each week.  My question is if I install a 3ware or similar card to
  replace my current software RAID 1 configuration, am I going to see a
  very large improvement?  If so, what would be a ball park figure?

 Well that entirely depends on what level...

 1. I would suggest LSI 150-6 not 3ware

Why?

 Because 3ware does not make a midrange card that has a battery backed
 cache :). That is the only reason. 3ware makes good stuff.

 So anyway... LSI150-6 with Battery Backed cache option. Put 6 drives
 on it with a RAID 10 array, turn on write cache and you should have
 a hauling drive.

 Joshua D. Drake



 
  *Benjamin Arai*
  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
  http://www.benjaminarai.com http://www.benjaminarai.com/
 


 --
 The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] 3WARE Card performance boost?

2006-01-18 Thread Alex Turner

A 3ware card will re-order your writes to put them more in disk order,
which will probably improve performance a bit, but just going from a
software RAID 1 to a hardware RAID 1, I would not imagine that you
will see much of a performance boost.  Really to get better
performance you will need to add more drives, or faster drives.  If
you are currently running 7200 RPM consumer drives, going to a
1RPM WD Raptor drive will probably increase performance by about
30%, again not all that much.

Alex

On 1/18/06, Benjamin Arai [EMAIL PROTECTED] wrote:

 Hi,

 I am currently doing large weekly updates with fsync=off.  My updates
 involves SELECT, UPDATE, DELETE and etc.  Setting fsync=off works for me
 since I take a complete backup before the weekly update and run a sync and
 CHECKPOINT after each weekly update has completed to ensure the data is
 all written to disk.

 Obviously, I have done this to improve write performance for the update each
 week.  My question is if I install a 3ware or similar card to replace my
 current software RAID 1 configuration, am I going to see a very large
 improvement?  If so, what would be a ball park figure?


 Benjamin Arai
 [EMAIL PROTECTED]
 http://www.benjaminarai.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-26 Thread Alex Turner

It's irrelavent what controller, you still have to actualy write the
parity blocks, which slows down your write speed because you have to
write n+n/2 blocks. instead of just n blocks making the system write
50% more data.

RAID 5 must write 50% more data to disk therefore it will always be slower.

Alex.

On 12/25/05, Michael Stone [EMAIL PROTECTED] wrote:
 On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
 Caches help, and the bigger the cache the better, but once you are
 doing enough writes fast enough (and that doesn't take much even with
 a few GBs of cache) the recalculate-checksums-and-write-new-ones
 overhead will decrease the write speed of real data.  Bear in mind
 that the HD's _raw_ write speed hasn't been decreased.  Those HD's
 are pounding away as fast as they can for you.  Your _effective_ or
 _data level_ write speed is what decreases due to overhead.

 You're overgeneralizing. Assuming a large cache and a sequential write,
 there's need be no penalty for raid 5. (For random writes you may
 need to read unrelated blocks in order to calculate parity, but for
 large sequential writes the parity blocks should all be read from
 cache.) A modern cpu can calculate parity for raid 5 on the order of
 gigabytes per second, and even crummy embedded processors can do
 hundreds of megabytes per second. You may have run into some lousy
 implementations, but you should be much more specific about what
 hardware you're talking about instead of making sweeping
 generalizations.

 Side Note: people often forget the other big reason to use RAID 10
 over RAID 5.  RAID 5 is always only 2 HD failures from data
 loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
 unless you get unlucky and lose both members of a RAID 1 set.

 IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
 an issue you need to go with RAID 6 or add another disk to each mirror.

 Mike Stone

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-26 Thread Alex Turner

Yes - they work excellently. I have several medium and large servers
running 3ware 9500S series cards with great success.  We have
rebuilding many failed RAID 10s over the course with no problems.

Alex

On 12/26/05, Benjamin Arai [EMAIL PROTECTED] wrote:
  Have you have any experience rebuilding arrays in linux using the 3Ware
 utilities?  If so, did it work well?


  Luke Lonergan wrote:
  Benjamin,



  Have you done any benchmarking of the 9550SX against a software raid
 configuration?


 Interesting - no, not on SATA, mostly because I've had awful luck with Linux
 drivers and SATA. The popular manufacturers of SATA to PCI bridge chipsets
 are Silicon Image and Highpoint, and I've not seen Linux work with them at
 any reasonable performance yet. I've also had problems with Adaptec's cards
 - I think they manufacture their own SATA to PCI chipset as well. So far,
 I've only had good luck with the on-chipset Intel SATA implementation. I
 think the problems I've had could be entirely driver-related, but in the end
 it doesn't matter if you can't find drivers that work for Linux.

 The other problem is getting enough SATA connections for the number of disks
 we want. I do have two new Areca SATA RAID cards and I'm going to benchmark
 those against the 3Ware 9550SX with 2 x 8 = 16 disks on one host.

 I guess we could run the HW RAID controllers in JBOD mode to get a good
 driver / chipset configuration for software RAID, but frankly I prefer HW
 RAID if it performs well. So far the SATA host-based RAID is blowing the
 doors off of every other HW RAID solution I've tested.

 - Luke




 --
  Benjamin Arai
  [EMAIL PROTECTED]
  http://www.benjaminarai.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-26 Thread Alex Turner

Yes, but those blocks in RAID 10 are largely irrelevant as they are to
independant disks.  In RAID 5 you have to write parity to an 'active'
drive that is part of the stripe.  (They are irrelevant unless of
course you are maxing out your SCSI bus - yet another reason why SATA
can be faster than SCSI, particularly in RAID 10, every channel is
independant).

Sorry - my math for RAID 5 was a bit off - I don't know why I was
considering only a three dirve situation - which is the worst.  It's
n+1 you are right.  still, for small arrays thats a big penalty. 
Still, there is definately a penatly contrary to the assertion of the
orignal poster.

I agree totally that the read+parity-calc+write in the worst case is
totaly bad, which is why I alway recommend people should _never ever_
use RAID 5.   In this day and age of large capacity chassis, and large
capacity SATA drives, RAID 5 is totally inapropriate IMHO for _any_
application least of all databases.

In reality I have yet to benchmark a system where RAID 5 on the same
number of drives with 8 drives or less in a single array beat a RAID
10 with the same number of drives.  I would definately be interested
in a SCSI card that could actualy achieve the theoretical performance
of RAID 5 especially under Linux.

With RAID 5 you get to watch you system crumble and fail when a drive
fails and the array goes into a failed state.  It's just not worth it.

Alex.


On 12/26/05, David Lang [EMAIL PROTECTED] wrote:
 On Mon, 26 Dec 2005, Alex Turner wrote:

  It's irrelavent what controller, you still have to actualy write the
  parity blocks, which slows down your write speed because you have to
  write n+n/2 blocks. instead of just n blocks making the system write
  50% more data.
 
  RAID 5 must write 50% more data to disk therefore it will always be slower.

 raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
 have a 15+1 disk raid5 array for example

 however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
 talking about pure I/O needed raid5 wins hands down. (the same 16 drives
 would be a 8+8 array)

 what slows down raid 5 is that to modify a block you have to read blocks
 from all your drives to re-calculate the parity. this interleaving of
 reads and writes when all you are logicly doing is writes can really hurt.
 (this is why I asked the question that got us off on this tangent, when
 doing new writes to an array you don't have to read the blocks as they are
 blank, assuming your cacheing is enough so that you can write blocksize*n
 before the system starts actually writing the data)

 David Lang

  Alex.
 
  On 12/25/05, Michael Stone [EMAIL PROTECTED] wrote:
  On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:
  Caches help, and the bigger the cache the better, but once you are
  doing enough writes fast enough (and that doesn't take much even with
  a few GBs of cache) the recalculate-checksums-and-write-new-ones
  overhead will decrease the write speed of real data.  Bear in mind
  that the HD's _raw_ write speed hasn't been decreased.  Those HD's
  are pounding away as fast as they can for you.  Your _effective_ or
  _data level_ write speed is what decreases due to overhead.
 
  You're overgeneralizing. Assuming a large cache and a sequential write,
  there's need be no penalty for raid 5. (For random writes you may
  need to read unrelated blocks in order to calculate parity, but for
  large sequential writes the parity blocks should all be read from
  cache.) A modern cpu can calculate parity for raid 5 on the order of
  gigabytes per second, and even crummy embedded processors can do
  hundreds of megabytes per second. You may have run into some lousy
  implementations, but you should be much more specific about what
  hardware you're talking about instead of making sweeping
  generalizations.
 
  Side Note: people often forget the other big reason to use RAID 10
  over RAID 5.  RAID 5 is always only 2 HD failures from data
  loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
  unless you get unlucky and lose both members of a RAID 1 set.
 
  IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
  an issue you need to go with RAID 6 or add another disk to each mirror.
 
  Mike Stone
 
  ---(end of broadcast)---
  TIP 6: explain analyze is your friend
 
 
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Is RAID10 the best choice?

2005-12-11 Thread Alex Turner

Personaly I would split into two RAID 1s.  One for pg_xlog, one for
the rest.  This gives probably the best performance/reliability
combination.

Alex.

On 12/10/05, Carlos Benkendorf [EMAIL PROTECTED] wrote:
 Hello,

 I would like to know which is the best configuration to use 4 scsi drives
 with a pg 8.1 server.

 Configuring them as a RAID10 set seems a good choice but now I´m figuring
 another configuration:
 SCSI drive 1: operational system
 SCSI drive 2: pg_xlog
 SCSI drive 3: data
 SCSI drive 4: index

 I know the difference between them when you analyze risks of loosing data
 but how about performance?

 What should be better?

 Obs.: Our system uses always an index for every access..
 (enable_seqscan(false))

 Thanks in advance!

 Benkendorf

  
  Yahoo! doce lar. Faça do Yahoo! sua homepage.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Alex Turner

I would argue that almost certainly won't by doing that as you will
create a new place even further away for the disk head to seek to
instead of just another file on the same FS that is probably closer to
the current head position.

Alex

On 12/6/05, Michael Stone [EMAIL PROTECTED] wrote:
 On Tue, Dec 06, 2005 at 09:36:23PM +, Rory Campbell-Lange wrote:
 Yes, we don't have any spare disks unfortunately. We have enabled the
 BBU write, so we are hoping for good performance.

 Even if you don't use seperate disks you'll probably get better
 performance by putting the WAL on a seperate ext2 partition. xfs gives
 good performance for the table data, but is not particularly good for
 the WAL.

 Mike Stone

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Alex Turner

Ok - so I ran the same test on my system and get a total speed of
113MB/sec.  Why is this?  Why is the system so limited to around just
110MB/sec?  I tuned read ahead up a bit, and my results improve a
bit..

Alex


On 11/18/05, Luke Lonergan [EMAIL PROTECTED] wrote:
  Dave,

  On 11/18/05 5:00 AM, Dave Cramer [EMAIL PROTECTED] wrote:
  
   Now there's an interesting line drawn in the sand. I presume you have
   numbers to back this up ?
  
   This should draw some interesting posts.

  Part 2: The answer

  System A:

 This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.

  On a single table with 15 columns (the Bizgres IVP) at a size double memory
 (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan
 the table: that's 66 MB/s.  Not the efficiency I'd hope from the onboard
 SATA controller that I'd like, I would have expected to get 85% of the
 100MB/s raw read performance.

  So that's $1,200 / 66 MB/s (without adjusting for 2003 price versus now) =
 18.2 $/MB/s

  Raw data:
  [EMAIL PROTECTED] IVP]$ cat scan.sh
  #!/bin/bash

  time psql -c select count(*) from ivp.bigtable1 dgtestdb
  [EMAIL PROTECTED] IVP]$ cat sysout1
count
  --
   1000
  (1 row)


  real0m32.565s
  user0m0.002s
  sys 0m0.003s

  Size of the table data:
  [EMAIL PROTECTED] IVP]$ du -sk dgtestdb/base
  2121648 dgtestdb/base

  System B:

 This system is running an XFS filesystem, and has been tuned to use very
 large (16MB) readahead.  It's running the Centos 4.1 distro, which uses a
 Linux 2.6.9 kernel.

  Same test as above, but with 17GB of data takes 69.7 seconds to scan (!)
 That's 244.2MB/s, which is obviously double my earlier point of 110-120MB/s.
  This system is running with a 16MB Linux readahead setting, let's try it
 with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds
 or 99.3MB/s.

  So, using the tuned setting of blockdev —setra 16384 we get $6,000 /
 244MB/s = 24.6 $/MB/s
  If we use the default Linux setting it's 2.5x worse.

  Raw data:
  [EMAIL PROTECTED] IVP]$ cat scan.sh
  #!/bin/bash

  time psql -c select count(*) from ivp.bigtable1 dgtestdb
  [EMAIL PROTECTED] IVP]$ cat sysout3
count
  --
   8000
  (1 row)


  real1m9.875s
  user0m0.000s
  sys 0m0.004s
  [EMAIL PROTECTED] IVP]$ !du
  du -sk dgtestdb/base
  17021260dgtestdb/base

  Summary:

  cough, cough OK – you can get more I/O bandwidth out of the current I/O
 path for sequential scan if you tune the filesystem for large readahead.
 This is a cheap alternative to overhauling the executor to use asynch I/O.

  Still, there is a CPU limit here – this is not I/O bound, it is CPU limited
 as evidenced by the sensitivity to readahead settings.   If the filesystem
 could do 1GB/s, you wouldn't go any faster than 244MB/s.

  - Luke

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-17 Thread Alex Turner

On 11/16/05, William Yu [EMAIL PROTECTED] wrote:
 Alex Turner wrote:
  Spend a fortune on dual core CPUs and then buy crappy disks...  I bet
  for most applications this system will be IO bound, and you will see a
  nice lot of drive failures in the first year of operation with
  consumer grade drives.
 
  Spend your money on better Disks, and don't bother with Dual Core IMHO
  unless you can prove the need for it.

 I would say the opposite -- you always want Dual Core nowadays. DC
 Opterons simply give you better bang for the buck than single core
 Opterons. Price out a 1xDC system against a 2x1P system -- the 1xDC will
 be cheaper. Do the same for 2xDC versus 4x1P, 4xDC versus 8x1P, 8xDC
 versus 16x1P, etc. -- DC gets cheaper by wider and wider margins because
 those mega-CPU motherboards are astronomically expensive.


Opteron 242 - $178.00
Opteron 242 - $178.00
Tyan S2882 - $377.50
Total: $733.50

Opteron 265 - $719.00
Tyan K8E - $169.00
Total: $888.00

Tyan K8E - doesn't have any PCI-X, so it's also not apples to apples. 
Infact I couldn't find a single CPU slot board that did, so you pretty
much have to buy a dual CPU board to get PCI-X.

1xDC is _not_ cheaper.

Our DB application does about 5 queries/second peak, plus a heavy
insert job once per day.  We only _need_ two CPUs, which is true for a
great many DB applications.  Unless you like EJB of course, which will
thrash the crap out of your system.

Consider the two most used regions for DBs:

a) OLTP - probably IO bound, large number of queries/sec updating info
on _disks_, not requiring much CPU activity except to retrieve item
infomration which is well indexed and normalized.

b) Data wharehouse - needs CPU, but probably still IO bound, large
data set that won't fit in RAM will required large amounts of disk
reads.  CPU can easily keep up with disk reads.

I have yet to come across a DB system that wasn't IO bound.

 DC also gives you a better upgrade path. Let's say you do testing and
 figure 2x246 is the right setup to handle the load. Well instead of
 getting 2x1P, use the same 2P motherboard but only populate 1 CPU w/ a
 DC/270. Now you have a server that can be upgraded to +80% more CPU by
 popping in another DC/270 versus throwing out the entire thing to get a
 4x1P setup.

No argument there.  But it's pointless if you are IO bound.


 The only questions would be:
 (1) Do you need a SMP server at all? I'd claim yes -- you always need 2+
 cores whether it's DC or 2P to avoid IO interrupts blocking other
 processes from running.

At least 2CPUs is always good for precisely those reasons.  More than
2CPUs gives diminishing returns.


 (2) Does a DC system perform better than it's Nx1P cousin? My experience
 is yes. Did some rough tests in a drop-in-replacement 1x265 versus 2x244
 and saw about +10% for DC. All the official benchmarks (Spec, Java, SAP,
 etc) from AMD/Sun/HP/IBM show DCs outperforming the Nx1P setups.

Maybe true, but the 265 does have a 25% faster FSB than the 244, which
might perhaps play a role.


 (3) Do you need an insane amount of memory? Well here's the case where
 the more expensive motherboard will serve you better since each CPU slot
 has its own bank of memory. Spend more money on memory, get cheaper
 single-core CPUs.

Remember - large DB is going to be IO bound.  Memory will get thrashed
for file block buffers, even if you have large amounts, it's all gonna
be cycled in and out again.


 Of course, this doesn't apply if you are an Intel/Dell-only shop. Xeon
 DCs, while cheaper than their corresponding single-core SMPs, don't have
 the same performance profile of Opteron DCs. Basically, you're paying a
 bit extra so your server can generate a ton more heat.

Dell/Xeon/Postgres is just a bad combination any day of the week ;)

Alex.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-17 Thread Alex Turner

Just pick up a SCSI drive and a consumer ATA drive.

Feel their weight.

You don't have to look inside to tell the difference.

Alex

On 11/16/05, David Boreham [EMAIL PROTECTED] wrote:


  I suggest you read this on the difference between enterprise/SCSI and
 desktop/IDE drives:

 http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf


  This is exactly the kind of vendor propaganda I was talking about
  and it proves my point quite well : that there's nothing specific relating
  to reliability that is different between SCSI and SATA drives cited in that
 paper.
  It does have a bunch of FUD such as 'oh yeah we do a lot more
  drive characterization during manufacturing'.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-17 Thread Alex Turner

On 11/16/05, Joshua D. Drake [EMAIL PROTECTED] wrote:
 
  The only questions would be:
  (1) Do you need a SMP server at all? I'd claim yes -- you always need
  2+ cores whether it's DC or 2P to avoid IO interrupts blocking other
  processes from running.

 I would back this up. Even for smaller installations (single raid 1, 1
 gig of ram). Why? Well because many applications are going to be CPU
 bound. For example
 we have a PHP application that is a CMS. On a single CPU machine, RAID 1
 it takes about 300ms to deliver a single page, point to point. We are
 not IO bound.
 So what happens is that under reasonable load we are actually waiting
 for the CPU to process the code.


This is the performance profile for PHP, not for Postgresql.  This is
the postgresql mailing list.

 A simple upgrade to an SMP machine literally doubles our performance
 because we are still not IO bound. I strongly suggest that everyone use
 at least a single dual core because of this experience.


Performance of PHP, not postgresql.

 
  (3) Do you need an insane amount of memory? Well here's the case where
  the more expensive motherboard will serve you better since each CPU
  slot has its own bank of memory. Spend more money on memory, get
  cheaper single-core CPUs.
 Agreed. A lot of times the slowest dual-core is 5x what you actually
 need. So get the slowest, and bulk up on memory. If nothing else memory
 is cheap today and it might not be tomorrow.
[snip]

Running postgresql on a single drive RAID 1 with PHP on the same
machine is not a typical installation.

300ms for PHP in CPU time?  wow dude - that's quite a page.  PHP
typical can handle up to 30-50 pages per second for a typical OLTP
application on a single CPU box.  Something is really wrong with that
system if it takes 300ms per page.

Alex.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Hardware/OS recommendations for large databases

2005-11-16 Thread Alex Turner

Yes - that very benchmark shows that for a MySQL Datadrive in RAID 10,
the 3ware controllers beat the Areca card.

Alex.

On 11/16/05, Ron [EMAIL PROTECTED] wrote:
 Got some hard numbers to back your statement up?  IME, the Areca
 1160's with  = 1GB of cache beat any other commodity RAID
 controller.  This seems to be in agreement with at least one
 independent testing source:

 http://print.tweakers.net/?reviews/557

 RAID HW from Xyratex, Engino, or Dot Hill will _destroy_ any
 commodity HW solution, but their price point is considerably higher.

 ...on another note, I completely agree with the poster who says we
 need more cache on RAID controllers.  We should all be beating on the
 RAID HW manufacturers to use standard DIMMs for their caches and to
 provide 2 standard DIMM slots in their full height cards (allowing
 for up to 8GB of cache using 2 4GB DIMMs as of this writing).

 It should also be noted that 64 drive chassis' are going to become
 possible once 2.5 10Krpm SATA II and FC HDs become the standard next
 year (48's are the TOTL now).  We need controller technology to keep up.

 Ron

 At 12:16 AM 11/16/2005, Alex Turner wrote:
 Not at random access in RAID 10 they aren't, and anyone with their
 head screwed on right is using RAID 10.  The 9500S will still beat the
 Areca cards at RAID 10 database access patern.
 
 Alex.
 
 On 11/15/05, Dave Cramer [EMAIL PROTECTED] wrote:
   Luke,
  
   Have you tried the areca cards, they are slightly faster yet.
  
   Dave
  
   On 15-Nov-05, at 7:09 AM, Luke Lonergan wrote:
  
  
  
  
  
   I agree - you can get a very good one from www.acmemicro.com or
  
   www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA
  
   RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM
  
   on a Tyan 2882 motherboard.  We get about 400MB/s sustained disk read
  
   performance on these (with tuning) on Linux using the xfs filesystem,
  
   which is one of the most critical factors for large databases.
  
  
  
  
   Note that you want to have your DBMS use all of the CPU and disk channel
  
   bandwidth you have on each query, which takes a parallel database like
  
   Bizgres MPP to achieve.
  
  
  
  
   Regards,
  
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-15 Thread Alex Turner

On 11/15/05, Luke Lonergan [EMAIL PROTECTED] wrote:
 Adam,

  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
  Claus Guttesen
  Sent: Tuesday, November 15, 2005 12:29 AM
  To: Adam Weisberg
  Cc: pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Hardware/OS recommendations for large
  databases ( 5TB)

   Does anyone have recommendations for hardware and/or OS to
  work with
   around 5TB datasets?

  Hardware-wise I'd say dual core opterons. One
  dual-core-opteron performs better than two single-core at the
  same speed. Tyan makes some boards that have four sockets,
  thereby giving you 8 cpu's (if you need that many). Sun and
  HP also makes nice hardware although the Tyan board is more
  competetive priced.

  OS wise I would choose the FreeBSD amd64 port but
  partititions larger than 2 TB needs some special care, using
  gpt rather than disklabel etc., tools like fsck may not be
  able to completely check partitions larger than 2 TB. Linux
  or Solaris with either LVM or Veritas FS sounds like candidates.

 I agree - you can get a very good one from www.acmemicro.com or
 www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA
 RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM
 on a Tyan 2882 motherboard.  We get about 400MB/s sustained disk read
 performance on these (with tuning) on Linux using the xfs filesystem,
 which is one of the most critical factors for large databases.

Spend a fortune on dual core CPUs and then buy crappy disks...  I bet
for most applications this system will be IO bound, and you will see a
nice lot of drive failures in the first year of operation with
consumer grade drives.

Spend your money on better Disks, and don't bother with Dual Core IMHO
unless you can prove the need for it.

Alex

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-15 Thread Alex Turner

Not at random access in RAID 10 they aren't, and anyone with their
head screwed on right is using RAID 10.  The 9500S will still beat the
Areca cards at RAID 10 database access patern.

Alex.

On 11/15/05, Dave Cramer [EMAIL PROTECTED] wrote:
 Luke,

 Have you tried the areca cards, they are slightly faster yet.

 Dave

 On 15-Nov-05, at 7:09 AM, Luke Lonergan wrote:





 I agree - you can get a very good one from www.acmemicro.com or

 www.rackable.com with 8x 400GB SATA disks and the new 3Ware 9550SX SATA

 RAID controller for about $6K with two Opteron 272 CPUs and 8GB of RAM

 on a Tyan 2882 motherboard.  We get about 400MB/s sustained disk read

 performance on these (with tuning) on Linux using the xfs filesystem,

 which is one of the most critical factors for large databases.




 Note that you want to have your DBMS use all of the CPU and disk channel

 bandwidth you have on each query, which takes a parallel database like

 Bizgres MPP to achieve.




 Regards,


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Sort performance on large tables

2005-11-10 Thread Alex Turner

We use this memory in all our servers (well - the 512 sticks).  0
problems to date:

http://www.newegg.com/Product/Product.asp?Item=N82E16820145513

$163 for 1GB.

This stuff is probably better than the Samsung RAM dell is selling you
for 3 times the price.

Alex

On 11/10/05, Ron Peacetree [EMAIL PROTECTED] wrote:
 My original post did not take into account VAT, I apologize for that 
 oversight.

 However, unless you are naive, or made of gold, or have some sort of 
 special relationship that requires you to, _NE VER_ buy RAM from your 
 computer HW OEM.  For at least two decades it's been a provable fact that 
 OEMs like DEC, Sun, HP, Compaq, Dell, etc, etc charge far more per GB for the 
 RAM they sell.  Same goes for HDs.  Buy your memory and HDs direct from 
 reputable manufacturers, you'll get at least the same quality and pay 
 considerably less.

 Your Dell example is evidence that supports my point.  As of this writing, 
 decent RAM should cost $75-$150 pr GB (not including VAT ;-) ).   Don't let 
 yourself be conned into paying more.

 I'm talking about decent RAM from reputable direct suppliers like Corsair and 
 Kingston (_not_ their Value RAM, the actual Kingston branded stuff), OCZ, 
 etc.  Such companies sell via multiple channels, including repuatble websites 
 like dealtime.com, pricewatch.com, newegg.com, etc, etc.

 You are quite correct that there's poor quality junk out there.  I was not 
 talking about it, only reasonable quality components.

 Ron


 -Original Message-
 From: Kurt De Grave [EMAIL PROTECTED]
 Sent: Nov 10, 2005 5:40 AM
 To: Ron Peacetree [EMAIL PROTECTED]
 Cc: Charlie Savage [EMAIL PROTECTED], pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Sort performance on large tables



 On Wed, 9 Nov 2005, Ron Peacetree wrote:

  At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less.
  4 2GB DIMMs (8GB) should cost ~$600. As of now, very few mainboards
  support 4GB DIMMs and I doubt the D3000 has such a mainboard.  If you
  can use them, 4 4GB DIMMs (16GB) will currently set you back
  ~$1600-$2400.

 Sorry, but every time again I see unrealistic memory prices quoted when
 the buy-more-memory argument passes by.
 What kind of memory are you buying for your servers?  Non-ECC no-name
 memory that doesn't even pass a one-hour memtest86 for 20% of the items
 you buy?

 Just checked at Dell's web page: adding 4 1GB DIMMs to a PowerEdge 2850
 sets you back _1280 EURO_ excluding VAT.  And that's after they already
 charged you 140 euro for replacing the obsolete standard 4 512MB DIMMs
 with the same capacity in 1GB DIMMs. So the 4GB upgrade actually costs
 1420 euro plus VAT, which is quite a bit more than $300.

 Okay, few people will happily buy at those prices.  You can get the
 exact same goods much cheaper elsewhere, but it'll still cost you way
 more than the number you gave, plus you'll have to drive to the server's
 location, open up the box yourself, and risk incompatibilities and
 support problems if there's ever something wrong with that memory.

 Disclaimers:
 I know that you're talking about a desktop in this particular case.
 I wouldn't see a need for ECC in a development box either.
 I know a Dell hasn't been the smartest choice for a database box lately
 (but politics...).

 kurt.



 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

2005-11-07 Thread Alex Turner

Where are the pg_xlog and data directories with respect to each other?
 From this IOStat it looks like they might be on the same partition,
which is not ideal, and actualy surprising that throughput is this
good.  You need to seperate pg_xlog and data directories to get any
kind of reasonable performance.  Also don't use RAID 5 - RAID 5 bites,
no really - it bites. Use multiple RAID 1s, or RAID 10s, you will get
better performance.  50MB/70MB is about the same as you get from a
single disk or a RAID 1.

We use 2x9506S8MI controlers, and have maintained excellent
performance with 2xRAID 10 and 2xRAID 1.  Make sure you get the
firmware update if you have these controllers though.

Alex Turner
NetEconomist

On 11/6/05, Joost Kraaijeveld [EMAIL PROTECTED] wrote:
 Hi,

 I am experiencing very long update queries and I want to know if it
 reasonable to expect them to perform better.

 The query below is running for more than 1.5 hours (5500 seconds) now,
 while the rest of the system does nothing (I don't even type or move a
 mouse...).

 - Is that to be expected?
 - Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given
 the fact that fsync is off?  (Note: with bonnie++ I get write
 performance  50 MB/sec and read performace  70 MB/sec with  2000
 read/write ops /sec?
 - Does anyone else have any experience with the 3Ware RAID controller
 (which is my suspect)?
 - Any good idea how to determine the real botleneck if this is not the
 performance I can expect?

 My hard- and software:

 - PostgreSQL 8.0.3
 - Debian 3.1 (Sarge) AMD64
 - Dual Opteron
 - 4GB RAM
 - 3ware Raid5 with 5 disks

 Pieces of my postgresql.conf (All other is default):
 shared_buffers = 7500
 work_mem = 260096
 fsync=false
 effective_cache_size = 32768



 The query with explain (amount and orderbedrag_valuta are float8,
 ordernummer and ordernumber int4):

 explain update prototype.orders set amount =
 odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber =
 odbc.orders.ordernummer;
  QUERY PLAN
 -
 Hash Join  (cost=50994.74..230038.17 rows=1104379 width=466)
Hash Cond: (outer.ordernumber = inner.ordernummer)
-  Seq Scan on orders  (cost=0.00..105360.68 rows=3991868 width=455)
-  Hash  (cost=48233.79..48233.79 rows=1104379 width=15)
  -  Seq Scan on orders  (cost=0.00..48233.79 rows=1104379
 width=15)


 Sample output from iostat during query (about avarage):
 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
 hdc   0.00 0.00 0.00  0  0
 sda   0.00 0.00 0.00  0  0
 sdb 187.1323.76  8764.36 24   8852


 --
 Groeten,

 Joost Kraaijeveld
 Askesis B.V.
 Molukkenstraat 14
 6524NB Nijmegen
 tel: 024-3888063 / 06-51855277
 fax: 024-3608416
 e-mail: [EMAIL PROTECTED]
 web: www.askesis.nl



 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Alex Turner

You could also create your own index so to speak as a table that
simply contains a list of primary keys and an order value field that
you can use as your offset.  This can be kept in sync with the master
table using triggers pretty easily.  2 million is not very much if you
only have a integer pkey, and an integer order value, then you can
join it against the main table.

create table my_index_table (
primary_key_value int,
order_val int,
primary key (primary_key_value));

create index my_index_table_order_val_i on index_table (order_val);

select * from main_table a, my_index_table b where b.order_val=25 and
b.order_val50 and a.primary_key_id=b.primary_key_id

If the data updates alot then this won't work as well though as the
index table will require frequent updates to potentialy large number
of records (although a small number of pages so it still won't be
horrible).

Alex Turner
NetEconomist

On 10/26/05, Joshua D. Drake [EMAIL PROTECTED] wrote:

  We have a GUI that let user browser through the record page by page at
  about 25 records a time. (Don't ask me why but we have to have this
  GUI). This translates to something like
 
select count(*) from table   -- to give feedback about the DB size

 Do you have a integer field that is an ID that increments? E.g; serial?

select * from table order by date limit 25 offset 0

 You could use a cursor.

 Sincerely,

 Joshua D. Drake


 
  Tables seems properly indexed, with vacuum and analyze ran regularly.
  Still this very basic SQLs takes up to a minute run.
 
  I read some recent messages that select count(*) would need a table
  scan for Postgre. That's disappointing. But I can accept an
  approximation if there are some way to do so. But how can I optimize
  select * from table order by date limit x offset y? One minute
  response time is not acceptable.
 
  Any help would be appriciated.
 
  Wy
 
 
 --
 The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] What gets cached?

2005-10-24 Thread Alex Turner

Just to play devils advocate here for as second, but if we have an
algorithm that is substational better than just plain old LRU, which is
what I believe the kernel is going to use to cache pages (I'm no kernel
hacker), then why don't we apply that and have a significantly larger
page cache a la Oracle?

AlexOn 10/21/05, Neil Conway [EMAIL PROTECTED] wrote:
On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote: Let's say I do the same thing in Postgres.I'm likely to have my very fastest performance for the first few queries until memory gets filled up.
No, you're not: if a query doesn't hit the cache (both the OS cache andthe Postgres userspace cache), it will run slower. If the caches areempty when Postgres starts up (which is true for the userspace cache and
might be true of the OS cache), the first queries that are run should beslower, not faster.The only time Postgres seems to take advantage of cached data is when Irepeat the same (or substantially the same) query.
Caching is done on a page-by-page basis -- the source text of the queryitself is not relevant. If two different queries happen to hit a similarset of pages, they will probably both benefit from the same set of
cached pages. I don't know of any way to view what is actually cached at any point in time, but it seems like most recently used rather than most frequently used.
The cache replacement policy in 7.4 and older releases is simple LRU.The policy in 8.0 is ARC (essentially a version of LRU modified to tryto retain hot pages more accurately). The policy in 8.1 is a clock-based
algorithm.-Neil---(end of broadcast)---TIP 6: explain analyze is your friend

Re: [PERFORM] Is There Any Way ....

2005-10-24 Thread Alex Turner

This is possible with Oracle utilizing the keep pool

alter table t_name storage ( buffer_pool keep);

If Postgres were to implement it's own caching system, this seems like
it would be easily to implement (beyond the initial caching effort).

Alex


On 10/24/05, Craig A. James [EMAIL PROTECTED] wrote:
 Jim C. Nasby jnasby ( at ) pervasive ( dot ) com wrote:
   Stefan Weiss wrote:
   ... IMO it would be useful to have a way to tell
   PG that some tables were needed frequently, and should be cached if
   possible. This would allow application developers to consider joins with
   these tables as cheap, even when querying on columns that are
   not indexed.
 
  Why do you think you'll know better than the database how frequently
  something is used? At best, your guess will be correct and PostgreSQL
  (or the kernel) will keep the table in memory. Or, your guess is wrong
  and you end up wasting memory that could have been used for something
  else.
 
  It would probably be better if you describe why you want to force this
  table (or tables) into memory, so we can point you at more appropriate
  solutions.

 Or perhaps we could explain why we NEED to force these tables into memory, so 
 we can point you at a more appropriate implementation.  ;-)

 Ok, wittiness aside, here's a concrete example.  I have an application with 
 one critical index that MUST remain in memory at all times.  The index's 
 tablespace is about 2 GB.  As long as it's in memory, performance is 
 excellent - a user's query takes a fraction of a second.  But if it gets 
 swapped out, the user's query might take up to five minutes as the index is 
 re-read from memory.

 Now here's the rub.  The only performance I care about is response to queries 
 from the web application.  Everything else is low priority.  But there is 
 other activity going on.  Suppose, for example, that I'm updating tables, 
 performing queries, doing administration, etc., etc., for a period of an 
 hour, during which no customer visits the site.  The another customer comes 
 along and performs a query.

 At this point, no heuristic in the world could have guessed that I DON'T CARE 
 ABOUT PERFORMANCE for anything except my web application.  The performance of 
 all the other stuff, the administration, the updates, etc., is utterly 
 irrelevant compared to the performance of the customer's query.

 What actually happens is that the other activities have swapped out the 
 critical index, and my customer waits, and waits, and waits... and goes away 
 after a minute or two.  To solve this, we've been forced to purchase two 
 computers, and mirror the database on both.  All administration and 
 modification happens on the offline database, and the web application only 
 uses the online database.  At some point, we swap the two servers, sync the 
 two databases, and carry on.  It's a very unsatisfactory solution.

 There is ONLY one way to convey this sort of information to Postgres, which 
 is to provide the application developer a mechanism to explicitely name the 
 tables that should be locked in memory.

 Look at tsearchd that Oleg is working on.  It's a direct response to this 
 problem.

 It's been recognized for decades that, as kernel developers (whether a Linux 
 kernel or a database kernel), our ability to predict the behavior of an 
 application is woefully inadequate compared with the application developer's 
 knowledge of the application.  Computer Science simply isn't a match for the 
 human brain yet, not even close.

 To give you perspective, since I posted a question about this problem 
 (regarding tsearch2/GIST indexes), half of the responses I received told me 
 that they encountered this problem, and their solution was to use an external 
 full-text engine.  They all confirmed that Postgres can't deal with this 
 problem yet, primarily for the reasons outlined above.

 Craig

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] What gets cached?

2005-10-21 Thread Alex Turner

Oracle uses LRU caching algorithm also, not LFU.

AlexOn 10/21/05, Martin Nickel [EMAIL PROTECTED] wrote:
I was reading a comment in another posting and it started me thinkingabout this.Let's say I startup an Oracle server.All my queries are alittle bit (sometimes a lot bit) slow until it gets its normal things in
memory, then it's up to speed.The normal things would include somesmall lookup tables and the indexes for the most frequently used tables.Let's say I do the same thing in Postgres.I'm likely to have my very
fastest performance for the first few queries until memory gets filled up. The only time Postgres seems to take advantage of cached data is when I repeat the same (or substantially the same) query.I don't know of any
 way to view what is actually cached at any point in time, but it seems like most recently used rather than most frequently used.Does this seem true? s---(end of broadcast)---
TIP 4: Have you searched our list archives? http://archives.postgresql.org

Re: [PERFORM] Used Memory

2005-10-21 Thread Alex Turner

[snip]to the second processor in my dual Xeon eServer) has got me to thepoint that the perpetually high memory usage doesn't affect my
application server.
I'm curious - how does the high memory usage affect your application server?

Alex

Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Alex Turner

Realise also that unless you are running the 1.5 x86-64 build, java
will not use more than 1Gig, and if the app server requests more than
1gig, Java will die (I've been there) with an out of memory error, even
though there is plenty of free mem available. This can easily be
cause by a lazy GC thread if the applicaiton is running high on CPU
usage.

The kernel will not report memory used for caching pages as being
unavailable, if a program calls a malloc, the kernel will just swap out
the oldest disk page and give the memory to the application.

Your free -mo shows 3 gig free even with cached disk pages. It
looks to me more like either a Java problem, or a kernel problem...

Alex Turner
NetEconomistOn 10/10/05, Jon Brisbin [EMAIL PROTECTED] wrote:
Tom Lane wrote: Are you sure it's not cached data pages, rather than cached inodes? If so, the above behavior is *good*. People often have a mistaken notion that having near-zero free RAM means
 they have a problem.In point of fact, that is the way it is supposed to be (at least on Unix-like systems).This is just a reflection of the kernel doing what it is supposed to do, which is to use all spare RAM
 for caching recently accessed disk pages.If you're not swapping then you do not have a problem.Except for the fact that my Java App server crashes when all theavailable memory is being used by caching and not reclaimed :-)
If it wasn't for the app server going down, I probably wouldn't care.--Jon BrisbinWebmasterNPC International, Inc.---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Alex Turner

Well - to each his own I guess - we did extensive testing on 1.4, and
it refused to allocate much past 1gig on both Linux x86/x86-64 and
Windows.

AlexOn 10/11/05, Alan Stange [EMAIL PROTECTED] wrote:
Alex Turner wrote: Perhaps this is true for 1.5 on x86-32 (I've only used it on x86-64) but I was more thinking 1.4 which many folks are still using.The 1.4.x JVM's will also work just fine with much more than 1GB of
memory. Perhaps you'd like to try again?-- Alan On 10/11/05, *Alan Stange* [EMAIL PROTECTED] mailto:
[EMAIL PROTECTED] wrote: Alex Turner wrote:  Realise also that unless you are running the 1.5 x86-64 build, java  will not use more than 1Gig, and if the app server requests more
 than  1gig, Java will die (I've been there) with an out of memory error,  even though there is plenty of free mem available.This can easily be  cause by a lazy GC thread if the applicaiton is running high on
 CPU usage. On my side of Planet Earth, the standard non-x64 1.5 JVM will happily use more than 1G of memory (on linux and Solaris, can't speak for Windows).If you're running larger programs, it's probably a good
 idea to use the -server compiler in the JVM as well.I regularly run with -Xmx1800m and regularly have 1GB heap sizes. The standard GC will not cause on OOM error if space remains for the
 requested object.The GC thread blocks all other threads during its activity, whatever else is happening on the machine. The newer/experimental GC's did have some potential race conditions, but I
 believe those have been resolved in the 1.5 JVMs. Finally, note that the latest _05 release of the 1.5 JVM also now supports large page sizes on Linux and Windows: -XX:+UseLargePages this can be quite beneficial depending on the
 memory patterns in your programs. -- Alan

Re: [PERFORM] Indexes on ramdisk

2005-10-04 Thread Alex Turner

Talk about your IO system a bit. There might be obvious ways to improve.

What System/Motherboard are you using?
What Controller Cards are you using?
What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k)
What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?
What kind of RAIDs do you have setup (How many drives what stripe sizes, how many used for what).
What levels of RAID are you using (0,1,10,5,50)?

With good setup, a dual PCI-X bus motherboard can hit 2GB/sec and
thousands of transactions to disk if you have a controller/disks
that can keep up. That is typicaly enough for most people without
resorting to SSD.

Alex Turner
NetEconomistOn 10/4/05, Emil Briggs [EMAIL PROTECTED] wrote:
I have an application that has a table that is both read and write intensive.Data from iostat indicates that the write speed of the system is the factorthat is limiting performance. The table has around 20 columns and most of the
columns are indexed. The data and the indices for the table are distributedover several mirrored disk partitions and pg_xlog is on another. I'm lookingat ways to improve performance and besides the obvious one of getting an SSD
I thought about putting the indices on a ramdisk. That means that after apower failure or shutdown I would have to recreate them but that isacceptable for this application. What I am wondering though is whether or not
I would see much performance benefit and if there would be any startupproblems after a power down event due to the indices not being present. Anyinsight would be appreciated.Emil---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] RAID Stripe size

2005-09-20 Thread Alex Turner

I have benched different sripe sizes with different file systems, and the perfmance differences can be quite dramatic.

Theoreticaly a smaller stripe is better for OLTP as you can write more
small transactions independantly to more different disks more often
than not, but a large stripe size is good for Data warehousing as you
are often doing very large sequential reads, and a larger stripe size
is going to exploit the on-drive cache as you request larger single
chunks from the disk at a time.

It also seems that different controllers are partial to different
defaults that can affect their performance, so I would suggest that
testing this on two different controller cards man be less than optimal.

I would also recommend looking at file system. For us JFS worked
signifcantly faster than resier for large read loads and large write
loads, so we chose JFS over ext3 and reiser.

I found that lower stripe sizes impacted performance badly as did overly large stripe sizes.

Alex Turner
NetEconomistOn 16 Sep 2005 04:51:43 -0700, bmmbn [EMAIL PROTECTED] wrote:
Hi EveryoneThe machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15kdisks.2 disks are in RAID1 and hold the OS, SWAP pg_xlog4 disks are in RAID10 and hold the Cluster itself.
the DB will have two major tables 1 with 10 million rows and one with100 million rows.All the activities against this tables will be SELECT.Currently the strip size is 8k. I read in many place this is a poor
setting.Am i right ?---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] RAID Stripe size

2005-09-20 Thread Alex Turner

I have found JFS to be just fine. We have been running a medium
load on this server for 9 months with no unscheduled down time.
Datbase is about 30gig on disk, and we get about 3-4 requests per
second that generate results sets in the thousands from about 8am to
about 11pm.

I have foudn that JFS barfs if you put a million files in a directory
and try to do an 'ls', but then so did reiser, only Ext3 handled this
test succesfully. Fortunately with a database, this is an
atypical situation, so JFS has been fine for DB for us so far.

We have had severe problems with Ext3 when file systems hit 100% usage,
they get all kinds of unhappy, we haven't had the same problem with JFS.

Alex Turner
NetEconomistOn 9/20/05, Welty, Richard [EMAIL PROTECTED] wrote:
Alex Turnerwrote: I would also recommend looking at file system.For us JFS worked signifcantlyfaster than resier for large read loads and large write loads, so we chose JFSover ext3 and reiser.
has jfs been reliable for you? there seems to be a lot of conjecture about instability,but i find jfs a potentially attractive alternative for a number of reasons.richard---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match

Re: [PERFORM] Need for speed 2

2005-09-20 Thread Alex Turner

I have found that while the OS may flush to the controller fast with
fsync=true, the controller does as it pleases (it has BBU, so I'm not
too worried), so you get great performance because your controller is
determine read/write sequence outside of what is being demanded by an
fsync.

Alex Turner
NetEconomistOn 8/25/05, Kelly Burkhart [EMAIL PROTECTED] wrote:
On Thu, 2005-08-25 at 11:16 -0400, Ron wrote: # - Settings - 
fsync =
false
# turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms:

# fsync, fdatasync, open_sync, or I hope you have a battery backed write buffer!Battery backed write buffer will do nothing here, because the OS istaking it's sweet time flushing to the controller's battery backed write
buffer!Isn't the reason for batter backed controller cache to make fsync()sfast?-K---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not match

Re: [PERFORM] Performance considerations for very heavy INSERT traffic

2005-09-12 Thread Alex Turner

Split your system into multiple partitions of RAID 10s.For max
performance, ten drive RAID 10 for pg_xlog (This will max out a
PCI-X bus) on Bus A, multiple 4/6Drive RAID 10s for tablespaces on Bus
B. For max performance I would recommend using one RAID 10 for raw data
tables, one for aggregate tables and one for indexes. More RAM
will only help you with queries against your data, if you are
pre-aggregating, then you may not need all that much RAM.

You can easily get 100 tansacts per second with even less hardware with a little data partitioning.

Choose your controller carefully as many don't co-operate with linux well.
Alex Turner
NetEconomist
On 9/12/05, Brandon Black [EMAIL PROTECTED] wrote:

I'm in the process of developing an application which uses PostgreSQL
for data storage. Our database traffic is very atypical, and as a
result it has been rather challenging to figure out how to best tune
PostgreSQL on what development hardware we have, as well as to figure
out exactly what we should be evaluating and eventually buying for
production hardware.

The vast, overwhelming majority of our database traffic is pretty much
a non-stop stream of INSERTs filling up tables. It is akin to
data acquisition. Several thousand clients are sending
once-per-minute updates full of timestamped numerical data at our
central server, which in turn performs INSERTs into several distinct
tables as part of the transaction for that client. We're talking
on the order of ~100 transactions per second, each containing INSERTs
to multiple tables (which contain only integer and floating point
columns and a timestamp column - the primary key (and only index) is on
a unique integer ID for the client and the timestamp). The
transaction load is spread evenly over time by having the clients send
their per-minute updates at random times rather than on the exact
minute mark.

There will of course be users using a web-based GUI to extract data
from these tables and display them in graphs and whatnot, but the
SELECT query traffic will always be considerably less frequent and
intensive than the incessant INSERTs, and it's not that big a deal if
the large queries take a little while to run.

This data also expires - rows with timestamps older than X days will be
DELETEd periodically (once an hour or faster), such that the tables
will reach a relatively stable size (pg_autovacuum is handling
vacuuming for now, but considering our case, we're thinking of killing
pg_autovacuum in favor of having the periodic DELETE process also do a
vacuum of affected tables right after the DELETE, and then have it
vacuum the other low traffic tables once a day while it's at it).

There is an aggregation layer in place which proxies the inbound data
from the clients into a small(er) number of persistent postgresql
backend processes. Right now we're doing one aggregator per 128
clients (so instead of 128 seperate database connections over the
course of a minute for a small transaction each, there is a single
database backend that is constantly committing transactions at a rate
of ~ 2/second). At a test load of ~1,000 clients, we would have 8
aggregators running and 8 postgresql backends. Testing has seemed
to indicate we should aggregate even harder - the planned production
load is ~5,000 clients initially, but will grow to almost double that
in the not-too-distant future, and that would mean ~40 backends at 128
clients each initially. Even on 8 cpus, I'm betting 40 concurrent
backends doing 2 tps is much worse off than 10 backends doing 8 tps.

Test hardware right now is a dual Opteron with 4G of ram, which we've
barely gotten 1,000 clients running against. Current disk hardware in
testing is whatever we could scrape together (4x 3-ware PCI hardware
RAID controllers, with 8 SATA drives in a RAID10 array off of each -
aggregated up in a 4-way stripe with linux md driver and then formatted
as ext3 with an appropriate stride parameter and data=""
Production will hopefully be a 4-8-way Opteron, 16 or more G of RAM,
and a fiberchannel hardware raid array or two (~ 1TB available RAID10 storage) with 15krpm disks and
battery-backed write cache.

I know I haven't provided a whole lot of application-level detail here,
but does anyone have any general advice on tweaking postgresql to deal
with a very heavy load of concurrent and almost exclusively write-only
transactions? Increasing shared_buffers seems to always help,
even out to half of the dev box's ram (2G). A 100ms commit_delay
seemed to help, but tuning it (and _siblings) has been difficult.
We're using 8.0 with the default 8k blocksize, but are strongly
considering both developing against 8.1 (seems it might handle the
heavy concurrency better), and re-compiling with 32k blocksize since
our storage arrays will inevitably be using fairly wide stripes.
Any advice on any of this (other than drop the project while you're
still a little bit sane)?

--Brandon

Re: [PERFORM] sustained update load of 1-2k/sec

2005-08-19 Thread Alex Turner

I have managed tx speeds that high from postgresql going even as high
as 2500/sec for small tables, but it does require a good RAID
controler card (yes I'm even running with fsync on).  I'm using 3ware
9500S-8MI with Raptor drives in multiple RAID 10s.  The box wasn't too
$$$ at just around $7k.  I have two independant controlers on two
independant PCI buses to give max throughput. on with a 6 drive RAID
10 and the other with two 4 drive RAID 10s.

Alex Turner
NetEconomist

On 8/19/05, Mark Cotner [EMAIL PROTECTED] wrote:
 Hi all,
 I bet you get tired of the same ole questions over and
 over.
 
 I'm currently working on an application that will poll
 thousands of cable modems per minute and I would like
 to use PostgreSQL to maintain state between polls of
 each device.  This requires a very heavy amount of
 updates in place on a reasonably large table(100k-500k
 rows, ~7 columns mostly integers/bigint).  Each row
 will be refreshed every 15 minutes, or at least that's
 how fast I can poll via SNMP.  I hope I can tune the
 DB to keep up.
 
 The app is threaded and will likely have well over 100
 concurrent db connections.  Temp tables for storage
 aren't a preferred option since this is designed to be
 a shared nothing approach and I will likely have
 several polling processes.
 
 Here are some of my assumptions so far . . .
 
 HUGE WAL
 Vacuum hourly if not more often
 
 I'm getting 1700tx/sec from MySQL and I would REALLY
 prefer to use PG.  I don't need to match the number,
 just get close.
 
 Is there a global temp table option?  In memory tables
 would be very beneficial in this case.  I could just
 flush it to disk occasionally with an insert into blah
 select from memory table.
 
 Any help or creative alternatives would be greatly
 appreciated.  :)
 
 'njoy,
 Mark
 
 
 --
 Writing software requires an intelligent person,
 creating functional art requires an artist.
 -- Unknown
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Need for speed

2005-08-16 Thread Alex Turner

Are you calculating aggregates, and if so, how are you doing it (I ask
the question from experience of a similar application where I found
that my aggregating PGPLSQL triggers were bogging the system down, and
changed them so scheduled jobs instead).

Alex Turner
NetEconomist

On 8/16/05, Ulrich Wisser [EMAIL PROTECTED] wrote:
 Hello,
 
 one of our services is click counting for on line advertising. We do
 this by importing Apache log files every five minutes. This results in a
 lot of insert and delete statements. At the same time our customers
 shall be able to do on line reporting.
 
 We have a box with
 Linux Fedora Core 3, Postgres 7.4.2
 Intel(R) Pentium(R) 4 CPU 2.40GHz
 2 scsi 76GB disks (15.000RPM, 2ms)
 
 I did put pg_xlog on another file system on other discs.
 
 Still when several users are on line the reporting gets very slow.
 Queries can take more then 2 min.
 
 I need some ideas how to improve performance in some orders of
 magnitude. I already thought of a box with the whole database on a ram
 disc. So really any idea is welcome.
 
 Ulrich
 
 
 
 --
 Ulrich Wisser  / System Developer
 
 RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden
 Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769
 
 http://www.relevanttraffic.com
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner

Also seems pretty silly to put it on a regular SATA connection, when
all that can manage is 150MB/sec.  If you made it connection directly
to 66/64-bit PCI then it could actualy _use_ the speed of the RAM, not
to mention PCI-X.

Alex Turner
NetEconomist

On 7/26/05, John A Meinel [EMAIL PROTECTED] wrote:
 I saw a review of a relatively inexpensive RAM disk over at
 anandtech.com, the Gigabyte i-RAM
 http://www.anandtech.com/storage/showdoc.aspx?i=2480
 
 Basically, it is a PCI card, which takes standard DDR RAM, and has a
 SATA port on it, so that to the system, it looks like a normal SATA drive.
 
 The card costs about $100-150, and you fill it with your own ram, so for
 a 4GB (max size) disk, it costs around $500. Looking for solid state
 storage devices, the cheapest I found was around $5k for 2GB.
 
 Gigabyte claims that the battery backup can last up to 16h, which seems
 decent, if not really long (the $5k solution has a built-in harddrive so
 that if the power goes out, it uses the battery power to copy the
 ramdisk onto the harddrive for more permanent storage).
 
 Anyway, would something like this be reasonable as a drive for storing
 pg_xlog? With 4GB you could have as many as 256 checkpoint segments.
 
 I'm a little leary as it is definitely a version 1.0 product (it is
 still using an FPGA as the controller, so they were obviously pushing to
 get the card into production).
 
 But it seems like this might be a decent way to improve insert
 performance, without setting fsync=false.
 
 Probably it should see some serious testing (as in power spikes/pulled
 plugs, etc). I know the article made some claim that if you actually
 pull out the card it goes into high consumption mode which is somehow
 greater than if you leave it in the slot with the power off. Which to me
 seems like a lot of bull, and really means the 16h is only under
 best-case circumstances. But even 1-2h is sufficient to handle a simple
 power outage.
 
 And if you had a UPS with detection of power failure, you could always
 sync the ramdisk to a local partition before the power goes out. Though
 you could do that with a normal in-memory ramdisk (tmpfs) without having
 to buy the card. Though it does give you up-to an extra 4GB of ram, for
 machines which have already maxed out their slots.
 
 Anyway, I thought I would mention it to the list, to see if anyone else
 has heard of it, or has any thoughts on the matter. I'm sure there are
 some people who are using more expensive ram disks, maybe they have some
 ideas about what this device is missing. (other than costing about
 1/10th the price)
 
 John
 =:-
 
 
 


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner

Please see:

http://www.newegg.com/Product/Product.asp?Item=N82E16820145309
and
http://www.newegg.com/Product/Product.asp?Item=N82E16820145416

The price of Reg ECC is not significantly higher than regular ram at
this point.  Plus if you go with super fast 2-2-2-6 then it's actualy
more than good ol 2.5 Reg ECC.

Alex Turner
NetEconomist

On 7/26/05, PFC [EMAIL PROTECTED] wrote:
 
  I'm a little leary as it is definitely a version 1.0 product (it is
  still using an FPGA as the controller, so they were obviously pushing to
  get the card into production).
 
 Not necessarily. FPGA's have become a sensible choice now. My RME 
 studio
 soundcard uses a big FPGA.
 
 The performance in the test doesn't look that good, though, but don't
 forget it was run under windows. For instance they get 77s to copy the
 Firefox source tree on their Athlon 64/raptor ; my Duron / 7200rpm ide
 drive does it in 30 seconds, but not with windows of course.
 
 However it doesnt' use ECC so... That's a pity, because they could 
 have
 implemented ECC in software inside the chip, and have the benefits of
 error correction with normal, cheap RAM.
 
 Well; wait and see...
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Multiple disks: RAID 5 or PG Cluster

2005-06-18 Thread Alex Turner

Of course these numbers are not true as soon as you exceed the stripe
size for a read operation, which is often only 128k. Typically a
stripe of mirrors will not read from seperate halves of the mirrors
either, so RAID 10 is only N/2 best case in my experience, Raid 0+1 is
a mirror of stripes and will read from independant halves, but gives
worse redundancy.

Alex Turner
NetEconomistOn 6/18/05, Jacques Caron [EMAIL PROTECTED] wrote:
Hi,At 18:00 18/06/2005, PFC wrote: I don't know what I'm talking about, but wouldn't mirorring be fasterthan striping for random reads like you often get on a database ? (ie. the
reads can be dispatched to any disk) ? (or course, not for writes, but ifyou won't use fsync, random writes should be reduced no ?)Roughly, for random reads, the performance (in terms of operations/s)
compared to a single disk setup, with N being the number of drives, is:RAID 0 (striping):- read = N- write = N- capacity = N- redundancy = 0RAID 1 (mirroring, N=2):- read = N- write = 1
- capacity = 1- redundancy = 1RAID 5 (striping + parity, N=3)- read = N-1- write = 1/2- capacity = N-1- redundancy = 1RAID 10 (mirroring + striping, N=2n, N=4)- read = N
- write = N/2- capacity = N/2- redundancy  N/2So depending on your app, i.e. your read/write ratio, how much data can becached, whether the data is important or not, how much data you have, etc,
one or the other option might be better.Jacques.---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Re: [PERFORM] Filesystem

2005-06-03 Thread Alex Turner

We have been using XFS for about 6 months now and it has even tolerated
a controller card crash. So far we have mostly good things to
report about XFS. I benchmarked raw throughputs at various stripe
sizes, and XFS came out on top for us against reiser and ext3. I
also used it because of it's supposed good support for large files,
which was verified somewhat by the benchmarks.

I have noticed a problem though - if you have 80 files in a
directory, it seems that XFS chokes on simple operations like 'ls' or
'chmod -R ...' where ext3 doesn't, don't know about reiser, I went
straight back to default after that problem (that partition is not on a
DB server though).

Alex Turner
netEconomistOn 6/3/05, Martin Fandel [EMAIL PROTECTED] wrote:
Hi @ all,i have only a little question. Which filesystem is preferred forpostgresql? I'm plan to use xfs (before i used reiserfs). The reasonis the xfs_freeze Tool to make filesystem-snapshots.Is the performance better than reiserfs, is it reliable?
best regards,Martin---(end of broadcast)---TIP 3: if posting/reading through Usenet, please send an appropriatesubscribe-nomail command to 
[EMAIL PROTECTED] so that yourmessage can get through to the mailing list cleanly

Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Alex Turner

Until you start worrying about MVC - we have had problems with the
MSSQL implementation of read consistency because of this 'feature'.

Alex Turner
NetEconomistOn 5/24/05, Bruno Wolff III [EMAIL PROTECTED] wrote:
On Tue, May 24, 2005 at 08:36:36 -0700,mark durrant [EMAIL PROTECTED] wrote: --MSSQL's ability to hit the index only and not having to go to the table itself results in a _big_
 performance/efficiency gain. If someone who's in development wants to pass this along, it would be a nice addition to PostgreSQL sometime in the future. I'd suspect that as well as making one query faster,
 it would make everything else faster/more scalable as the server load is so much less.This gets brought up a lot. The problem is that the index doesn't includeinformation about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overheadto every update and the opinion of the developers is that this would bea net loss overall.---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner

Having local sessions is unnesesary, and here is my logic:

Generaly most people have less than 100Mb of bandwidth to the internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth. 
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.

Why solve the complicated clustered sessions problem, when you don't
really need to?

Alex Turner
netEconomist

On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 So, what is the advantage of such a system versus, say, a sticky
 sessions system where each session is assigned to ONE application server
 (not PHP then) which keeps it in RAM as native objects instead of
 serializing and deserializing it on each request ?
 I'd say the sticky sessions should perform a lot better, and if one
 machine dies, only the sessions on this one are lost.
 But of course you can't do it with PHP as you need an app server which
 can manage sessions. Potentially the savings are huge, though.
 
 On Google, their distributed system spans a huge number of PCs and it 
 has
 redundancy, ie. individual PC failure is a normal thing and is a part of
 the system, it is handled gracefully. I read a paper on this matter, it's
 pretty impressive. The google filesystem has nothing to do with databases
 though, it's more a massive data store / streaming storage.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner

Ok - my common sense alarm is going off here...

There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.

http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.

That means each and every person on the internet has to view 100 pages
per day of yahoo.

pretty unlikely IMHO.  I for one don't even use Yahoo ;)

100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.

If I visit 20 web sites in a day, and see an average of 10 pages per
site. that means only about 2000 or so sites generate 100 million page
views in a day or better.

100 million pageviews averages to 1157/sec, which we'll double for
peak load to 2314.

I can easily see a system doing 2314 hash lookups per second.  Hell I
wrote a system that could do a thousand times that four years ago on a
single 1Ghz Athlon.  Heck - you can get 2314 lookups/sec on a 486 ;)

Given that session information doesn't _have_ to persist to storage,
and can be kept in RAM.  A single server could readily manage session
information for even very large sites (of course over a million
concurrent users could really start chewing into RAM, but if you are
Yahoo, you can probably afford a box with 100GB of RAM ;).

We get over 1000 tps on a dual opteron with a couple of mid size RAID
arrays on 10k discs with fsync on for small transactions.  I'm sure
that could easily be bettered with a few more dollars.

Maybe my number are off, but somehow it doesn't seem like that many
people need a highly complex session solution to me.

Alex Turner
netEconomist

On 5/12/05, Alex Stapleton [EMAIL PROTECTED] wrote:
 
 On 12 May 2005, at 15:08, Alex Turner wrote:
 
  Having local sessions is unnesesary, and here is my logic:
 
  Generaly most people have less than 100Mb of bandwidth to the
  internet.
 
  If you make the assertion that you are transferring equal or less
  session data between your session server (lets say an RDBMS) and the
  app server than you are between the app server and the client, an out
  of band 100Mb network for session information is plenty of bandwidth.
  This also represents OLTP style traffic, which postgresql is pretty
  good at.  You should easily be able to get over 100Tps.  100 hits per
  second is an awful lot of traffic, more than any website I've managed
  will ever see.
 
  Why solve the complicated clustered sessions problem, when you don't
  really need to?
 
 100 hits a second = 8,640,000 hits a day. I work on a site which does
   100 million dynamic pages a day. In comparison Yahoo probably does
   100,000,000,000 (100 billion) views a day
   if I am interpreting Alexa's charts correctly. Which is about
 1,150,000 a second.
 
 Now considering the site I work on is not even in the top 1000 on
 Alexa, theres a lot of sites out there which need to solve this
 problem I would assume.
 
 There are also only so many hash table lookups a single machine can
 do, even if its a Quad Opteron behemoth.
 
 
  Alex Turner
  netEconomist
 
  On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 
  So, what is the advantage of such a system versus, say, a
  sticky
  sessions system where each session is assigned to ONE application
  server
  (not PHP then) which keeps it in RAM as native objects instead of
  serializing and deserializing it on each request ?
  I'd say the sticky sessions should perform a lot better,
  and if one
  machine dies, only the sessions on this one are lost.
  But of course you can't do it with PHP as you need an app
  server which
  can manage sessions. Potentially the savings are huge, though.
 
  On Google, their distributed system spans a huge number of
  PCs and it has
  redundancy, ie. individual PC failure is a normal thing and is a
  part of
  the system, it is handled gracefully. I read a paper on this
  matter, it's
  pretty impressive. The google filesystem has nothing to do with
  databases
  though, it's more a massive data store / streaming storage.
 
  ---(end of
  broadcast)---
  TIP 1: subscribe and unsubscribe commands go to
  [EMAIL PROTECTED]
 
 
 
 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner

Is:

REINDEX DATABASE blah

supposed to rebuild all indices in the database, or must you specify
each table individualy? (I'm asking because I just tried it and it
only did system tables)

Alex Turner
netEconomist

On 4/21/05, Josh Berkus josh@agliodbs.com wrote:
 Bill,
 
  What about if an out-of-the-ordinary number of rows
  were deleted (say 75% of rows in the table, as opposed
  to normal 5%) followed by a 'VACUUM ANALYZE'? Could
  things get out of whack because of that situation?
 
 Yes.  You'd want to run REINDEX after and event like that.  As you should now.
 
 --
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner

Same thing happens in Oracle

ALTER INDEX blah rebuild

To force a rebuild.  It will mark the free blocks as 'free' below the
PCTFREE value for the tablespace.

Basically If you build an index with  entries.  and each entry is
1/4 of a block, the database will write 2500 blocks to the disk.  If
you delete a random 75% of the index values, you will now have 2500
blocks that have 75% free space.  The database will reuse that free
space in those blocks as you insert new values, but until then, you
still have 2500 blocks worth of data on a disk, that is only 25% full.
 Rebuilding the index forces the system to physically re-allocate all
that data space, and now you have just 2499 entries, that use 625
blocks.

I'm not sure that 'blocks' is the correct term in postgres, it's
segments in Oracle, but the concept remains the same.

Alex Turner
netEconomist

On 4/21/05, Bill Chandler [EMAIL PROTECTED] wrote:
 
 --- Josh Berkus josh@agliodbs.com wrote:
  Bill,
 
   What about if an out-of-the-ordinary number of
  rows
   were deleted (say 75% of rows in the table, as
  opposed
   to normal 5%) followed by a 'VACUUM ANALYZE'?
  Could
   things get out of whack because of that situation?
 
  Yes.  You'd want to run REINDEX after and event like
  that.  As you should now.
 
  --
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
 
 Thank you.  Though I must say, that is very
 discouraging.  REINDEX is a costly operation, timewise
 and due to the fact that it locks out other processes
 from proceeding.  Updates are constantly coming in and
 queries are occurring continuously.  A REINDEX could
 potentially bring the whole thing to a halt.
 
 Honestly, this seems like an inordinate amount of
 babysitting for a production application.  I'm not
 sure if the client will be willing to accept it.
 
 Admittedly my knowledge of the inner workings of an
 RDBMS is limited, but could somebody explain to me why
 this would be so?  If you delete a bunch of rows why
 doesn't the index get updated at the same time?  Is
 this a common issue among all RDBMSs or is it
 something that is PostgreSQL specific?  Is there any
 way around it?
 
 thanks,
 
 Bill
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-20 Thread Alex Turner

I wonder if thats something to think about adding to Postgresql? A
setting for multiblock read count like Oracle (Although having said
that I believe that Oracle natively caches pages much more
aggressively that postgresql, which allows the OS to do the file
caching).

Alex Turner
netEconomist

P.S. Oracle changed this with 9i, you can change the Database block
size on a tablespace by tablespace bassis making it smaller for OLTP
tablespaces and larger for Warehousing tablespaces (at least I think
it's on a tablespace, might be on a whole DB).

On 4/19/05, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote:
  Don't you think optimal stripe width would be
  a good question to research the binaries for? I'd
  think that drives the answer, largely.  (uh oh, pun alert)
 
  EG, oracle issues IO requests (this may have changed _just_
  recently) in 64KB chunks, regardless of what you ask for.
  So when I did my striping (many moons ago, when the Earth
  was young...) I did it in 128KB widths, and set the oracle
  multiblock read count according. For oracle, any stripe size
  under 64KB=stupid, anything much over 128K/258K=wasteful.
 
  I am eager to find out how PG handles all this.
 
 AFAIK PostgreSQL requests data one database page at a time (normally
 8k). Of course the OS might do something different.
 --
 Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
 Give your computer some brain candy! www.distributed.net Team #1828
 
 Windows: Where do you want to go today?
 Linux: Where do you want to go tomorrow?
 FreeBSD: Are you guys coming, or what?
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to improve db performance with $7K?

2005-04-20 Thread Alex Turner

Whilst I admire your purist approach, I would say that if it is
beneficial to performance that a kernel understand drive geometry,
then it is worth investigating teaching it how to deal with that!

I was less referrring to the kernel as I was to the controller.

Lets say we invented a new protocol that including the drive telling
the controller how it was layed out at initialization time so that the
controller could make better decisions about re-ordering seeks.  It
would be more cost effective to have that set of electronics just once
in the controller, than 8 times on each drive in an array, which would
yield better performance to cost ratio.  Therefore I would suggest it
is something that should be investigated.  After all, why implemented
TCQ on each drive, if it can be handled more effeciently at the other
end by the controller for less money?!

Alex Turner
netEconomist

On 4/19/05, Dave Held [EMAIL PROTECTED] wrote:
  -Original Message-
  From: Alex Turner [mailto:[EMAIL PROTECTED]
  Sent: Monday, April 18, 2005 5:50 PM
  To: Bruce Momjian
  Cc: Kevin Brown; pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] How to improve db performance with $7K?
 
  Does it really matter at which end of the cable the queueing is done
  (Assuming both ends know as much about drive geometry etc..)?
  [...]
 
 The parenthetical is an assumption I'd rather not make.  If my
 performance depends on my kernel knowing how my drive is laid
 out, I would always be wondering if a new drive is going to
 break any of the kernel's geometry assumptions.  Drive geometry
 doesn't seem like a kernel's business any more than a kernel
 should be able to decode the ccd signal of an optical mouse.
 The kernel should queue requests at a level of abstraction that
 doesn't depend on intimate knowledge of drive geometry, and the
 drive should queue requests on the concrete level where geometry
 matters.  A drive shouldn't guess whether a process is trying to
 read a file sequentially, and a kernel shouldn't guess whether
 sector 30 is contiguous with sector 31 or not.
 
 __
 David B. Held
 Software Engineer/Array Services Group
 200 14th Ave. East,  Sartell, MN 56377
 320.534.3637 320.253.7800 800.752.8129
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

This is fundamentaly untrue.

A mirror is still a mirror.  At most in a RAID 10 you can have two
simultaneous seeks.  You are always going to be limited by the seek
time of your drives.  It's a stripe, so you have to read from all
members of the stripe to get data, requiring all drives to seek. 
There is no advantage to seek time in adding more drives.  By adding
more drives you can increase throughput, but the max throughput of the
PCI-X bus isn't that high (I think around 400MB/sec)  You can easily
get this with a six or seven drive RAID 5, or a ten drive RAID 10.  At
that point you start having to factor in the cost of a bigger chassis
to hold more drives, which can be big bucks.

Alex Turner
netEconomist

On 18 Apr 2005 10:59:05 -0400, Greg Stark [EMAIL PROTECTED] wrote:
 
 William Yu [EMAIL PROTECTED] writes:
 
  Using the above prices for a fixed budget for RAID-10, you could get:
 
  SATA 7200 -- 680MB per $1000
  SATA 10K  -- 200MB per $1000
  SCSI 10K  -- 125MB per $1000
 
 What a lot of these analyses miss is that cheaper == faster because cheaper
 means you can buy more spindles for the same price. I'm assuming you picked
 equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as
 many spindles as the 125MB/$1000. That means it would have almost double the
 bandwidth. And the 7200 RPM case would have more than 5x the bandwidth.
 
 While 10k RPM drives have lower seek times, and SCSI drives have a natural
 seek time advantage, under load a RAID array with fewer spindles will start
 hitting contention sooner which results into higher latency. If the controller
 works well the larger SATA arrays above should be able to maintain their
 mediocre latency much better under load than the SCSI array with fewer drives
 would maintain its low latency response time despite its drives' lower average
 seek time.
 
 --
 greg
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

[snip]
 
 Adding drives will not let you get lower response times than the average seek
 time on your drives*. But it will let you reach that response time more often.
 
[snip]

I believe your assertion is fundamentaly flawed.  Adding more drives
will not let you reach that response time more often.  All drives are
required to fill every request in all RAID levels (except possibly
0+1, but that isn't used for enterprise applicaitons).  Most requests
in OLTP require most of the request time to seek, not to read.  Only
in single large block data transfers will you get any benefit from
adding more drives, which is atypical in most database applications. 
For most database applications, the only way to increase
transactions/sec is to decrease request service time, which is
generaly achieved with better seek times or a better controller card,
or possibly spreading your database accross multiple tablespaces on
seperate paritions.

My assertion therefore is that simply adding more drives to an already
competent* configuration is about as likely to increase your database
effectiveness as swiss cheese is to make your car run faster.

Alex Turner
netEconomist

*Assertion here is that the DBA didn't simply configure all tables and
xlog on a single 7200 RPM disk, but has seperate physical drives for
xlog and tablespace at least on 10k drives.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at
least I would never recommend 1+0 for anything).

RAID 10 and RAID 0+1 are _quite_ different.  One gives you very good
redundancy, the other is only slightly better than RAID 5, but
operates faster in degraded mode (single drive).

Alex Turner
netEconomist

On 4/18/05, John A Meinel [EMAIL PROTECTED] wrote:
 Alex Turner wrote:
 
 [snip]
 
 
 Adding drives will not let you get lower response times than the average 
 seek
 time on your drives*. But it will let you reach that response time more 
 often.
 
 
 
 [snip]
 
 I believe your assertion is fundamentaly flawed.  Adding more drives
 will not let you reach that response time more often.  All drives are
 required to fill every request in all RAID levels (except possibly
 0+1, but that isn't used for enterprise applicaitons).
 
 Actually 0+1 is the recommended configuration for postgres databases
 (both for xlog and for the bulk data), because the write speed of RAID5
 is quite poor.
 Hence you base assumption is not correct, and adding drives *does* help.
 
 Most requests
 in OLTP require most of the request time to seek, not to read.  Only
 in single large block data transfers will you get any benefit from
 adding more drives, which is atypical in most database applications.
 For most database applications, the only way to increase
 transactions/sec is to decrease request service time, which is
 generaly achieved with better seek times or a better controller card,
 or possibly spreading your database accross multiple tablespaces on
 seperate paritions.
 
 
 This is probably true. However, if you are doing lots of concurrent
 connections, and things are properly spread across multiple spindles
 (using RAID0+1, or possibly tablespaces across multiple raids).
 Then each seek occurs on a separate drive, which allows them to occur at
 the same time, rather than sequentially. Having 2 processes competing
 for seeking on the same drive is going to be worse than having them on
 separate drives.
 John
 =:-
 
 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

I think the add more disks thing is really from the point of view that
one disk isn't enough ever.  You should really have at least four
drives configured into two RAID 1s.  Most DBAs will know this, but
most average Joes won't.

Alex Turner
netEconomist

On 4/18/05, Steve Poe [EMAIL PROTECTED] wrote:
 Alex,
 
 In the situation of the animal hospital server I oversee, their
 application is OLTP. Adding hard drives (6-8) does help performance.
 Benchmarks like pgbench and OSDB agree with it, but in reality users
 could not see noticeable change. However, moving the top 5/10 tables and
 indexes to their own space made a greater impact.
 
 Someone who reads PostgreSQL 8.0 Performance Checklist is going to see
 point #1 add more disks is the key. How about adding a subpoint to
 explaining when more disks isn't enough or applicable? I maybe
 generalizing the complexity of tuning an OLTP application, but some
 clarity could help.
 
 Steve Poe
 


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

Ok - well - I am partially wrong...

If you're stripe size is 64Kb, and you are reading 256k worth of data,
it will be spread across four drives, so you will need to read from
four devices to get your 256k of data (RAID 0 or 5 or 10), but if you
are only reading 64kb of data, I guess you would only need to read
from one disk.

So my assertion that adding more drives doesn't help is pretty
wrong... particularly with OLTP because it's always dealing with
blocks that are smaller that the stripe size.

Alex Turner
netEconomist

On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote:
 Hi,
 
 At 18:56 18/04/2005, Alex Turner wrote:
 All drives are required to fill every request in all RAID levels
 
 No, this is definitely wrong. In many cases, most drives don't actually
 have the data requested, how could they handle the request?
 
 When reading one random sector, only *one* drive out of N is ever used to
 service any given request, be it RAID 0, 1, 0+1, 1+0 or 5.
 
 When writing:
 - in RAID 0, 1 drive
 - in RAID 1, RAID 0+1 or 1+0, 2 drives
 - in RAID 5, you need to read on all drives and write on 2.
 
 Otherwise, what would be the point of RAID 0, 0+1 or 1+0?
 
 Jacques.
 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

So I wonder if one could take this stripe size thing further and say
that a larger stripe size is more likely to result in requests getting
served parallized across disks which would lead to increased
performance?

Again, thanks to all people on this list, I know that I have learnt a
_hell_ of alot since subscribing.

Alex Turner
netEconomist

On 4/18/05, Alex Turner [EMAIL PROTECTED] wrote:
 Ok - well - I am partially wrong...
 
 If you're stripe size is 64Kb, and you are reading 256k worth of data,
 it will be spread across four drives, so you will need to read from
 four devices to get your 256k of data (RAID 0 or 5 or 10), but if you
 are only reading 64kb of data, I guess you would only need to read
 from one disk.
 
 So my assertion that adding more drives doesn't help is pretty
 wrong... particularly with OLTP because it's always dealing with
 blocks that are smaller that the stripe size.
 
 Alex Turner
 netEconomist
 
 On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote:
  Hi,
 
  At 18:56 18/04/2005, Alex Turner wrote:
  All drives are required to fill every request in all RAID levels
 
  No, this is definitely wrong. In many cases, most drives don't actually
  have the data requested, how could they handle the request?
 
  When reading one random sector, only *one* drive out of N is ever used to
  service any given request, be it RAID 0, 1, 0+1, 1+0 or 5.
 
  When writing:
  - in RAID 0, 1 drive
  - in RAID 1, RAID 0+1 or 1+0, 2 drives
  - in RAID 5, you need to read on all drives and write on 2.
 
  Otherwise, what would be the point of RAID 0, 0+1 or 1+0?
 
  Jacques.
 
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

Mistype.. I meant 0+1 in the second instance :(


On 4/18/05, Joshua D. Drake [EMAIL PROTECTED] wrote:
 Alex Turner wrote:
  Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at
  least I would never recommend 1+0 for anything).
 
 Uhmm I was under the impression that 1+0 was RAID 10 and that 0+1 is NOT
 RAID 10.
 
 Ref: http://www.acnc.com/raid.html
 
 Sincerely,
 
 Joshua D. Drake
 
 
  ---(end of broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote:
 Hi,
 
 At 20:21 18/04/2005, Alex Turner wrote:
 So I wonder if one could take this stripe size thing further and say
 that a larger stripe size is more likely to result in requests getting
 served parallized across disks which would lead to increased
 performance?
 
 Actually, it would be pretty much the opposite. The smaller the stripe
 size, the more evenly distributed data is, and the more disks can be used
 to serve requests. If your stripe size is too large, many random accesses
 within one single file (whose size is smaller than the stripe size/number
 of disks) may all end up on the same disk, rather than being split across
 multiple disks (the extreme case being stripe size = total size of all
 disks, which means concatenation). If all accesses had the same cost (i.e.
 no seek time, only transfer time), the ideal would be to have a stripe size
 equal to the number of disks.
 
[snip]

Ahh yes - but the critical distinction is this:
The smaller the stripe size, the more disks will be used to serve _a_
request - which is bad for OLTP because you want fewer disks per
request so that you can have more requests per second because the cost
is mostly seek.  If more than one disk has to seek to serve a single
request, you are preventing that disk from serving a second request at
the same time.

To have more throughput in MB/sec, you want a smaller stripe size so
that you have more disks serving a single request allowing you to
multiple by effective drives to get total bandwidth.

Because OLTP is made up of small reads and writes to a small number of
different files, I would guess that you want those files split up
across your RAID, but not so much that a single small read or write
operation would traverse more than one disk.   That would infer that
your optimal stripe size is somewhere on the right side of the bell
curve that represents your database read and write block count
distribution.  If on average the dbwritter never flushes less than 1MB
to disk at a time, then I guess your best stripe size would be 1MB,
but that seems very large to me.

So I think therefore that I may be contending the exact opposite of
what you are postulating!

Alex Turner
netEconomist

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner

Does it really matter at which end of the cable the queueing is done
(Assuming both ends know as much about drive geometry etc..)?

Alex Turner
netEconomist

On 4/18/05, Bruce Momjian pgman@candle.pha.pa.us wrote:
 Kevin Brown wrote:
  Greg Stark wrote:
 
 
   I think you're being misled by analyzing the write case.
  
   Consider the read case. When a user process requests a block and
   that read makes its way down to the driver level, the driver can't
   just put it aside and wait until it's convenient. It has to go ahead
   and issue the read right away.
 
  Well, strictly speaking it doesn't *have* to.  It could delay for a
  couple of milliseconds to see if other requests come in, and then
  issue the read if none do.  If there are already other requests being
  fulfilled, then it'll schedule the request in question just like the
  rest.
 
 The idea with SCSI or any command queuing is that you don't have to wait
 for another request to come in --- you can send the request as it
 arrives, then if another shows up, you send that too, and the drive
 optimizes the grouping at a later time, knowing what the drive is doing,
 rather queueing in the kernel.
 
 --
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner

No offense to that review, but it was really wasn't that good, and
drew bad conclusions from the data.  I posted it originaly and
immediately regretted it.

See http://www.tweakers.net/reviews/557/18

Amazingly the controller with 1Gig cache manages a write throughput of
750MB/sec on a single drive.

quote:
Floating high above the crowd, the ARC-1120 has a perfect view on the
struggles of the other adapters. 

It's because the adapter has 1Gig of RAM, nothing to do with the RAID
architecture, it's clearly caching the entire dataset.  The drive
can't physicaly run that fast.  These guys really don't know what they
are doing.

Curiously:
http://www.tweakers.net/reviews/557/25

The 3ware does very well as a data drive for MySQL.

The size of your cache is going to _directly_ affect RAID 5
performance.  Put a gig of memory in a 3ware 9500S and benchmark it
against the Areca then.

Also - folks don't run data paritions on RAID 5 because the write
speed is too low.  When you look at the results for RAID 10, the 3ware
leads the pack.

See also:
http://www20.tomshardware.com/storage/20041227/areca-raid6-06.html

I trust toms hardware a little more to set up a good review to be honest.

The 3ware trounces the Areca in all IO/sec test.

Alex Turner
netEconomist

On 4/15/05, Marinos Yannikos [EMAIL PROTECTED] wrote:
 Joshua D. Drake wrote:
  Well I have never even heard of it. 3ware is the defacto authority of
  reasonable SATA RAID.
 
 no! 3ware was rather early in this business, but there are plenty of
 (IMHO, and some other people's opinion) better alternatives available.
 3ware has good Linux drivers, but the performance of their current
 controllers isn't that good.
 
 Have a look at this: http://www.tweakers.net/reviews/557/1
 
 especially the sequential writes with RAID-5 on this page:
 
 http://www.tweakers.net/reviews/557/19
 
 We have been a long-time user of a 3ware 8506 controller (8 disks,
 RAID-5) and have purchased 2 Areca ARC-1120 now since we weren't
 satisfied with the performance and the 2TB per array limit...
 
 -mjy
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner

The original thread was how much can I get for $7k

You can't fit a 15k RPM SCSI solution into $7K ;)  Some of us are on a budget!

10k RPM SATA drives give acceptable performance at a good price, thats
really the point here.

I have never really argued that SATA is going to match SCSI
performance on multidrive arrays for IO/sec.  But it's all about the
benjamins baby.  If I told my boss we need $25k for a database
machine, he'd tell me that was impossible, and I have $5k to do it. 
If I tell him $7k - he will swallow that.  We don't _need_ the amazing
performance of a 15k RPM drive config.  Our biggest hit is reads, so
we can buy 3xSATA machines and load balance.  It's all about the
application, and buying what is appropriate.  I don't buy a Corvette
if all I need is a malibu.

Alex Turner
netEconomist

On 4/15/05, Dave Held [EMAIL PROTECTED] wrote:
  -Original Message-
  From: Alex Turner [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 14, 2005 6:15 PM
  To: Dave Held
  Cc: pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
 
  Looking at the numbers, the raptor with TCQ enabled was close or
  beat the Atlas III 10k drive on most benchmarks.
 
 And I would be willing to bet that the Atlas 10k is not using the
 same generation of technology as the Raptors.
 
  Naturaly a 15k drive is going to be faster in many areas, but it
  is also much more expensive.  It was only 44% better on the server
  tests than the raptor with TCQ, but it costs nearly 300% more ($538
  cdw.com, $180 newegg.com).
 
 State that in terms of cars.  Would you be willing to pay 300% more
 for a car that is 44% faster than your competitor's?  Of course you
 would, because we all recognize that the cost of speed/performance
 does not scale linearly.  Naturally, you buy the best speed that you
 can afford, but when it comes to hard drives, the only major feature
 whose price tends to scale anywhere close to linearly is capacity.
 
  Note also that the 15k drive was the only drive that kept up with
  the raptor on raw transfer speed, which is going to matter for WAL.
 
 So get a Raptor for your WAL partition. ;)
 
  [...]
  The Raptor drives can be had for as little as $180/ea, which is
  quite a good price point considering they can keep up with their
  SCSI 10k RPM counterparts on almost all tests with NCQ enabled
  (Note that 3ware controllers _don't_ support NCQ, although they
  claim their HBA based queueing is 95% as good as NCQ on the drive).
 
 Just keep in mind the points made by the Seagate article.  You're
 buying much more than just performance for that $500+.  You're also
 buying vibrational tolerance, high MTBF, better internal
 environmental controls, and a pretty significant margin on seek time,
 which is probably your most important feature for disks storing tables.
 An interesting test would be to stick several drives in a cabinet and
 graph how performance is affected at the different price points/
 technologies/number of drives.
 
 __
 David B. Held
 Software Engineer/Array Services Group
 200 14th Ave. East,  Sartell, MN 56377
 320.534.3637 320.253.7800 800.752.8129
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner

I stand corrected!

Maybe I should re-evaluate our own config!

Alex T

(The dell PERC controllers do pretty much suck on linux)

On 4/15/05, Vivek Khera [EMAIL PROTECTED] wrote:
 
 On Apr 15, 2005, at 11:01 AM, Alex Turner wrote:
 
  You can't fit a 15k RPM SCSI solution into $7K ;)  Some of us are on a
  budget!
 
 
 I just bought a pair of Dual Opteron, 4GB RAM, LSI 320-2X RAID dual
 channel with 8 36GB 15kRPM seagate drives.  Each one of these boxes set
 me back just over $7k, including onsite warrantee.
 
 They totally blow away the Dell Dual XEON with external 14 disk RAID
 (also 15kRPM drives, manufacturer unknown) which also has 4GB RAM and a
 Dell PERC 3/DC controller, the whole of which set me back over $15k.
 
 Vivek Khera, Ph.D.
 +1-301-869-4449 x806
 
 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner

I have read a large chunk of this, and I would highly recommend it to
anyone who has been participating in the drive discussions.  It is
most informative!!

Alex Turner
netEconomist

On 4/14/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Greg,
 
 I posted this link under a different thread (the $7k server thread).  It is
 a very good read on why SCSI is better for servers than ATA.  I didn't note
 bias, though it is from a drive manufacturer.  YMMV.  There is an
 interesting, though dated appendix on different manufacturers' drive
 characteristics.
 
 http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
 
 Enjoy,
 
 Rick
 
 [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM:
 
 
  Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller
  instead of the 3ware one.
 
  Poking around it seems this does come with Linux drivers and there is a
  battery backup option. So it doesn't seem to be completely insane.
 
  Anyone have any experience with these controllers?
 
  I'm also wondering about whether I'm better off with one of these SATA
 raid
  controllers or just going with SCSI drives.
 
  --
  greg
 
 
  ---(end of broadcast)---
  TIP 8: explain analyze is your friend
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner

I have put together a little head to head performance of a 15k SCSI,
10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
comparison at storage review

http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=10testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=40devID_2=259devID_3=267devID_4=261devID_5=248devCnt=6

It does illustrate some of the weaknesses of SATA drives, but all in
all the Raptor drives put on a good show.

Alex Turner
netEconomist

On 4/14/05, Alex Turner [EMAIL PROTECTED] wrote:
 I have read a large chunk of this, and I would highly recommend it to
 anyone who has been participating in the drive discussions.  It is
 most informative!!
 
 Alex Turner
 netEconomist
 
 On 4/14/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Greg,
 
  I posted this link under a different thread (the $7k server thread).  It is
  a very good read on why SCSI is better for servers than ATA.  I didn't note
  bias, though it is from a drive manufacturer.  YMMV.  There is an
  interesting, though dated appendix on different manufacturers' drive
  characteristics.
 
  http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
 
  Enjoy,
 
  Rick
 
  [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM:
 
  
   Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller
   instead of the 3ware one.
  
   Poking around it seems this does come with Linux drivers and there is a
   battery backup option. So it doesn't seem to be completely insane.
  
   Anyone have any experience with these controllers?
  
   I'm also wondering about whether I'm better off with one of these SATA
  raid
   controllers or just going with SCSI drives.
  
   --
   greg
  
  
   ---(end of broadcast)---
   TIP 8: explain analyze is your friend
 
  ---(end of broadcast)---
  TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner

Looking at the numbers, the raptor with TCQ enabled was close or beat
the Atlas III 10k drive on most benchmarks.

Naturaly a 15k drive is going to be faster in many areas, but it is
also much more expensive.  It was only 44% better on the server tests
than the raptor with TCQ, but it costs nearly 300% more ($538 cdw.com,
$180 newegg.com).  Note also that the 15k drive was the only drive
that kept up with the raptor on raw transfer speed, which is going to
matter for WAL.

For those of us on a budget, a quality controller card with lots of
RAM is going to be our biggest friend because it can cache writes, and
improve performance.  The 3ware controllers seem to be universally
benchmarked as the best SATA RAID 10 controllers where database
performance is concerned.  Even the crappy tweakers.net review had the
3ware as the fastest controller for a MySQL data partition in RAID 10.

The Raptor drives can be had for as little as $180/ea, which is quite
a good price point considering they can keep up with their SCSI 10k
RPM counterparts on almost all tests with NCQ enabled (Note that 3ware
controllers _don't_ support NCQ, although they claim their HBA based
queueing is 95% as good as NCQ on the drive).

Alex Turner
netEconomist

On 4/14/05, Dave Held [EMAIL PROTECTED] wrote:
  -Original Message-
  From: Alex Turner [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 14, 2005 12:14 PM
  To: [EMAIL PROTECTED]
  Cc: Greg Stark; pgsql-performance@postgresql.org;
  [EMAIL PROTECTED]
  Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
 
 
  I have put together a little head to head performance of a 15k SCSI,
  10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
  comparison at storage review
 
  http://www.storagereview.com/php/benchmark/compare_rtg_2001.ph
  p?typeID=10testbedID=3osID=4raidconfigID=1numDrives=1devI
  D_0=232devID_1=40devID_2=259devID_3=267devID_4=261devID_5
  =248devCnt=6
 
  It does illustrate some of the weaknesses of SATA drives, but all in
  all the Raptor drives put on a good show.
  [...]
 
 I think it's a little misleading that your tests show 0ms seek times
 for some of the write tests.  The environmental test also selects a
 missing data point as the winner.  Besides that, it seems to me that
 seek time is one of the most important features for a DB server, which
 means that the SCSI drives are the clear winners and the non-WD SATA
 drives are the embarrassing losers.  Transfer rate is import, but
 perhaps less so because DBs tend to read/write small blocks rather
 than large files.  On the server suite, which seems to me to be the
 most relevant for DBs, the Atlas 15k spanks the other drives by a
 fairly large margin (especially the lesser SATA drives).  When you
 ignore the consumer app benchmarks, I wouldn't be so confident in
 saying that the Raptors put on a good show.
 
 __
 David B. Held
 Software Engineer/Array Services Group
 200 14th Ave. East,  Sartell, MN 56377
 320.534.3637 320.253.7800 800.752.8129
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] How to improve db performance with $7K?

2005-04-14 Thread Alex Turner

3ware claim that their 'software' implemented command queueing
performs at 95% effectiveness compared to the hardware queueing on a
SCSI drive, so I would say that they agree with you.

I'm still learning, but as I read it, the bits are split across the
platters and there is only 'one' head, but happens to be reading from
multiple platters.  The 'further' in linear distance the data is from
the current position, the longer it's going to take to get there. 
This seems to be true based on a document that was circulated.  A hard
drive takes considerable amount of time to 'find' a track on the
platter compared to the rotational speed, which would agree with the
fact that you can read 70MB/sec, but it takes up to 13ms to seek.

the ATA protocol is just how the HBA communicates with the drive,
there is no reason why the HBA can't reschedule reads and writes just
the like SCSI drive would do natively, and this is what infact 3ware
claims.  I get the feeling based on my own historical experience that
generaly drives don't just have a bunch of bad blocks.  This all leads
me to believe that you can predict with pretty good accuracy how
expensive it is to retrieve a given block knowing it's linear
increment.

Alex Turner
netEconomist

On 4/14/05, Kevin Brown [EMAIL PROTECTED] wrote:
 Tom Lane wrote:
  Kevin Brown [EMAIL PROTECTED] writes:
   I really don't see how this is any different between a system that has
   tagged queueing to the disks and one that doesn't.  The only
   difference is where the queueing happens.  In the case of SCSI, the
   queueing happens on the disks (or at least on the controller).  In the
   case of SATA, the queueing happens in the kernel.
 
  That's basically what it comes down to: SCSI lets the disk drive itself
  do the low-level I/O scheduling whereas the ATA spec prevents the drive
  from doing so (unless it cheats, ie, caches writes).  Also, in SCSI it's
  possible for the drive to rearrange reads as well as writes --- which
  AFAICS is just not possible in ATA.  (Maybe in the newest spec...)
 
  The reason this is so much more of a win than it was when ATA was
  designed is that in modern drives the kernel has very little clue about
  the physical geometry of the disk.  Variable-size tracks, bad-block
  sparing, and stuff like that make for a very hard-to-predict mapping
  from linear sector addresses to actual disk locations.
 
 Yeah, but it's not clear to me, at least, that this is a first-order
 consideration.  A second-order consideration, sure, I'll grant that.
 
 What I mean is that when it comes to scheduling disk activity,
 knowledge of the specific physical geometry of the disk isn't really
 important.  What's important is whether or not the disk conforms to a
 certain set of expectations.  Namely, that the general organization is
 such that addressing the blocks in block number order guarantees
 maximum throughput.
 
 Now, bad block remapping destroys that guarantee, but unless you've
 got a LOT of bad blocks, it shouldn't destroy your performance, right?
 
  Combine that with the fact that the drive controller can be much
  smarter than it was twenty years ago, and you can see that the case
  for doing I/O scheduling in the kernel and not in the drive is
  pretty weak.
 
 Well, I certainly grant that allowing the controller to do the I/O
 scheduling is faster than having the kernel do it, as long as it can
 handle insertion of new requests into the list while it's in the
 middle of executing a request.  The most obvious case is when the head
 is in motion and the new request can be satisfied by reading from the
 media between where the head is at the time of the new request and
 where the head is being moved to.
 
 My argument is that a sufficiently smart kernel scheduler *should*
 yield performance results that are reasonably close to what you can
 get with that feature.  Perhaps not quite as good, but reasonably
 close.  It shouldn't be an orders-of-magnitude type difference.
 
 --
 Kevin Brown   [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Alex Turner

Based on the reading I'm doing, and somebody please correct me if I'm
wrong, it seems that SCSI drives contain an on disk controller that
has to process the tagged queue.  SATA-I doesn't have this.  This
additional controller, is basicaly an on board computer that figures
out the best order in which to process commands.  I believe you are
also paying for the increased tolerance that generates a better speed.
 If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM
to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one
represents a step up in parts and technology, thereby generating a
cost increase (at least thats what the manufactures tell us).  I know
if you ever held a 15k drive in your hand, you can notice a
considerable weight difference between it and a 7200RPM IDE drive.

Alex Turner
netEconomist

On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 Another simple question: Why is SCSI more expensive?  After the
 eleventy-millionth controller is made, it seems like SCSI and SATA are
 using a controller board and a spinning disk.  Is somebody still making
 money by licensing SCSI technology?
 
 Rick
 
 [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:
 
  You asked for it!  ;-)
 
  If you want cheap, get SATA.  If you want fast under
  *load* conditions, get SCSI.  Everything else at this
  time is marketing hype, either intentional or learned.
  Ignoring dollars, expect to see SCSI beat SATA by 40%.
 
   * * * What I tell you three times is true * * *
 
  Also, compare the warranty you get with any SATA
  drive with any SCSI drive.  Yes, you still have some
  change leftover to buy more SATA drives when they
  fail, but... it fundamentally comes down to some
  actual implementation and not what is printed on
  the cardboard box.  Disk systems are bound by the
  rules of queueing theory.  You can hit the sales rep
  over the head with your queueing theory book.
 
  Ultra320 SCSI is king of the hill for high concurrency
  databases.  If you're only streaming or serving files,
  save some money and get a bunch of SATA drives.
  But if you're reading/writing all over the disk, the
  simple first-come-first-serve SATA heuristic will
  hose your performance under load conditions.
 
  Next year, they will *try* bring out some SATA cards
  that improve on first-come-first-serve, but they ain't
  here now.  There are a lot of rigged performance tests
  out there...  Maybe by the time they fix the queueing
  problems, serial Attached SCSI (a/k/a SAS) will be out.
  Looks like Ultra320 is the end of the line for parallel
  SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
  water.
 
  Ultra320 SCSI.
  Ultra320 SCSI.
  Ultra320 SCSI.
 
  Serial Attached SCSI.
  Serial Attached SCSI.
  Serial Attached SCSI.
 
  For future trends, see:
  http://www.incits.org/archive/2003/in031163/in031163.htm
 
  douglas
 
  p.s. For extra credit, try comparing SATA and SCSI drives
  when they're 90% full.
 
  On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
 
   I guess I'm setting myself up here, and I'm really not being ignorant,
   but can someone explain exactly how is SCSI is supposed to better than
   SATA?
  
   Both systems use drives with platters.  Each drive can physically only
   read one thing at a time.
  
   SATA gives each drive it's own channel, but you have to share in SCSI.
A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
   SCSI can only do 320MB/sec across the entire array.
  
   What am I missing here?
  
   Alex Turner
   netEconomist
 
 
  ---(end of broadcast)---
  TIP 9: the planner will ignore your desire to choose an index scan if
 your
joining column's datatypes do not match
 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?

2005-04-06 Thread Alex Turner

I think everyone was scared off by the 5000 inserts per second number.

I've never seen even Oracle do this on a top end Dell system with
copious SCSI attached storage.

Alex Turner
netEconomist

On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  
 Unfortunately. 
  
 But we are in the the process to choose Postgresql with pgcluster. I'm
 currently running some tests (performance, stability...) 
 Save the money on the license fees, you get it for your hardware ;-) 
  
 I still welcome any advices or comments and I'll let you know how the
 project is going on. 
  
 Benjamin. 
  
  
  
  Mohan, Ross [EMAIL PROTECTED] 
 
 05/04/2005 20:48 
  
 Pour :[EMAIL PROTECTED] 
 cc : 
 Objet :RE: [PERFORM] Postgresql vs SQLserver for this
 application ? 
  
  
 You never got answers on this? Apologies, I don't have one, but'd be curious
 to hear about any you did get 
   
 thx 
   
 Ross 
 
 -Original Message-
  From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
 Of [EMAIL PROTECTED]
  Sent: Monday, April 04, 2005 4:02 AM
  To: pgsql-performance@postgresql.org
  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
  
 
  hi all. 
  
  We are designing a quite big application that requires a high-performance
 database backend. 
  The rates we need to obtain are at least  5000 inserts per second and 15
 selects per second for one connection. There should only be 3 or 4
 simultaneous connections. 
  I think our main concern is to deal with the constant flow of data coming
 from the inserts that must be available for selection as fast as possible.
 (kind of real time access ...) 
  
  As a consequence, the database should rapidly increase up to more than one
 hundred gigs. We still have to determine how and when we shoud backup old
 data to prevent the application from a performance drop. We intend to
 develop some kind of real-time partionning on our main table keep the flows
 up. 
  
  At first, we were planning to use SQL Server as it has features that in my
 opinion could help us a lot : 
 - replication 
 - clustering 
  
  Recently we started to study Postgresql as a solution for our project : 
 - it also has replication 
 - Postgis module can handle geographic datatypes (which would
 facilitate our developments) 
 - We do have a strong knowledge on Postgresql administration (we use
 it for production processes) 
 - it is free (!) and we could save money for hardware purchase. 
  
  Is SQL server clustering a real asset ? How reliable are Postgresql
 replication tools  ? Should I trust Postgresql performance for this kind of
 needs ? 
  
  My question is a bit fuzzy but any advices are most welcome...
 hardware,tuning or design tips as well :)) 
  
  Thanks a lot. 
  
  Benjamin. 
  
  


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ?

2005-04-06 Thread Alex Turner

I guess I was thinking more in the range of 5000 transaction/sec, less
so 5000 rows on bulk import...

Alex

On Apr 6, 2005 12:47 PM, Mohan, Ross [EMAIL PROTECTED] wrote:
 snip good stuff...
 
 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per 
 second.
 
  I'd love to see PG get into this range..i am a big fan of PG (just a
  rank newbie) but I gotta think the underlying code to do this has to
  be not-too-complex.
 
 I'd say we're there.
 
 ||  CLAPPING!!  Yes! PG is there, assuredly!   So VERY cool!  I made a 
 newbie
 error of conflating COPY with INSERT. I don't know if I could get
 oracle to do much more than about 500-1500 rows/sec...PG is quite 
 impressive.
 
 Makes one wonder why corporations positively insist on giving oracle
  yearly. shrug
 
 -Original Message-
 From: Rod Taylor [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 06, 2005 12:41 PM
 To: Mohan, Ross
 Cc: pgsql-performance@postgresql.org
 Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ?
 
 On Wed, 2005-04-06 at 16:12 +, Mohan, Ross wrote:
  I wish I had a Dell system and run case to show you Alex, but I
  don't... however...using Oracle's direct path feature, it's pretty
  straightforward.
 
  We've done 110,000 rows per second into index-less tables on a big
  system (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a
  second. Sustained for almost 9 minutes. )
 
 Just for kicks I did a local test on a desktop machine (single CPU, single 
 IDE drive) using COPY from STDIN for a set of integers in via a single 
 transaction, no indexes.
 
 1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per 
 second.
 
 Okay, no checkpoints and I didn't cross an index boundary, but I also haven't 
 tuned the config file beyond bumping up the buffers.
 
 Lets try again with more data this time.
 
 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per 
 second.
 
  I'd love to see PG get into this range..i am a big fan of PG (just a
  rank newbie) but I gotta think the underlying code to do this has to
  be not-too-complex.
 
 I'd say we're there.
 
  -Original Message-
  From: Alex Turner [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, April 06, 2005 11:38 AM
  To: [EMAIL PROTECTED]
  Cc: pgsql-performance@postgresql.org; Mohan, Ross
  Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this 
  application ?
 
 
  I think everyone was scared off by the 5000 inserts per second number.
 
  I've never seen even Oracle do this on a top end Dell system with
  copious SCSI attached storage.
 
  Alex Turner
  netEconomist
 
  On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  
   Unfortunately.
  
   But we are in the the process to choose Postgresql with pgcluster.
   I'm
   currently running some tests (performance, stability...) Save the
   money on the license fees, you get it for your hardware ;-)
  
   I still welcome any advices or comments and I'll let you know how
   the
   project is going on.
  
   Benjamin.
  
  
  
Mohan, Ross [EMAIL PROTECTED]
  
   05/04/2005 20:48
  
   Pour :[EMAIL PROTECTED]
   cc :
   Objet :RE: [PERFORM] Postgresql vs SQLserver for this
   application ?
  
  
   You never got answers on this? Apologies, I don't have one, but'd be
   curious to hear about any you did get
  
   thx
  
   Ross
  
   -Original Message-
From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of
   [EMAIL PROTECTED]
Sent: Monday, April 04, 2005 4:02 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Postgresql vs SQLserver for this application ?
  
  
hi all.
  
We are designing a quite big application that requires a
   high-performance database backend.  The rates we need to obtain are at
   least  5000 inserts per second and 15 selects per second for one
   connection. There should only be 3 or 4 simultaneous connections.
I think our main concern is to deal with the constant flow of data coming
   from the inserts that must be available for selection as fast as possible.
   (kind of real time access ...)
  
As a consequence, the database should rapidly increase up to more
   than one hundred gigs. We still have to determine how and when we
   shoud backup old data to prevent the application from a performance
   drop. We intend to develop some kind of real-time partionning on our
   main table keep the flows up.
  
At first, we were planning to use SQL Server as it has features
   that
   in my opinion could help us a lot :
   - replication
   - clustering
  
Recently we started to study Postgresql as a solution for our project :
   - it also has replication
   - Postgis module can handle geographic datatypes (which
   would
   facilitate our developments)
   - We do have a strong knowledge on Postgresql

Re: [PERFORM] Réf

2005-04-06 Thread Alex Turner

I think his point was that 9 * 4 != 2400

Alex Turner
netEconomist

On Apr 6, 2005 2:23 PM, Rod Taylor [EMAIL PROTECTED] wrote:
 On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote:
  On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote:
   Yeah, I think that can be done provided there is more than one worker.
   My limit seems to be about 1000 transactions per second each with a
   single insert for a single process (round trip time down the Fibre
   Channel is large) but running 4 simultaneously only drops throughput to
   about 900 per process (total of 2400 transactions per second) and the
   machine still seemed to have lots of oomph to spare.
 
  Erm, have I missed something here? 900 * 4 = 2400?
 
 Nope. You've not missed anything.
 
 If I ran 10 processes and the requirement would be met.
 --
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner

Well - unfortuantely software RAID isn't appropriate for everyone, and
some of us need a hardware RAID controller.  The LSI Megaraid 320-2
card is almost exactly the same price as the 3ware 9500S-12 card
(although I will conceed that a 320-2 card can handle at most 2x14
devices compare with the 12 on the 9500S).

If someone can come up with a test, I will be happy to run it and see
how it goes.  I would be _very_ interested in the results having just
spent $7k on a new DB server!!

I have also seen really bad performance out of SATA.  It was with
either an on-board controller, or a cheap RAID controller from
HighPoint.  As soon as I put in a decent controller, things went much
better.  I think it's unfair to base your opinion of SATA from a test
that had a poor controler.

I know I'm not the only one here running SATA RAID and being very
satisfied with the results.

Thanks,

Alex Turner
netEconomist

On Apr 6, 2005 4:01 PM, William Yu [EMAIL PROTECTED] wrote:
 It's the same money if you factor in the 3ware controller. Even without
 a caching controller, SCSI works good in multi-threaded IO (not
 withstanding crappy shit from Dell or Compaq). You can get such cards
 from LSI for $75. And of course, many server MBs come with LSI
 controllers built-in. Our older 32-bit production servers all use Linux
 software RAID w/ SCSI and there's no issues when multiple
 users/processes hit the DB.
 
 *Maybe* a 3ware controller w/ onboard cache + battery backup might do
 much better for multi-threaded IO than just plain-jane SATA.
 Unfortunately, I have not been able to find anything online that can
 confirm or deny this. Hence, the choice is spend $$$ on the 3ware
 controller and hope it meets your needs -- or spend $$$ on SCSI drives
 and be sure.
 
 Now if you want to run such tests, we'd all be delighted with to see the
 results so we have another option for building servers.
 
 
 Alex Turner wrote:
  It's hardly the same money, the drives are twice as much.
 
  It's all about the controller baby with any kind of dive.  A bad SCSI
  controller will give sucky performance too, believe me.  We had a
  Compaq Smart Array 5304, and it's performance was _very_ sub par.
 
  If someone has a simple benchmark test database to run, I would be
  happy to run it on our hardware here.
 
  Alex Turner
 
  On Apr 6, 2005 3:30 AM, William Yu [EMAIL PROTECTED] wrote:
 
 Alex Turner wrote:
 
 I'm no drive expert, but it seems to me that our write performance is
 excellent.  I think what most are concerned about is OLTP where you
 are doing heavy write _and_ heavy read performance at the same time.
 
 Our system is mostly read during the day, but we do a full system
 update everynight that is all writes, and it's very fast compared to
 the smaller SCSI system we moved off of.  Nearly a 6x spead
 improvement, as fast as 900 rows/sec with a 48 byte record, one row
 per transaction.
 
 I've started with SATA in a multi-read/multi-write environment. While it
 ran pretty good with 1 thread writing, the addition of a 2nd thread
 (whether reading or writing) would cause exponential slowdowns.
 
 I suffered through this for a week and then switched to SCSI. Single
 threaded performance was pretty similar but with the advanced command
 queueing SCSI has, I was able to do multiple reads/writes simultaneously
 with only a small performance hit for each thread.
 
 Perhaps having a SATA caching raid controller might help this situation.
 I don't know. It's pretty hard justifying buying a $$$ 3ware controller
 just to test it when you could spend the same money on SCSI and have a
 guarantee it'll work good under multi-IO scenarios.
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 
 
 
  ---(end of broadcast)---
  TIP 2: you can get off all lists at once with the unregister command
  (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner

I guess I'm setting myself up here, and I'm really not being ignorant,
but can someone explain exactly how is SCSI is supposed to better than
SATA?

Both systems use drives with platters.  Each drive can physically only
read one thing at a time.

SATA gives each drive it's own channel, but you have to share in SCSI.
 A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
SCSI can only do 320MB/sec across the entire array.

What am I missing here?

Alex Turner
netEconomist

On Apr 6, 2005 5:41 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
 Sorry if I'm pointing out the obvious here, but it seems worth
 mentioning. AFAIK all 3ware controllers are setup so that each SATA
 drive gets it's own SATA bus. My understanding is that by and large,
 SATA still suffers from a general inability to have multiple outstanding
 commands on the bus at once, unlike SCSI. Therefore, to get good
 performance out of SATA you need to have a seperate bus for each drive.
 Theoretically, it shouldn't really matter that it's SATA over ATA, other
 than I certainly wouldn't want to try and cram 8 ATA cables into a
 machine...
 
 Incidentally, when we were investigating storage options at a previous
 job we talked to someone who deals with RS/6000 storage. He had a bunch
 of info about their serial controller protocol (which I can't think of
 the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
 saturating even a 160MB SCSI bus with only 2 or 3 drives.
 
 People are finally realizing how important bandwidth has become in
 modern machines. Memory bandwidth is why RS/6000 was (and maybe still
 is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
 the water. Likewise it's why SCSI is so much better than IDE (unless you
 just give each drive it's own dedicated bandwidth).
 --
 Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
 Give your computer some brain candy! www.distributed.net Team #1828
 
 Windows: Where do you want to go today?
 Linux: Where do you want to go tomorrow?
 FreeBSD: Are you guys coming, or what?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner

Ok - so I found this fairly good online review of various SATA cards
out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10.

http://www.tweakers.net/reviews/557/

Very interesting stuff.

Alex Turner
netEconomist

On Apr 6, 2005 7:32 PM, Alex Turner [EMAIL PROTECTED] wrote:
 I guess I'm setting myself up here, and I'm really not being ignorant,
 but can someone explain exactly how is SCSI is supposed to better than
 SATA?
 
 Both systems use drives with platters.  Each drive can physically only
 read one thing at a time.
 
 SATA gives each drive it's own channel, but you have to share in SCSI.
  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
 SCSI can only do 320MB/sec across the entire array.
 
 What am I missing here?
 
 Alex Turner
 netEconomist
 
 On Apr 6, 2005 5:41 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
  Sorry if I'm pointing out the obvious here, but it seems worth
  mentioning. AFAIK all 3ware controllers are setup so that each SATA
  drive gets it's own SATA bus. My understanding is that by and large,
  SATA still suffers from a general inability to have multiple outstanding
  commands on the bus at once, unlike SCSI. Therefore, to get good
  performance out of SATA you need to have a seperate bus for each drive.
  Theoretically, it shouldn't really matter that it's SATA over ATA, other
  than I certainly wouldn't want to try and cram 8 ATA cables into a
  machine...
 
  Incidentally, when we were investigating storage options at a previous
  job we talked to someone who deals with RS/6000 storage. He had a bunch
  of info about their serial controller protocol (which I can't think of
  the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
  saturating even a 160MB SCSI bus with only 2 or 3 drives.
 
  People are finally realizing how important bandwidth has become in
  modern machines. Memory bandwidth is why RS/6000 was (and maybe still
  is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
  the water. Likewise it's why SCSI is so much better than IDE (unless you
  just give each drive it's own dedicated bandwidth).
  --
  Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
  Give your computer some brain candy! www.distributed.net Team #1828
 
  Windows: Where do you want to go today?
  Linux: Where do you want to go tomorrow?
  FreeBSD: Are you guys coming, or what?
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner

Ok - I take it back - I'm reading through this now, and realising that
the reviews are pretty clueless in several places...


On Apr 6, 2005 8:12 PM, Alex Turner [EMAIL PROTECTED] wrote:
 Ok - so I found this fairly good online review of various SATA cards
 out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10.
 
 http://www.tweakers.net/reviews/557/
 
 Very interesting stuff.
 
 Alex Turner
 netEconomist
 
 On Apr 6, 2005 7:32 PM, Alex Turner [EMAIL PROTECTED] wrote:
  I guess I'm setting myself up here, and I'm really not being ignorant,
  but can someone explain exactly how is SCSI is supposed to better than
  SATA?
 
  Both systems use drives with platters.  Each drive can physically only
  read one thing at a time.
 
  SATA gives each drive it's own channel, but you have to share in SCSI.
   A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
  SCSI can only do 320MB/sec across the entire array.
 
  What am I missing here?
 
  Alex Turner
  netEconomist
 
  On Apr 6, 2005 5:41 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
   Sorry if I'm pointing out the obvious here, but it seems worth
   mentioning. AFAIK all 3ware controllers are setup so that each SATA
   drive gets it's own SATA bus. My understanding is that by and large,
   SATA still suffers from a general inability to have multiple outstanding
   commands on the bus at once, unlike SCSI. Therefore, to get good
   performance out of SATA you need to have a seperate bus for each drive.
   Theoretically, it shouldn't really matter that it's SATA over ATA, other
   than I certainly wouldn't want to try and cram 8 ATA cables into a
   machine...
  
   Incidentally, when we were investigating storage options at a previous
   job we talked to someone who deals with RS/6000 storage. He had a bunch
   of info about their serial controller protocol (which I can't think of
   the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
   saturating even a 160MB SCSI bus with only 2 or 3 drives.
  
   People are finally realizing how important bandwidth has become in
   modern machines. Memory bandwidth is why RS/6000 was (and maybe still
   is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
   the water. Likewise it's why SCSI is so much better than IDE (unless you
   just give each drive it's own dedicated bandwidth).
   --
   Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
   Give your computer some brain candy! www.distributed.net Team #1828
  
   Windows: Where do you want to go today?
   Linux: Where do you want to go tomorrow?
   FreeBSD: Are you guys coming, or what?
  
 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner

Yeah - the more reading I'm doing - the more I'm finding out.

Alledgelly the Western Digial Raptor drives implement a version of
ATA-4 Tagged Queing which allows reordering of commands.  Some
controllers support this.  The 3ware docs say that the controller
support both reordering on the controller and to the drive. *shrug*

This of course is all supposed to go away with SATA II which as NCQ,
Native Command Queueing.  Of course the 3ware controllers don't
support SATA II, but a few other do, and I'm sure 3ware will come out
with a controller that does.

Alex Turner
netEconomist

On 06 Apr 2005 23:00:54 -0400, Greg Stark [EMAIL PROTECTED] wrote:
 
 Alex Turner [EMAIL PROTECTED] writes:
 
  SATA gives each drive it's own channel, but you have to share in SCSI.
   A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
  SCSI can only do 320MB/sec across the entire array.
 
 SCSI controllers often have separate channels for each device too.
 
 In any case the issue with the IDE protocol is that fundamentally you can only
 have a single command pending. SCSI can have many commands pending. This is
 especially important for a database like postgres that may be busy committing
 one transaction while another is trying to read. Having several commands
 queued on the drive gives it a chance to execute any that are on the way to
 the committing transaction.
 
 However I'm under the impression that 3ware has largely solved this problem.
 Also, if you save a few dollars and can afford one additional drive that
 additional drive may improve your array speed enough to overcome that
 inefficiency.
 
 --
 greg
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

To be honest, I've yet to run across a SCSI configuration that can
touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
let alone 180MB/sec read or write, which is why we moved _away_ from
SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
badly on RAID 1, RAID 5 and RAID 10.

35MB/sec for a three drive RAID 0 is not bad, it's appalling.  The
hardware manufacturer should be publicly embarassed for this kind of
speed.  A single U320 10k drive can do close to 70MB/sec sustained.

If someone can offer benchmarks to the contrary (particularly in
linux), I would be greatly interested.

Alex Turner
netEconomist

On Mar 29, 2005 8:17 AM, Dave Cramer [EMAIL PROTECTED] wrote:
 Yeah, 35Mb per sec is slow for a raid controller, the 3ware mirrored is
 about 50Mb/sec, and striped is about 100
 
 Dave
 
 PFC wrote:
 
 
  With hardware tuning, I am sure we can do better than 35Mb per sec. Also
 
 
  WTF ?
 
  My Laptop does 19 MB/s (reading 10 KB files, reiser4) !
 
  A recent desktop 7200rpm IDE drive
  # hdparm -t /dev/hdc1
  /dev/hdc1:
   Timing buffered disk reads:  148 MB in  3.02 seconds =  49.01 MB/sec
 
  # ll DragonBall 001.avi
  -r--r--r--1 peufeu   users218M mar  9 20:07 DragonBall
  001.avi
 
  # time cat DragonBall 001.avi /dev/null
  real0m4.162s
  user0m0.020s
  sys 0m0.510s
 
  (the file was not in the cache)
  = about 52 MB/s (reiser3.6)
 
  So, you have a problem with your hardware...
 
  ---(end of broadcast)---
  TIP 7: don't forget to increase your free space map settings
 
 
 
 --
 Dave Cramer
 http://www.postgresintl.com
 519 939 0336
 ICQ#14675561
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

I'm no drive expert, but it seems to me that our write performance is
excellent.  I think what most are concerned about is OLTP where you
are doing heavy write _and_ heavy read performance at the same time.

Our system is mostly read during the day, but we do a full system
update everynight that is all writes, and it's very fast compared to
the smaller SCSI system we moved off of.  Nearly a 6x spead
improvement, as fast as 900 rows/sec with a 48 byte record, one row
per transaction.

I don't know enough about how SATA works to really comment on it's
performance as a protocol compared with SCSI.  If anyone has a usefull
link on that, it would be greatly appreciated.

More drives will give more throughput/sec, but not necesarily more
transactions/sec.  For that you will need more RAM on the controler,
and defaintely a BBU to keep your data safe.

Alex Turner
netEconomist

On Apr 4, 2005 10:39 AM, Steve Poe [EMAIL PROTECTED] wrote:
 
 
 Alex Turner wrote:
 
 To be honest, I've yet to run across a SCSI configuration that can
 touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
 let alone 180MB/sec read or write, which is why we moved _away_ from
 SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
 badly on RAID 1, RAID 5 and RAID 10.
 
 
 Alex,
 
 How does the 3ware controller do in heavy writes back to the database?
 It may have been Josh, but someone said that SATA does well with reads
 but not writes. Would not equal amount of SCSI drives outperform SATA?
 I don't want to start a whose better war, I am just trying to learn
 here. It would seem the more  drives you could place in a RAID
 configuration, the performance would increase.
 
 Steve Poe
 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

I'm doing some research on SATA vs SCSI right now, but to be honest
I'm not turning up much at the protocol level.  Alot of stupid
benchmarks comparing 10k Raptor drives against Top of the line 15k
drives, where usnurprsingly the SCSI drives win but of course cost 4
times as much.  Although even in some, SATA wins, or draws.  I'm
trying to find something more apples to apples. 10k to 10k.

Alex Turner
netEconomist



On Apr 4, 2005 3:23 PM, Vivek Khera [EMAIL PROTECTED] wrote:
 
 On Apr 4, 2005, at 3:12 PM, Alex Turner wrote:
 
  Our system is mostly read during the day, but we do a full system
  update everynight that is all writes, and it's very fast compared to
  the smaller SCSI system we moved off of.  Nearly a 6x spead
  improvement, as fast as 900 rows/sec with a 48 byte record, one row
  per transaction.
 
 
 Well, if you're not heavily multitasking, the advantage of SCSI is lost
 on you.
 
 Vivek Khera, Ph.D.
 +1-301-869-4449 x806
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Alex Turner

Oh - this is with a seperate transaction per command.

fsync is on.

Alex Turner
netEconomist

On Apr 1, 2005 4:17 PM, Alex Turner [EMAIL PROTECTED] wrote:
 1250/sec with record size average is 26 bytes
 800/sec with record size average is 48 bytes.
 250/sec with record size average is 618 bytes.
 
 Data from pg_stats and our own job monitoring
 
 System has four partitions, two raid 1s, a four disk RAID 10 and a six
 disk RAID 10.
 pg_xlog is on four disk RAID 10, database is on RAID 10.
 
 Data is very spread out because database turnover time is very high,
 so our performance is about double this with a fresh DB. (the data
 half life is probably measurable in days or weeks).
 
 Alex Turner
 netEconomist
 
 On Apr 1, 2005 1:06 PM, Marc G. Fournier [EMAIL PROTECTED] wrote:
 
  Just curious, but does anyone have an idea of what we are capable of?  I
  realize that size of record would affect things, as well as hardware, but
  if anyone has some ideas on max, with 'record size', that would be
  appreciated ...
 
  Thanks ...
 
  
  Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
  Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 
  7615664
 
  ---(end of broadcast)---
  TIP 8: explain analyze is your friend
 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Alex Turner

a 14 drive stripe will max out the PCI bus long before anything else,
the only reason for a stripe this size is to get a total accessible
size up.  A 6 drive RAID 10 on a good controller can get up to
400Mb/sec which is pushing the limit of the PCI bus (taken from
offical 3ware 9500S 8MI benchmarks).  140 drives is not going to beat
6 drives because you've run out of bandwidth on the PCI bus.

The debait on RAID 5 rages onward.  The benchmarks I've seen suggest
that RAID 5 is consistantly slower than RAID 10 with the same number
of drivers, but others suggest that RAID 5 can be much faster that
RAID 10 (see arstechnica.com) (Theoretical performance of RAID 5 is
inline with a RAID 0 stripe of N-1 drives, RAID 10 has only N/2 drives
in a stripe, perfomance should be nearly double - in theory of
course).

35 Trans/sec is pretty slow, particularly if they are only one row at
a time.  I typicaly get 200-400/sec on our DB server on a bad day.  Up
to 1100 on a fresh database.

I suggested running a bonnie benchmark, or some other IO perftest to
determine if it's the array itself performing badly, or if there is
something wrong with postgresql.

If the array isn't kicking out at least 50MB/sec read/write
performance, something is wrong.

Until you've isolated the problem to either postgres or the array,
everything else is simply speculation.

In a perfect world, you would have two 6 drive RAID 10s. on two PCI
busses, with system tables on a third parition, and archive logging on
a fourth.  Unsurprisingly this looks alot like the Oracle recommended
minimum config.

Also a note for interest is that this is _software_ raid...

Alex Turner
netEconomist

On 13 Mar 2005 23:36:13 -0500, Greg Stark [EMAIL PROTECTED] wrote:
 
 Arshavir Grigorian [EMAIL PROTECTED] writes:
 
  Hi,
 
  I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
  Ext3 filesystem which is used by Postgres.
 
 People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
 across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
 though. It would be interesting to see empirical data.
 
 One thing that does scare me is the Postgres transaction log and the ext3
 journal both sharing these disks with the data. Ideally both of these things
 should get (mirrored) disks of their own separate from the data files.
 
 But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
 flush on every fsync to get the journal pushed out. This is a new linux
 feature that's necessary with ide but shouldn't be necessary with scsi.
 
 It would be interesting to know whether postgres performs differently with
 fsync=off. This would even be a reasonable mode to run under for initial
 database loads. It shouldn't make much of a difference with hardware like this
 though. And you should be aware that running under this mode in production
 would put your data at risk.
 
 --
 greg
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Alex Turner

He doesn't have a RAID controller, it's software RAID...

Alex Turner
netEconomis


On Mon, 14 Mar 2005 16:18:00 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
 Alex Turner wrote:
  35 Trans/sec is pretty slow, particularly if they are only one row at
  a time.  I typicaly get 200-400/sec on our DB server on a bad day.  Up
  to 1100 on a fresh database.
 
 Well, don't rule out that his raid controller is not caching his writes.
 His WAL sync method may be overriding his raid cache policy and flushing
 his writes to disk, always.  Win32 has the same problem, and before
 Magnus's O_DIRECT patch, there was no way to easily work around it
 without turning fsync off.  I'd suggest playing with different WAL sync
 methods before trying anything else.
 
 Merli


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Alex Turner

I would recommend running a bonnie++ benchmark on your array to see if
it's the array/controller/raid being crap, or wether it's postgres.  I
have had some very surprising results from arrays that theoretically
should be fast, but turned out to be very slow.

I would also seriously have to recommend against a 14 drive RAID 5!
This is statisticaly as likely to fail as a 7 drive RAID 0 (not
counting the spare, but rebuiling a spare is very hard on existing
drives).

Alex Turner
netEconomist


On Fri, 11 Mar 2005 16:13:05 -0500, Arshavir Grigorian [EMAIL PROTECTED] 
wrote:
 Hi,
 
 I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
 an Ext3 filesystem which is used by Postgres. Currently we are loading a
 50G database on this server from a Postgres dump (copy, not insert) and
 are experiencing very slow write performance (35 records per second).
 
 Top shows that the Postgres process (postmaster) is being constantly put
 into D state for extended periods of time (2-3 seconds) which I assume
 is because it's waiting for disk io. I have just started gathering
 system statistics and here is what sar -b shows: (this is while the db
 is being loaded - pg_restore)
 
tpsrtps wtps  bread/s  bwrtn/s
 01:35:01 PM275.77 76.12199.66709.59   2315.23
 01:45:01 PM287.25 75.56211.69706.52   2413.06
 01:55:01 PM281.73 76.35205.37711.84   2389.86
 
[snip]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] multi billion row tables: possible or insane?

2005-03-04 Thread Alex Turner

Not true - with fsync on I get nearly 500 tx/s, with it off I'm as
high as 1600/sec with dual opteron and 14xSATA drives and 4GB RAM on a
3ware Escalade.  Database has 3 million rows.

As long as queries use indexes, multi billion row shouldn't be too
bad.  Full table scan will suck though.

Alex Turner
netEconomist


On Tue, 1 Mar 2005 16:40:29 +0100, Vig, Sandor (G/FI-2)
[EMAIL PROTECTED] wrote:
 385 transaction/sec?
 
 fsync = false
 
 risky but fast.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of John Arbash
 Meinel
 Sent: Tuesday, March 01, 2005 4:19 PM
 To: Ramon Bastiaans
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] multi billion row tables: possible or insane?
 
 Ramon Bastiaans wrote:
 
  Hi all,
 
  I am doing research for a project of mine where I need to store
  several billion values for a monitoring and historical tracking system
  for a big computer system. My currect estimate is that I have to store
  (somehow) around 1 billion values each month (possibly more).
 
 If you have that 1 billion perfectly distributed over all hours of the
 day, then you need 1e9/30/24/3600 = 385 transactions per second.
 
 Which I'm pretty sure is possible with postgres, you just need pretty
 beefy hardware. And like Jeff said, lots of disks for lots of IO.
 Like a quad opteron, with 16GB of ram, and around 14-20 very fast disks.
 raid10 not raid5, etc. To improve query performance, you can do some
 load balancing by having replication machines by using Slony.
 
 Or if you can do batch processing, you could split up the work into a
 few update machines, which then do bulk updates on the master database.
 This lets you get more machines into the job, since you can't share a
 database across multiple machines.
 
  I was wondering if anyone has had any experience with these kind of
  big numbers of data in a postgres sql database and how this affects
  database design and optimization.
 
 Well, one of the biggest things is if you can get bulk updates, or if
 clients can handle data being slightly out of date, so you can use
 cacheing. Can you segregate your data into separate tables as much as
 possible? Are your clients okay if aggregate information takes a little
 while to update?
 
 One trick is to use semi-lazy materialized views to get your updates to
 be fast.
 
  What would be important issues when setting up a database this big,
  and is it at all doable? Or would it be a insane to think about
  storing up to 5-10 billion rows in a postgres database.
 
 I think you if you can design the db properly, it is doable. But if you
 have a clients saying I need up to the second information on 1 billion
 rows, you're never going to get it.
 
 
  The database's performance is important. There would be no use in
  storing the data if a query will take ages. Query's should be quite
  fast if possible.
 
 Again, it depends on the queries being done.
 There are some nice tricks you can use, like doing a month-by-month
 partitioning (if you are getting 1G inserts, you might want week-by-week
 partitioning), and then with a date column index, and a union all view
 you should be able to get pretty good insert speed, and still keep fast
 *recent* queries. Going through 1billion rows is always going to be
 expensive.
 
  I would really like to hear people's thoughts/suggestions or go see a
  shrink, you must be mad statements ;)
 
  Kind regards,
 
  Ramon Bastiaans
 
 I think it would be possible, but there are a lot of design issues with
 a system like this. You can't go into it thinking that you can design a
 multi billion row database the same way you would design a million row db.
 
 John
 =:-
 
 The information transmitted is intended only for the person or entity to
 which it is addressed and may contain confidential and/or privileged
 material. Any review, retransmission, dissemination or other use of, or
 taking of any action in reliance upon, this information by persons or
 entities other than the intended recipient is prohibited. If you received
 this in error, please contact the sender and delete the material from any
 computer.
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

1 2 >

1 - 100 of 124 matches

Mail list logo