Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Matt Clark
> the machine will be dealing with lots of inserts, basically as many as we can
> throw at it

If you mean lots of _transactions_ with few inserts per transaction you should get a 
RAID controller w/ battery backed write-back
cache.  Nothing else will improve your write performance by nearly as much.  You could 
sell the RAM and one of the CPU's to pay for
it ;-)

If you have lots of inserts but all in a few transactions then it's not quite so 
critical.

M



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] software vs hw hard on linux

2003-09-12 Thread Jeff
Due to various third party issues, and the fact PG rules, we're planning
on migrating our deplorable informix db to PG.  It is a rather large DB
with a rather high amount of activity (mostly updates).  So I'm going to
be aquiring a dual (or quad if they'll give me money) box. (In my testing
my glorious P2 with a 2 spindle raid0 is able to handle it fairly well)

What I'm wondering about is what folks experience with software raid vs
hardware raid on linux is.  A friend of mine ran a set of benchmarks at
work and found sw raid was running obscenely faster than the mylex and
(some other brand that isn't 3ware) raids..

On the pro-hw side you have ones with battery backed cache, chacnes are
they are less likely to fail..

On the pro-sw side you have lots of speed and less cost (unfortunately,
there is a pathetic budget so spending $15k on a raid card is out of the
question really).

any thoughts?

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Richard Jones
Hi all,
I have some new hardware on the way and would like some advice on how to get 
the most out of it..

its a dual xeon 2.4,  4gb ram and 3x identical 15k rpm scsi disks

should i mirror 2 of the disks for postgres data, and use the 3rd disk for the 
o/s and the pg logs or raid5 the 3 disks or even stripe 2 disks for pg and 
use the 3rd for o/s,logs,backups ?

the machine will be dealing with lots of inserts, basically as many as we can 
throw at it

thanks,
Richard

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [GENERAL] how to get accurate values in pg_statistic

2003-09-12 Thread scott.marlowe
On Thu, 11 Sep 2003, Christopher Browne wrote:

> [EMAIL PROTECTED] ("scott.marlowe") writes:
> > On Thu, 11 Sep 2003, Tom Lane wrote:
> >
> >> Christopher Browne <[EMAIL PROTECTED]> writes:
> >> > The "right answer" for most use seems likely to involve:
> >> >  a) Getting an appropriate number of bins (I suspect 10 is a bit
> >> > small, but I can't justify that mathematically), and
> >> 
> >> I suspect that also, but I don't have real evidence for it either.
> >> We've heard complaints from a number of people for whom it was indeed
> >> too small ... but that doesn't prove it's not appropriate in the
> >> majority of cases ...
> >> 
> >> > Does the sample size change if you increase the number of bins?
> >> 
> >> Yes, read the comments in backend/commands/analyze.c.
> >> 
> >> > Do we also need a parameter to control sample size?
> >> 
> >> Not if the paper I read before writing that code is correct.
> >
> > I was just talking to a friend of mine who does statistical analysis, and 
> > he suggested a different way of looking at this.  I know little of the 
> > analyze.c, but I'll be reading it some today.
> >
> > His theory was that we can figure out the number of target bins by 
> > basically running analyze twice with two different random seeds, and 
> > initially setting the bins to 10.
> >
> > The, compare the variance of the two runs.  If the variance is great, 
> > increase the target by X, and run two again.  repeat, wash, rinse, until 
> > the variance drops below some threshold.
> >
> > I like the idea, I'm not at all sure if it's practical for Postgresql to 
> > implement it.
> 
> It may suffice to do some analytic runs on some "reasonable datasets"
> in order to come up with a better default than 10.
> 
> If you run this process a few times on some different databases and
> find that the variance keeps dropping pretty quickly, then that would
> be good material for arguing that 10 should change to 17 or 23 or 31
> or some such value.  (The only interesting pttern in that is that
> those are all primes :-).)

That's a good intermediate solution, but it really doesn't solve 
everyone's issue.  If one table/field has a nice even distribution (i.e. 
10 rows with id 1, 10 rows with id2, so on and so on) then it won't need 
nearly as high of a default target as a row with lots of weird spikes and 
such in it.  

That's why Joe (my statistics friend) made the point about iterating over 
each table with higher targets until the variance drops to something 
reasonable.

I would imagine a simple script would be a good proof of concept of this, 
but in the long run, it would be a huge win if the analyze.c code did this 
automagically eventually, so that you don't have a target that's still too 
low for some complex data sets and too high for simple ones.

Well, time for me to get to work on a proof of concept...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Christopher Browne
[EMAIL PROTECTED] (Richard Jones) writes:
> I have some new hardware on the way and would like some advice on
> how to get the most out of it..
>
> its a dual xeon 2.4,  4gb ram and 3x identical 15k rpm scsi disks
>
> should i mirror 2 of the disks for postgres data, and use the 3rd
> disk for the o/s and the pg logs or raid5 the 3 disks or even stripe
> 2 disks for pg and use the 3rd for o/s,logs,backups ?
>
> the machine will be dealing with lots of inserts, basically as many
> as we can throw at it

Having WAL on a separate drive from the database would be something of
a win.  I'd buy that 1 disk for OS+WAL and then RAID [something]
across the other two drives for the database would be pretty helpful.

After doing some [loose] benchmarking, the VERY best way to improve
performance would involve a RAID controller with battery-backed cache.

On a box with similar configuration to yours, it took ~3h for a
particular set of data to load; on another one with battery-backed
cache (and a dozen fast SCSI drives :-)), the same data took as little
as 6 minutes to load.  The BIG effect seemed to come from the
controller.
-- 
(reverse (concatenate 'string "ofni.smrytrebil" "@" "enworbbc"))

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Richard Jones
The machine is coming from dell, and i have the option of a 
PERC 3/SC RAID Controller (32MB)
or software raid.

does anyone have any experience of this controller? 
its an additional £345 for this controller, i'd be interested to know what 
people think - my other option is to buy the raid controller separately, 
which appeals to me but i wouldnt know what to look for in a raid controller.

that raid controller review site sounds like a good idea :)

Richard.

On Friday 12 September 2003 4:24 pm, Christopher Browne wrote:
> [EMAIL PROTECTED] (Richard Jones) writes:
> > I have some new hardware on the way and would like some advice on
> > how to get the most out of it..
> >
> > its a dual xeon 2.4,  4gb ram and 3x identical 15k rpm scsi disks
> >
> > should i mirror 2 of the disks for postgres data, and use the 3rd
> > disk for the o/s and the pg logs or raid5 the 3 disks or even stripe
> > 2 disks for pg and use the 3rd for o/s,logs,backups ?
> >
> > the machine will be dealing with lots of inserts, basically as many
> > as we can throw at it
>
> Having WAL on a separate drive from the database would be something of
> a win.  I'd buy that 1 disk for OS+WAL and then RAID [something]
> across the other two drives for the database would be pretty helpful.
>
> After doing some [loose] benchmarking, the VERY best way to improve
> performance would involve a RAID controller with battery-backed cache.
>
> On a box with similar configuration to yours, it took ~3h for a
> particular set of data to load; on another one with battery-backed
> cache (and a dozen fast SCSI drives :-)), the same data took as little
> as 6 minutes to load.  The BIG effect seemed to come from the
> controller.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] software vs hw hard on linux

2003-09-12 Thread Josh Berkus
Jeff,

> What I'm wondering about is what folks experience with software raid vs
> hardware raid on linux is.  A friend of mine ran a set of benchmarks at
> work and found sw raid was running obscenely faster than the mylex and
> (some other brand that isn't 3ware) raids..

Our company has stopped recommending hardware raid for all low-to-medium end 
systems.   Our experience is that Linux SW RAID does as good a job as any 
$700 to $1000 RAID card, and has the advantage of not having lots of driver 
issues (for example, we still have one system running Linux 2.2.19 because 
the Mylex driver maintainer passed away in early 2002).

The exception to this is if you are expecting to frequently max out your CPU 
and/or RAM with your application, in which case the SW RAID might not be so 
good because you would get query-vs.-RAID CPU contention.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] best arrangement of 3 disks for (insert) performance - Dell

2003-09-12 Thread Thom Dyson

The Dell PERC controllers have a very strong reputation for terrible
performance.  If you search the archives of the Dell Linux Power Edge list
(dell.com/linux), you will find many, many people who get better
performance from software RAID, rather than the hw RAID on the PERC.
Having said that, the 3/SC might be one of the better PERC controllers.  I
would spend and hour or two and benchmark hw vs. sw before I committed to
either one.

Thom Dyson
Director of Information Services
Sybex, Inc.

On 9/12/2003 9:55:40 AM, Richard Jones <[EMAIL PROTECTED]> wrote:
> The machine is coming from dell, and i have the option of a
> PERC 3/SC RAID Controller (32MB)
> or software raid.
>
> does anyone have any experience of this controller?
> its an additional £345 for this controller, i'd be interested to know
what
> people think - my other option is to buy the raid controller separately,
> which appeals to me but i wouldnt know what to look for in a raid
> controller.
>
> that raid controller review site sounds like a good idea :)
>
> Richard.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Will LaShell
I would like to point out though on the PERC controllers that are LSI
based ( Megaraid )  there -are- settings that can be changed to fix any
o the performance issues. Check the linux megaraid driver list archives
to see the full description. I've seen it come up many times and
basically all the problems have turned up resolved.

Will


On Fri, 2003-09-12 at 10:03, Thom Dyson wrote:
> 
> The Dell PERC controllers have a very strong reputation for terrible
> performance.  If you search the archives of the Dell Linux Power Edge list
> (dell.com/linux), you will find many, many people who get better
> performance from software RAID, rather than the hw RAID on the PERC.
> Having said that, the 3/SC might be one of the better PERC controllers.  I
> would spend and hour or two and benchmark hw vs. sw before I committed to
> either one.
> 
> Thom Dyson
> Director of Information Services
> Sybex, Inc.
> 
> On 9/12/2003 9:55:40 AM, Richard Jones <[EMAIL PROTECTED]> wrote:
> > The machine is coming from dell, and i have the option of a
> > PERC 3/SC RAID Controller (32MB)
> > or software raid.
> >
> > does anyone have any experience of this controller?
> > its an additional £345 for this controller, i'd be interested to know
> what
> > people think - my other option is to buy the raid controller separately,
> > which appeals to me but i wouldnt know what to look for in a raid
> > controller.
> >
> > that raid controller review site sounds like a good idea :)
> >
> > Richard.
> 
> 
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] software vs hw hard on linux

2003-09-12 Thread Vivek Khera
> "a" == aturner  <[EMAIL PROTECTED]> writes:

a> you need a good size cache too.  If you don't have it, RAID 5
a> performance will suck big time.  If you need speed, RAID 10 seems
a> to be the only way to go, but of course that means you are gonna
a> spend $$s on drives and chasis.  I wish someone would start a

I disagree on your RAID level assertions.  Check back about 10 or 15
days on this list for some numbers I posted on restore times for a 20+
GB database with different RAID levels.  RAID5 came out fastest
compared with RAID10 and RAID50 across 14 disks.  On my 5 disk system,
I run RAID10 plus a spare in preference to RAID5 as it is faster for
that.  So the answer is "it depends". ;-)

Both systems use SCSI hardware RAID controllers, one is LSI and the
other Adaptec, all hardware from Dell.

But if you're budget limited, spend every last penny you have on the
fastest disks you can get, and then boost memory.  Any current CPU
will be more than enough for Postgres.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] software vs hw hard on linux

2003-09-12 Thread Vivek Khera
> "J" == Jeff  <[EMAIL PROTECTED]> writes:

J> Due to various third party issues, and the fact PG rules, we're planning
J> on migrating our deplorable informix db to PG.  It is a rather large DB
J> with a rather high amount of activity (mostly updates).  So I'm going to

If at all possible, batch your updates within transactions containing
as many of those updates as you can.  You will get *much* better
performance.

More than 2 procs is probably overkill.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] best arrangement of 3 disks for (insert) performance - Dell

2003-09-12 Thread Vivek Khera
> "TD" == Thom Dyson <[EMAIL PROTECTED]> writes:

TD> The Dell PERC controllers have a very strong reputation for terrible
TD> performance.  If you search the archives of the Dell Linux Power Edge list
TD> (dell.com/linux), you will find many, many people who get better
TD> performance from software RAID, rather than the hw RAID on the PERC.

The PERC controllers are just a fancy name for a whole host of
different hardware.  I have several, and some are made by LSI and some
are made by Adaptec.  My latest is PERC3/DC which is an LSI MegaRAID
and is pretty darned fast.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] best arrangement of 3 disks for (insert) performance - Dell

2003-09-12 Thread Christopher Browne
[EMAIL PROTECTED] ("Thom Dyson") writes:
> The Dell PERC controllers have a very strong reputation for terrible
> performance.  If you search the archives of the Dell Linux Power
> Edge list (dell.com/linux), you will find many, many people who get
> better performance from software RAID, rather than the hw RAID on
> the PERC.  Having said that, the 3/SC might be one of the better
> PERC controllers.  I would spend and hour or two and benchmark hw
> vs. sw before I committed to either one.

I can't agree with that.

1.  If you search the archives for messages dated a couple of years
ago, you can find lots of messages indicating terrible performance.

Drivers are not cast in concrete; there has been a LOT of change to
them since then.

2.  The second MAJOR merit to hardware RAID is the ability to hot-swap
drives.  Software RAID doesn't help with that at all.

3.  The _immense_ performance improvement that can be gotten out of
these controllers comes from having fsync() turn into a near no-op
since changes can be committed to the 128K battery-backed cache REALLY
QUICKLY.

That is something you should avoid doing with software RAID in any
case where you actually care about your data.

That third part is where Big Wins come.  It is the very same sort of
"big win from cacheing" that we saw, years ago, when we improved
system performance _immensely_ by adding a mere 16 bytes of cache by
buying serial controller cards with cacheing UUARTs.  It is akin to
the way SCSI controllers got pretty big performance improvements by
adding 256 bytes of tagged command cache.
-- 
output = ("cbbrowne" "@" "libertyrms.info")

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] software vs hw hard on linux

2003-09-12 Thread Christopher Browne
[EMAIL PROTECTED] (Jeff) writes:
> On the pro-sw side you have lots of speed and less cost (unfortunately,
> there is a pathetic budget so spending $15k on a raid card is out of the
> question really).

I have been playing with a Perq3 QC card
  
which isn't anywhere near $15K, and which certainly seems to provide the
characteristic improved performance.

PriceWatch is showing several LSI Logic cards in the $300-$400 range
with battery backed cache, which doesn't seem too out of line.  

It would seem a good tradeoff to buy one of these cards and drop a
SCSI drive off the array.
-- 
output = ("cbbrowne" "@" "libertyrms.info")

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Rod Taylor
On Fri, 2003-09-12 at 12:55, Richard Jones wrote:
> The machine is coming from dell, and i have the option of a 
> PERC 3/SC RAID Controller (32MB)
> or software raid.
> 
> does anyone have any experience of this controller? 
> its an additional £345 for this controller, i'd be interested to know what 
> people think - my other option is to buy the raid controller separately, 
> which appeals to me but i wouldnt know what to look for in a raid controller.

Hardware raid with the write cache, and sell a CPU if necessary to buy
it (don't sell the ram though!).


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] software vs hw hard on linux

2003-09-12 Thread aturner
My personal experience with RAID cards is that you have to spend money to get good 
performance.  You need battery backed cache because RAID 5 only works well with write 
to cache turned on, and you need a good size cache too.  If you don't have it, RAID 5 
performance will suck big time.  If you need speed, RAID 10 seems to be the only way 
to go, but of course that means you are gonna spend $$s on drives and chasis.  I wish 
someone would start a website like storagereview.com for RAID cards because I have had 
_vastly_ differing experience with different cards.  We currently have a compaq ML370 
with a Compaq Smart Array 5300, and quite frankly it sucks (8MB/sec write).  I get 
better performance numbers off my new Tyan Thunder s2469UGN board with a single U320 
10k RPM drive (50MB/sec) than we get off our RAID 5 array including seeks/sec.  
Definately shop around, and hopefully some other folks can give some suggestions of a 
good RAID card, and a good config.

Alex Turner

P.S. If there is movement for a RAID review site, I would be willing to start one, I'm 
pretty dissapointed at the lack of resources out there for this.

On Fri, Sep 12, 2003 at 10:34:26AM -0400, Jeff wrote:
> Due to various third party issues, and the fact PG rules, we're planning
> on migrating our deplorable informix db to PG.  It is a rather large DB
> with a rather high amount of activity (mostly updates).  So I'm going to
> be aquiring a dual (or quad if they'll give me money) box. (In my testing
> my glorious P2 with a 2 spindle raid0 is able to handle it fairly well)
> 
> What I'm wondering about is what folks experience with software raid vs
> hardware raid on linux is.  A friend of mine ran a set of benchmarks at
> work and found sw raid was running obscenely faster than the mylex and
> (some other brand that isn't 3ware) raids..
> 
> On the pro-hw side you have ones with battery backed cache, chacnes are
> they are less likely to fail..
> 
> On the pro-sw side you have lots of speed and less cost (unfortunately,
> there is a pathetic budget so spending $15k on a raid card is out of the
> question really).
> 
> any thoughts?
> 
> --
> Jeff Trout <[EMAIL PROTECTED]>
> http://www.jefftrout.com/
> http://www.stuarthamm.net/
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Richard Jones
The dual xeon arrangement is because the machine will also have to do some 
collaborative filtering which is very cpu intensive and very disk 
un-intensive, after loading the data into ram.

On Friday 12 September 2003 5:49 pm, you wrote:
> RIchard,
>
> > its a dual xeon 2.4,  4gb ram and 3x identical 15k rpm scsi disks
> >
> > should i mirror 2 of the disks for postgres data, and use the 3rd disk
> > for
>
> the
>
> > o/s and the pg logs or raid5 the 3 disks or even stripe 2 disks for pg
> > and use the 3rd for o/s,logs,backups ?
>
> I'd mirror 2.   Stripey RAID with few disks imposes a heavy performance
> penalty on data writes (particularly updates), sometimes as much as 50% for
> a RAID5-3disk config.
>
> I am a little curious why you've got a dual-xeon, but could only afford 3
> disks 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Vivek Khera
> "WL" == Will LaShell <[EMAIL PROTECTED]> writes:

WL> o the performance issues. Check the linux megaraid driver list archives
WL> to see the full description. I've seen it come up many times and
WL> basically all the problems have turned up resolved.

I've seen this advice a couple of times, but perhaps I'm just not a
good archive searcher because I can't find such recommendations on the
linux-megaraid-devel list archives...

Anyone have a direct pointer to right info?

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] best arrangement of 3 disks for (insert) performance

2003-09-12 Thread Josh Berkus
RIchard,

> its a dual xeon 2.4,  4gb ram and 3x identical 15k rpm scsi disks
> 
> should i mirror 2 of the disks for postgres data, and use the 3rd disk for 
the 
> o/s and the pg logs or raid5 the 3 disks or even stripe 2 disks for pg and 
> use the 3rd for o/s,logs,backups ?

I'd mirror 2.   Stripey RAID with few disks imposes a heavy performance 
penalty on data writes (particularly updates), sometimes as much as 50% for a 
RAID5-3disk config.  

I am a little curious why you've got a dual-xeon, but could only afford 3 
disks 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org