[PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Tilo Buschmann
Hello,

I am trying to build a application to search CDs and their tracks and I
am experiencing some performance difficulties.

The database is very simple at the moment, two tables "cd" and "tracks"
contain the CD-information and their respective tracks. A column
"cd_id" in public.tracks is the foreign key to the cd table.

#v+
  Table "public.cd"
   Column|   Type| Modifiers
-+---+
 revision| integer   | not null default 0
 disc_length | integer   |
 via | character varying |
 cd_id   | integer   | not null default 
nextval('cd_cd_id_seq'::regclass)
 discid  | integer   | not null
 title   | character varying | not null
 artist  | character varying | not null
 year| smallint  |
 genre   | character varying |
 ext | character varying |
 tstitle | tsvector  |
 tsartist| tsvector  |
Indexes:
"cd_id_key" PRIMARY KEY, btree (cd_id)
"discid_key" UNIQUE, btree (discid)
"tsartist_cd_idx" gist (tsartist)
"tstitle_cd_idx" gist (tstitle)
Check constraints:
"year_check" CHECK ("year" IS NULL OR "year" >= 0 AND "year" <= 1)
Tablespace: "d_separate"

   Table "public.tracks"
  Column  |   Type| Modifiers   
  
--+---+---
 track_id | integer   | not null default 
nextval('tracks_track_id_seq'::regclass)
 cd_id| integer   | not null
 title| character varying | 
 artist   | character varying | 
 ext  | character varying | 
 length   | integer   | 
 number   | smallint  | not null default 0
 tstitle  | tsvector  | 
 tsartist | tsvector  | 
Indexes:
"tracks_pkey" PRIMARY KEY, btree (track_id)
"cdid_tracks_idx" btree (cd_id)
"tsartist_tracks_idx" gist (tsartist)
"tstitle_tracks_idx" gin (tstitle)
Foreign-key constraints:
"tracks_cd_id_fkey" FOREIGN KEY (cd_id) REFERENCES cd(cd_id) ON UPDATE 
RESTRICT ON DELETE RESTRICT
Tablespace: "d_separate"

#v-

I am using tsearch2 to be able to search very fast for CD and track
artists and titles.

The database is created only once and I expect SELECTS to happen very
often, therefore the indexes will not hurt the performance. I also ran
a VACUUM FULL ANALYSE.

The query that I want to optimise at the moment is the "Give me all CDs
with their tracks, that contain a track with the Title 'foobar'". The
query is very expensive, so I try to limit it to 10 cds at once.

My first idea was:

#+
cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.title,cd.artist,tracks.title FROM 
tracks JOIN (SELECT cd.cd_id,cd.artist,cd.title FROM cd JOIN tracks USING 
(cd_id) WHERE tracks.tstitle @@ plainto_tsquery('simple','education') LIMIT 10) 
AS cd USING (cd_id);
QUERY 
PLAN
--
 Nested Loop  (cost=0.00..3852.42 rows=11974 width=91) (actual 
time=310.983..972.739 rows=136 loops=1)
   ->  Limit  (cost=0.00..121.94 rows=10 width=46) (actual 
time=264.797..650.178 rows=10 loops=1)
 ->  Nested Loop  (cost=0.00..227602.43 rows=18665 width=46) (actual 
time=264.793..650.165 rows=10 loops=1)
   ->  Index Scan using tstitle_tracks_idx on tracks  
(cost=0.00..73402.74 rows=18665 width=4) (actual time=155.516..155.578 rows=10 
loops=1)
 Index Cond: (tstitle @@ '''education'''::tsquery)
   ->  Index Scan using cd_id_key on cd  (cost=0.00..8.25 rows=1 
width=46) (actual time=49.452..49.453 rows=1 loops=10)
 Index Cond: (public.cd.cd_id = public.tracks.cd_id)
   ->  Index Scan using cdid_tracks_idx on tracks  (cost=0.00..358.08 rows=1197 
width=27) (actual time=29.588..32.239 rows=14 loops=10)
 Index Cond: (public.tracks.cd_id = cd.cd_id)
 Total runtime: 972.917 ms
(10 rows)
#v-


The query is fast enough, but erroneous. If a cd contains more than one
track, that matches the condition, the inner SELECT will return more
than one cd and therefore the whole query will shield duplicate cds.

The solution is to either insert DISTINCT into the above query or use
EXISTS as condition, but both queries show a terrible performance:

#v+
cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.title,cd.artist,tracks.title FROM 
tracks JOIN (SELECT DISTINCT cd.cd_id,cd.artist,cd.title FROM cd JOIN tracks 
USING (cd_id) WHERE tracks.tstitle @@ plainto_tsquery('simple','education') 
LIMIT 10) AS cd USING (cd_id);

Re: [PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Arjen van der Meijden
Can't you use something like this? Or is the distinct on the t.cd_id 
still causing the major slowdown here?


SELECT ... FROM cd
  JOIN tracks ...
WHERE cd.id IN (SELECT DISTINCT t.cd_id FROM tracks t
 WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 10)

If that is your main culprit, you could also use two limits based on the 
fact that there will be at most X songs per cd which would match your 
title (my not very educated guess is 3x). Its a bit ugly... but if that 
is what it takes to make postgresql not scan your entire index, so be it...


SELECT ... FROM cd
  JOIN tracks ...
WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
 WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30) 
as foo LIMIT 10)



Best regards,

Arjen

On 7-4-2007 12:47 Tilo Buschmann wrote:

Hello,

I am trying to build a application to search CDs and their tracks and I
am experiencing some performance difficulties.

The database is very simple at the moment, two tables "cd" and "tracks"
contain the CD-information and their respective tracks. A column
"cd_id" in public.tracks is the foreign key to the cd table.

#v+
  Table "public.cd"
   Column|   Type| Modifiers
-+---+
 revision| integer   | not null default 0
 disc_length | integer   |
 via | character varying |
 cd_id   | integer   | not null default 
nextval('cd_cd_id_seq'::regclass)
 discid  | integer   | not null
 title   | character varying | not null
 artist  | character varying | not null
 year| smallint  |
 genre   | character varying |
 ext | character varying |
 tstitle | tsvector  |
 tsartist| tsvector  |
Indexes:
"cd_id_key" PRIMARY KEY, btree (cd_id)
"discid_key" UNIQUE, btree (discid)
"tsartist_cd_idx" gist (tsartist)
"tstitle_cd_idx" gist (tstitle)
Check constraints:
"year_check" CHECK ("year" IS NULL OR "year" >= 0 AND "year" <= 1)
Tablespace: "d_separate"

   Table "public.tracks"
  Column  |   Type| Modifiers 
--+---+---

 track_id | integer   | not null default 
nextval('tracks_track_id_seq'::regclass)
 cd_id| integer   | not null
 title| character varying | 
 artist   | character varying | 
 ext  | character varying | 
 length   | integer   | 
 number   | smallint  | not null default 0
 tstitle  | tsvector  | 
 tsartist | tsvector  | 
Indexes:

"tracks_pkey" PRIMARY KEY, btree (track_id)
"cdid_tracks_idx" btree (cd_id)
"tsartist_tracks_idx" gist (tsartist)
"tstitle_tracks_idx" gin (tstitle)
Foreign-key constraints:
"tracks_cd_id_fkey" FOREIGN KEY (cd_id) REFERENCES cd(cd_id) ON UPDATE 
RESTRICT ON DELETE RESTRICT
Tablespace: "d_separate"

#v-

I am using tsearch2 to be able to search very fast for CD and track
artists and titles.

The database is created only once and I expect SELECTS to happen very
often, therefore the indexes will not hurt the performance. I also ran
a VACUUM FULL ANALYSE.

The query that I want to optimise at the moment is the "Give me all CDs
with their tracks, that contain a track with the Title 'foobar'". The
query is very expensive, so I try to limit it to 10 cds at once.

My first idea was:

#+
cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.title,cd.artist,tracks.title FROM 
tracks JOIN (SELECT cd.cd_id,cd.artist,cd.title FROM cd JOIN tracks USING 
(cd_id) WHERE tracks.tstitle @@ plainto_tsquery('simple','education') LIMIT 10) 
AS cd USING (cd_id);
QUERY PLAN
--

 Nested Loop  (cost=0.00..3852.42 rows=11974 width=91) (actual 
time=310.983..972.739 rows=136 loops=1)
   ->  Limit  (cost=0.00..121.94 rows=10 width=46) (actual 
time=264.797..650.178 rows=10 loops=1)
 ->  Nested Loop  (cost=0.00..227602.43 rows=18665 width=46) (actual 
time=264.793..650.165 rows=10 loops=1)
   ->  Index Scan using tstitle_tracks_idx on tracks  
(cost=0.00..73402.74 rows=18665 width=4) (actual time=155.516..155.578 rows=10 
loops=1)
 Index Cond: (tstitle @@ '''education'''::tsquery)
   ->  Index Scan using cd_id_key on cd  (cost=0.00..8.25 rows=1 
width=46) (actual time=49.452..49.453 rows=1 loops=10)
 Index Cond: (public.cd.cd_id = public.tracks.cd_id)
   ->  Index Scan using cdid_tracks_idx on tracks  (cost=0.00..3

Re: [PERFORM] SCSI vs SATA

2007-04-07 Thread Ron
Given all the data I have personally + all that I have from NOC 
personnel, Sys Admins, Network Engineers, Operations Managers, etc my 
experience (I do systems architecture consulting that requires me to 
interface with many of these on a regular basis) supports a variation 
of hypothesis 2.  Let's call it 2a:


2a= Drive manufacturers _do_ design server drives to be more reliable 
than consumer drives
This is easily provable by opening the clam shells of a Seagate 
consumer HD and a Seagate enterprise HD of the same generation and 
comparing them.
In addition to non-visible quality differences in the actual media 
(which result in warranty differences), there are notable differences 
in the design and materials of the clam shells.
HOWEVER, there are at least 2 complicating factors in actually being 
able to obtain the increased benefits from the better design:


 *HDs are often used in environments and use cases so far outside 
their manufacturer's suggested norms that the beating they take 
overwhelms the initial quality difference.  For instance, dirty power 
events or 100+F room temperatures will age HDs so fast that even if 
the enterprise HDs survive better, it's only going to be a bit better 
in the worst cases.


*The pace of innovation in this business is so brisk that HDs from 4 
years ago, of all types, are of considerably less quality than those made now.
Someone mentioned FDB and the difference they made.  Very much 
so.  If you compare HDs from 4 years ago to ones made 8 years ago you 
get a similar quality difference.  Ditto 8 vs 12 years ago.  Etc.


The reality is that all modern HDs are so good that it's actually 
quite rare for someone to suffer a data loss event.  The consequences 
of such are so severe that the event stands out more than just the 
statistics would imply.  For those using small numbers of HDs, HDs just work.


OTOH, for those of us doing work that involves DBMSs and relatively 
large numbers of HDs per system, both the math and the RW conditions 
of service require us to pay more attention to quality details.

Like many things, one can decide on one of multiple ways to "pay the piper".

a= The choice made by many, for instance in the studies mentioned, is 
to minimize initial acquisition cost and operating overhead and 
simply accept having to replace HDs more often.


b= For those in fields were this is not a reasonable option 
(financial services, health care, etc), or for those literally using 
100's of HD per system (where statistical failure rates are so likely 
that TLC is required), policies and procedures like those mentioned 
in this thread (paying close attention to environment and use 
factors, sector remap detecting, rotating HDs into and out of roles 
based on age, etc) are necessary.


Anyone who does some close variation of "b" directly above =will= see 
the benefits of using better HDs.


At least in my supposedly unqualified anecdotal 25 years of 
professional experience.

Ron Peacetree



At 10:35 PM 4/6/2007, Bruce Momjian wrote:


In summary, it seems one of these is true:

1.  Drive manufacturers don't design server drives to be more
reliable than consumer drive

2.  Drive manufacturers _do_ design server drives to be more
reliable than consumer drive, but the design doesn't yield significantly
better reliability.

3. Server drives are significantly more reliable than consumer
drives.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Tom Lane
Arjen van der Meijden <[EMAIL PROTECTED]> writes:
> If that is your main culprit, you could also use two limits based on the 
> fact that there will be at most X songs per cd which would match your 
> title (my not very educated guess is 3x). Its a bit ugly... but if that 
> is what it takes to make postgresql not scan your entire index, so be it...

> SELECT ... FROM cd
>JOIN tracks ...
> WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
>   WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30) 
> as foo LIMIT 10)

I think that's the only way.  There is no plan type in Postgres that
will generate unique-ified output without scanning the whole input
first, except for Uniq on pre-sorted input, which we can't use here
because the tsearch scan isn't going to deliver the rows in cd_id order.

I can see how to build one: make a variant of HashAggregate that returns
each input row immediately after hashing it, *if* it isn't a duplicate
of one already in the hash table.  But it'd be a lot of work for what
seems a rather specialized need.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Tilo Buschmann
Hi everyone,

On Sat, 07 Apr 2007 11:54:08 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:

> Arjen van der Meijden <[EMAIL PROTECTED]> writes:
> > If that is your main culprit, you could also use two limits based on the 
> > fact that there will be at most X songs per cd which would match your 
> > title (my not very educated guess is 3x). Its a bit ugly... but if that 
> > is what it takes to make postgresql not scan your entire index, so be it...
> 
> > SELECT ... FROM cd
> >JOIN tracks ...
> > WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
> >   WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30) 
> > as foo LIMIT 10)
> 
> I think that's the only way.  There is no plan type in Postgres that
> will generate unique-ified output without scanning the whole input
> first, except for Uniq on pre-sorted input, which we can't use here
> because the tsearch scan isn't going to deliver the rows in cd_id order.

Unfortunately, the query above will definitely not work correctly, if
someone searches for "a" or "the". 

The correct query does not perform as well as I hoped. 

#v+
cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.artist,cd.title,tracks.title FROM cd 
JOIN tracks USING (cd_id) WHERE cd_id IN (SELECT DISTINCT tracks.cd_id FROM 
tracks WHERE tracks.tstitle @@ plainto_tsquery('simple','sympathy') LIMIT 10);
  
QUERY PLAN
--
 Nested Loop  (cost=61031.41..64906.58 rows=139 width=69) (actual 
time=31236.562..31810.940 rows=166 loops=1)
   ->  Nested Loop  (cost=61031.41..61176.20 rows=10 width=50) (actual 
time=31208.649..31388.289 rows=10 loops=1)
 ->  Limit  (cost=61031.41..61089.74 rows=10 width=4) (actual 
time=31185.972..31186.024 rows=10 loops=1)
   ->  Unique  (cost=61031.41..61124.74 rows=16 width=4) (actual 
time=31185.967..31186.006 rows=10 loops=1)
 ->  Sort  (cost=61031.41..61078.07 rows=18665 width=4) 
(actual time=31185.961..31185.977 rows=11 loops=1)
   Sort Key: public.tracks.cd_id
   ->  Bitmap Heap Scan on tracks  
(cost=536.76..59707.31 rows=18665 width=4) (actual time=146.222..30958.057 
rows=1677 loops=1)
 Recheck Cond: (tstitle @@ 
'''sympathy'''::tsquery)
 ->  Bitmap Index Scan on tstitle_tracks_idx  
(cost=0.00..532.09 rows=18665 width=0) (actual time=126.328..126.328 rows=1677 
loops=1)
   Index Cond: (tstitle @@ 
'''sympathy'''::tsquery)
 ->  Index Scan using cd_id_key on cd  (cost=0.00..8.62 rows=1 
width=46) (actual time=20.218..20.219 rows=1 loops=10)
   Index Cond: (cd.cd_id = "IN_subquery".cd_id)
   ->  Index Scan using cdid_tracks_idx on tracks  (cost=0.00..358.08 rows=1197 
width=27) (actual time=39.935..42.247 rows=17 loops=10)
 Index Cond: (cd.cd_id = public.tracks.cd_id)
 Total runtime: 31811.256 ms
(15 rows)
#v-

It gets better when the rows are in memory (down to 10.452 ms), but
Murphy tells me, that the content that I need will never be in memory.

I think I disregarded this variant at first, because it limits the
possibility to restrict the cd artist and title.

> I can see how to build one: make a variant of HashAggregate that returns
> each input row immediately after hashing it, *if* it isn't a duplicate
> of one already in the hash table.  But it'd be a lot of work for what
> seems a rather specialized need.

D'oh.

Actually, I hoped to find an alternative, that does not involve
DISTINCT.

Best Regards,

Tilo

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Tom Lane
Tilo Buschmann <[EMAIL PROTECTED]> writes:
>> Arjen van der Meijden <[EMAIL PROTECTED]> writes:
>>> SELECT ... FROM cd
>>> JOIN tracks ...
>>> WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
>>> WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30) 
>>> as foo LIMIT 10)

> Unfortunately, the query above will definitely not work correctly, if
> someone searches for "a" or "the". 

Well, the "incorrectness" is only that it might deliver fewer than the
hoped-for ten CDs ... but that was a completely arbitrary cutoff anyway,
no?  I think in practice this'd give perfectly acceptable results.

> Actually, I hoped to find an alternative, that does not involve
> DISTINCT.

You could try playing around with GROUP BY rather than DISTINCT; those
are separate code paths and will probably give you different plans.
But I don't think you'll find that GROUP BY does any better on this
particular measure of yielding rows before the full input has been
scanned.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] fast DISTINCT or EXIST

2007-04-07 Thread Arjen van der Meijden

On 7-4-2007 18:24 Tilo Buschmann wrote:

Unfortunately, the query above will definitely not work correctly, if
someone searches for "a" or "the". 


That are two words you may want to consider not searching on at all.

As Tom said, its not very likely to be fixed in PostgreSQL. But you can 
always consider using application logic (or a pgpsql function, you could 
even use a set returning function to replace the double-limit subselects 
in your in-statement) which will automatically fetch more records when 
the initial guess turns out to be wrong, obviously using something like 
a NOT IN to remove the initially returned cd.id's for the next batches.
Then again, even 'a' or 'the' will not likely be in *all* tracks of a 
cd, so you can also use the 'average amount of tracks per cd' (about 10 
or 11?) as your multiplier rather than my initial 3. Obviously you'll 
loose performance with each increment of that value.


Best regards,

Arjen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SCSI vs SATA

2007-04-07 Thread david

On Sat, 7 Apr 2007, Ron wrote:

The reality is that all modern HDs are so good that it's actually quite rare 
for someone to suffer a data loss event.  The consequences of such are so 
severe that the event stands out more than just the statistics would imply. 
For those using small numbers of HDs, HDs just work.


OTOH, for those of us doing work that involves DBMSs and relatively large 
numbers of HDs per system, both the math and the RW conditions of service 
require us to pay more attention to quality details.

Like many things, one can decide on one of multiple ways to "pay the piper".

a= The choice made by many, for instance in the studies mentioned, is to 
minimize initial acquisition cost and operating overhead and simply accept 
having to replace HDs more often.


b= For those in fields were this is not a reasonable option (financial 
services, health care, etc), or for those literally using 100's of HD per 
system (where statistical failure rates are so likely that TLC is required), 
policies and procedures like those mentioned in this thread (paying close 
attention to environment and use factors, sector remap detecting, rotating 
HDs into and out of roles based on age, etc) are necessary.


Anyone who does some close variation of "b" directly above =will= see the 
benefits of using better HDs.


At least in my supposedly unqualified anecdotal 25 years of professional 
experience.


Ron, why is it that you assume that anyone who disagrees with you doesn't 
work in an environment where they care about the datacenter environment, 
and aren't in fields like financial services? and why do you think that we 
are just trying to save a few pennies? (the costs do factor in, but it's 
not a matter of pennies, it's a matter of tens of thousands of dollars)


I actually work in the financial services field, I do have a good 
datacenter environment that's well cared for.


while I don't personally maintain machines with hundreds of drives each, I 
do maintain hundreds of machines with a small number of drives in each, 
and a handful of machines with a few dozens of drives. (the database 
machines are maintained by others, I do see their failed drives however)


it's also true that my expericance is only over the last 10 years, so I've 
only been working with a few generations of drives, but my experiance is 
different from yours.


my experiance is that until the drives get to be 5+ years old the failure 
rate seems to be about the same for the 'cheap' drives as for the 'good' 
drives. I won't say that they are exactly the same, but they are close 
enough that I don't believe that there is a significant difference.


in other words, these studies do seem to match my experiance.

this is why, when I recently had to create some large capacity arrays, I'm 
only ending up with machines with a few dozen drives in them instead of 
hundreds. I've got two machines with 6TB of disk, one with 8TB, one with 
10TB, and one with 20TB. I'm building these sytems for ~$1K/TB for the 
disk arrays. other departments sho shoose $bigname 'enterprise' disk 
arrays are routinely paying 50x that price


I am very sure that they are not getting 50x the reliability, I'm sure 
that they aren't getting 2x the reliability.


I believe that the biggest cause for data loss from people useing the 
'cheap' drives is due to the fact that one 'cheap' drive holds the 
capacity of 5 or so 'expensive' drives, and since people don't realize 
this they don't realize that the time to rebuild the failed drive onto a 
hot-spare is correspondingly longer.


in the thread 'Sunfire X4500 recommendations' we recently had a discussion 
on this topic starting from a guy who was asking the best way to configure 
the drives in his sun x4500 (48 drive) system for safety. in that 
discussion I took some numbers from the cmu study and as a working figure 
I said a 10% chance for a drive to fail in a year (the study said 5-7% in 
most cases, but some third year drives were around 10%). combining this 
with the time needed to write 750G useing ~10% of the systems capacity 
results in a rebuild time of about 5 days. it turns out that there is 
almost a 5% chance of a second drive failing in a 48 drive array in this 
time. If I were to build a single array with 142G 'enterprise' drives 
instead of with 750G 'cheap' drives the rebuild time would be only 1 day 
instead of 5, but you would have ~250 drives instead of 48 and so your 
chance of a problem would be the same (I acknoledge that it's unlikly to 
use 250 drives in a single array, and yes that does help, however if you 
had 5 arrays of 50 drives each you would still have a 1% chance of a 
second failure)


when I look at these numbers, my reaction isn't that it's wrong to go with 
the 'cheap' drives, my reaction is that single reducndancy isn't good 
enough. depending on how valuble the data is, you need to either replicate 
the data to another system, or go with dual-parity redundancy (or both)


while dr

Re: [PERFORM] SCSI vs SATA

2007-04-07 Thread Ron

At 05:42 PM 4/7/2007, [EMAIL PROTECTED] wrote:

On Sat, 7 Apr 2007, Ron wrote:

The reality is that all modern HDs are so good that it's actually 
quite rare for someone to suffer a data loss event.  The 
consequences of such are so severe that the event stands out more 
than just the statistics would imply. For those using small numbers 
of HDs, HDs just work.


OTOH, for those of us doing work that involves DBMSs and relatively 
large numbers of HDs per system, both the math and the RW 
conditions of service require us to pay more attention to quality details.

Like many things, one can decide on one of multiple ways to "pay the piper".

a= The choice made by many, for instance in the studies mentioned, 
is to minimize initial acquisition cost and operating overhead and 
simply accept having to replace HDs more often.


b= For those in fields were this is not a reasonable option 
(financial services, health care, etc), or for those literally 
using 100's of HD per system (where statistical failure rates are 
so likely that TLC is required), policies and procedures like those 
mentioned in this thread (paying close attention to environment and 
use factors, sector remap detecting, rotating HDs into and out of 
roles based on age, etc) are necessary.


Anyone who does some close variation of "b" directly above =will= 
see the benefits of using better HDs.


At least in my supposedly unqualified anecdotal 25 years of 
professional experience.


Ron, why is it that you assume that anyone who disagrees with you 
doesn't work in an environment where they care about the datacenter 
environment, and aren't in fields like financial services? and why 
do you think that we are just trying to save a few pennies? (the 
costs do factor in, but it's not a matter of pennies, it's a matter 
of tens of thousands of dollars)
I don't assume that.  I didn't make any assumptions.  I (rightfully 
IMHO) criticized everyone jumping on the "See, cheap =is= good!" 
bandwagon that the Google and CMU studies seem to have ignited w/o 
thinking critically about them.
I've never mentioned or discussed specific financial amounts, so 
you're making an (erroneous) assumption when you think my concern is 
over people "trying to save a few pennies".


In fact, "saving pennies" is at the =bottom= of my priority list for 
the class of applications I've been discussing.  I'm all for 
economical, but to paraphrase Einstein "Things should be as cheap as 
possible; but no cheaper."


My biggest concern is that something I've seen over and over again in 
my career will happen again:
People tend to jump at the _slightest_ excuse to believe a story that 
will save them short term money and resist even _strong_ reasons to 
pay up front for quality.  Even if paying more up front would lower 
their lifetime TCO.


The Google and CMU studies are =not= based on data drawn from 
businesses where the lesser consequences of an outage are losing 
$10Ks or $100K per minute... ...and where the greater consequences 
include the chance of loss of human life.
Nor are they based on businesses that must rely exclusively on highly 
skilled and therefore expensive labor.


In the case of the CMU study, people are even extrapolating an 
economic conclusion the original author did not even make or intend!
Is it any wonder I'm expressing concern regarding inappropriate 
extrapolation of those studies?



I actually work in the financial services field, I do have a good 
datacenter environment that's well cared for.


while I don't personally maintain machines with hundreds of drives 
each, I do maintain hundreds of machines with a small number of 
drives in each, and a handful of machines with a few dozens of 
drives. (the database machines are maintained by others, I do see 
their failed drives however)


it's also true that my expericance is only over the last 10 years, 
so I've only been working with a few generations of drives, but my 
experiance is different from yours.


my experiance is that until the drives get to be 5+ years old the 
failure rate seems to be about the same for the 'cheap' drives as 
for the 'good' drives. I won't say that they are exactly the same, 
but they are close enough that I don't believe that there is a 
significant difference.


in other words, these studies do seem to match my experiance.
Fine.  Let's pretend =You= get to build Citibank's or Humana's next 
mission critical production DBMS using exclusively HDs with 1 year warranties.

(never would be allowed ITRW)

Even if you RAID 6 them, I'll bet you anything that a system with 32+ 
HDs on it is likely enough to spend a high enough percentage of its 
time operating in degraded mode that you are likely to be looking for 
a job as a consequence of such a decision.
...and if you actually suffer data loss or, worse, data corruption, 
that's a Career Killing Move.

(and it should be given the likely consequences to the public of such a F* up).


this is why, when I recently had to creat

Re: [PERFORM] SCSI vs SATA

2007-04-07 Thread david

On Sat, 7 Apr 2007, Ron wrote:

Ron, why is it that you assume that anyone who disagrees with you doesn't 
work in an environment where they care about the datacenter environment, 
and aren't in fields like financial services? and why do you think that we 
are just trying to save a few pennies? (the costs do factor in, but it's 
not a matter of pennies, it's a matter of tens of thousands of dollars)
I don't assume that.  I didn't make any assumptions.  I (rightfully IMHO) 
criticized everyone jumping on the "See, cheap =is= good!" bandwagon that the 
Google and CMU studies seem to have ignited w/o thinking critically about 
them.


Ron, I think that many people aren't saying cheap==good, what we are 
doing is arguing against the idea that expesnsive==good (and it's 
coorelary cheap==bad)


I've never mentioned or discussed specific financial amounts, so you're 
making an (erroneous) assumption when you think my concern is over people 
"trying to save a few pennies".


In fact, "saving pennies" is at the =bottom= of my priority list for the 
class of applications I've been discussing.  I'm all for economical, but to 
paraphrase Einstein "Things should be as cheap as possible; but no cheaper."


this I fully agree with, I have no problem spending money if I believe 
that there's a cooresponding benifit.


My biggest concern is that something I've seen over and over again in my 
career will happen again:
People tend to jump at the _slightest_ excuse to believe a story that will 
save them short term money and resist even _strong_ reasons to pay up front 
for quality.  Even if paying more up front would lower their lifetime TCO.


on the other hand, it's easy for people to blow $bigbucks with this 
argument with no significant reduction in their maintinance costs.


The Google and CMU studies are =not= based on data drawn from businesses 
where the lesser consequences of an outage are losing $10Ks or $100K per 
minute... ...and where the greater consequences include the chance of loss of 
human life.
Nor are they based on businesses that must rely exclusively on highly skilled 
and therefore expensive labor.


hmm, I didn't see the CMU study document what businesses it used.

In the case of the CMU study, people are even extrapolating an economic 
conclusion the original author did not even make or intend!
Is it any wonder I'm expressing concern regarding inappropriate extrapolation 
of those studies?


I missed the posts where people were extrapolating economic conclusions, 
what I saw was people stateing that 'you better buy the SCSI drives as 
they are more reliable', and other people pointing out that recent studies 
indicate that there's not a significant difference in drive reliability 
between the two types of drives


I actually work in the financial services field, I do have a good 
datacenter environment that's well cared for.


while I don't personally maintain machines with hundreds of drives each, I 
do maintain hundreds of machines with a small number of drives in each, and 
a handful of machines with a few dozens of drives. (the database machines 
are maintained by others, I do see their failed drives however)


it's also true that my expericance is only over the last 10 years, so I've 
only been working with a few generations of drives, but my experiance is 
different from yours.


my experiance is that until the drives get to be 5+ years old the failure 
rate seems to be about the same for the 'cheap' drives as for the 'good' 
drives. I won't say that they are exactly the same, but they are close 
enough that I don't believe that there is a significant difference.


in other words, these studies do seem to match my experiance.
Fine.  Let's pretend =You= get to build Citibank's or Humana's next mission 
critical production DBMS using exclusively HDs with 1 year warranties.

(never would be allowed ITRW)


who is arguing that you should use drives with 1 year warranties? in case 
you blinked consumer drive warranties are backup to 5 years.


Even if you RAID 6 them, I'll bet you anything that a system with 32+ HDs on 
it is likely enough to spend a high enough percentage of its time operating 
in degraded mode that you are likely to be looking for a job as a consequence 
of such a decision.
...and if you actually suffer data loss or, worse, data corruption, that's a 
Career Killing Move.
(and it should be given the likely consequences to the public of such a F* 
up).


so now it's "nobody got fired for buying SCSI?"

this is why, when I recently had to create some large capacity arrays, I'm 
only ending up with machines with a few dozen drives in them instead of 
hundreds. I've got two machines with 6TB of disk, one with 8TB, one with 
10TB, and one with 20TB. I'm building these sytems for ~$1K/TB for the disk 
arrays. other departments sho shoose $bigname 'enterprise' disk arrays are 
routinely paying 50x that price


I am very sure that they are not getting 50x the reliability, I'm sure that 

Re: [PERFORM] SCSI vs SATA

2007-04-07 Thread Joshua D. Drake


I believe that the biggest cause for data loss from people useing the 
'cheap' drives is due to the fact that one 'cheap' drive holds the 
capacity of 5 or so 'expensive' drives, and since people don't 
realize this they don't realize that the time to rebuild the failed 
drive onto a hot-spare is correspondingly longer.
Commodity HDs get 1 year warranties for the same reason enterprise HDs 
get 5+ year warranties: the vendor's confidence that they are not 
going to lose money honoring the warranty in question.


at least seagate gives 5 year warranties on their consumer drives.


Hitachi 3 years
Maxtor  3 years
Samsung 1-3 years depending on drive (but who buys samsung drives)
Seagate 5 years (300 Gig, 7200 RPM perpendicular recording... 89 bucks)
Western Digital 3-5 years depending on drive

Joshua D. Drake





--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster