Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-18 Thread James Mansion

[EMAIL PROTECTED] wrote:


WAL is on a RAID 0 drive along with the OS

Isn't that just as unsafe as having the whole lot on RAID0?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Andrej Ricnik-Bay
On 18/03/2008, Peter Koczan [EMAIL PROTECTED] wrote:


  available, and RAID 5 and RAID 6 are just huge pains and terribly slow
  for writes.
RAID 5 and RAID 6 are just huge pains and terribly slow for writes
with small numbers of spindles  ;}

In my testing I found that once you hit 10 spindles in a RAID5 the
differences between it and a RAID10 started to become negligible
(around 6% slower on writes average with 10 runs of bonnie++ on
10 spindles) while the read speed (if you're doing similar amounts
of reads   writes it's a fair criterion) were in about the 10% region
faster.  With 24 spindles I couldn't see any difference at all. Those
were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre.


  Peter
Cheers,
Andrej


-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Craig Ringer
Andrej Ricnik-Bay wrote:

 In my testing I found that once you hit 10 spindles in a RAID5 the
 differences between it and a RAID10 started to become negligible
 (around 6% slower on writes average with 10 runs of bonnie++ on
 10 spindles) while the read speed (if you're doing similar amounts
 of reads   writes it's a fair criterion) were in about the 10% region
 faster.  With 24 spindles I couldn't see any difference at all. Those
 were 73GB 15K SCAs, btw, and the SAN connected via 2GB fibre.

Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
double disk failure (such as during the load imposed by rebuild onto a
spare) ?

I guess if you have good backups - as you must - it's not that big a
deal, but I'd be pretty nervous with anything less than RAID 6 or RAID 10 .

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TB-sized databases

2008-03-18 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Would another possible condition for considering
 Cartesian joins be be:

* Consider Cartesian joins when a unique constraint can prove
  that at most one row will be pulled from one of the tables
  that would be part of this join?

What for?  That would still lead us to consider large numbers of totally
useless joins.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-18 Thread Michael Stone

On Sun, Mar 16, 2008 at 12:04:44PM -0700, Craig James wrote:
Just out of curiosity: Last time I did research, the word seemed to be that 
xfs was better than ext2 or ext3.  Is that not true?  Why use ext2/3 at all 
if xfs is faster for Postgres?


For the WAL, the filesystem is largely irrelevant. (It's relatively 
small, the files are preallocated, the data is synced to disk so there's 
not advantage from write buffering, etc.) The best filesystem is one 
that does almost nothing and stays out of the way--ext2 is a good choice 
for that. The data is a different story and a different filesystem is 
usually a better choice. (If for no other reason than to avoid long 
fsck times.)


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Chris Kratz
A number of weeks ago, I had posted a request for help regarding join
estimates in pg 8.2.6.  In moderately complex to very complex ad hoc queries
in our system, we were consistently having the system massively
underestimate the number of rows coming out of join at a low level making
these queries very slow and inefficient.  At times the mis-estimation was
1000:1.  Ie when it should have been 2000 returned rows from a join, the
planner assumed 1 or 2 rows.  Modifying stats on the join columns up to the
max made little difference (y, we analyzed tables in question after each
change).  Since the planner sees only one row coming out of the low level
join, it uses nested loops all the way up chain when it would be more
efficient to use another join type.  In our informal testing, we found that
by disabling nested loops and forcing other join types, we could get
fantastic speedups.  Those queries that seem to benefit most from this have
a lot of sub-queries being built up into a final query set as well as a fair
number of joins in the sub-queries.  Since these are user created and are
then generated via our tools, they can be quite messy at times.
After doing this testing, have since added some functionality in our ad hoc
reporting tool to allow us to tune individual queries by turning on and off
individual join types at runtime.  As we hear of slow reports, we've been
individually turning off the nested loops on those reports.  Almost always,
this has increased the performance of the reports, sometimes in a completely
amazing fashion (many, many minutes to seconds at times).  It of course
doesn't help everything and turning off nested loops in general causes
overall slowdown in other parts of the system.

As this has gone on over the last couple of weeks, it feels like we either
have a misconfiguration on the server, or we are tickling a mis-estimation
bug in the planner.  I'm hoping it's the former.  The db server has 8G of
memory and raid1 -wal, raid10- data configuration, os is linux 2.6.9, db is
8.2.6.  The db is a utf-8 db if that is of any bearing and autovac and
bgwriter are on.

Nondefault settings of interest from postgresql.conf


shared_buffers = 1024MB # min 128kB or max_connections*16kB
work_mem = 256MB# min 64kB
maintenance_work_mem = 256MB# min 1MB
random_page_cost = 1.75 # same scale as above
effective_cache_size = 4096MB
default_statistics_target = 100 # range 1-1000


If nothing else, perhaps this will help somebody else who has run into the
same problem.  If explain analyze of a query shows a large mis-estimation of
rows returned on a join (estimate=1, actual=2k) causing the planner to
choose nested loops instead of another join type, you might try running the
query with nested loops set to off and see if that helps w/ performance.

Thanks,

-Chris


Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 18 Mar 2008 11:35:08 -0400
Chris Kratz [EMAIL PROTECTED] wrote:

 Nondefault settings of interest from postgresql.conf
 
 
 shared_buffers = 1024MB # min 128kB or
 max_connections*16kB work_mem = 256MB
 # min 64kB maintenance_work_mem = 256MB# min 1MB
 random_page_cost = 1.75 # same scale as above
 effective_cache_size = 4096MB
 default_statistics_target = 100 # range 1-1000
 
 
 If nothing else, perhaps this will help somebody else who has run
 into the same problem.  If explain analyze of a query shows a large
 mis-estimation of rows returned on a join (estimate=1, actual=2k)
 causing the planner to choose nested loops instead of another join
 type, you might try running the query with nested loops set to off
 and see if that helps w/ performance.

Did you try that? Did it work?

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
  PostgreSQL political pundit | Mocker of Dolphins

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU
eI9i5yucBH718okW3w2UewQ=
=BO3E
-END PGP SIGNATURE-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Chris Kratz
Y, turning nested loops off in specific cases has increased performance
greatly.  It didn't fix the planner mis-estimation, just the plan it chose.
 It's certainly not a panacea, but it's something we now try early on when
trying to speed up a query that matches these characteristics.
-Chris

On 3/18/08, Joshua D. Drake [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1


 On Tue, 18 Mar 2008 11:35:08 -0400
 Chris Kratz [EMAIL PROTECTED] wrote:

  Nondefault settings of interest from postgresql.conf
 
 
  shared_buffers = 1024MB # min 128kB or
  max_connections*16kB work_mem = 256MB
  # min 64kB maintenance_work_mem = 256MB# min 1MB
  random_page_cost = 1.75 # same scale as above
  effective_cache_size = 4096MB
  default_statistics_target = 100 # range 1-1000
 
 
  If nothing else, perhaps this will help somebody else who has run
  into the same problem.  If explain analyze of a query shows a large
  mis-estimation of rows returned on a join (estimate=1, actual=2k)
  causing the planner to choose nested loops instead of another join
  type, you might try running the query with nested loops set to off
  and see if that helps w/ performance.


 Did you try that? Did it work?

 Joshua D. Drake


 - --
 The PostgreSQL Company since 1997: http://www.commandprompt.com/
 PostgreSQL Community Conference: http://www.postgresqlconference.org/
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
   PostgreSQL political pundit | Mocker of Dolphins

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)

 iD8DBQFH3+TlATb/zqfZUUQRAmXUAKCjwidfW0KXjzUM26I4yTx94/wSiQCfaqWU
 eI9i5yucBH718okW3w2UewQ=
 =BO3E
 -END PGP SIGNATURE-



Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Matthew

On Tue, 18 Mar 2008, Chris Kratz wrote:
In moderately complex to very complex ad hoc queries in our system, we 
were consistently having the system massively underestimate the number 
of rows coming out of join at a low level making these queries very slow 
and inefficient.


I have long thought that perhaps Postgres should be a little more cautious 
about its estimates, and assume the worst case scenario sometimes, rather 
than blindly following the estimates from the statistics. The problem is 
that Postgres uses the statistics to generate best estimates of the cost. 
However, it does not take into account the consequences of being wrong. If 
it was more clever, then it may be able to decide to use a non-optimal 
algorithm according to the best estimate, if the optimal algorithm has the 
possibility of blowing up to 1000 times the work if the estimates are off 
by a bit.


Such cleverness would be very cool, but (I understand) a lot of work. It 
would hopefully solve this problem.


Matthew

--
Taking apron off And now you can say honestly that you have been to a
lecture where you watched paint dry.
- Computer Graphics Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Andrej Ricnik-Bay
On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote:
 Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
  double disk failure (such as during the load imposed by rebuild onto a
  spare) ?

I never said that we actually USED that set-up.  I just said
I did extensive testing with varied RAID-setups. ;}  We did go
with the 10 in the end because of that very consideration.

It's just that the mantra RAID5 = slow writes isn't quite true.


Cheers,
Andrej


-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer

Tom Lane wrote:

Ron Mayer [EMAIL PROTECTED] writes:

Would another possible condition for considering
Cartesian joins be be:



   * Consider Cartesian joins when a unique constraint can prove
 that at most one row will be pulled from one of the tables
 that would be part of this join?


What for?  That would still lead us to consider large numbers of totally
useless joins.

regards, tom lane


Often I get order-of-magnitude better queries by forcing the cartesian
join even without multi-column indexes.

Explain analyze results below.



Here's an example with your typical star schema.
  fact is the central fact table.
  d_ref is a dimension table for the referrer
  d_uag is a dimension table for the useragent.

Forcing the cartesan join using offset 0 makes
the the query take 14 ms (estimated cost 7575).

If I don't force the cartesian join the query takes
over 100ms (estimated cost 398919).

Indexes are on each dimension; but no multi-column
indexes (since the ad-hoc queries can hit any permutation
of dimensions).

logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' 
and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='
 and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' offset 0 ) as a;

  QUERY PLAN
--
 Nested Loop  (cost=6465.12..7575.91 rows=367 width=2096) (actual 
time=14.152..14.192 rows=4 loops=1)
   -  Limit  (cost=0.00..14.22 rows=1 width=218) (actual time=0.084..0.102 
rows=1 loops=1)
 -  Nested Loop  (cost=0.00..14.22 rows=1 width=218) (actual 
time=0.082..0.096 rows=1 loops=1)
   -  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 
rows=1 width=127) (actual time=0.056..0.058 rows=1 loops=1)
 Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 
'www.real.com'::text) AND ((ref_query)::text = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='::text))
   -  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 
rows=1 width=91) (actual time=0.020..0.029 rows=1 loops=1)
 Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] 
(WinNT; U ;Nav)'::text)
   -  Bitmap Heap Scan on fact  (cost=6465.12..7556.18 rows=367 width=32) 
(actual time=14.053..14.066 rows=4 loops=1)
 Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
 -  BitmapAnd  (cost=6465.12..6465.12 rows=367 width=0) (actual 
time=14.016..14.016 rows=0 loops=1)
   -  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 
rows=196223 width=0) (actual time=2.258..2.258 rows=7960 loops=1)
 Index Cond: (fact.uag_id = a.uag_id)
   -  Bitmap Index Scan on i__fact__ref_id  (cost=0.00..3581.50 
rows=253913 width=0) (actual time=9.960..9.960 rows=13751 loops=1)
 Index Cond: (fact.ref_id = a.ref_id)
 Total runtime: 14.332 ms
(15 rows)

logs=#



logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' 
and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='
 and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' ) as a;

   QUERY PLAN

 Hash Join  (cost=2827.72..398919.05 rows=1 width=242) (actual 
time=78.777..107.038 rows=4 loops=1)
   Hash Cond: (fact.ref_id = d_ref.ref_id)
   -  Nested Loop  (cost=2819.88..398908.65 rows=511 width=119) (actual 
time=6.311..101.843 rows=7960 loops=1)
 -  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 
width=91) (actual time=0.021..0.029 rows=1 loops=1)
   Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U 
;Nav)'::text)
 -  Bitmap Heap Scan on fact  (cost=2819.88..396449.49 rows=196223 
width=32) (actual time=6.273..91.645 rows=7960 loops=1)
   Recheck Cond: (fact.uag_id = d_uag.uag_id)
   -  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 

Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread david

On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:



On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote:

Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
 double disk failure (such as during the load imposed by rebuild onto a
 spare) ?


that's why you should use raid6 (allowing for dual failures)

David Lang


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Scott Marlowe
On Tue, Mar 18, 2008 at 9:58 AM, Chris Kratz [EMAIL PROTECTED] wrote:
 Y, turning nested loops off in specific cases has increased performance
 greatly.  It didn't fix the planner mis-estimation, just the plan it chose.
 It's certainly not a panacea, but it's something we now try early on when
 trying to speed up a query that matches these characteristics.

I have to admit I've had one or two reporting queries in the past that
turning off nested_loop was the only reasonable fix due to
misestimation.  I'd tried changing the stats targets etc and nothing
really worked reliably to prevent the nested_loop from showing up in
the wrong places.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread david

On Tue, 18 Mar 2008, Gregory Stark wrote:


[EMAIL PROTECTED] writes:


On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:



On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote:

Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
 double disk failure (such as during the load imposed by rebuild onto a
 spare) ?


that's why you should use raid6 (allowing for dual failures)


You can have as many parity drives as you want with RAID 5 too.


you can? I've never seen a raid 5 setup with more then a single parity 
dirve (or even the option of having more then one drives worth of 
redundancy). you can have hot-spare drives, but thats a different thing.


what controller/software lets you do this?

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Gregory Stark
[EMAIL PROTECTED] writes:

 On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:


 On 18/03/2008, Craig Ringer [EMAIL PROTECTED] wrote:
 Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
  double disk failure (such as during the load imposed by rebuild onto a
  spare) ?

 that's why you should use raid6 (allowing for dual failures)

You can have as many parity drives as you want with RAID 5 too.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] question on TRUNCATE vs VACUUM FULL

2008-03-18 Thread Mark Steben
Hi folks,

We are running Postgres 8.2.5.

I have 3 tables, call them A, B, and C

 

Table A houses info on all emails that have ever been created for the
purpose of being delivered to our end customers.

Big table.  About 23 million rows.

  Table B, the 'holding' table is populated with Table A key information via
an after trigger whenever Table A is updated or inserted to.

  Table C, the 'work' table is populated by function D from table B.  It is
configured exactly like table B.

  PLPGSQL Function D inserts a predefined number of rows from table B to
table C. For purposes of discussion, say 500.  

  Function D, after it does its thing, then deletes the 500 rows it
processed from table B, and ALL 500 rows from table C.

 

This entire process, after a sleep period of 10 seconds, repeats itself all
day.

 

After each fifth iteration of function D, we perform a VACUUM FULL on both
tables B and C. 

   Takes less than 5 seconds.

 

In terms of transaction processing:

  Table A is processed by many transactions (some read, some update), 

  Table B is processed by

- any transaction updating or inserting to Table A via the after
trigger (insert, update)

- Function D (insert, update, delete)

  Table C is processed ONLY by function D (insert, update, delete).  Nothing
else touches it;

PG_LOCKS table verifies that that this table is totally free of any
transaction 

Between iterations of function D.

 

So my question is this:  Shouldn't VACUUM FULL clean Table C and reclaim all
its space?

It doesn't.  It usually reports the same number of pages before and after
the Vacuum.

We have to resort to TRUNCATE to clean and reclaim this table, which

Must be empty at the beginning of function D. 

 

Any insights appreciated. Thanks,

 

Mark Steben

Senior Database Administrator
@utoRevenueT 
A Dominion Enterprises Company
480 Pleasant Street
Suite B200
Lee, MA 01238
413-243-4800 Home Office 
413-243-4809 Corporate Fax

msteben blocked::mailto:[EMAIL PROTECTED] @autorevenue.com

Visit our new website at 
 blocked::http://www.autorevenue.com/ www.autorevenue.com

 

IMPORTANT: The information contained in this e-mail message is confidential
and is intended only for the named addressee(s). If the reader of this
e-mail message is not the intended recipient (or the individual responsible
for the delivery of this e-mail message to the intended recipient), please
be advised that any re-use, dissemination, distribution or copying of this
e-mail message is prohibited.  If you have received this e-mail message in
error, please reply to the sender that you have received this e-mail message
in error and then delete it.  Thank you.

 



Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread Stephen Denne
Scott Marlowe wrote
 On Tue, Mar 18, 2008 at 9:58 AM, Chris Kratz 
 [EMAIL PROTECTED] wrote:
  Y, turning nested loops off in specific cases has increased 
 performance
  greatly.  It didn't fix the planner mis-estimation, just 
 the plan it chose.
  It's certainly not a panacea, but it's something we now try 
 early on when
  trying to speed up a query that matches these characteristics.
 
 I have to admit I've had one or two reporting queries in the past that
 turning off nested_loop was the only reasonable fix due to
 misestimation.  I'd tried changing the stats targets etc and nothing
 really worked reliably to prevent the nested_loop from showing up in
 the wrong places.

One cause of planner mis-estimation I've seen quite frequently is when there 
are a number of predicates on the data that filter the results in roughly the 
same manner. PostgreSQL, not knowing that the filters are highly correlated, 
multiplies the fraction of selected rows together.

Making up an example using pseudo-code, if this is one of the subqueries:

select * from orders where
order_date is recent
and
order_fulfilled is false

Used in an application where the unfulfilled orders are the recent ones.

If postgresql estimates that 1% of the orders are recent, and 1% are 
unfulfilled, then it will assume that 0.01% are both recent and unfulfilled. If 
in reality it's more like 0.9%, and your actual row count will be 90 times your 
estimate.

The only kind of simple behind-the-scenes fix for these situations that I know 
of is to add more indexes (such as a partial index on order_date where 
order_fulfilled is false), which slows down all your updates, and only works 
for the simplest situations.

A general fix would need to calculate, store, and lookup a huge amount of 
correlation data. Probably equal to the square of the number of rows in 
pg_stats, though this could possibly be generated as needed.

Perhaps if the analyze command was extended to be able to take a command line 
like:
ANALYZE CARTESIAN CORRELATION orders(order_date,order_fulfilled);
which stores the fraction for each combination of most frequent value, and 
domain buckets from order_date and order_fulfilled.
The difficulty is whether the planner can quickly and easily determine whether 
appropriate correlation data exists for the query plan it is estimating.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Craig Ringer

Gregory Youngblood wrote:

Also, a very informative read:
http://research.google.com/archive/disk_failures.pdf
In short, best thing to do is watch SMART and be prepared to try and
swap a drive out before it fails completely. :)
  
I currently have four brand new 1TB disks (7200RPM SATA - they're for 
our backup server). Two of them make horrible clicking noises - they're 
rapidly parking and unparking or doing constant seeks. One of those two 
also spins up very loudly, and on spin down rattles and buzzes.


Their internal SMART health check reports the problem two to be just 
fine, and both pass a short SMART self test (smartctl -d ata -t short). 
Both have absurdly huge seek_error_rate values, but the SMART thresholds 
see nothing wrong with this.


The noisy spin down one is so defective that I can't even write data to 
it successfully, and the other problem disk has regular I/O errors and 
fails an extended SMART self test (the short test fails).



I see this sort of thing regularly. Vendors are obviously setting the 
SMART health thresholds so that there's absolutely no risk of reporting 
an issue with a working drive, and in the process making it basically 
useless for detecting failing or faulty drives.


I rely on manual examination of the vendor attributes like the seek 
error rate, ECC recovered sectors, offline uncorrectable sectors 
(usually a REALLY bad sign if this grows), etc combined with regular 
extended SMART tests (which do a surface scan). Just using SMART - say, 
the basic health check - really isn't enough.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Craig Ringer

[EMAIL PROTECTED] wrote:
you can? I've never seen a raid 5 setup with more then a single parity 
dirve (or even the option of having more then one drives worth of 
redundancy). you can have hot-spare drives, but thats a different thing.


With RAID 4, where the parity drives  are in fact dedicated to parity 
information, the controller could just store the parity data mirrored on 
more than one drive. Unfortunately write performance on RAID 4 is 
absolutely horrible, and a second or third parity disk would not help 
with that.


I suppose there's nothing stopping a controller adding a second disk's 
worth of duplicate parity information when striping a four or more disk 
RAID 5 array, but I thought that's basically what RAID 6 was.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread david

On Wed, 19 Mar 2008, Craig Ringer wrote:


[EMAIL PROTECTED] wrote:
you can? I've never seen a raid 5 setup with more then a single parity 
dirve (or even the option of having more then one drives worth of 
redundancy). you can have hot-spare drives, but thats a different thing.


With RAID 4, where the parity drives  are in fact dedicated to parity 
information, the controller could just store the parity data mirrored on more 
than one drive. Unfortunately write performance on RAID 4 is absolutely 
horrible, and a second or third parity disk would not help with that.


I suppose there's nothing stopping a controller adding a second disk's worth 
of duplicate parity information when striping a four or more disk RAID 5 
array, but I thought that's basically what RAID 6 was.


just duplicating the Raid 4 or 5 pairity information will not help you if 
the parity drive is not one of the drives that fail.


raid 6 uses a different pairity algorithm so that any two drives in the 
array can fail with no data loss.


even this isn't completely error proof. I just went through a scare with a 
15 disk array where it reported 3 dead drives after a power outage. one of 
the dead drives ended up being the hot-spare, and another drive that acted 
up worked well enough to let me eventually recover all the data (seek 
errors), but it was a very scary week while I worked through this.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner mis-estimation using nested loops followup

2008-03-18 Thread KC ESL

At 00:24 08/03/19, Matthew wrote:

On Tue, 18 Mar 2008, Chris Kratz wrote:
In moderately complex to very complex ad hoc queries in our system, 
we were consistently having the system massively underestimate the 
number of rows coming out of join at a low level making these 
queries very slow and inefficient.


I have long thought that perhaps Postgres should be a little more 
cautious about its estimates, and assume the worst case scenario 
sometimes, rather than blindly following the estimates from the 
statistics. The problem is that Postgres uses the statistics to 
generate best estimates of the cost. However, it does not take into 
account the consequences of being wrong. If it was more clever, then 
it may be able to decide to use a non-optimal algorithm according to 
the best estimate, if the optimal algorithm has the possibility of 
blowing up to 1000 times the work if the estimates are off by a bit.


Such cleverness would be very cool, but (I understand) a lot of 
work. It would hopefully solve this problem.


Matthew


Just a crazy thought. If Postgres could check its own estimates or 
set some limits while executing the query and, if it found that the 
estimates were way off, fall back to a less optimal plan immediately 
or the next time, that would be cool.


KC 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Gregory Stark

[EMAIL PROTECTED] writes:

 On Tue, 18 Mar 2008, Gregory Stark wrote:

 You can have as many parity drives as you want with RAID 5 too.

 you can? I've never seen a raid 5 setup with more then a single parity dirve
 (or even the option of having more then one drives worth of redundancy). you
 can have hot-spare drives, but thats a different thing.

 what controller/software lets you do this?

Hm, some research shows I may have completely imagined this. I don't see why
you couldn't but I can't find any evidence that this feature exists. I could
have sworn I've seen it before though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Craig Ringer
[EMAIL PROTECTED] wrote:

 just duplicating the Raid 4 or 5 pairity information will not help you
 if the parity drive is not one of the drives that fail.

Good point - and no doubt why nothing supports extra disks worth of
parity on RAID 5, which would be entirely useless (still only protecting
against a 1-disk failure but wasting more space).

Except, apparently, the earlier poster's RAID 5 controller that DOES
support extra parity disks.

It must just be hot spares, nothing else makes any sense.

 even this isn't completely error proof. I just went through a scare with
 a 15 disk array where it reported 3 dead drives after a power outage.
 one of the dead drives ended up being the hot-spare, and another drive
 that acted up worked well enough to let me eventually recover all the
 data (seek errors), but it was a very scary week while I worked through
 this.

As file systems can be corrupted, files deleted, etc, I try to make sure
that all my data is sufficiently well backed up that a week's worth of
recovery effort is never needed. Dead array? Rebuild and restore from
backups. Admittedly this practice has arisen because of a couple of
scares much like you describe, but at least now it happens.

I even test the backups ;-)

Big SATA 7200rpm disks are so cheap compared to high performance SAS or
even 10kRPM SATA disks that it seems like a really bad idea not to have
a disk-based backup server with everything backed up quick to hand.

For that reason I'm absolutely loving PostgreSQL's archive_wal feature
and support for a warm spare server. I can copy the WAL files to another
machine and immediately restore them there (providing a certain level of
inherent testing) as well as writing them to tape. It's absolutely
wonderful. Sure, the warm spare will run like a man in knee-deep mud,
but it'll do in an emergency.

The existing database used by the app I'm working to replace is an
ISAM-based single host shared-file DB where all the user processes
access the DB directly. Concurrency is only supported through locking,
there's no transaction support, referential integrity checking, data
typing, no SQL of any sort, AND it's prone to corruption and data loss
if a user process is killed. User processes are killed when the user's
terminal is closed or loses its connection. Backups are only possible
once a day when all users are logged off. It's not an application where
losing half a day of data is fun. On top of all that it runs on SCO
OpenServer 5.0.5 (which has among other things the most broken C
toolchain I've ever seen).

So ... hooray for up-to-date, well tested backups and how easy
PostgreSQL makes them.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Craig Ringer

Gregory Youngblood wrote:

In my way of thinking, and what I was referring to above, was using
those error conditions to identify drives to change before the reported
complete failures. Yes, that will mean changing drives before SMART
actually says there is a full failure, and you may have to fight to get
a drive replaced under warranty when you do so, but you are protecting
your data.

  
I actually find it surprisingly easy to get a disk replaced based on a 
printed SMART report showing uncorrectable sectors or just very high 
reallocated sector counts etc. Almost suspiciously easy. I would not be 
at all surprised if the disk vendors are, at least for their 7200rpm 
SATA disks, recording a black mark against the serial number, doing a 
low level reformat and sending them back out as a new disk to another 
customer. Some of the new disks I've received have lifetimes and logs 
that suggest they might be such refurbs - much longer test logs than 
most new drives for example, as well as earlier serial numbers than 
others ordered at the same time. They're also much, much more likely to 
be DOA or develop defects early.

I agree with you completely that waiting for SMART to actually indicate
a true failure is pointless due to the thresholds set by mfrs. But using
SMART for early warning signs still has value IMO.
  
I could not agree more. smartmontools is right up there with tools like 
wireshark, mrt, and tcptraceroute in my most-vital toolbox, and it's 
mostly because of its ability to examine the vendor attributes and kick 
off scheduled self tests.


I've saved a great deal of dead-disk-replacement hassle by ensuring that 
smartd is configured to  run extended self tests on the disks in all the 
machines I operate at least fortnightly, and short tests at least 
weekly. Being able to plan ahead to swap a dying disk is very nice indeed.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TB-sized databases

2008-03-18 Thread Ron Mayer

Ron Mayer wrote:

Tom Lane wrote:

Ron Mayer [EMAIL PROTECTED] writes:

Would another possible condition for considering
Cartesian joins be be:
   * Consider Cartesian joins when a unique constraint can prove
 that at most one row will be pulled from one of the tables
 that would be part of this join?


What for?  That would still lead us to consider large numbers of totally
useless joins.


Often I get order-of-magnitude better queries by forcing the cartesian
join even without multi-column indexes.


Ah - and sometimes even 2 order of magnitude improvements.

1.1 seconds with Cartesian join, 200 seconds if it
doesn't use it.



logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' 
and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='
 and useragent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)' offset 0 ) as a;

  QUERY PLAN
--
 Nested Loop  (cost=6465.12..7575.91 rows=367 width=2096) (actual 
time=1118.741..1119.207 rows=122 loops=1)
   -  Limit  (cost=0.00..14.22 rows=1 width=218) (actual time=0.526..0.542 
rows=1 loops=1)
 -  Nested Loop  (cost=0.00..14.22 rows=1 width=218) (actual 
time=0.524..0.537 rows=1 loops=1)
   -  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 
rows=1 width=127) (actual time=0.168..0.170 rows=1 loops=1)
 Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 
'www.real.com'::text) AND ((ref_query)::text = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='::text))
   -  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 
rows=1 width=91) (actual time=0.347..0.355 rows=1 loops=1)
 Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; 
MSIE 5.01; Windows 98)'::text)
   -  Bitmap Heap Scan on fact  (cost=6465.12..7556.18 rows=367 width=32) 
(actual time=1118.196..1118.491 rows=122 loops=1)
 Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
 -  BitmapAnd  (cost=6465.12..6465.12 rows=367 width=0) (actual 
time=1115.565..1115.565 rows=0 loops=1)
   -  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 
rows=196223 width=0) (actual time=813.859..813.859 rows=1183470 loops=1)
 Index Cond: (fact.uag_id = a.uag_id)
   -  Bitmap Index Scan on i__fact__ref_id  (cost=0.00..3581.50 
rows=253913 width=0) (actual time=8.667..8.667 rows=13751 loops=1)
 Index Cond: (fact.ref_id = a.ref_id)
 Total runtime: 1122.245 ms
(15 rows)

logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' 
and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = 
'?ID=370DC=LANG=PN=RealOne%20PlayerPV=6.0.11.818PT=OS=CM=CMV=LS=RE=RA=RV='
 and useragent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)' ) as a;

   QUERY PLAN

 Hash Join  (cost=2827.72..398919.05 rows=1 width=242) (actual 
time=114138.193..200622.416 rows=122 loops=1)
   Hash Cond: (fact.ref_id = d_ref.ref_id)
   -  Nested Loop  (cost=2819.88..398908.65 rows=511 width=119) (actual 
time=1524.600..199522.182 rows=1183470 loops=1)
 -  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 
width=91) (actual time=0.023..0.033 rows=1 loops=1)
   Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 
5.01; Windows 98)'::text)
 -  Bitmap Heap Scan on fact  (cost=2819.88..396449.49 rows=196223 
width=32) (actual time=1524.562..197627.135 rows=1183470 loops=1)
   Recheck Cond: (fact.uag_id = d_uag.uag_id)
   -  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 
rows=196223 width=0) (actual time=758.888..758.888 rows=1183470 loops=1)
 Index Cond: (fact.uag_id = d_uag.uag_id)
   -  Hash  (cost=7.83..7.83 rows=1 width=127) (actual time=0.067..0.067 
rows=1 loops=1)
 -  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 
width=127) (actual 

Re: [PERFORM] question on TRUNCATE vs VACUUM FULL

2008-03-18 Thread Chris




So my question is this:  Shouldn’t VACUUM FULL clean Table C and reclaim 
all its space?


You've got concepts mixed up.

TRUNCATE deletes all of the data from a particular table (and works in 
all dbms's).


http://www.postgresql.org/docs/8.3/interactive/sql-truncate.html



VACUUM FULL is a postgres-specific thing which does work behind the 
scenes to clean up MVCC left-overs. It does not touch any current data 
or records in the table, it's purely behind the scenes work.


http://www.postgresql.org/docs/current/interactive/sql-vacuum.html


The two have completely different uses and nothing to do with each other 
what-so-ever.


--
Postgresql  php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Gregory Youngblood
On Wed, 2008-03-19 at 07:44 +0900, Craig Ringer wrote:

 Gregory Youngblood wrote:
  Also, a very informative read:
  http://research.google.com/archive/disk_failures.pdf
  In short, best thing to do is watch SMART and be prepared to try and
  swap a drive out before it fails completely. :)

 I currently have four brand new 1TB disks (7200RPM SATA - they're for 
 our backup server). Two of them make horrible clicking noises - they're 
 rapidly parking and unparking or doing constant seeks. One of those two 
 also spins up very loudly, and on spin down rattles and buzzes.
 
 Their internal SMART health check reports the problem two to be just 
 fine, and both pass a short SMART self test (smartctl -d ata -t short). 
 Both have absurdly huge seek_error_rate values, but the SMART thresholds 
 see nothing wrong with this.


--8 snip 8--

In that Google report, one of their conclusions was that after the first
scan error drives were 39 times more likely to fail within the next 60
days. And, first errors in reallocations, etc. also correlated to higher
failure probabilities.

In my way of thinking, and what I was referring to above, was using
those error conditions to identify drives to change before the reported
complete failures. Yes, that will mean changing drives before SMART
actually says there is a full failure, and you may have to fight to get
a drive replaced under warranty when you do so, but you are protecting
your data.

I agree with you completely that waiting for SMART to actually indicate
a true failure is pointless due to the thresholds set by mfrs. But using
SMART for early warning signs still has value IMO.