Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-17 Thread Simon Riggs
 Iain
 Joe's example wasn't excluding partions, as he didn't use a
 predicated UNION
 ALL view to select from. His queries use an indexed column that allow the
 various partitions to be probed at low cost, and he was satisfied
 wth that.

Agreed - very very interesting design though.

 My point in my previous post was that you could still do all that that if
 you wanted to, by building the predicated view with UNION ALL of
 each of the
 child tables.


AFAICS of all the designs proposed there is still only one design *using
current PostgreSQL* that allows partitions to be excluded from queries as a
way of speeding up queries against very large tables: UNION ALL with
appended constants.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-17 Thread Manfred Koizar
On Thu, 16 Sep 2004 20:51:11 -0500, Stephen Crowley
[EMAIL PROTECTED] wrote:
explain analyze select * from history where date='2004-09-07' and
stock='ORCL' LIMIT 10;

  -  Index Scan using island_history_date_stock_time on
island_history  (cost=0.00..183099.72 rows=102166 width=83) (actual
time=1612.000..1702.000 rows=10 loops=1)
  ^^
LIMIT 10 hides what would be the most interesting info here.  I don't
believe that
EXPLAIN ANALYSE SELECT * FROM history WHERE ...
consumes lots of memory.  Please try it.

And when you post the results please include your Postgres version, some
info about hardware and OS, and your non-default settings, especially
random_page_cost and effective_cache_size.

May I guess that the correlation of the physical order of tuples in your
table to the contents of the date column is pretty good (examine
correlation in pg_stats) and that island_history_date_stock_time is a
3-column index?

It is well known that the optimizer overestimates the cost of index
scans in those situations.  This can be compensated to a certain degree
by increasing effective_cache_size and/or decreasing random_page_cost
(which might harm other planner decisions).

You could also try
CREATE INDEX history_date_stock ON history(date, stock);

This will slow down INSERTs and UPDATEs, though.

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-17 Thread Gaetano Mendola
Mischa Sandberg wrote:
In the meantime, what I gather from browsing mail archives is that 
postgresql on Solaris seems to get hung up on IO rather than CPU.
Furthermore, I notice that Oracle and now MySQL use directio to bypass 
the system cache, when doing heavy writes to the disk; and Postgresql 
does not.

Not wishing to alter backend/store/file for this test, I figured I could 
get a customer to mount the UFS volume for pg_xlog  with the option 
forcedirectio.

Any comment on this? No consideration of what the wal_sync_method is at 
this point. Presumably it's defaulting to fdatasync on Solaris.

BTW this is Postgres 7.4.1, and our customers are Solaris 8 and 9.
If you care your data upgrade to more recent 7.4.5
Test your better sync method using /src/tools/fsync  however do some
experiment changing the sync method, you can also avoid to update the
acces time for the inodes mounting the partition with noatime option
( this however have more impact on performance for read activities )
Regards
Gaetano Mendola

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-17 Thread Stephen Crowley
Ok.. now I ran VACUUM FULL' and things seem to be working as they should.. 

explain analyze select * from history where date='2004-09-07' and stock='MSFT';

Seq Scan on island_history  (cost=0.00..275359.13 rows=292274
width=83) (actual time=50.000..411683.000 rows=265632 loops=1)
  Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'MSFT'::text))
Total runtime: 412703.000 ms

random_page_cost and effective_cache_size are both default, 8 and 1000

explain analyze select * from history where date='2004-09-07' and stock='ORCL';

Index Scan using island_history_date_stock_time on island_history 
(cost=0.00..181540.07 rows=102166 width=83) (actual
time=551.000..200268.000 rows=159618 loops=1)
  Index Cond: ((date = '2004-09-07'::date) AND ((stock)::text = 'ORCL'::text))
Total runtime: 201009.000 ms

So now  this in all in proportion and works as expected.. the question
is, why would the fact that it needs to be vaccumed cause such a huge
hit in performance? When i vacuumed it did free up nearly 25% of the
space.

--Stephen

On Fri, 17 Sep 2004 22:44:05 +0200, Manfred Koizar [EMAIL PROTECTED] wrote:
 On Thu, 16 Sep 2004 20:51:11 -0500, Stephen Crowley
 [EMAIL PROTECTED] wrote:
 explain analyze select * from history where date='2004-09-07' and
 stock='ORCL' LIMIT 10;
 
   -  Index Scan using island_history_date_stock_time on
 island_history  (cost=0.00..183099.72 rows=102166 width=83) (actual
 time=1612.000..1702.000 rows=10 loops=1)
   ^^
 LIMIT 10 hides what would be the most interesting info here.  I don't
 believe that
 EXPLAIN ANALYSE SELECT * FROM history WHERE ...
 consumes lots of memory.  Please try it.
 
 And when you post the results please include your Postgres version, some
 info about hardware and OS, and your non-default settings, especially
 random_page_cost and effective_cache_size.
 
 May I guess that the correlation of the physical order of tuples in your
 table to the contents of the date column is pretty good (examine
 correlation in pg_stats) and that island_history_date_stock_time is a
 3-column index?
 
 It is well known that the optimizer overestimates the cost of index
 scans in those situations.  This can be compensated to a certain degree
 by increasing effective_cache_size and/or decreasing random_page_cost
 (which might harm other planner decisions).
 
 You could also try
 CREATE INDEX history_date_stock ON history(date, stock);
 
 This will slow down INSERTs and UPDATEs, though.
 
 Servus
  Manfred


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org