[PERFORM] slow result

2007-01-23 Thread Laurent Manchon

Hi,

I have a slow response of my PostgreSQL database 7.4 using this query below
on a table with 80 rows:

select count(*)from tbl;

PostgreSQL return result in 28 sec every time.
although MS-SQL return result in 0.02 sec every time.

My server is a DELL PowerEdge 2600 with bi-processor Xeon at 3.2 Ghz
with 3GBytes RAM


My PostgreSQL Conf is
*
log_connections = yes
syslog = 2
effective_cache_size = 5
sort_mem = 1
max_connections = 200
shared_buffers = 3000
vacuum_mem = 32000
wal_buffers = 8
max_fsm_pages = 2000
max_fsm_relations = 100

Can you tell me is there a way to enhence performance ?

Thank you





+-+
| Laurent Manchon |
| Email: [EMAIL PROTECTED] |
+-+


Re: [PERFORM] slow result

2007-01-23 Thread Heiko W.Rupp


Am 23.01.2007 um 11:34 schrieb Laurent Manchon:


Hi,

I have a slow response of my PostgreSQL database 7.4 using this  
query below

on a table with 80 rows:

select count(*)from tbl;


count(*) is doing a full tablescan over all your 80 rows. This is  
a well known feature

of postgres :-/

So enhancing the performance is currently only possible by having  
faster disk drives.

--
Heiko W.Rupp
   [EMAIL PROTECTED], http://www.dpunkt.de/buch/ 
3-89864-429-4.html





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] slow result

2007-01-23 Thread A. Kretschmer
am  Tue, dem 23.01.2007, um 11:34:52 +0100 mailte Laurent Manchon folgendes:
 Hi,
 
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;

If i remember correctly, i saw this question yesterday on an other
list...


Answer:

Because PG force a sequencial scan. You can read a lot about this in the
archives. Here some links to explanations:

http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10117cNode=0T1L6L
http://sql-info.de/postgresql/postgres-gotchas.html#1_7
http://www.varlena.com/GeneralBits/49.php


Hope that helps, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow result

2007-01-23 Thread Steinar H. Gunderson
On Tue, Jan 23, 2007 at 11:34:52AM +0100, Laurent Manchon wrote:
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;

Contrary to your expectations, this is _not_ a query you'd expect to be fast
in Postgres. Try real queries from your application instead -- most likely,
you'll find them to be much master. (If not, come back with the query, the
schema and the EXPLAIN ANALYZE output of your query, and you'll usually get
help nailing down the issues. :-) )

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] slow result

2007-01-23 Thread Steinar H. Gunderson
On Tue, Jan 23, 2007 at 11:55:41AM +0100, Steinar H. Gunderson wrote:
 you'll find them to be much master.

s/master/faster/

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] slow result

2007-01-23 Thread Laurent Manchon

Hi,

I have a slow response of my PostgreSQL database 7.4 using this query below
on a table with 80 rows:

select count(*)from tbl;

PostgreSQL return result in 28 sec every time.
although MS-SQL return result in 0.02 sec every time.

My server is a DELL PowerEdge 2600 with bi-processor Xeon at 3.2 Ghz
with 3GBytes RAM


My PostgreSQL Conf is
*
log_connections = yes
syslog = 2
effective_cache_size = 5
sort_mem = 1
max_connections = 200
shared_buffers = 3000
vacuum_mem = 32000
wal_buffers = 8
max_fsm_pages = 2000
max_fsm_relations = 100

Can you tell me is there a way to enhence performance ?

Thank you





+-+
| Laurent Manchon |
| Email: [EMAIL PROTECTED] |
+-+


Re: [PERFORM] slow result

2007-01-23 Thread A. Kretschmer
am  Tue, dem 23.01.2007, um 13:34:19 +0100 mailte Laurent Manchon folgendes:
 Hi,
 
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;

PLEASE READ THE ANSWERS FOR YOUR OTHER MAILS.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Tobias Brox
Does anyone have experience with using postgres for data warehousing?
Right, I saw one post suggestion to use mysql for a mostly read-only
database ... but anyway, I think it's not a question to change the
database platform for this project, at least not today ;-)

Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having
attributes for what day of the week it is, month, day of the month,
week number, bank holiday, anything special, etc.  Well, it does make
sense if adding lots of information there that cannot easily be pulled
out from elsewhere - but as for now, I'm mostly only interessted in
grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.

The question is ... I'm curious about what would yield the highest
performance - when choosing between:

  select extract(week from created), ...
  from some_table
  where ...
  group by extract(week from created), ...
  sort by extract(week from created), ...

and:

  select date_dim.week_num, ...
  from some_table join date_dim ...
  where ...
  group by date_dim.week_num, ...
  sort by date_dim, week_num, ...

The date_dim table would eventually cover ~3 years of operation, that
is less than 1000 rows.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Chad Wagner

On 1/23/07, Tobias Brox [EMAIL PROTECTED] wrote:


Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having



I would tend to agree with this line of thought.


out from elsewhere - but as for now, I'm mostly only interessted in

grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.



I guess go with your gut, but at some point the expressions are going to be
too complicated to maintain, and inefficient.

Calendar tables are very very common, because traditional date functions
simply can't define business logic (especially things like month end close,
quarter end close, and year end close) that doesn't have any repeating
patterns (every 4th friday, 1st monday in the quarter, etc).  Sure you can
stuff it into a function, but it just isn't as maintainable as a table.




--
Chad
http://www.postgresqlforums.com/


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Tobias Brox
[Chad Wagner - Tue at 08:24:34AM -0500]
 I guess go with your gut, but at some point the expressions are going to be
 too complicated to maintain, and inefficient.

The layout of my system is quite flexible, so it should eventually be
fairly trivial to throw in a date dimension at a later stage.

 Calendar tables are very very common, because traditional date functions
 simply can't define business logic (especially things like month end close,
 quarter end close, and year end close) that doesn't have any repeating
 patterns (every 4th friday, 1st monday in the quarter, etc).  Sure you can
 stuff it into a function, but it just isn't as maintainable as a table.

So far I haven't been bothered with anything more complex than clean
weeks, months, quarters, etc.

I suppose the strongest argument for introducing date dimensions already
now is that I probably will benefit from having conform and
well-designed dimensions when I will be introducing more data marts.  As
for now I have only one fact table and some few dimensions in the
system.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] slow result

2007-01-23 Thread Bill Moran
In response to Laurent Manchon [EMAIL PROTECTED]:
 
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;
 
 PostgreSQL return result in 28 sec every time.
 although MS-SQL return result in 0.02 sec every time.
 
 My server is a DELL PowerEdge 2600 with bi-processor Xeon at 3.2 Ghz
 with 3GBytes RAM

While there's truth in everything that's been said by others, the query
should not take _that_ long.  I just tried a count(*) on a table with
460,000 rows, and it took less than a second.  count(*) in PostgreSQL
is not likely to compare to most other RDBMS for the reasons others have
stated, but counting 800,000 rows shouldn't take 28 seconds.

The standard question applies: have you vacuumed recently?

 My PostgreSQL Conf is
 *
 log_connections = yes
 syslog = 2
 effective_cache_size = 5
 sort_mem = 1
 max_connections = 200
 shared_buffers = 3000
 vacuum_mem = 32000
 wal_buffers = 8
 max_fsm_pages = 2000
 max_fsm_relations = 100
 
 Can you tell me is there a way to enhence performance ?

On our 4G machines, we use shared_buffers=24 (which equates to about
2G).  The only reason I don't set it higher is that FreeBSD has a limit on
shared memory of 2G.

The caveat here is that I'm running a mix of 8.1 and 8.2.  There have been
significant improvements in both the usage of shared memory, and the
optimization of count(*) since 7.4, so the first suggestion I have is to
upgrade your installation.

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] slow result

2007-01-23 Thread Ron

At 07:34 AM 1/23/2007, Laurent Manchon wrote:

Hi,

I have a slow response of my PostgreSQL database 7.4 using this query below
on a table with 80 rows:


1= Upgrade to the latest stable version of pg.  That would be 
8.2.x   You are very much in the Dark Ages pg version wise.

pg 8.x has significant IO enhancements.  Especially compared to 7.4.


select count(*)from tbl;

PostgreSQL return result in 28 sec every time.
although MS-SQL return result in 0.02 sec every time.


2= pg actually counts how many rows there are in a table.  MS-SQL 
looks up a count value from a internal data table... which can be 
wrong in extraordinarily rare circumstances in a MVCC DBMS (which 
MS-SQL is !not!.  MS-SQL uses the older hierarchical locking strategy 
for data protection.)
Since pg actually scans the table for the count, pg's count will 
always be correct.  No matter what.


Since MS-SQL does not use MVCC, it does not have to worry about the 
corner MVCC cases that pg does.
OTOH, MVCC _greatly_ reduces the number of cases where one 
transaction can block another compared to the locking strategy used in MS-SQL.
This means in real day to day operation, pg is very likely to handle 
OLTP loads and heavy loads better than MS-SQL will.


In addition, MS-SQL is a traditional Codd  Date table oriented 
DBMS.  pg is an object oriented DBMS.


Two very different products with very different considerations and 
goals (and initially designed at very different times historically.)


Compare them under real loads using real queries if you are going to 
compare them.  Comparing pg and MS-SQL using fluff queries like 
count(*) is both misleading and a waste of effort.




My server is a DELL PowerEdge 2600 with bi-processor Xeon at 3.2 Ghz
with 3GBytes RAM


My PostgreSQL Conf is
*
log_connections = yes
syslog = 2
effective_cache_size = 5
sort_mem = 1
max_connections = 200
shared_buffers = 3000
vacuum_mem = 32000
wal_buffers = 8
max_fsm_pages = 2000
max_fsm_relations = 100

Can you tell me is there a way to enhence performance ?
There are extensive FAQs on what the above values should be for 
pg.  The lore is very different for pg 8.x vs pg 7.x



Thank you

You're welcome.

Ron Peacetree


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Postgres processes have a burst of CPU usage

2007-01-23 Thread Subramaniam Aiylam
Hello all,

  I have a setup in which four client machines access
a Postgres database (8.1.1) (on a Linux box). So,
there are connections from each machine to the
database; hence, the Linux box has about 2 postgres
processes associated with each machine.

  I am using the JDBC driver
(postgresql-8.1-404.jdbc3.jar) to talk to the
database. I am also using the Spring framework(1.2.2)
and Hibernate (3.0.5) on top of JDBC. I use Apache's
DBCP database connection pool (1.2.1).

  Now, there is one particular update that I make from
one of the client machines - this involves a
reasonably large object graph (from the Java point of
view). It deletes a bunch of rows (around 20 rows in
all) in 4-5 tables and inserts another bunch into the
same tables.

  When I do this, I see a big spike in the CPU usage
of postgres processes that are associated with ALL the
client machines, not just the one I executed the
delete/insert operation on. The spike seems to happen
a second or two AFTER the original update completes
and last for a few seconds.

  Is it that this operation is forcibly clearing some
client cache on ALL the postgres processes? Why is
there such an interdependency? Can I set some
parameter to turn this off?

Regards and thanks,
S.Aiylam




 

Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Merlin Moncure

On 1/23/07, Tobias Brox [EMAIL PROTECTED] wrote:

Does anyone have experience with using postgres for data warehousing?
Right, I saw one post suggestion to use mysql for a mostly read-only
database ... but anyway, I think it's not a question to change the
database platform for this project, at least not today ;-)

Ralph Kimball seems to be some kind of guru on data warehousing, and
in his books he's strongly recommending to have a date dimension -
simply a table describing all dates in the system, and having
attributes for what day of the week it is, month, day of the month,
week number, bank holiday, anything special, etc.  Well, it does make
sense if adding lots of information there that cannot easily be pulled
out from elsewhere - but as for now, I'm mostly only interessted in
grouping turnover/profit by weeks/months/quarters/years/weekdays.  It
seems so much bloated to store this information, my gut feeling tells it
should be better to generate them on the fly.  Postgres even allows to
create an index on an expression.

The question is ... I'm curious about what would yield the highest
performance - when choosing between:

  select extract(week from created), ...
  from some_table
  where ...
  group by extract(week from created), ...
  sort by extract(week from created), ...

and:

  select date_dim.week_num, ...
  from some_table join date_dim ...
  where ...
  group by date_dim.week_num, ...
  sort by date_dim, week_num, ...

The date_dim table would eventually cover ~3 years of operation, that
is less than 1000 rows.



In my opinion, I would make a date_dim table for this case.  I would
strongly advice against making a date_id field, just use the date
itself as the p-key (i wouldn't bother with RI links to the table
though).

I would also however make a function and use this to make the record:
create or replace function make_date_dim(in_date date) returns
date_dim as $$ [...]

And make date_dim records this way:
insert into date_dim select * from make_dim('01/01/2001'::date);

(or pre-insert with generate_series).
now you get the best of both worlds: you can join to the table for the
general case or index via function for special case indexes.  for
example suppose you had to frequently count an account's sales by
fiscal year quarter irrespective of year:

create index q_sales_idx on account_sale(account_no,
(make_dim(sale_date)).fiscal_quarter);

also you can use the function in place of a join if you want.  In some
cases the join may be better, though.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] slow result

2007-01-23 Thread Bruno Wolff III
On Tue, Jan 23, 2007 at 11:34:52 +0100,
  Laurent Manchon [EMAIL PROTECTED] wrote:
 Hi,
 
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;
 
 PostgreSQL return result in 28 sec every time.
 although MS-SQL return result in 0.02 sec every time.

Besides the other advice mentioned in this thread, check that you don't
have a lot of dead tuples in that table. 28 seconds seems a bit high
for even a sequential scan of 80 tuples unless they are pretty large.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-23 Thread Dave Dutcher
Hello,

I discovered a query which is taking 70 seconds on 8.2.1 which used to take
under a second on 8.1.2.  I was digging into what was causing it and I
believe the problem is a view which the planner estimates will return 1 row
when it actually returns 3500.  When I join with the view, the planner ends
up using a nested loop because it thinks the right branch will run once
instead of 3500 times.  I've analyzed all the tables and played around with
the default_statistics_target, but still the planner estimates 1 row.  I was
wondering if anybody else has any other ideas?  

Here is the query the view is defined as:

SELECT foo.fund_id, foo.owner_trader_id, foo.strategy_id, foo.cf_account_id,
foo.instrument_id, sum(foo.pos) AS pos, sum(foo.cost) AS cost
FROM 
( 
SELECT om_position.fund_id, om_position.owner_trader_id,
om_position.strategy_id, om_position.cf_account_id,
om_position.instrument_id, om_position.pos, om_position.cost
FROM om_position
WHERE om_position.as_of_date = date(now())
UNION ALL 
SELECT om_trade.fund_id, om_trade.owner_trader_id,
om_trade.strategy_id, om_trade.cf_account_id, om_trade.instrument_id,
om_trade.qty::numeric(22,9) AS pos, om_trade.cost
FROM om_trade
WHERE om_trade.process_state = 0 OR om_trade.process_state = 2
) foo
GROUP BY foo.fund_id, foo.owner_trader_id, foo.strategy_id,
foo.cf_account_id, foo.instrument_id;



Here is explain analyze from both 8.1.2 and 8.2.1 with
default_statistics_target=10 and tables freshly analyzed:




8.1.2
HashAggregate  (cost=4760.33..4764.95 rows=308 width=168) (actual
time=56.873..71.293 rows=3569 loops=1)
  -  Append  (cost=0.00..4675.85 rows=3072 width=54) (actual
time=0.037..38.261 rows=3715 loops=1)
-  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..4637.10 rows=3071 width=54) (actual time=0.031..14.722 rows=3559
loops=1)
  Index Cond: (as_of_date = date(now()))
-  Bitmap Heap Scan on om_trade  (cost=4.01..8.03 rows=1 width=48)
(actual time=0.118..0.917 rows=156 loops=1)
  Recheck Cond: ((process_state = 0) OR (process_state = 2))
  -  BitmapOr  (cost=4.01..4.01 rows=1 width=0) (actual
time=0.079..0.079 rows=0 loops=1)
-  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..2.00 rows=1 width=0)
(actual time=0.060..0.060 rows=156 loops=1)
  Index Cond: (process_state = 0)
-  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..2.00 rows=1 width=0)
(actual time=0.008..0.008 rows=0 loops=1)
  Index Cond: (process_state = 2)
Total runtime: 82.398 ms

8.2.1
HashAggregate  (cost=6912.51..6912.53 rows=1 width=200) (actual
time=19.005..24.137 rows=3569 loops=1)
  -  Append  (cost=0.00..6406.73 rows=28902 width=200) (actual
time=0.037..11.569 rows=3715 loops=1)
-  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..4333.82 rows=2964 width=53) (actual time=0.035..4.884 rows=3559
loops=1)
  Index Cond: (as_of_date = date(now()))
-  Bitmap Heap Scan on om_trade  (cost=464.40..1783.89 rows=25938
width=49) (actual time=0.060..0.380 rows=156 loops=1)
  Recheck Cond: ((process_state = 0) OR (process_state = 2))
  -  BitmapOr  (cost=464.40..464.40 rows=308 width=0) (actual
time=0.041..0.041 rows=0 loops=1)
-  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..225.72 rows=154 width=0)
(actual time=0.032..0.032 rows=156 loops=1)
  Index Cond: (process_state = 0)
-  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..225.72 rows=154 width=0)
(actual time=0.003..0.003 rows=0 loops=1)
  Index Cond: (process_state = 2)
Total runtime: 27.193 ms





Here is explain analyze from 8.2.1 with default_statistics_target=1000 and
tables freshly analyzed:




HashAggregate  (cost=5344.36..5344.37 rows=1 width=200) (actual
time=18.826..23.950 rows=3569 loops=1)
  -  Append  (cost=0.00..5280.01 rows=3677 width=200) (actual
time=0.031..11.606 rows=3715 loops=1)
-  Index Scan using as_of_date_om_position_index on om_position
(cost=0.00..5224.44 rows=3502 width=54) (actual time=0.029..4.903 rows=3559
loops=1)
  Index Cond: (as_of_date = date(now()))
-  Bitmap Heap Scan on om_trade  (cost=9.91..18.79 rows=175
width=49) (actual time=0.069..0.394 rows=156 loops=1)
  Recheck Cond: ((process_state = 0) OR (process_state = 2))
  -  BitmapOr  (cost=9.91..9.91 rows=2 width=0) (actual
time=0.050..0.050 rows=0 loops=1)
-  Bitmap Index Scan on
om_trade_partial_process_state_index  (cost=0.00..5.57 rows=2 width=0)
(actual time=0.039..0.039 rows=156 loops=1)
  Index Cond: (process_state = 0)
-  Bitmap Index Scan on

Re: [PERFORM] slow result

2007-01-23 Thread Mark Kirkwood

Laurent Manchon wrote:

Hi,

I have a slow response of my PostgreSQL database 7.4 using this query below
on a table with 80 rows:

select count(*)from tbl;

PostgreSQL return result in 28 sec every time.



Can you post the results of:

analyze verbose tbl;
explain analyze select count(*) from tbl;

The first will give us some info about how many pages tbl has (in 7.4 
ISTR it does not state the # of dead rows... but anyway), the second 
should help us deduce why it is so slow.


Also as others have pointed out, later versions are quite a bit faster 
for sequential scans...


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Mark Kirkwood

Tobias Brox wrote:



I suppose the strongest argument for introducing date dimensions already
now is that I probably will benefit from having conform and
well-designed dimensions when I will be introducing more data marts.  As
for now I have only one fact table and some few dimensions in the
system.



Another factors to consider is that end user tools (and end users) may 
find a date/time dimension helpful.


Best wishes

Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] how to plan for vacuum?

2007-01-23 Thread Galy Lee
Hi,

For I can not find too much information about how to use vacuum, I want
to ask some general information about the guideline of vacuum planning.

1.  How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
- vacuum threshold and scale factor (500/0.2)
   - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters?  When does it need to
change the default values?
 
3. How to tune cost-based delay vacuum?
I had searched in performance list; it seems that most of the practices
are based on experience / trial-and-error approach to meet the
requirement of disk utilization or CPU utilization. Is there any other
guild line to set them?

For when autovacuum is turned on by default, if the parameters for
vacuum have not been set well, it will make the system rather unstable.
So I just wonder if we should setup a section in the manual about the
tips of vacuum, then many users can easily set the vacuum parameters for
their system.

Best Regards
Galy Lee
NTT OSS Center

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-23 Thread Tom Lane
Dave Dutcher [EMAIL PROTECTED] writes:
 I discovered a query which is taking 70 seconds on 8.2.1 which used to take
 under a second on 8.1.2.  I was digging into what was causing it and I
 believe the problem is a view which the planner estimates will return 1 row
 when it actually returns 3500.

This is evidently a failure of estimate_num_groups().  However, I do not
see any difference in that code between 8.1 and 8.2 branch tips.  I do
notice a possibly-relevant change that was applied in 8.1.4:

2006-05-02 00:34  tgl

* src/backend/: optimizer/path/allpaths.c, utils/adt/selfuncs.c
(REL8_1_STABLE): Avoid assuming that statistics for a parent
relation reflect the properties of the union of its child relations
as well.  This might have been a good idea when it was originally
coded, but it's a fatally bad idea when inheritance is being used
for partitioning.  It's better to have no stats at all than
completely misleading stats.  Per report from Mark Liberman.

The bug arguably exists all the way back, but I've only patched
HEAD and 8.1 because we weren't particularly trying to support
partitioning before 8.1.

Eventually we ought to look at deriving union statistics instead of
just punting, but for now the drop kick looks good.

I think this was only meant to apply to table inheritance situations,
but on reflection it might affect UNION queries too.  The question is
whether the numbers it was using before really mean anything --- they
seem to have been better-than-nothing in your particular case, but I'm
not sure that translates to a conclusion that we should depend on 'em.

In fact, since there isn't any parent relation in a UNION, I'm not
sure that this patch actually changed your results ... but I'm not
seeing what else would've ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] how to plan for vacuum?

2007-01-23 Thread Galy Lee
Just have one example here:

workload:  run pgbench in 365x24x7
database size: 100GB

the workload distribution:
  06:00-24:00   100tps
  00:00-06:00   20tps

how should we plan vacuum for this situation to get the highest performance?

Best regards
Galy

Galy Lee wrote:
 Hi,
 
 For I can not find too much information about how to use vacuum, I want
 to ask some general information about the guideline of vacuum planning.
 
 1.  How do we know if autovacuum is enough for my application, or should
 I setup a vacuum manually from cron for my application?
 
 2. How to set the GUC parameters for autovacuum?
 There are two sets of parameters for autovacuum:
 - vacuum threshold and scale factor (500/0.2)
    - analyze threshold and scale factor(250/0.1)
 Is there any guideline to set these parameters?  When does it need to
 change the default values?
  
 3. How to tune cost-based delay vacuum?
 I had searched in performance list; it seems that most of the practices
 are based on experience / trial-and-error approach to meet the
 requirement of disk utilization or CPU utilization. Is there any other
 guild line to set them?
 
 For when autovacuum is turned on by default, if the parameters for
 vacuum have not been set well, it will make the system rather unstable.
 So I just wonder if we should setup a section in the manual about the
 tips of vacuum, then many users can easily set the vacuum parameters for
 their system.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org