Re: [PERFORM] Need indexes on empty tables for good performance ?

2005-08-27 Thread Bruno Wolff III
On Tue, Aug 23, 2005 at 13:41:32 +1000,
  Lenard, Rohan (Rohan) [EMAIL PROTECTED] wrote:
 I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x)
 with EXPLAIN that indexes never seem to be used on empty tables - is
 there any reason to have indexes on empty tables, or will postgresql
 never use them.

count will use indexes if appropiate. The counts themselves are NOT in the
indexes, so counts of significant fractions of a table (in particular
of the whole table) won't benefit from indexes.

You aren't going to get query speed ups by putting indexes on empty tables.
However, they may be required if you have unique or primary keys declared
in the table. You may want them to enforce some kinds of constraints.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance for relative large DB

2005-08-27 Thread Chris Travers

tobbe wrote:


Hi Chris.

Thanks for the answer.
Sorry that i was a bit unclear.

1) We update around 20.000 posts per night.

2) What i meant was that we suspect that the DBMS called PervasiveSQL
that we are using today is much to small. That's why we're looking for
alternatives.

Today we base our solution much on using querry-specific tables created
at night, so instead of doing querrys direct on the post table (with
4-6M rows) at daytime, we have the data pre-aligned in several much
smaller tables. This is just to make the current DBMS coop with our
amount of data.

What I am particulary interested in is if we can expect to run all our
select querrys directly from the post table with PostgreSQL.
 

20k transactions per day?  Doesn't seem too bad.  That amounts to how 
many transactions per second during peak times?  Personally I don't 
think it will be a problem, but you might want to clarify what sort of 
load you are expecting during its peak time.



3) How well does postgres work with load balancing environments. Is it
built-in?
 

There is no load balancing built in.  You would need to use Slony-I 
and possibly Pg-Pool for that.  I don't know about Pg-Pool, but Slony-I 
was written in large part by member(s?) of the core development team so 
even if it is not built in it is not as if it is a team of outsiders 
who wrote it. 

If you need something proprietary, there are similar solutions with 
replication built in which are based on PostgreSQL and licensed under 
proprietary licenses.


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Weird performance drop after VACUUM

2005-08-27 Thread Umit Oztosun
On Cum, 2005-08-26 at 19:31 -0400, Tom Lane wrote:
 I think the problem is that the planner is underestimating the cost of
 evaluating this complicated filter condition:
 
  -  Seq Scan on scf_irsaliye irs  (cost=0.00..30.00 rows=1 
  width=20) (actual time=0.290..0.290 rows=0 loops=1)
Filter: (((karsifirma)::text = 'KENDI'::text) AND 
  (((turu)::text = 'MAI'::text) OR ((turu)::text = 'KGI'::text) OR 
  ((turu)::text = 'PS'::text) OR ((turu)::text = 'TS'::text) OR ((turu)::text 
  = 'KC'::text) OR ((turu)::text = 'KCO'::text)) AND 
  (((_key_sis_depo_dest)::text = '003l$1$$'::text) OR 
  ((_key_sis_depo_dest)::text = '0048$1$$'::text) OR 
  ((_key_sis_depo_dest)::text = '004b$1$$'::text) OR 
  ((_key_sis_depo_dest)::text = '004d$1$$'::text)) AND (tarih = 
  '2005-08-26'::date))
 
 While you could attack that by raising the cpu_operator_cost parameter,
 it would also be worth inquiring *why* the condition is so expensive to
 evaluate.  I am suspicious that you are running the database in a locale
 in which strcoll() is really slow.  Can you run it in C locale instead,
 or do you really need locale-aware behavior?  Can you switch to a
 different database encoding?  (A single-byte encoding such as Latin1
 might be faster than UTF8, for example.)

Yes, you are perfectly right. We are using UTF8 and tr_TR.UTF8 locale.
However, I tried the same tests with latin1 and C locale, it is surely
faster, but not dramatically. i.e.:

  Before VacuumAfter Vacuum
UTF8 and tr_TR.UTF8:   ~8 s~110 s
latin1 and C:  ~7 s ~65 s

I also played with cpu_operator_cost parameter and it dramatically
reduced query times, but not to the level before vacuum:

  Before VacuumAfter Vacuum
UTF8 and tr_TR.UTF8:   ~8 s~11 s
latin1 and C:  ~7 s ~9 s

These values are much better but I really wonder if I can reach the
performance levels before vacuum. I am also worried about the
side-effects that may be caused by the non-default cpu_operator_cost
parameter.

 Another possibility is to take a hard look at whether you can't simplify
 the filter condition, but that'd require more knowledge of your
 application than I have.

Yes that is another option, we are even considering schema changes to
use less character types, but these are really costly and error-prone
operations at the moment.

 Or you could just play with the order of the filter conditions ... for
 example, the date condition at the end is probably far cheaper to test
 than the text comparisons, so if that's fairly selective it'd be worth
 putting it first.

We are experimenting on this.

Thanks your help!

Best Regards,
Umit Oztosun


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Weird performance drop after VACUUM

2005-08-27 Thread Steinar H. Gunderson
On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote:
 Or you could just play with the order of the filter conditions ... for
 example, the date condition at the end is probably far cheaper to test
 than the text comparisons, so if that's fairly selective it'd be worth
 putting it first.

That's an interesting approach -- could the planner do such things itself?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Inefficient queryplan for query with intersectable

2005-08-27 Thread Arjen van der Meijden



On 27-8-2005 0:56, Tom Lane wrote:

Arjen van der Meijden [EMAIL PROTECTED] writes:

As said, it chooses sequential scans or the wrong index plans over a 
perfectly good plan that is just not selected when the parameters are 
too well tuned or sequential scanning of the table is allowed.



I think some part of the problem comes from using inconsistent
datatypes.  For instance, it seems very odd that the thing is not
using a hash or something to handle

 t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)

seeing that it correctly guesses there are only going to be about 8 rows
in the union.  Part of the reason is that cat2 is smallint, whereas the
output of the union must be at least int, maybe wider depending on the
datatype of cat.id (which you did not show us); so the comparison isn't
hashable.  Even a smallint vs int comparison would be mergejoinable,
though, so I'm really wondering what cat.id is.


cat.id is a smallint. I replaced that subquery with these two:
t_0.Cat2 IN (SELECT '545'::smallint UNION SELECT ID FROM cat WHERE 
ParentID = '545'::smallint)


t_0.Cat2 IN (SELECT '545' UNION SELECT ID FROM cat WHERE ParentID = '545')

But appareantly there is a bug in the explain mechanism of the 8.1devel 
I'm using (I downloaded a nightly 25 august somewhere in the morning 
(CEST)), since it returned:

ERROR:  bogus varno: 9

So I can't see whether the plan changed, execution times didn't change 
much. I also replaced the subselect with the result of that query (like 
('545', '546', ...) ) but that didn't seem to make much difference in 
the execution time as well. The plan did change of course, it used a 
BitmapOr of 8 Bitmap Index Scans over the pwprodukten.


By the way, as far as I know, this is the only datatype mismatch in the 
query.



Another big part of the problem comes from poor result size estimation.
I'm not sure you can eliminate that entirely given the multiple
conditions on different columns (which'd require cross-column statistics
to really do well, which we do not have).  But you could avoid
constructs like

WHERE ... t_1.recordtimestamp =
  (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')

The planner is basically going to throw up its hands and make a default
guess on the selectivity of this; it's not smart enough to decide that
the sub-select probably represents a constant.  What I'd do with this
is to define a function marked STABLE for the sub-select result, perhaps
something like

[...]

need.)  Then write the query like

WHERE ... t_1.recordtimestamp = get_last_date('pricetracker', 60)

In this formulation the planner will be able to make a reasonable guess
about how many rows will match ... at least if your statistics are up
to date ...


I tried such a function and also tried replacing it with the fixed 
outcome of that suquery itself. Although it has a considerable more 
accurate estimate of the rows returned, it doesn't seem to impact the 
basic plan much. It does make the sub-query itself use another index 
(the one on the recordtimestamp alone, rather than the combined index on 
leverancierid and recordtimestamp).

With that changed subquery it estimates about 4173 rows over 4405 real rows.

Actually with the adjusted or original query, it seems to favor the hash 
join over a nested loop, but the rest of the plan (for the subqueries) 
seems to be exactly the same.


Here is the first part of the explain analyze when it can do any trick 
it wants:
 Hash Join  (cost=7367.43..186630.19 rows=132426 width=12) (actual 
time=191.726..11072.025 rows=58065 loops=1)

   Hash Cond: (outer.produktid = inner.id)
   -  Seq Scan on pwprijs chart_2  (cost=0.00..137491.07 rows=7692207 
width=16) (actual time=0.018..6267.744 rows=7692207 loops=1)
   -  Hash  (cost=7366.02..7366.02 rows=565 width=4) (actual 
time=123.265..123.265 rows=103 loops=1)
 -  SetOp Intersect  (cost=7332.10..7360.37 rows=565 width=4) 
(actual time=115.760..123.192 rows=103 loops=1)

[snip]  

And here is the first (and last) part when I disable hash joins or seq 
scans:
 Nested Loop  (cost=7334.92..517159.39 rows=132426 width=12) (actual 
time=111.905..512.575 rows=58065 loops=1)
   -  SetOp Intersect  (cost=7332.10..7360.37 rows=565 width=4) 
(actual time=111.588..120.035 rows=103 loops=1)

[snip]
   -  Bitmap Heap Scan on pwprijs chart_2  (cost=2.82..895.85 rows=234 
width=16) (actual time=0.344..2.149 rows=564 loops=103)

 Recheck Cond: (chart_2.produktid = outer.id)
 -  Bitmap Index Scan on pwprijs_produktid_idx 
(cost=0.00..2.82 rows=234 width=0) (actual time=0.189..0.189 rows=564 
loops=103)

   Index Cond: (chart_2.produktid = outer.id)

Is a nested loop normally so much (3x) more costly than a hash join? Or 
is it just this query that gets estimated wronly?


Best regards,

Arjen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  

Re: [PERFORM] Inefficient queryplan for query with intersectable

2005-08-27 Thread Tom Lane
Arjen van der Meijden [EMAIL PROTECTED] writes:
 But appareantly there is a bug in the explain mechanism of the 8.1devel 
 I'm using (I downloaded a nightly 25 august somewhere in the morning 
 (CEST)), since it returned:
 ERROR:  bogus varno: 9

Yeah, someone else sent in a test case for this failure (or at least one
with a similar symptom) yesterday.  I'll try to fix it today.

 Is a nested loop normally so much (3x) more costly than a hash join? Or 
 is it just this query that gets estimated wronly?

There's been some discussion that we are overestimating the cost of
nestloops in general, because we don't take into account that successive
scans of the inner relation are likely to find many pages already in
cache from the earlier scans.  So far no one's come up with a good cost
model to use for this, though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Inefficient queryplan for query with

2005-08-27 Thread Ron

At 10:27 AM 8/27/2005, Tom Lane wrote:

Arjen van der Meijden [EMAIL PROTECTED] writes:
 But appareantly there is a bug in the explain mechanism of the 8.1devel
 I'm using (I downloaded a nightly 25 august somewhere in the morning
 (CEST)), since it returned:
 ERROR:  bogus varno: 9

Yeah, someone else sent in a test case for this failure (or at least one
with a similar symptom) yesterday.  I'll try to fix it today.

 Is a nested loop normally so much (3x) more costly than a hash join? Or
 is it just this query that gets estimated wronly?

There's been some discussion that we are overestimating the cost of
nestloops in general, because we don't take into account that successive
scans of the inner relation are likely to find many pages already in
cache from the earlier scans.  So far no one's come up with a good cost
model to use for this, though.

regards, tom lane
It certainly seems common in the EXPLAIN ANALYZE output I see that 
the (estimated) cost of Nested Loop is far higher than the actual 
time measured.


What happened when someone tried the naive approach of telling the 
planner to estimate the cost of a nested loop based on fitting 
whatever entities are involved in the nested loop in RAM as much as 
possible?  When there are multiple such mappings, use whichever one 
results in the lowest cost for the NL in question.


Clearly, this should lead to an underestimate of the cost of the 
constant of operation involved, but since nested loops have the only 
polynomial growth function of the planner's choices, NL's should 
still have a decent chance of being more expensive than other choices 
under most circumstances.


In addition, if those costs are based on actual measurements of how 
long it takes to do such scans then the estimated cost has a decent 
chance of being fairly accurate under such circumstances.


It might not work well, but it seems like a reasonable first attempt 
at a solution?

Ron Peacetree



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Weird performance drop after VACUUM

2005-08-27 Thread Steinar H. Gunderson
On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote:
 It could, but it doesn't really have enough information.  We don't
 currently have any model that some operators are more expensive than
 others.  IIRC the only sort of reordering the current code will do
 in a filter condition list is to push clauses involving sub-SELECTs
 to the end.

I was more thinking along the lines of reordering a AND/OR b to b AND/OR
a if b has lower selectivity than a.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Weird performance drop after VACUUM

2005-08-27 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 On Sat, Aug 27, 2005 at 11:05:01AM -0400, Tom Lane wrote:
 It could, but it doesn't really have enough information.  We don't
 currently have any model that some operators are more expensive than
 others.  IIRC the only sort of reordering the current code will do
 in a filter condition list is to push clauses involving sub-SELECTs
 to the end.

 I was more thinking along the lines of reordering a AND/OR b to b AND/OR
 a if b has lower selectivity than a.

Yeah, but if b is considerably more expensive to evaluate than a, that
could still be a net loss.  To do it correctly you really need to trade
off cost of evaluation against selectivity, and the planner currently
only knows something about the latter (and all too often, not enough :-().

I'd like to do this someday, but until we get some cost info in there
I think it'd be a mistake to do much re-ordering of conditions.
Currently the SQL programmer can determine what happens by writing his
query carefully --- if we reorder based on selectivity only, we could
make things worse, and there'd be no way to override it.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Weird performance drop after VACUUM

2005-08-27 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 On Fri, Aug 26, 2005 at 07:31:51PM -0400, Tom Lane wrote:
 Or you could just play with the order of the filter conditions ... for
 example, the date condition at the end is probably far cheaper to test
 than the text comparisons, so if that's fairly selective it'd be worth
 putting it first.

 That's an interesting approach -- could the planner do such things itself?

It could, but it doesn't really have enough information.  We don't
currently have any model that some operators are more expensive than
others.  IIRC the only sort of reordering the current code will do
in a filter condition list is to push clauses involving sub-SELECTs
to the end.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
 c.id ORDER BY c.id DESC LIMIT 5;
 [ fails to pick an available index-scan-backward plan ]

I looked into this and found that indeed the desirable join plan was
getting generated, but it wasn't picked because query_planner didn't
have an accurate idea of how much of the join needed to be scanned to
satisfy the GROUP BY step.  I've committed some changes that hopefully
will let 8.1 be smarter about GROUP BY ... LIMIT queries.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Mark Kirkwood

Tom Lane wrote:


I looked into this and found that indeed the desirable join plan was
getting generated, but it wasn't picked because query_planner didn't
have an accurate idea of how much of the join needed to be scanned to
satisfy the GROUP BY step.  I've committed some changes that hopefully
will let 8.1 be smarter about GROUP BY ... LIMIT queries.



Very nice :-)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
QUERY PLAN 


--
 Limit  (cost=0.00..15.23 rows=5 width=4)
   -  Group  (cost=0.00..243730.00 rows=8 width=4)
 -  Nested Loop  (cost=0.00..243530.00 rows=8 width=4)
   -  Index Scan Backward using c_pkey on c 
(cost=0.00..1450.00 rows=8 width=4)
   -  Index Scan using b_on_c on b  (cost=0.00..3.01 
rows=1 width=4)

 Index Cond: (b.c_id = outer.id)
(6 rows)

This is 8.1devel from today.

regards

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Tobias Brox
[Tom Lane]
 I looked into this and (...) I've committed some changes that hopefully will
 let 8.1 be smarter about GROUP BY ... LIMIT queries.

[Mark Kirkwood]
 Very nice :-)
(...)
 This is 8.1devel from today.

Splendid :-) Unfortunately we will not be upgrading for some monthes still,
but anyway I'm happy.  This provides yet another good argument for upgrading
sooner.  I'm also happy to see such a perfect match:

 - A problem that can be reduced from beeing complex and
   production-specific, to simple and easily reproducible.
   
 - Enthusiastic people testing it and pinpointing even more precisely what
   conditions will cause the condition
   
 - Programmers actually fixing the issue
 
 - Testers verifying that it was fixed
 
Long live postgresql! :-) 

-- 
Notice of Confidentiality: This email is sent unencrypted over the network,
and may be stored on several email servers; it can be read by third parties
as easy as a postcard.  Do not rely on email for confidential information.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings