Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-25 Thread Matteo Beccati

Tom Lane ha scritto:

Alvaro Herrera [EMAIL PROTECTED] writes:

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above


is wrong because there's no guarantee that the first 50 elements of a
seqscan will be anything special.  You could imagine dealing with that
by sorting the seqscan results and limiting to 50, or by not
sorting/limiting that data at all but letting the upper sort see all the
seqscan entries.  Offhand I think either of those could win depending on
how many elements the seqscan will yield.  Also, it might be interesting
to consider inventing a merge plan node type that takes N
already-sorted inputs and produces a sorted output stream.  Then we'd
need to trade off this approach versus doing the top-level sort, which
could cope with some of its inputs not being pre-sorted.

This seems to have some aspects in common with the recent discussion
about how to optimize min/max aggregates across an appendrel set.


The plan proposed by Alvaro reminds me of:

http://archives.postgresql.org/pgsql-performance/2005-09/msg00047.php

My proposal was in fact (Alvaro's plan + first Tom's suggested change):

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 Sort (key: pse_lastlogin)
SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

The plan was generated rewriting the query to use explicit subselect and 
forcing the planner to order by and limit for each subquery.


I've tried a few times to write a patch to handle it, but I wasn't able 
to do it because of my lack of internals knowledge and spare time.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake
Hello,

We have a problem with CE that I want to verify is either expected
behavior, a bug or something else :).

Yes constraint exclusion is on.

I have tried increasing the default_statistics_target (all the way 1000)
no change in behavior.

Query plan with ORDER BY:

 Limit  (cost=47110.19..47110.31 rows=50 width=8) (actual
time=6088.013..6088.269 rows=50 loops=1)
   -  Sort  (cost=47110.19..47943.14 rows=333179 width=8) (actual
time=6088.007..6088.104 rows=50 loops=1)
 Sort Key: public.tbl_profile_search.pse_lastlogin
 -  Result  (cost=0.00..16547.78 rows=333179 width=8) (actual
time=0.020..4339.472 rows=334319 loops=1)
   -  Append  (cost=0.00..16547.78 rows=333179 width=8)
(actual time=0.016..3208.022 rows=334319 loops=1)
 -  Seq Scan on tbl_profile_search
(cost=0.00..2.27 rows=1 width=8) (actual time=0.012..0.047 rows=2 loops=1)
   Filter: (((pse_normalized_text)::text =
'1'::text) AND (pse_interest_type = 10))
 -  Index Scan using index_pse_09_on_part_1 on
tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
rows=1 width=8) (actual time=0.202..0.202 rows=0 loops=1)
   Index Cond: ((pse_normalized_text)::text =
'1'::text)
   Filter: (pse_interest_type = 10)
 -  Bitmap Heap Scan on
tbl_profile_search_interest_10 tbl_profile_search
(cost=3579.12..16540.78 rows=333177 width=8) (actual
time=90.619..2116.224 rows=334317 loops=1)
   Recheck Cond: ((pse_normalized_text)::text =
'1'::text)
   Filter: (pse_interest_type = 10)
   -  Bitmap Index Scan on
index_pse_09_on_part_10  (cost=0.00..3579.12 rows=333177 width=0)
(actual time=89.052..89.052 rows=340964 loops=1)
 Index Cond:
((pse_normalized_text)::text = '1'::text)
 Total runtime: 6103.190 ms


Same query, just removed ORDER BY:


---
 Limit  (cost=0.00..2.48 rows=50 width=4) (actual time=0.025..57.146
rows=50 loops=1)
   -  Result  (cost=0.00..16549.78 rows=333179 width=4) (actual
time=0.021..56.993 rows=50 loops=1)
 -  Append  (cost=0.00..16549.78 rows=333179 width=4) (actual
time=0.017..56.835 rows=50 loops=1)
   -  Seq Scan on tbl_profile_search  (cost=0.00..2.27
rows=1 width=4) (actual time=0.013..0.050 rows=2 loops=1)
 Filter: (((pse_normalized_text)::text = '1'::text)
AND (pse_interest_type = 10))
   -  Index Scan using index_pse_09_on_part_1 on
tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
rows=1 width=4) (actual time=0.051..0.051 rows=0 loops=1)
 Index Cond: ((pse_normalized_text)::text = '1'::text)
 Filter: (pse_interest_type = 10)
   -  Bitmap Heap Scan on tbl_profile_search_interest_10
tbl_profile_search  (cost=3581.12..16542.78 rows=333177 width=4) (actual
time=56.481..56.573 rows=48 loops=1)
 Recheck Cond: ((pse_normalized_text)::text = '1'::text)
 Filter: (pse_interest_type = 10)
 -  Bitmap Index Scan on index_pse_09_on_part_10
(cost=0.00..3581.12 rows=333177 width=0) (actual time=54.999..54.999
rows=341233 loops=1)
   Index Cond: ((pse_normalized_text)::text =
'1'::text)
 Total runtime: 57.396 ms


-- 

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 We have a problem with CE that I want to verify is either expected
 behavior, a bug or something else :).

Uh, what's your problem exactly?  The example only seems to demonstrate
that if you don't ask for a sort, you don't get one.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake
Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 We have a problem with CE that I want to verify is either expected
 behavior, a bug or something else :).
 
 Uh, what's your problem exactly?  The example only seems to demonstrate
 that if you don't ask for a sort, you don't get one.

Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
versus only scanning the partition that has the data in it.

Sincerely,

Joshua D. Drake


 
   regards, tom lane
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Alvaro Herrera
Joshua D. Drake wrote:
 Tom Lane wrote:
  Joshua D. Drake [EMAIL PROTECTED] writes:
  We have a problem with CE that I want to verify is either expected
  behavior, a bug or something else :).
  
  Uh, what's your problem exactly?  The example only seems to demonstrate
  that if you don't ask for a sort, you don't get one.
 
 Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
 versus only scanning the partition that has the data in it.

Huh, but that's not what the EXPLAIN ANALYZE you posted says ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Joshua D. Drake
Alvaro Herrera wrote:
 Joshua D. Drake wrote:
 Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 We have a problem with CE that I want to verify is either expected
 behavior, a bug or something else :).
 Uh, what's your problem exactly?  The example only seems to demonstrate
 that if you don't ask for a sort, you don't get one.
 Sorry. The problem is, if I ask for an ORDER BY it scans all partitions
 versus only scanning the partition that has the data in it.
 
 Huh, but that's not what the EXPLAIN ANALYZE you posted says ...
 

Sorry I realize the error of my ways. It isn't that it is scanning all
partitions, it is that it is scanning all of a single partition (subject
to the WHERE clause). That is correct behavior.

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Alvaro Herrera

I followed up with Joshua on Jabber.  This is the query:

SELECT pse_userid FROM tbl_profile_search WHERE pse_normalized_text='1'
and pse_interest_type = 10 order by pse_lastlogin DESC limit 50 offset 0

I suggested adding an index on (pse_normalized_text, pse_lastlogin), on
the assumption that the planner would get the sorted output from there
and be able to push the LIMIT clause, just below the indexscan, thus
saving the big heap scan (and a sort across a large result set).  But it
turns out the index is already there.

So it seems to me to be a planner shortcoming.  Is this correct?

My idea of the plan would be (tabs=8 spaces)

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  SeqScan tbl_profile_search
  Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

Is this possible?  It would be very fast.  Maybe it should be like this
instead:

Limit (50)
  Sort (key: pse_lastlogin)
Result
   Append
  Limit (50)
 SeqScan tbl_profile_search
  Limit (50)
 Indexscan tbl_profile_search_interest_1
  Limit (50)
 IndexScan on the index mentioned above

This is the actual plan:

  Limit  (cost=47110.19..47110.31 rows=50 width=8) (actual
 time=6088.013..6088.269 rows=50 loops=1)
-  Sort  (cost=47110.19..47943.14 rows=333179 width=8) (actual
 time=6088.007..6088.104 rows=50 loops=1)
  Sort Key: public.tbl_profile_search.pse_lastlogin
  -  Result  (cost=0.00..16547.78 rows=333179 width=8) (actual
 time=0.020..4339.472 rows=334319 loops=1)
-  Append  (cost=0.00..16547.78 rows=333179 width=8)
 (actual time=0.016..3208.022 rows=334319 loops=1)
  -  Seq Scan on tbl_profile_search
 (cost=0.00..2.27 rows=1 width=8) (actual time=0.012..0.047 rows=2 loops=1)
Filter: (((pse_normalized_text)::text =
 '1'::text) AND (pse_interest_type = 10))
  -  Index Scan using index_pse_09_on_part_1 on
 tbl_profile_search_interest_1 tbl_profile_search  (cost=0.00..4.73
 rows=1 width=8) (actual time=0.202..0.202 rows=0 loops=1)
Index Cond: ((pse_normalized_text)::text =
 '1'::text)
Filter: (pse_interest_type = 10)
  -  Bitmap Heap Scan on
 tbl_profile_search_interest_10 tbl_profile_search
 (cost=3579.12..16540.78 rows=333177 width=8) (actual
 time=90.619..2116.224 rows=334317 loops=1)
Recheck Cond: ((pse_normalized_text)::text =
 '1'::text)
Filter: (pse_interest_type = 10)
-  Bitmap Index Scan on
 index_pse_09_on_part_10  (cost=0.00..3579.12 rows=333177 width=0)
 (actual time=89.052..89.052 rows=340964 loops=1)
  Index Cond:
 ((pse_normalized_text)::text = '1'::text)
  Total runtime: 6103.190 ms


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Incorrect behavior with CE and ORDER BY

2006-10-24 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Is this possible?  It would be very fast.

It's possible but not exactly simple.  As an example, your proposed
plan:

 Limit (50)
   Sort (key: pse_lastlogin)
 Result
Append
   Limit (50)
SeqScan tbl_profile_search
 Limit (50)
Indexscan tbl_profile_search_interest_1
 Limit (50)
IndexScan on the index mentioned above

is wrong because there's no guarantee that the first 50 elements of a
seqscan will be anything special.  You could imagine dealing with that
by sorting the seqscan results and limiting to 50, or by not
sorting/limiting that data at all but letting the upper sort see all the
seqscan entries.  Offhand I think either of those could win depending on
how many elements the seqscan will yield.  Also, it might be interesting
to consider inventing a merge plan node type that takes N
already-sorted inputs and produces a sorted output stream.  Then we'd
need to trade off this approach versus doing the top-level sort, which
could cope with some of its inputs not being pre-sorted.

This seems to have some aspects in common with the recent discussion
about how to optimize min/max aggregates across an appendrel set.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match