Re: [PERFORM] Planner issue on sorting joining of two tables with limit

Alexander Korotkov Fri, 07 May 2010 07:35:43 -0700

I found my mistake. My supposition is working only if value column in t1
table is unique. But if I replace the index by unique one then plan is the
same.


On Mon, May 3, 2010 at 5:57 PM, Alexander Korotkov <aekorot...@gmail.com>wrote:

> Well, no, because that plan wouldn't produce the specified ordering;
>> or at least it would be a lucky coincidence if it did.  It's only
>> sorting on t1.value.
>>
> I just don't find why it is coincidence. I think that such plan will always
> produce result ordered by two columns, because such nested index scan always
> produce this result.
>
> Let's consider some simple example in order to illustrate how this plan
> works.
>
> t1
> id | value
> ---+------
> 1  | 0.1
> 2  | 0.3
> 3  | 0.2
>
> t2
> id | id1 | value
> ---+-----+------
> 1  |  2  | 0.2
> 2  |  1  | 0.9
> 3  |  2  | 0.6
> 4  |  1  | 0.7
> 5  |  1  | 0.4
> 6  |  3  | 0.2
>
> 1) The outer index scan will find the row of t1 with least value using
> test1_value_idx. It will be row (1, 0.1)
> 2) The inner index scan will find all the rows in t2 where id1 = 1 using
> test2_id1_value_idx. But index test2_id1_value_idx have second order by
> value column and the index scan result will be ordered by value. That's why
> inner index scan will find rows (5, 1, 0.4), (4, 1, 0.7) and (2, 1, 0.9).
> And following query output will be produced:
>
> value1  | value2
> --------+-------
> 0.1     | 0.4
> 0.1     | 0.7
> 0.1     | 0.9
> 3) The outer index scan will find the row of t1 with the second value using
> test1_value_idx. It will be row (3, 0.2)
> 4) The inner index scan will find all the rows in t2 where id1 = 3 using
> test2_id1_value_idx. This row is (6, 3, 0.2). The following query output
> will be produced:
>
> value1  | value2
> --------+-------
> 0.2     | 0.2
> 5) The outer index scan will find the row of t1 with the third value using
> test1_value_idx. It will be row (2, 0.3)
> 6) The inner index scan will find all the rows in t2 where id1 = 2 using
> test2_id1_value_idx. These rows are (1, 2, 0.2) and (3, 2, 0.6). The
> following query output will be produced:
>
> value1  | value2
> --------+-------
> 0.3     | 0.2
> 0.3     | 0.6
>
> And the whole query result is:
> value1  | value2
> --------+-------
> 0.1     | 0.4
> 0.1     | 0.7
> 0.1     | 0.9
> 0.2     | 0.2
> 0.3     | 0.2
> 0.3     | 0.6
>
> And this result is really ordered by t1.value, t2.value.
> I can't find error in my reasoning :)
>
> The query without limit produce similar plan.
>
> EXPLAIN SELECT t1.value AS value1, t2.value AS value2 FROM test1 t1 JOIN
> test2 t2 ON t2.id1 = t1.id ORDER BY t1.value
>
> Nested Loop  (cost=0.00..62109.86 rows=995025 width=16)
>   ->  Index Scan using test1_value_idx on test1 t1  (cost=0.00..19.19
> rows=200 width=12)
>   ->  Index Scan using test2_id1_value_idx on test2 t2  (cost=0.00..248.27
> rows=4975 width=12)
>         Index Cond: (t2.id1 = t1.id)
>
> And I checked that the result is ordered by t1.value and t2.value.
>
> 2010/4/26 Tom Lane <t...@sss.pgh.pa.us>
>
>> =?KOI8-R?B?68/Sz9TLz9cg4czFy9PBzsTS?= <aekorot...@gmail.com> writes:
>>
>> > So PostgreSQL planner can produce the plan I need but it doesn't produce
>> > this plan when I specify particular second ordering column.
>>
>> Well, no, because that plan wouldn't produce the specified ordering;
>> or at least it would be a lucky coincidence if it did.  It's only
>> sorting on t1.value.
>>
>> > So is there any
>> > way to make planner produce desired plan when particular second ordering
>> > column is specified?
>>
>> Not when the ordering columns come from two different tables.  (If they
>> were in the same table then scanning a two-column index could produce
>> the demanded sort order.)  I don't see any way to satisfy this query
>> without an explicit sort step, which means it has to look at the whole
>> join output.
>>
>> If you're willing to make assumptions like "the required 10 rows will be
>> within the first 100 t1.value rows" then you could nest an ORDER BY
>> t1.value LIMIT 100 query inside something that did an ORDER BY with both
>> columns.  But this fails if you have too many duplicate t1.value values,
>> and your test case suggests that you might have a lot of them.  In any
>> case it would stop being fast if you make the inner LIMIT very large.
>>
>>                        regards, tom lane
>>
>
>

Re: [PERFORM] Planner issue on sorting joining of two tables with limit

Reply via email to