[HACKERS] "micro bucket sort" ...

Hans-Jürgen Schönig Wed, 11 Aug 2010 05:21:35 -0700

hello all ...

i am bugged with a small issue which is basically like this ...


test=# create table t_test as select x, x % 5 as y from generate_series(1, 
1000000) AS x;
SELECT
test=# create index idx_aaaaa on t_test (x) ;
CREATE INDEX
test=# ANALYZE ;
ANALYZE
test=# explain analyze select * from t_test order by x;
                                                             QUERY PLAN         
                                                    
------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_aaaaa on t_test  (cost=0.00..30408.36 rows=1000000 
width=8) (actual time=0.057..311.832 rows=1000000 loops=1)
 Total runtime: 392.943 ms
(2 rows)

we know that we get sorted output from the index and thus we do the index 
traversal here ...
if you add a condition to the sorting you will naturally get a sort in postgres 
because y is clearly now known to be sorted.

test=# explain analyze select * from t_test order by x, y;
                                                       QUERY PLAN               
                                        
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=141431.84..143931.84 rows=1000000 width=8) (actual 
time=1086.014..1271.257 rows=1000000 loops=1)
   Sort Key: x, y
   Sort Method:  external sort  Disk: 17608kB
   ->  Seq Scan on t_test  (cost=0.00..14425.00 rows=1000000 width=8) (actual 
time=0.024..143.474 rows=1000000 loops=1)
 Total runtime: 1351.848 ms
(5 rows)


same with limit ...


test=# explain analyze select * from t_test order by x, y limit 20;
                                                          QUERY PLAN            
                                              
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=41034.64..41034.69 rows=20 width=8) (actual time=317.939..317.943 
rows=20 loops=1)
   ->  Sort  (cost=41034.64..43534.64 rows=1000000 width=8) (actual 
time=317.934..317.936 rows=20 loops=1)
         Sort Key: x, y
         Sort Method:  top-N heapsort  Memory: 26kB
         ->  Seq Scan on t_test  (cost=0.00..14425.00 rows=1000000 width=8) 
(actual time=0.019..144.109 rows=1000000 loops=1)
 Total runtime: 317.995 ms
(6 rows)

now, the problem is: i cannot easily create additional indexes as i have too 
many possible "second" conditions here.
what makes it even more funny: i don't have enough space  to do the resort of 
the entire thing (X TB).
so, a more expensive index traversal is my only option.

my question is: is there already a concept out there to make this work or does 
anybody know of a patch out there addressing an issue like that?
some idea is heavily appreciated. it seems our sort key infrastructure is not 
enough for this.

        many thanks,

                hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "micro bucket sort" ...

Reply via email to