Hello all,

How can I speed up the sorting of a union all query?

Here are two queries which effectively create the same output.
However the first one pipes the output from the union all to the
shell's sort command whereas the second one used an order by clause.
In addition, the second query explodes in the amount of time it takes
for large data sets, becoming unusable after about 500,000 records.
In contrast, the first is still reasonable even past several million
records.

$ time -p { grep -v ^# <<eof
select amp1, tag, rowid
from foo
union all
select amp2, tag, rowid
from foo
;
eof
} | sqlite3 primers.db |
sort -k1,1n -k2,2n |
head -2

1|-1147116725663304346|67713
1|-1253798817002555285|42783
real 4.33
user 4.33
sys 0.13

$ time -p { grep -v ^# <<eof
select amp1 as "as1", tag, rowid
from foo
union all
select amp2 as "as1", tag, rowid
from foo
order by as1, tag
;
eof
} | sqlite3 primers.db |
head -2

1|-9220010096774152842|117488
1|-8636788692624495913|191979
real 12.64
user 7.44
sys 2.16

$ sqlite3 primers.db 'select count(*) from foo ; '
200000

The output for 500,000 records:

# first query
1|-1147116725663304346|67713
1|-1253798817002555285|42783
real 12.26
user 10.99
sys 0.66

# second query
1|-9220010096774152842|117488
1|-8636788692624495913|191979
real 62.82
user 20.22
sys 7.08

I am guessing that some kind of index should be built on the fields in
table foo.  But which?  I've tried the queries with indexes on amp1,
amp2, tag, (amp1+tag), (amp2+tag), (amp1+amp2+tag).  None show a
performance increase, which makes me believe that either I'm not
building the correct index or that the bottleneck is not the index.

How can I go about finding where the bottleneck is?

Thanks in advance for any pointers.

Regards,
- Robert
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to