Re: [SQL] How slow is distinct - 2nd

Michael Contzen Tue, 08 Oct 2002 09:07:46 -0700

Bruno Wolff III schrieb:
> 
> On Tue, Oct 01, 2002 at 14:18:50 +0200,
>   Michael Contzen <[EMAIL PROTECTED]> wrote:
> > Here the table:
> >
> > mc=# \d egal
> >      Table "public.egal"
> >  Column |  Type   | Modifiers
> > --------+---------+-----------
> >  i      | integer |
> >
> > mc=# select count(*) from egal;
> >   count
> > ---------
> >  7227744
> > (1 row)
> >
> > mc=# select count(distinct i) from egal;
> >  count
> > -------
> >     67
> > (1 row)
> 
> This suggests that the best way to do this is with a hash instead of a sort.
> 
> If you have lots of memory you might try increasing the sort memory size.


Hello,

ok, sort_mem was still set to the default (=1024). I've increased it to 
  sort_mem=10240
which results to: (same machine, same data, etc.)

time echo "select distinct i from egal;"|psql mc >/dev/null
 

real    2m30.667s
user    0m0.000s
sys     0m0.010s

If I set sort_mem=1024000:

time echo "select distinct i from egal;"|psql mc >/dev/null
 
real    0m52.274s
user    0m0.020s
sys     0m0.000s

wow, in comparison to nearly 5 minutes before this is quite good
speedup.

But: 

All the work could be done in memory as the processor load shows (output
of top, which shows the following output during all the time)


  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 8310 postgres  17   0  528M 528M  2712 R    99.9 13.5   0:11 postmaster

Even it nearly performs 5 times faster than before with 1M memory,
postgres is still
8 times slower than oracle. Further increasing of sort_mem to 4096000
doesn't reduce the
time, as the cpu load cannot increased any more :-)

But increasing the memory in that way is not realy a solution: Normaly
not all the data
fits into memory. In our application I guess 10%.

Oracle has even less memory:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 8466 oracle    14   0 11052  10M 10440 R    99.9  0.2   0:01 oracle

(this 10M session memory plus 32M shared memory pool not shown here).

This shows to me, that oracle uses a quite different algorithm for this
task. May be it
uses some hashing-like algorithm first without sorting before. I don't
know oracle enough, perhaps this is that "sort unique" step in the
planners output. 
I think, first Postgres sorts all the data, which results to temporary
data of the same
size than before and which needs to be written to disk at least once,
and after that postgres does the unique operation, right? 

If I can do any more tests to oracle or postgres, let me know.

Kind regards,

Michael Contzen

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] How slow is distinct - 2nd

Reply via email to