Re: [GENERAL] SELECT DISTINCT very slow

2009-07-10 Thread Ben Harper
Thanks for all the feedback.

Using GROUP BY is indeed much faster (about 1 second).

Unfortunately I can't use GROUP BY, because what I'm really doing is
SELECT DISTINCT ON(unique_field) id FROM table;

I'm not familiar with the Postgres internals, but in my own DB system
that I have written, I do the skip-scanning thing, and for my system
it was a really trivial optimization to code. I know, I'm always free
to submit a patch, and hopefully someday I will, if it hasn't already
been done by then.

I can't comment on whether this skip-scan optimization is general
enough to warrant the lines of code, but I might as well explain my
use case:
Inside a GIS application, the user wants to categorize the display of
some information based on, in this case, the suburb name.
He clicks a button that says "Add All Unique Categories". This is a
very common operation in this domain.

Again, thanks for all the feedback. I'll upgrade to 8.4 soon.
Ben Harper

On Fri, Jul 10, 2009 at 2:50 AM, Tom Lane wrote:
> Greg Stark  writes:
>> Not really. The OP doesn't say how wide the record rows are but unless
>> they're very wide it wouldn't pay to use an index for this even if you
>> didn't have to access the heap also. It's going to be faster to scan
>> the whole heap and either sort or use a hash. Currently there aren't
>> many cases where a btree with 6,000 copies of 111 distinct keys is
>> going to be useful.
>
> It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
> but as you say we haven't done it.
>
> In any case I think the real issue is that the OP is probably using a
> pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
> Hash aggregation would be a whole lot faster for these numbers, even
> if not exactly instantaneous.  He could update to 8.4, or go over to
> using GROUP BY as was recommended upthread.
>
>                        regards, tom lane
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] SELECT DISTINCT very slow

2009-07-09 Thread Ben Harper
Hi,
Can anybody explain this:

Records: 600,000
Field Width: Approximately 20 UTF8 characters - type is VARCHAR(25)
Field is Indexed.

SELECT DISTINCT field FROM table;

Takes about 6 seconds. There are 111 distinct items.

On Sqlite, and another place where I have a B+Tree, this query is
faster than my eye can measure.

Is this a well known issue?

Thanks,
Ben

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Create db with template does not transfer ownership

2009-06-29 Thread Ben Harper
I'm trying to create a PostGIS database by using

CREATE DATABASE mydb TEMPLATE template_postgis;

using the Windows one-click installer package, with PostGIS 1.3 /
Postgres 8.3.7.

But my question is more general:
When a new database is created base on a template, the owner of those
tables is not changed. The owner of the copied tables remains the
original owner in the template. This leaves the owner of the new DB
unable to access those tables. The end result is that you cannot use a
system-wide template database, unless you want to grant full access to
these tables. Should the action of the 'TEMPLATE' option not be to
transfer ownership to the owner of the new DB? I don't understand how
one is supposed to use this mechanism.

Thanks,
Ben

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general