>Sean Shanny
> Hardware:  Apple G5 dual 2.0 with 8GB memory attached via dual fibre
> channel to a fully loaded 3.5TB XRaid.  The XRaid is configured as two
> disk hardware based RAID5 sets software striped to form a RAID50 set.
> The DB, WALS, etc are all on that file set.  Running OSX journaled
> system  Running postgres 7.4.1.  OSX Server 10.3.2  Postgres is
> locally with  '--enable-recode' '--enable-multibyte=UNICODE'
> 'CFLAGS=-mcpu=970 -mtune=970 -mpowerpc64 -O3'

Have you tried altering the blocksize to a higher value? Say 32K?

> max_connections = 100

Why have you set this to 100 when you have typically 1-3 users?

> sort_mem = 256000               # min 64, size in KB

If you have only 1-3 users, then that value seems reasonable.

> The query is
> SELECT t1.id, t2.md5, t2.url FROM referral_temp t2 LEFT OUTER JOIN
> d_referral t1 ON t2.md5 = t1.referral_md5;
> \d d_referral
>  id                 | integer | not null
>  referral_md5       | text    | not null
>  referral_raw_url   | text    | not null
>  referral_host      | text    |
>  referral_path      | text    |
>  referral_query     | text    |
>  job_control_number | integer | not null
> \d referral_temp
>  md5    | text |
>  url    | text |

Have you looked at using reversed indexes, as per recent postings in
[performance]? These seemed to help considerably with lookup speed when
using a large URL database, which seems to be your situation here.

>Jeff Boes writes
> We have a large (several million row) table with a field containing
> URLs. Now, funny thing about URLs: they mostly start with a common
> substring ("http://www.";). But not all the rows start with this, so we
> can't just lop off the first N characters. However, we noticed some
> ago that an index on this field wasn't as effective as an index on the
> REVERSE of the field. So ...
> CREATE OR REPLACE FUNCTION fn_urlrev(text) returns text as '
> return reverse(lc($_[0]))
> ' language 'plperl' with (iscachable,isstrict);
> and then
> CREATE UNIQUE INDEX ix_links_3 ON links
> (fn_urlrev(path_base));

You have 2 CPUs: have you tried splitting your input data file into two
tables, then executing the same query simultaneously, to split the
processing? If you get the correct plan, you should use roughly the same
I/O but use all of the available CPU power.

I'm sure we'd all be interested in your further results!

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?


Reply via email to