Re: [PERFORM] General performance questions about postgres on Apple

Sean Shanny Mon, 23 Feb 2004 16:55:40 -0800

Simon Riggs wrote:

Sean Shanny Hardware: Apple G5 dual 2.0 with 8GB memory attached via dual fibre channel to a fully loaded 3.5TB XRaid. The XRaid is configured as two

7

disk hardware based RAID5 sets software striped to form a RAID50 set. The DB, WALS, etc are all on that file set. Running OSX journaled

file

system Running postgres 7.4.1. OSX Server 10.3.2 Postgres is

compiled

locally with '--enable-recode' '--enable-multibyte=UNICODE' 'CFLAGS=-mcpu=970 -mtune=970 -mpowerpc64 -O3'

Have you tried altering the blocksize to a higher value? Say 32K?

That is on our to do list. We had made that change while running on BSD 5.1on a Dell 2650 with 4GB and 5 10K SCSI drive in RAID 0. Did not see a huge improvement.

max_connections = 100

Why have you set this to 100 when you have typically 1-3 users?

Have already addressed that by lowering this to 50. Will drop it lower as time goes on.

sort_mem = 256000 # min 64, size in KB

If you have only 1-3 users, then that value seems reasonable.
The query is
SELECT t1.id, t2.md5, t2.url FROM referral_temp t2 LEFT OUTER JOIN
d_referral t1 ON t2.md5 = t1.referral_md5;
\d d_referral
id                 | integer | not null
referral_md5       | text    | not null
referral_raw_url   | text    | not null
referral_host      | text    |
referral_path      | text    |
referral_query     | text    |
job_control_number | integer | not null
\d referral_temp md5 | text | url | text |
Have you looked at using reversed indexes, as per recent postings in [performance]? These seemed to help considerably with lookup speed when using a large URL database, which seems to be your situation here.

We create an MD5 of the URL and store it as referral_md5. This is our key for lookup. We ran into problems with the URL as the index. The postgres indexing code was complaining about the URL being too long, hence the MD5 which thought longer to compute during the ETL phase is much quicker to match on.

...

Jeff Boes writes We have a large (several million row) table with a field containing URLs. Now, funny thing about URLs: they mostly start with a common substring ("http://www.";). But not all the rows start with this, so we can't just lop off the first N characters. However, we noticed some

time
ago that an index on this field wasn't as effective as an index on the
REVERSE of the field. So ...
CREATE OR REPLACE FUNCTION fn_urlrev(text) returns text as '
return reverse(lc($_[0]))
' language 'plperl' with (iscachable,isstrict);
and then

CREATE UNIQUE INDEX ix_links_3 ON links (fn_urlrev(path_base));
You have 2 CPUs: have you tried splitting your input data file into two tables, then executing the same query simultaneously, to split the processing? If you get the correct plan, you should use roughly the same I/O but use all of the available CPU power.

Have not considered that.

I'm sure we'd all be interested in your further results!

I will post things as I discover them.

--sean

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [PERFORM] General performance questions about postgres on Apple

Reply via email to