On Tue, 18 Oct 2005 08:52:15 +0100, Richard Huxton wrote: > Martin Nickel wrote: >> When I turn of seqscan it does use the index - and it runs 20 to 30% >> longer. Based on that, the planner is correctly choosing a sequential >> scan - but that's just hard for me to comprehend. I'm joining on an >> int4 key, 2048 per index page - I guess that's a lot of reads - then the >> data -page reads. Still, the 8-minute query time seems excessive. > > You'll be getting (many) fewer than 2048 index entries per page. There's a > page header and various pointers involved too, and index pages aren't > going to be full. So - it needs to search the table on dates, fetch the > id's and then assemble them for the hash join. Of course, if you have too > many to join then all this will spill to disk slowing you further. > > Now, you'd rather get down below 8 minutes. There are a number of options: > 1. Make sure your disk i/o is being pushed to its limit We are completely peaked out on disk io. iostat frequently shows 60% iowait time. This is quite an issue for us and I don't have any great ideas. Data is on a 3ware sata raid at raid 10 across 4 disks. I can barely even run vacuums on our largest table (lead) since it runs for a day and a half and kills our online performance while running.
> 2. Look into increasing the sort memory for this one query "set > work_mem..." (see the runtime configuration section of the manual) I haven't tried this, and I will. Thanks for the idea. > 3. Actually - are you happy that your general configuration is OK? I'm not at all. Most of the configuration changes I've tried have made almost no discernable difference. I'll post the relevant numbers in a different post - possibly you'll have some suggestions. > 4. Perhaps use a cursor - I'm guessing you want to process these > mailings in some way and only want them one at a time in any case. Where this first came up was in trying to get aggregate totals per mailing. I gave up on that and created a nightly job to create a summary table since Postgres wasn't up to the job in real time. Still, I frequently need to do the join and limit it by other criteria - and it is incredibly slow - even when the result set is smallish. > 5. Try the query one day at a time and see if the balance tips the > other way - you'll be dealing with substantially less data per query > which might match your system better. Of course, this may not be > practical for your applicaton. It is not useful. > 6. If your lead table is updated only rarely, you could try a CLUSTER > on the table by mailing_id - that should speed the scan. Read the manual > for the cluster command first though. The lead table is one of the most volatle in our system. Each day we insert tens or hundreds of thousands of rows, update almost that many, and delete a few. It is growing, and could reach 100 million rows in 8 or 9 months. We're redesigning the data structure a little so lead is not updated (updates are just too slow), but it will continue to have inserts and deletes, and we'll have to join it with the associated table being updated, which already promises to be a slow operation. We're looking at 15K rpm scsi drives for a replacement raid array. We are getting the place where it may be cheaper to convert to Oracle or DB2 than to try and make Posgres work. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match