Re: [PERFORM] large tables and simple "= constant" queries using indexes

John Beaver Thu, 10 Apr 2008 07:45:49 -0700

Thanks a lot, all of you - this is excellent advice. With the dataclustered and statistics at a more reasonable value of 100, it nowreproducibly takes even less time - 20-57 ms per query.

After reading the section on "Statistics Used By the Planner" in themanual, I was a little concerned that, while the statistics sped up thequeries that I tried immeasurably, that the most_common_vals array waswhere the speedup was happening, and that the values which wouldn't fitin this array wouldn't be sped up. Though I couldn't offhand find anexample where this occurred, the clustering approach seems intuitivelylike a much more complete and scalable solution, at least for aread-only table like this.

As to whether the entire index/table was getting into ram between mystatistics calls, I don't think this was the case. Here's the behaviorthat I found:- With statistics at 10, the query took 25 (or so) seconds no matter howmany times I tried different values. The query plan was the same as forthe 200 and 800 statistics below.- Trying the same constant a second time gave an instantaneous result,I'm guessing because of query/result caching.- Immediately on increasing the statistics to 200, the query took areproducibly less amount of time. I tried about 10 different values- Immediately on increasing the statistics to 800, the queryreproducibly took less than a second every time. I tried about 30different values.- Decreasing the statistics to 100 and running the cluster commandbrought it to 57 ms per query.- The Activity Monitor (OSX) lists the relevant postgres process astaking a little less than 500 megs.- I didn't try decreasing the statistics back to 10 before I ran thecluster command, so I can't show the search times going up because ofthat. But I tried killing the 500 meg process. The new process uses lessthan 5 megs of ram, and still reproducibly returns a result in less than60 ms. Again, this is with a statistics value of 100 and the dataclustered by gene_prediction_view_gene_ref_key.

And I'll consider the idea of using triggers with an ancillary table forother purposes; seems like it could be a useful solution for something.


Matthew wrote:

On Thu, 10 Apr 2008, PFC wrote:

... Lots of useful advice ...
- If you often query rows with the same gene_ref, consider usingCLUSTER to physically group those rows on disk. This way you can getall rows with the same gene_ref in 1 seek instead of 2000. Clusteredtables also make Bitmap scan happy.
In my opinion this is the one that will make the most difference. Youwill need to run:
CLUSTER gene_prediction_view USING gene_prediction_view_gene_ref_key;
after you insert significant amounts of data into the table. Thisre-orders the table according to the index, but new data is alwayswritten out of order, so after adding lots more data the table willneed to be re-clustered again.
- Switch to a RAID10 (4 times the IOs per second, however zero gainif you're single-threaded, but massive gain when concurrent)
Greg Stark has a patch in the pipeline that will change this, forbitmap index scans, by using fadvise(), so a single thread can utilisemultiple discs in a RAID array.
Matthew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] large tables and simple "= constant" queries using indexes

Reply via email to