This table is totally unnormalized. Normalize it and try again. You'll probably see a huge speedup. Maybe even 10x. My mantra has always been less data stored means less data to scan means faster scans.
On Thu, Jan 7, 2010 at 12:57 PM, Gurgel, Flavio <fla...@4linux.com.br>wrote: > ----- "Matthew Wakeling" <matt...@flymine.org> escreveu: > > On Thu, 7 Jan 2010, Gurgel, Flavio wrote: > > Postgres does not change a query plan according to the shared_buffers > > > > setting. It does not anticipate one step contributing to another step > > in > > this way. It does however make use of the effective_cache_size setting > > to > > estimate this effect, and that does affect the planner. > > That was what I was trying to say :) > > > In a situation like this, the opposite will be true. If you were > > accessing > > a very small part of a table, say to order by a field with a small > > limit, > > then an index can be very useful by providing the results in the > > correct > > order. However, in this case, almost the entire table has to be read. > > > > Changing the order in which it is read will mean that the disc access > > is > > no longer sequential, which will slow things down, not speed them up. > > > > The Postgres planner isn't stupid (mostly), there is probably a good > > reason why it isn't using an index scan. > > Sorry but I disagree. This is the typical case where the test has to be > made. > The results are partial, let's say 50% of the table. Considerind that the > disk is fast enough, the cost estimation of sequential and random reads are > in a proportion of 1 to 4, considering default settings in PostgreSQL. > > If, and only if the data is time distributed in the table (which can be > this case during bulk load) there will be some gain in seqscan. > If, let's say 50% of the 50% (25% of the data) is time distributed (which > can be the case in most data warehouses), the cost of random reads * number > of reads can be cheaper then seqscan. > > The volume of data doesn't turn the index generations so deep, let's say > 2^7 or 2^8. This can lead to very fast data retrieval. > > > > Ugh. I don't think so. That's why indexes were invented. PostgreSQL > > is > > > smart enough to "jump" over columns using byte offsets. > > > A better option for this table is to partition it in year (or > > year/month) chunks. > > > > Postgres (mostly) stores the columns for a row together with a row, so > > > > what you say is completely wrong. Postgres does not "jump" over > > columns > > using byte offsets in this way. The index references a row in a page > > on > > disc, and that page is fetched separately in order to retrieve the > > row. > > The expensive part is physically moving the disc head to the right > > part of > > the disc in order to fetch the correct page from the disc - jumping > > over > > columns will not help with that at all. > > If the index point to the right pages, I keep the circumstance of 1 to 4 > cost. > Agreed about seqscans. When I talked about byte offsets I was talking of > data in the same disk page, and this does not help I/O reduction at all. > > > Reducing the width of the table will greatly improve the performance > > of a > > sequential scan, as it will reduce the size of the table on disc, and > > > > therefore the time taken to read the entire table sequentially. > > I just don't understand if you're seeing this situation as OLTP or DW, > sorry. > DW tables are usually wider then OLTP. > > > Moreover, your suggestion of partitioning the table may not help much > > with > > this query. It will turn a single sequential scan into a UNION of many > > > > tables, which may be harder for the planner to plan. Also, for queries > > Partitioned plans are a collection of an independent plan for each table in > the inheritance. > If the data to be retrieved is confined in selected partitions, you won't > seqscan the partitions you don't need. > The cost of the "union" of the aggregations in memory is a lot cheaper then > the avoided seqscans. > > I have at least 3 cases of partitioning in queries exactly like this that > droped from 3min to 5s execution times. > All of that DW tables, with aggregation and huge seqscans. > > I keep my word that the right use of indexes here has to be tested. > > Flavio Henrique A. Gurgel > tel. 55-11-2125.4786 > cel. 55-11-8389.7635 > www.4linux.com.br > FREE SOFTWARE SOLUTIONS > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >