Re: [PERFORM] Air-traffic benchmark

Nikolas Everett Thu, 07 Jan 2010 10:12:09 -0800

This table is totally unnormalized.  Normalize it and try again.  You'll
probably see a huge speedup.  Maybe even 10x.  My mantra has always been
less data stored means less data to scan means faster scans.


On Thu, Jan 7, 2010 at 12:57 PM, Gurgel, Flavio <fla...@4linux.com.br>wrote:

> ----- "Matthew Wakeling" <matt...@flymine.org> escreveu:
> > On Thu, 7 Jan 2010, Gurgel, Flavio wrote:
> > Postgres does not change a query plan according to the shared_buffers
> >
> > setting. It does not anticipate one step contributing to another step
> > in
> > this way. It does however make use of the effective_cache_size setting
> > to
> > estimate this effect, and that does affect the planner.
>
> That was what I was trying to say :)
>
> > In a situation like this, the opposite will be true. If you were
> > accessing
> > a very small part of a table, say to order by a field with a small
> > limit,
> > then an index can be very useful by providing the results in the
> > correct
> > order. However, in this case, almost the entire table has to be read.
> >
> > Changing the order in which it is read will mean that the disc access
> > is
> > no longer sequential, which will slow things down, not speed them up.
> >
> > The Postgres planner isn't stupid (mostly), there is probably a good
> > reason why it isn't using an index scan.
>
> Sorry but I disagree. This is the typical case where the test has to be
> made.
> The results are partial, let's say 50% of the table. Considerind that the
> disk is fast enough, the cost estimation of sequential and random reads are
> in a proportion of 1 to 4, considering default settings in PostgreSQL.
>
> If, and only if the data is time distributed in the table (which can be
> this case during bulk load) there will be some gain in seqscan.
> If, let's say 50% of the 50% (25% of the data) is time distributed (which
> can be the case in most data warehouses), the cost of random reads * number
> of reads can be cheaper then seqscan.
>
> The volume of data doesn't turn the index generations so deep, let's say
> 2^7 or 2^8. This can lead to very fast data retrieval.
>
> > > Ugh. I don't think so. That's why indexes were invented. PostgreSQL
> > is
> > > smart enough to "jump" over columns using byte offsets.
> > > A better option for this table is to partition it in year (or
> > year/month) chunks.
> >
> > Postgres (mostly) stores the columns for a row together with a row, so
> >
> > what you say is completely wrong. Postgres does not "jump" over
> > columns
> > using byte offsets in this way. The index references a row in a page
> > on
> > disc, and that page is fetched separately in order to retrieve the
> > row.
> > The expensive part is physically moving the disc head to the right
> > part of
> > the disc in order to fetch the correct page from the disc - jumping
> > over
> > columns will not help with that at all.
>
> If the index point to the right pages, I keep the circumstance of 1 to 4
> cost.
> Agreed about seqscans. When I talked about byte offsets I was talking of
> data in the same disk page, and this does not help I/O reduction at all.
>
> > Reducing the width of the table will greatly improve the performance
> > of a
> > sequential scan, as it will reduce the size of the table on disc, and
> >
> > therefore the time taken to read the entire table sequentially.
>
> I just don't understand if you're seeing this situation as OLTP or DW,
> sorry.
> DW tables are usually wider then OLTP.
>
> > Moreover, your suggestion of partitioning the table may not help much
> > with
> > this query. It will turn a single sequential scan into a UNION of many
> >
> > tables, which may be harder for the planner to plan. Also, for queries
>
> Partitioned plans are a collection of an independent plan for each table in
> the inheritance.
> If the data to be retrieved is confined in selected partitions, you won't
> seqscan the partitions you don't need.
> The cost of the "union" of the aggregations in memory is a lot cheaper then
> the avoided seqscans.
>
> I have at least 3 cases of partitioning in queries exactly like this that
> droped from 3min to 5s execution times.
> All of that DW tables, with aggregation and huge seqscans.
>
> I keep my word that the right use of indexes here has to be tested.
>
> Flavio Henrique A. Gurgel
> tel. 55-11-2125.4786
> cel. 55-11-8389.7635
> www.4linux.com.br
> FREE SOFTWARE SOLUTIONS
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Air-traffic benchmark

Reply via email to