>If we extended relations by more than 8k at a time, we would know a lot
>more about disk layout, at least on filesystems with a decent amount of
>free space.

I doubt it makes that much difference. If there was a significant amount
of fragmentation, we'd hear more complaints about seq scan performance.

The issue here is that we don't know which relations are on which drives
and controllers, how they're striped, mirrored etc.

Actually, isn't pre-allocation one of the tricks that Greenplum uses to
get it's seqscan performance?

My tests here show that, at least on reiserfs, after a few hours of benchmark torture (this represents several million write queries), table files become significantly fragmented. I believe the table and index files get extended more or less simultaneously and end up somehow a bit mixed up on disk. Seq scan perf suffers. reiserfs doesn't have an excellent fragmentation behaviour... NTFS is worse than hell in this respect. So, pre-alloc could be a good idea. Brutal Defrag (cp /var/lib/postgresql to somewhere and back) gets seq scan perf back to disk throughput.

Also, by the way, InnoDB uses a BTree organized table. The advantage is that data is always clustered on the primary key (which means you have to use something as your primary key that isn't necessary "natural", you have to choose it to get good clustering, and you can't always do it right, so it somehow, in the end, sucks rather badly). Anyway, seq-scan on InnoDB is very slow because, as the btree grows (just like postgres indexes) pages are split and scanning the pages in btree order becomes a mess of seeks. So, seq scan in InnoDB is very very slow unless periodic OPTIMIZE TABLE is applied. (caveat to the postgres TODO item "implement automatic table clustering"...)

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to