Re: [PERFORM] Block at a time ...

Scott Carey Fri, 26 Mar 2010 17:28:29 -0700

On Mar 22, 2010, at 4:46 PM, Craig James wrote:

> On 3/22/10 11:47 AM, Scott Carey wrote:
>> 
>> On Mar 17, 2010, at 9:41 AM, Craig James wrote:
>> 
>>> On 3/17/10 2:52 AM, Greg Stark wrote:
>>>> On Wed, Mar 17, 2010 at 7:32 AM, Pierre C<[email protected]>   wrote:
>>>>>> I was thinking in something like that, except that the factor I'd use
>>>>>> would be something like 50% or 100% of current size, capped at (say) 1 
>>>>>> GB.
>>>> 
>>>> This turns out to be a bad idea. One of the first thing Oracle DBAs
>>>> are told to do is change this default setting to allocate some
>>>> reasonably large fixed size rather than scaling upwards.
>>>> 
>>>> This might be mostly due to Oracle's extent-based space management but
>>>> I'm not so sure. Recall that the filesystem is probably doing some
>>>> rounding itself. If you allocate 120kB it's probably allocating 128kB
>>>> itself anyways. Having two layers rounding up will result in odd
>>>> behaviour.
>>>> 
>>>> In any case I was planning on doing this a while back. Then I ran some
>>>> experiments and couldn't actually demonstrate any problem. ext2 seems
>>>> to do a perfectly reasonable job of avoiding this problem. All the
>>>> files were mostly large contiguous blocks after running some tests --
>>>> IIRC running pgbench.
>>> 
>>> This is one of the more-or-less solved problems in Unix/Linux.  Ext* file 
>>> systems have a "reserve" usually of 10% of the disk space that nobody 
>>> except root can use.  It's not for root, it's because with 10% of the disk 
>>> free, you can almost always do a decent job of allocating contiguous blocks 
>>> and get good performance.  Unless Postgres has some weird problem that 
>>> Linux has never seen before (and that wouldn't be unprecedented...), 
>>> there's probably no need to fool with file-allocation strategies.
>>> 
>>> Craig
>>> 
>> 
>> Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
>> tables being written to at once.  Result?  Fragmented tables.
> 
> Is this from real-life experience?  With fragmentation, there's a point of 
> diminishing return.  A couple head-seeks now and then hardly matter.  My 
> recollection is that even when there are lots of concurrent processes running 
> that are all making files larger and larger, the Linux file system still can 
> do a pretty good job of allocating mostly-contiguous space.  It doesn't just 
> dumbly allocate from some list, but rather tries to allocate in a way that 
> results in pretty good "contiguousness" (if that's a word).
> 
> On the other hand, this is just from reading discussion groups like this one 
> over the last few decades, I haven't tried it...
>


Well how fragmented is too fragmented depends on the use case and the hardware 
capability.  In real world use, which for me means about 20 phases of large 
bulk inserts a day and not a lot of updates or index maintenance, the system 
gets somewhat fragmented but its not too bad.  I did a dump/restore in 8.4 with 
parallel restore and it was much slower than usual.  I did a single threaded 
restore and it was much faster.  The dev environments are on ext3 and we see 
this pretty clearly -- but poor OS tuning can mask it (readahead parameter not 
set high enough).   This is CentOS 5.4/5.3, perhaps later kernels are better at 
scheduling file writes to avoid this.  We also use the deadline scheduler which 
helps a lot on concurrent reads, but might be messing up concurrent writes.
On production with xfs this was also bad at first --- in fact worse because 
xfs's default 'allocsize' setting is 64k.  So files were regularly fragmented 
in small multiples of 64k.   Changing the 'allocsize' parameter to 80MB made 
the restore process produce files with fragment sizes of 80MB.  80MB is big for 
most systems, but this array does over 1000MB/sec sequential read at peak, and 
only 200MB/sec with moderate fragmentation.
It won't fail to allocate disk space due to any 'reservations' of the delayed 
allocation, it just means that it won't choose to create a new file or extent 
within 80MB of another file that is open unless it has to.  This can cause 
performance problems if you have lots of small files, which is why the default 
is 64k.



> Craig



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Block at a time ...

Reply via email to