Hello Bob,

Wednesday, March 19, 2008, 11:23:58 PM, you wrote:

BF> On Wed, 19 Mar 2008, Bill Moloney wrote:

>> When application IO sizes get small, the overhead in ZFS goes
>> up dramatically.

BF> Thanks for the feedback.  However, from what I have observed, it is 
BF> not a full story at all.  On my own system, when a new file is 
BF> written, the write block size does not make a significant difference 
BF> to the write speed.  Similarly, read block size does not make a 
BF> significant difference to the sequential read speed.  I do see a 
BF> large difference in rates when an existing file is updated 
BF> sequentially.  There is a many orders of magnitude difference for 
BF> random I/O type updates.

BF> I think that there some rather obvious reasons for the difference 
BF> between writing a new file, or updating an existing file.  When 
BF> writing a new file, the system can buffer up to a disk block's worth 
BF> of size prior to issuing a a disk I/O, or it can immedialy write what 
BF> it has and since the write is sequential, it does not need to re-read 
BF> prior to write (but there may be more metadata I/Os).  For the case of
BF> updating part of a disk block, there needs to be a read prior to write
BF> if the block is not cached in RAM.


Possibly when you created a file zfs used 128KB blocks.
Then if you randomly update that file the question is - what is an
average update size? If it's below 128KB (and not aligned) you will
basically have to read the old 128KB block first and then write it to
new location.

In such scenario (like oracle databases on zfs) BEFORE you create
files set recordsize property to something smaller, ideally match your
avg. update size. In case of Oracle matching db_block_size should give
you best results most the times.


-- 
Best regards,
 Robert Milkowski                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to