Re: calling all fs experts
Hi, Reference: From: Maksim Yevmenkin maksim.yevmen...@gmail.com Maksim Yevmenkin wrote: Hello, i have a question for fs wizards. There is a list for them: freebsd...@freebsd.org Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Reply below not above, cumulative like a play script, indent with . Format: Plain text. Not HTML, multipart/alternative, base64, quoted-printable. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: calling all fs experts
On Sat, Dec 10, 2011 at 05:42:01PM -0800, Maksim Yevmenkin wrote: Hello, i have a question for fs wizards. suppose i can persuade modern spinning disk to do large reads (say 512K to 1M) at a time. also, suppose file system on such modern spinning drive is used to store large files (tens to hundreds of megabytes). is there any way i can tweak the file system parameters (block size, layout, etc) to help it to get as close to disk's sequential read rate as possible. I understand that i will not be able to get 100MB/sec single client sequential read rate, but, can i get it into sustained 40-50MB/sec rate? also, can i reduce performance impact caused by small reads such as directory access etc. If you wanted to get responses from experts only, sorry in advance. The fs (AKA UFS) uses clustering provided by the block cache. The clustering code, mainly located in the kern/vfs_cluster.c, coalesces sequence of reads or writes that are targeting the consequtive blocks, into single physical read or write of the maximal size of MAXPHYS. Current definition of MAXPHYS is 128KB. Clustering allows filesystem to improve the layout of the files by calling VOP_REALLOCBLKS() to redo the allocation to make the writing sequence of blocks sequential if it is not. Even if file is not layed out ideally, or the i/o pattern is random, most writes scheduled are asynchronous, and for reads, the system tries to schedule read-aheads for some limited number of blocks. This allows the lower layers, i.e. geom and disk drivers, to optimize the i/o queue to coalesce requests that are consequitive on disk, but not on the queue. BTW, some time ago I was interested in the effect on the fragmentation on UFS, due to some semi-abandoned patch, which could make the fragmentation worse. I wrote the tool that calculated the percentage of non-consequtive spots in the whole filesystem. Apparently, even under the hard load consisting of writing a lot of files under the megabytes in size, UFS managed to keep the number of spots under 2-3% on sufficiently free volume. pgpg2apEuMeNy.pgp Description: PGP signature
Re: calling all fs experts
--- Dom 11/12/11, Kostik Belousov kostik...@gmail.com ha scritto: If you wanted to get responses from experts only, sorry in advance. I am no fs expert but just thought I'd mention some things based on my playing with the BSD ext2fs ... The fs (AKA UFS) uses clustering provided by the block cache. The clustering code, mainly located in the kern/vfs_cluster.c, coalesces sequence of reads or writes that are targeting the consequtive blocks, into single physical read or write of the maximal size of MAXPHYS. Current definition of MAXPHYS is 128KB. The clustering code is really cool and the idea is that it gives UFS the advantages of an extent based fs. I haven't seen benchmarks in UFS2 but on ext2 it didn't seem to work as it should though. One issue is that ext2 doesn't support fragments and as a consequence ext2 will not use big blocksizes. This is a limitation in the ext2 design that UFS doesn't have, but still linux's ext2fs outperforms UFS in async mode (we do shine in sync mode). It was never clear exactly why this happens but it would appear there is a bottleneck in geom that is not good in writing many contiguous blocks. Clustering allows filesystem to improve the layout of the files by calling VOP_REALLOCBLKS() to redo the allocation to make the writing sequence of blocks sequential if it is not. Even if file is not layed out ideally, or the i/o pattern is random, most writes scheduled are asynchronous, and for reads, the system tries to schedule read-aheads for some limited number of blocks. This allows the lower layers, i.e. geom and disk drivers, to optimize the i/o queue to coalesce requests that are consequitive on disk, but not on the queue. BTW, some time ago I was interested in the effect on the fragmentation on UFS, due to some semi-abandoned patch, which could make the fragmentation worse. I wrote the tool that calculated the percentage of non-consequtive spots in the whole filesystem. Apparently, even under the hard load consisting of writing a lot of files under the megabytes in size, UFS managed to keep the number of spots under 2-3% on sufficiently free volume. Yes, the realloc_blk code is very efficient in that. In fact it is so good it actually hides some inefficient operations in UFS. Bruce had a patch for this that I cc'd to Kirk but the difference was not big because the realloc_blk code does it's job in memory. Zheng Liu did the reallocation thing for ext2fs and it gave better results than preallocation but the results are not as spectacular as in UFS (the UFS code takes advantage of fragments there too). I do expect to commit it (kern/159233) once my mentor reviews and approves it. cheers, Pedro. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
calling all fs experts
Hello, i have a question for fs wizards. suppose i can persuade modern spinning disk to do large reads (say 512K to 1M) at a time. also, suppose file system on such modern spinning drive is used to store large files (tens to hundreds of megabytes). is there any way i can tweak the file system parameters (block size, layout, etc) to help it to get as close to disk's sequential read rate as possible. I understand that i will not be able to get 100MB/sec single client sequential read rate, but, can i get it into sustained 40-50MB/sec rate? also, can i reduce performance impact caused by small reads such as directory access etc. thanks, max ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org