Re: I/O clustering, Re: patches for test / review
> I agree that it is obvious for NFS, but I don't see it as being > obvious at all for (modern) disks, so for that case I would like > to see numbers. > > If running without clustering is just as fast for modern disks, > I think the clustering needs rethought. I think it should be pretty obvious, actually. Command overhead is large (and not getting much smaller), and clustering primarily serves to reduce the number of commands and thus the ratio of command time vs. data time. So unless the clustering implementation is extremely poor, it's worthwhile. -- \\ Give a man a fish, and you feed him for a day. \\ Mike Smith \\ Tell him he should learn how to fish himself, \\ [EMAIL PROTECTED] \\ and he'll hate you for a lifetime. \\ [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: I/O clustering, Re: patches for test / review
>>Committing a 64k block would require 8 times the overhead of bundling >>up the RPC as well as transmission and reply, it may be possible >>to pipeline these commits because you don't really need to wait >>for one to complete before issueing another request, but it's still >>8x the amount of traffic. > >I agree that it is obvious for NFS, but I don't see it as being >obvious at all for (modern) disks, so for that case I would like >to see numbers. > >If running without clustering is just as fast for modern disks, >I think the clustering needs rethought. Depends on the type of disk drive and how it is configured. Some drives perform badly (skip a revolution) with back-to-back writes. In all cases, without aggregation of blocks, you pay the extra cost of additional interrupts and I/O rundowns, which can be a significant factor. Also, unless the blocks were originally written by the application in a chunk, they will likely be mixed with blocks to varying locations, in which case for drives without write caching enabled, you'll have additional seeks to write the blocks out. Things like this don't show up when doing simplistic sequential write tests. -DG David Greenman Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org Creator of high-performance Internet servers - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: I/O clustering, Re: patches for test / review
:> :>I agree that it is obvious for NFS, but I don't see it as being :>obvious at all for (modern) disks, so for that case I would like :>to see numbers. :> :>If running without clustering is just as fast for modern disks, :>I think the clustering needs rethought. : : Depends on the type of disk drive and how it is configured. Some drives :perform badly (skip a revolution) with back-to-back writes. In all cases, :without aggregation of blocks, you pay the extra cost of additional interrupts :and I/O rundowns, which can be a significant factor. Also, unless the blocks :were originally written by the application in a chunk, they will likely be :mixed with blocks to varying locations, in which case for drives without :write caching enabled, you'll have additional seeks to write the blocks out. :Things like this don't show up when doing simplistic sequential write tests. : :-DG : :David Greenman :Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org I have an excellent example of this related to NFS. It's still applicable even though the NFS point has already been conceeded. As part of the performance enhancements package I extended the sequential detection heuristic to the NFS server side code and turned on clustering. On the server, mind you, not the client. Read performance went up drastically. My 100BaseTX network instantly maxed out and, more importantly, the server side cpu use went down drastically. Here is the relevant email from my archives describing the performance gains: :From: dillon :To: Alfred Perlstein <[EMAIL PROTECTED]> :Cc: Alan Cox <[EMAIL PROTECTED]>, Julian Elischer <[EMAIL PROTECTED]> :Date: Sun Dec 12 10:11:06 1999 : :... :This proposed patch allows us to maintain a sequential read heuristic :on the server side. I noticed that the NFS server side reads only 8K :blocks from the physical media even when the NFS client is reading a :file sequentially. : :With this heuristic in place I can now get 9.5 to 10 MBytes/sec reading :over NFS on a 100BaseTX network, and the server winds up being 80% :idle. Under -stable the same test runs 72% idle and 8.4 MBytes/sec. This is in spite of the fact that in this sequential test the hard drives were caching the read data ahead anyway. The reduction in command/response/interrupt overhead on the server by going from 8K read I/O's to 64K read I/O's in the sequential case made an obvious beneficial impact on the cpu. I almost halved the cpu overhead on the server! So while on-disk caching makes a lot of sense, it is in no way able to replace software clustering. Having both working together is a killer combination. -Matt Matthew Dillon <[EMAIL PROTECTED]> To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: I/O clustering, Re: patches for test / review
: :* Poul-Henning Kamp <[EMAIL PROTECTED]> [000320 12:03] wrote: :> In message <[EMAIL PROTECTED]>, Alfred Perlstein writes: :> >* Poul-Henning Kamp <[EMAIL PROTECTED]> [000320 11:45] wrote: :> >> In message <[EMAIL PROTECTED]>, Alfred Perlstein writes: :> >> :> >> >Keeping the currect cluster code is a bad idea, if the drivers were :> >> >taught how to traverse the linked list in the buf struct rather :> >> >than just notice "a big buffer" we could avoid a lot of page :> >> >twiddling and also allow for massive IO clustering ( > 64k ) :> >> :> >> Before we redesign the clustering, I would like to know if we :> >> actually have any recent benchmarks which prove that clustering :> >> is overall beneficial ? :> > :> >Yes it is really benificial. :> :> I would like to see some numbers if you have them. : :No I don't have numbers. : :Committing a 64k block would require 8 times the overhead of bundling :up the RPC as well as transmission and reply, it may be possible :to pipeline these commits because you don't really need to wait Clustering is extremely beneficial. DG and I and I think even BDE and Tor have done a lot of random tests in that area. I did a huge amount of clustering related work while optimizing NFSv3 and fixing up the random/sequential I/O heuristics for 4.0 (for both NFS and UFS). The current clustering code does a pretty good job and I would hesitate to change it at this time. The only real overhead comes from the KVA pte mappings for b_data in the pbuf that the clustering (and other) code uses. I do not think that redoing the clustering will have a beneficial result until *after* we optimize the I/O path as per my previous posting. Once we optimize the I/O path to make it more VM Object centric, it will make it a whole lot easier to remove *ALL* the artificial I/O size limitations. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: I/O clustering, Re: patches for test / review
Just as a perhaps interesting aside on this topic; it'd be quite neat for controllers that understand scatter/gather to be able to simply suck N regions of buffer cache which were due for committing directly into an S/G list... (wishlist item, I guess 8) -- \\ Give a man a fish, and you feed him for a day. \\ Mike Smith \\ Tell him he should learn how to fish himself, \\ [EMAIL PROTECTED] \\ and he'll hate you for a lifetime. \\ [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: I/O clustering, Re: patches for test / review
In message <[EMAIL PROTECTED]>, Alfred Perlstein writes: >> >> Before we redesign the clustering, I would like to know if we >> >> actually have any recent benchmarks which prove that clustering >> >> is overall beneficial ? >> > >> >Yes it is really benificial. >> >> I would like to see some numbers if you have them. > >No I don't have numbers. > >Committing a 64k block would require 8 times the overhead of bundling >up the RPC as well as transmission and reply, it may be possible >to pipeline these commits because you don't really need to wait >for one to complete before issueing another request, but it's still >8x the amount of traffic. I agree that it is obvious for NFS, but I don't see it as being obvious at all for (modern) disks, so for that case I would like to see numbers. If running without clustering is just as fast for modern disks, I think the clustering needs rethought. -- Poul-Henning Kamp FreeBSD coreteam member [EMAIL PROTECTED] "Real hackers run -current on their laptop." FreeBSD -- It will take a long time before progress goes too far! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message