I/O clustering, Re: patches for test / review

2000-03-20 Thread Alfred Perlstein

* Poul-Henning Kamp [EMAIL PROTECTED] [000320 12:03] wrote:
 In message [EMAIL PROTECTED], Alfred Perlstein writes:
 * Poul-Henning Kamp [EMAIL PROTECTED] [000320 11:45] wrote:
  In message [EMAIL PROTECTED], Alfred Perlstein writes:
  
  Keeping the currect cluster code is a bad idea, if the drivers were
  taught how to traverse the linked list in the buf struct rather
  than just notice "a big buffer" we could avoid a lot of page
  twiddling and also allow for massive IO clustering (  64k ) 
  
  Before we redesign the clustering, I would like to know if we
  actually have any recent benchmarks which prove that clustering
  is overall beneficial ?
 
 Yes it is really benificial.
 
 I would like to see some numbers if you have them.

No I don't have numbers.

Committing a 64k block would require 8 times the overhead of bundling
up the RPC as well as transmission and reply, it may be possible
to pipeline these commits because you don't really need to wait
for one to complete before issueing another request, but it's still
8x the amount of traffic.

You also complicate and penalize drivers because not all drivers
can add an IO request to an already started transaction, those
devices will need to start new transactions for each buffer instead
of bundling up the list and passing it all along.

Maybe I'm missing something.

Is there something to provide a clean way to cluster IO, can you
suggest something that won't have this sort of impact on NFS (and
elsewhere) if the clustering code was removed?

Bruce, what part of the clustering code makes you think of it as
hurting us, I thought it was mapping code?

 --
 Poul-Henning Kamp FreeBSD coreteam member
 [EMAIL PROTECTED]   "Real hackers run -current on their laptop."
 FreeBSD -- It will take a long time before progress goes too far!

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Alfred Perlstein writes:

  Before we redesign the clustering, I would like to know if we
  actually have any recent benchmarks which prove that clustering
  is overall beneficial ?
 
 Yes it is really benificial.
 
 I would like to see some numbers if you have them.

No I don't have numbers.

Committing a 64k block would require 8 times the overhead of bundling
up the RPC as well as transmission and reply, it may be possible
to pipeline these commits because you don't really need to wait
for one to complete before issueing another request, but it's still
8x the amount of traffic.

I agree that it is obvious for NFS, but I don't see it as being
obvious at all for (modern) disks, so for that case I would like
to see numbers.

If running without clustering is just as fast for modern disks,
I think the clustering needs rethought.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread Matthew Dillon


:
:* Poul-Henning Kamp [EMAIL PROTECTED] [000320 12:03] wrote:
: In message [EMAIL PROTECTED], Alfred Perlstein writes:
: * Poul-Henning Kamp [EMAIL PROTECTED] [000320 11:45] wrote:
:  In message [EMAIL PROTECTED], Alfred Perlstein writes:
:  
:  Keeping the currect cluster code is a bad idea, if the drivers were
:  taught how to traverse the linked list in the buf struct rather
:  than just notice "a big buffer" we could avoid a lot of page
:  twiddling and also allow for massive IO clustering (  64k ) 
:  
:  Before we redesign the clustering, I would like to know if we
:  actually have any recent benchmarks which prove that clustering
:  is overall beneficial ?
: 
: Yes it is really benificial.
: 
: I would like to see some numbers if you have them.
:
:No I don't have numbers.
:
:Committing a 64k block would require 8 times the overhead of bundling
:up the RPC as well as transmission and reply, it may be possible
:to pipeline these commits because you don't really need to wait

Clustering is extremely beneficial.  DG and I and I think even BDE and
Tor have done a lot of random tests in that area.  I did a huge amount
of clustering related work while optimizing NFSv3 and fixing up the
random/sequential I/O heuristics for 4.0 (for both NFS and UFS).

The current clustering code does a pretty good job and I would hesitate
to change it at this time.  The only real overhead comes from the KVA
pte mappings for b_data in the pbuf that the clustering (and other)
code uses.  I do not think that redoing the clustering will have 
a beneficial result until *after* we optimize the I/O path as per
my previous posting.

Once we optimize the I/O path to make it more VM Object centric, it
will make it a whole lot easier to remove *ALL* the artificial I/O size
limitations.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread Matthew Dillon

:
:I agree that it is obvious for NFS, but I don't see it as being
:obvious at all for (modern) disks, so for that case I would like
:to see numbers.
:
:If running without clustering is just as fast for modern disks,
:I think the clustering needs rethought.
:
:   Depends on the type of disk drive and how it is configured. Some drives
:perform badly (skip a revolution) with back-to-back writes. In all cases,
:without aggregation of blocks, you pay the extra cost of additional interrupts
:and I/O rundowns, which can be a significant factor. Also, unless the blocks
:were originally written by the application in a chunk, they will likely be
:mixed with blocks to varying locations, in which case for drives without
:write caching enabled, you'll have additional seeks to write the blocks out.
:Things like this don't show up when doing simplistic sequential write tests.
:
:-DG
:
:David Greenman
:Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org

   I have an excellent example of this related to NFS.  It's still applicable
   even though the NFS point has already been conceeded.

   As part of the performance enhancements package I extended the sequential
   detection heuristic to the NFS server side code and turned on clustering.
   On the server, mind you, not the client.

   Read performance went up drastically.  My 100BaseTX network instantly
   maxed out and, more importantly, the server side cpu use went down
   drastically.  Here is the relevant email from my archives describing the
   performance gains:

:From:   dillon
:To:   Alfred Perlstein [EMAIL PROTECTED]
:Cc:   Alan Cox [EMAIL PROTECTED], Julian Elischer [EMAIL PROTECTED]
:Date: Sun Dec 12 10:11:06 1999
:
:...
:This proposed patch allows us to maintain a sequential read heuristic
:on the server side.  I noticed that the NFS server side reads only 8K
:blocks from the physical media even when the NFS client is reading a
:file sequentially.
:
:With this heuristic in place I can now get 9.5 to 10 MBytes/sec reading
:over NFS on a 100BaseTX network, and the server winds up being 80% 
:idle.  Under -stable the same test runs 72% idle and 8.4 MBytes/sec.

This is in spite of the fact that in this sequential test the hard
drives were caching the read data ahead anyway.  The reduction in
command/response/interrupt overhead on the server by going from 8K read
I/O's to 64K read I/O's in the sequential case made an obvious beneficial
impact on the cpu.  I almost halved the cpu overhead on the server!

So while on-disk caching makes a lot of sense, it is in no way able
to replace software clustering.  Having both working together is a
killer combination.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread Mike Smith


Just as a perhaps interesting aside on this topic; it'd be quite 
neat for controllers that understand scatter/gather to be able to 
simply suck N regions of buffer cache which were due for committing 
directly into an S/G list...

(wishlist item, I guess 8)

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread David Greenman

Committing a 64k block would require 8 times the overhead of bundling
up the RPC as well as transmission and reply, it may be possible
to pipeline these commits because you don't really need to wait
for one to complete before issueing another request, but it's still
8x the amount of traffic.

I agree that it is obvious for NFS, but I don't see it as being
obvious at all for (modern) disks, so for that case I would like
to see numbers.

If running without clustering is just as fast for modern disks,
I think the clustering needs rethought.

   Depends on the type of disk drive and how it is configured. Some drives
perform badly (skip a revolution) with back-to-back writes. In all cases,
without aggregation of blocks, you pay the extra cost of additional interrupts
and I/O rundowns, which can be a significant factor. Also, unless the blocks
were originally written by the application in a chunk, they will likely be
mixed with blocks to varying locations, in which case for drives without
write caching enabled, you'll have additional seeks to write the blocks out.
Things like this don't show up when doing simplistic sequential write tests.

-DG

David Greenman
Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
Creator of high-performance Internet servers - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: I/O clustering, Re: patches for test / review

2000-03-20 Thread Mike Smith

 I agree that it is obvious for NFS, but I don't see it as being
 obvious at all for (modern) disks, so for that case I would like
 to see numbers.
 
 If running without clustering is just as fast for modern disks,
 I think the clustering needs rethought.

I think it should be pretty obvious, actually.  Command overhead is large 
(and not getting much smaller), and clustering primarily serves to reduce 
the number of commands and thus the ratio of command time vs. data time.

So unless the clustering implementation is extremely poor, it's 
worthwhile.
-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime. \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message