On Sun, Sep 09, 2018 at 11:49:20AM -0000, Michael van Elst wrote: > [email protected] (Manuel Bouyer) writes: > > >I strongly dissagree. I have 512-byte-sectors domUs running for years on a > >dom0 with 64k blocks/8k fragements. This works ! I'm I'm probably not the > >only one, as this has been this way since we have dom0 support. > >It's very unlikely that others have requested a 512-byte fragments FFS > >for their domUs backing store. > > Still it fails when the dom0 has disks with large sectors. So we need some > kind of check.
I agree with this. > > The (traditional) buffercache definitely cannot handle varying block sizes. > Fortunately this is not used by FFS, the UVM pager obviously handles only > complete pages. Maybe that makes it work accidentally. > > The filesystem code however makes sure that filesystem I/O is done in > multiples of fragment sizes and VOP_BMAP/VOP_STRATEGY was originally > defined to work on filesystem fragments (and multiples). vnd shouldn't > do anything else. Actually I don't think so. VOP_BMAP returns the disk sector number where the fragment starts. VOP_STRATEGY does a contigous I/O of sectors, with the start and size expressed in disk sectors. Obviously VOP_BMAP will return the start of a fragment, but then we can ajust the buffer's b_blkno and b_bcount to the part of the fragment we want to read (or write). This is what we do in vnd.c This is a direct I/O so the buffer cache is not an issue here. > > >The performance penaltly of VOP_READ/VOP_WRITE is just inacceptable > >for a Xen setup. > > I'm doing some benchmarks now (just for vnd, not xen yet). I can see > a penalty of about a factor of 2 for linear I/O, much less for random I/O. On my setup it's between 5 and 10. enough to make the disks 100% busy with less than 1MB/s of real I/O (the disk itself does about 3MB/s read and 5MB/s writes, but it seems to realy disklike doing a read followed by a write to the same sector(s)). One problem is that handle_with_rdwr() does a 64k read-modify-write, whatever the size of the original I/O is (i.e. it's not using the filesystem fragment size, but the filesystem block size). But anyway, even a factor 2 is bad; at some point we were a better dom0 than linux, performnce wise. I don't think it's true any more, but we should not make it worse. -- Manuel Bouyer <[email protected]> NetBSD: 26 ans d'experience feront toujours la difference --
