On Sun, Jul 24, 2016 at 1:06 PM, Gleb Natapov <g...@scylladb.com> wrote:
> > > So it appears that unless you're using mmap(), OSv does *not* have any > > writeback buffer, not even for ZFS filesystems: If you do a write() > system > > call to write 10 bytes, you will create a 10-byte disk operation; If > > understand you correctly, there is no attempt to somehow coalesce many > > small writes to one operation, nor is there any attempt to reorder the > I/O > > operations to better fit some assumptions on disk performance. > > > No, unless you're using mmap() _our_ write page cache will not be used, > Yes, that we already understood after your previous clarifications on the bug tracker. None of Benoit's tests are using mmap(), so our pagecache.cc is irrelevant to this discussion. > instead write will go directly into zfs layer which should not generate > disk write for each write operation unless it is extremely stupid (which > I doubt) or mounted/configured incorrectly or we use some kind of wrong > write. > Ok, so I think that what Benoit was seeing is that ZFS *is* being extremely stupid and generating many small writes. But, Benoit, is it possible that ZFS *is* trying to coalesce writes, just somehow misconfigured to aim at the wrong size? What I mean is, if you do write() operations of 10 bytes, will you see 10-byte operations, or are they still coalesced to 512-byte sizes? If there is coalescing, just to 512 bytes (not 4096 as you would have hoped), then clearly ZFS can coalesce the writes but is somehow misconfigured to make them so small. > > So basically in OSv, any write() to ZFS is using O_DIRECT even if you > > didn't ask for that. > No. Not intentional anyway. > Oh. Good to know. -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.