On Sun, Jul 24, 2016 at 12:48:57PM +0300, Nadav Har'El wrote:
> On Sun, Jul 24, 2016 at 12:33 PM, Gleb Natapov <g...@scylladb.com> wrote:
> 
> > On Sun, Jul 24, 2016 at 11:27:02AM +0200, Benoît Canet wrote:
> > > ZFS is also used on ubuntu.
> > >
> > So what? It does not changes the fact that writing using write() system
> > call does not go through out write page cache on OSv. It does go through
> > Linux' page cache on Linux.
> >
> 
> I think this was precisely Benoit's question...
> 
> So it appears that unless you're using mmap(), OSv does *not* have any
> writeback buffer, not even for ZFS filesystems: If you do a write() system
> call to write 10 bytes, you will create a 10-byte disk operation; If
> understand you correctly, there is no attempt to somehow coalesce many
> small writes to one operation, nor is there any attempt to reorder the I/O
> operations to better fit some assumptions on disk performance.
> 
No, unless you're using mmap() _our_ write page cache will not be used,
instead write will go directly into zfs layer which should not generate
disk write for each write operation unless it is extremely stupid (which
I doubt) or mounted/configured incorrectly or we use some kind of wrong
write.

> So basically in OSv, any write() to ZFS is using O_DIRECT even if you
> didn't ask for that.
No. Not intentional anyway.

> 
> How do things work on Solaris with ZFS? Wouldn't Solaris expect ZFS to
No idea. From the look of it ZFS was designed by some storage neckbeards
that moved to work on it directly from designing a state of the art tape
storage system (somewhere in 60th) and modern concepts like mmaped files
or page caches were completely alien to them.

My guess is that after Solaris developers recovered from the initial shock
they went into when their management ordered them to integrate this
thing, that meant to be sitting in some closed networking storage appliance,
into modern operation system then did what BSD devs (and my guess Linux
too) did - use OS page cache and do not integrate with ARC in any way.

> contain a write cache just like it holds a read cache (the ARC)? Moreover,
> these two should obviously be the same cache: if you write() to a file and
> immediately read() the same bytes, you can get the answer from the cache.
They do not have to be the same cache if writes and reads go to OS's
page cache only.

> 
> 
> >
> > > What make me think about page cache is the fact that ubuntu issue minimal
> > > write of 4K.
> > On Linux every write (if not O_DIRECT) goes through the page cache.
> >
> > >
> > > They store their data on an NFS server.
> > You mean VM's disk is on NFS server and they enables cache = none?
> > Otherwise tcpdump is meaningless.
> >
> 
> Yes, I believe that's what he meant. It's a kind of non-optimal setup, but
> it's still not very fun when OSv is 7 times slower than Linux on such a
> setup which wasn't particularly speedy to begin with.

--
                        Gleb.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to