Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Liu Yuan
On 01/11/2013 03:52 PM, MORITA Kazutaka wrote:
 At Thu, 10 Jan 2013 13:38:16 +0800,
 Liu Yuan wrote:

 On 01/09/2013 11:10 PM, Paolo Bonzini wrote:
 Il 09/01/2013 14:04, Liu Yuan ha scritto:
   2 The upper layer software which relies on the 'cache=xxx' to choose
 cache mode will fail its assumption against new QEMU.

 Which assumptions do you mean? As far as I can say the behaviour hasn't
 changed, except possibly for the performance.

 When users set 'cache=writethrough' to export only a writethrough cache
 to Guest, but with new QEMU, it will actually get a writeback cache as
 default.

 They get a writeback cache implementation-wise, but they get a
 writethrough cache safety-wise.  How the cache is implemented doesn't
 matter, as long as it looks like a writethrough cache.


 In fact, consider a local disk that doesn't support FUA.  In old QEMU,
 images used to be opened with O_DSYNC and that splits each write into
 WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
 flushes are created.  Old QEMU changes it in the kernel, new QEMU
 changes it in userspace.

 We don't need to communicate to the guest. I think 'cache=xxx' means
 what kind of cache the users *expect* to export to Guest OS. So if
 cache=writethrough set, Guest OS couldn't turn it to writeback cache
 magically. This is like I bought a disk with 'writethrough' cache
 built-in, I didn't expect that it turned to be a disk with writeback
 cache under the hood which could possible lose data when power outage
 happened.

 It's not by magic.  It's by explicitly requesting the disk to do this.

 Perhaps it's a bug that the cache mode is not reset when the machine is
 reset.  I haven't checked that, but it would be a valid complaint.


 Ah I didn't get the current implementation right. I tried the 3.7 kernel
 and it works as expected (cache=writethrough result in a 'writethrough'
 cache in the guest).

 It looks fine to me to emulate writethrough as writeback + flush, since
 the profermance drop isn't big, though sheepdog itself support true
 writethrough cache (no flush).
 
 Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests
 when bdrv_enable_write_cache() is false?  Then the requests behave
 like FUA writes and we can safely omit succeeding flush requests.
 

Let's first implement a emulated writethroughc cache and then look for a
methond if we can play tricks to implement a true writethrough cache.
This would bring complexity such as before we drop SD_FLAG_CMD_CACHE, we
need to flush beforehand. And more, bdrv_enable_write_cache() is always
true for now, I guess this need generic change block layer to get the
writethrough/writeback hints.

Thanks,
Yuan




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Paolo Bonzini
 bdrv_enable_write_cache() is always
 true for now, I guess this need generic change block layer to get the
 writethrough/writeback hints.

That's not correct.  Try hdparm with an IDE disk, or sysfs with a SCSI
disk (recent kernels will also support sysfs with an IDE or virtio-blk
disk) and you'll see that it changes.

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Liu Yuan
On 01/11/2013 05:00 PM, Paolo Bonzini wrote:
 That's not correct.  Try hdparm with an IDE disk, or sysfs with a SCSI
 disk (recent kernels will also support sysfs with an IDE or virtio-blk
 disk) and you'll see that it changes.

Okay, at least at startup, even with cache=writehtough set, I saw
bdrv_enable_write_cache() returns true from sd_open(). So you mean this
will be reset later (after sd_open())?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Kevin Wolf
Am 11.01.2013 08:52, schrieb MORITA Kazutaka:
 At Thu, 10 Jan 2013 13:38:16 +0800,
 Liu Yuan wrote:

 On 01/09/2013 11:10 PM, Paolo Bonzini wrote:
 Il 09/01/2013 14:04, Liu Yuan ha scritto:
   2 The upper layer software which relies on the 'cache=xxx' to choose
 cache mode will fail its assumption against new QEMU.

 Which assumptions do you mean? As far as I can say the behaviour hasn't
 changed, except possibly for the performance.

 When users set 'cache=writethrough' to export only a writethrough cache
 to Guest, but with new QEMU, it will actually get a writeback cache as
 default.

 They get a writeback cache implementation-wise, but they get a
 writethrough cache safety-wise.  How the cache is implemented doesn't
 matter, as long as it looks like a writethrough cache.


 In fact, consider a local disk that doesn't support FUA.  In old QEMU,
 images used to be opened with O_DSYNC and that splits each write into
 WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
 flushes are created.  Old QEMU changes it in the kernel, new QEMU
 changes it in userspace.

 We don't need to communicate to the guest. I think 'cache=xxx' means
 what kind of cache the users *expect* to export to Guest OS. So if
 cache=writethrough set, Guest OS couldn't turn it to writeback cache
 magically. This is like I bought a disk with 'writethrough' cache
 built-in, I didn't expect that it turned to be a disk with writeback
 cache under the hood which could possible lose data when power outage
 happened.

 It's not by magic.  It's by explicitly requesting the disk to do this.

 Perhaps it's a bug that the cache mode is not reset when the machine is
 reset.  I haven't checked that, but it would be a valid complaint.


 Ah I didn't get the current implementation right. I tried the 3.7 kernel
 and it works as expected (cache=writethrough result in a 'writethrough'
 cache in the guest).

 It looks fine to me to emulate writethrough as writeback + flush, since
 the profermance drop isn't big, though sheepdog itself support true
 writethrough cache (no flush).
 
 Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests
 when bdrv_enable_write_cache() is false?  Then the requests behave
 like FUA writes and we can safely omit succeeding flush requests.

First we would need to make sure that on a writeback - writethrough
switch a flush happens. But once this is implemented, I think your
suggestion would work well.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Paolo Bonzini


- Messaggio originale -
 Da: Liu Yuan namei.u...@gmail.com
 A: Paolo Bonzini pbonz...@redhat.com
 Cc: Kevin Wolf kw...@redhat.com, Stefan Hajnoczi stefa...@gmail.com, 
 qemu-devel@nongnu.org, MORITA Kazutaka
 morita.kazut...@lab.ntt.co.jp
 Inviato: Venerdì, 11 gennaio 2013 10:04:23
 Oggetto: Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics
 
 On 01/11/2013 05:00 PM, Paolo Bonzini wrote:
  That's not correct.  Try hdparm with an IDE disk, or sysfs with a
  SCSI disk (recent kernels will also support sysfs with an IDE or
  virtio-blk disk) and you'll see that it changes.
 
 Okay, at least at startup, even with cache=writehtough set, I saw
 bdrv_enable_write_cache() returns true from sd_open(). So you mean
 this will be reset later (after sd_open())?

It is always true for the protocol.  It is not always true for the format.

This is correct.  qcow2 can do some writes in writeback mode, but they
will always be followed by a flush before the write is reported to the
guest.  It is a bit faster, and does not lose any correctness.

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Liu Yuan
On 01/11/2013 05:34 PM, Paolo Bonzini wrote:
 It is always true for the protocol.  It is not always true for the format.
 

So you mean bdrv_enable_write_cache() always return true for Sheepdog at
any time?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-11 Thread Paolo Bonzini
 On 01/11/2013 05:34 PM, Paolo Bonzini wrote:
  It is always true for the protocol.  It is not always true for the
  format.
 
 So you mean bdrv_enable_write_cache() always return true for Sheepdog
 at any time?

Yes, at least now.  You could in principle have a format that calls
bdrv_set_enable_write_cache(bs-file, ...), but nobody does it yet.

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread Jamie Lokier
Kevin Wolf wrote:
 Am 08.01.2013 11:39, schrieb Liu Yuan:
  On 01/08/2013 06:00 PM, Kevin Wolf wrote:
  Am 08.01.2013 10:45, schrieb Liu Yuan:
  On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
  Otherwise use sheepdog writeback and let QEMU block.c decide when to
  flush.  Never use sheepdog writethrough because it's redundant here.
 
  I don't get it. What do you mean by 'redundant'? If we use virtio 
  sheepdog block driver, how can we specify writethrough mode for Sheepdog
  cache? Here 'writethrough' means use a pure read cache, which doesn't
  need flush at all.
 
  A writethrough cache is equivalent to a write-back cache where each
  write is followed by a flush. qemu makes sure to send these flushes, so
  there is no need use Sheepdog's writethrough mode.
  
  Implement writethrough as writeback + flush will cause considerable
  overhead for network block device like Sheepdog: a single write request
  will be executed as two requests: write + flush
 
 Yeah, maybe we should have some kind of a FUA flag with write requests
 instead of sending a separate flush.

Note that write+FUA has different semantics than write+flush, at least
with regular disks.

write+FUA commits just what was written, while write+flush commits
everything that was written before.

-- Jamie



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread Kevin Wolf
Am 10.01.2013 16:12, schrieb Jamie Lokier:
 Kevin Wolf wrote:
 Am 08.01.2013 11:39, schrieb Liu Yuan:
 On 01/08/2013 06:00 PM, Kevin Wolf wrote:
 Am 08.01.2013 10:45, schrieb Liu Yuan:
 On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.

 I don't get it. What do you mean by 'redundant'? If we use virtio 
 sheepdog block driver, how can we specify writethrough mode for Sheepdog
 cache? Here 'writethrough' means use a pure read cache, which doesn't
 need flush at all.

 A writethrough cache is equivalent to a write-back cache where each
 write is followed by a flush. qemu makes sure to send these flushes, so
 there is no need use Sheepdog's writethrough mode.

 Implement writethrough as writeback + flush will cause considerable
 overhead for network block device like Sheepdog: a single write request
 will be executed as two requests: write + flush

 Yeah, maybe we should have some kind of a FUA flag with write requests
 instead of sending a separate flush.
 
 Note that write+FUA has different semantics than write+flush, at least
 with regular disks.
 
 write+FUA commits just what was written, while write+flush commits
 everything that was written before.

True. However, when you use it for implementing a writethrough mode,
i.e. every single write has the FUA flag set, then it ought to be the
same. One thing to take care of might be doing one explicit flush when
switching from writeback to writethrough.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread Jamie Lokier
Paolo Bonzini wrote:
 Il 09/01/2013 14:04, Liu Yuan ha scritto:
  2 The upper layer software which relies on the 'cache=xxx' to choose
cache mode will fail its assumption against new QEMU.
   
   Which assumptions do you mean? As far as I can say the behaviour hasn't
   changed, except possibly for the performance.
 
  When users set 'cache=writethrough' to export only a writethrough cache
  to Guest, but with new QEMU, it will actually get a writeback cache as
  default.
 
 They get a writeback cache implementation-wise, but they get a
 writethrough cache safety-wise.  How the cache is implemented doesn't
 matter, as long as it looks like a writethrough cache.
 
 In fact, consider a local disk that doesn't support FUA.  In old QEMU,
 images used to be opened with O_DSYNC and that splits each write into
 WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
 flushes are created.  Old QEMU changes it in the kernel, new QEMU
 changes it in userspace.
 
  We don't need to communicate to the guest. I think 'cache=xxx' means
  what kind of cache the users *expect* to export to Guest OS. So if
  cache=writethrough set, Guest OS couldn't turn it to writeback cache
  magically. This is like I bought a disk with 'writethrough' cache
  built-in, I didn't expect that it turned to be a disk with writeback
  cache under the hood which could possible lose data when power outage
  happened.
 
 It's not by magic.  It's by explicitly requesting the disk to do this.
 
 Perhaps it's a bug that the cache mode is not reset when the machine is
 reset.  I haven't checked that, but it would be a valid complaint.

The question is, is cache=writeback/cache=writethrough an initial
setting of guest-visible WCE that the guest is allowed to change, or
is cache=writeththrough a way of saying don't have a write cache
(which may or may not be reflected in the guest-visible disk id).

I couldn't tell from QEMU documentation which is intended.  It would
be a bit silly if it means different things for different backend
storage.

I have seen (obscure) guest code which toggled WCE to simulate FUA,
and there is plenty of advice out there saying to set WCE=0 for
certain kinds of databases because of its presumed crash safety.  Even
very ancient guests on Linux and Windows can change WCE=0 with IDE and
SCSI.

So from a guest point of view, I think guest setting WCE=0 should mean
exactly the same as FUA every write, or flush after every write, until
guest setting WCE=1.

-- Jamie



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread Paolo Bonzini
Il 10/01/2013 16:25, Jamie Lokier ha scritto:
  Perhaps it's a bug that the cache mode is not reset when the machine is
  reset.  I haven't checked that, but it would be a valid complaint.
 The question is, is cache=writeback/cache=writethrough an initial
 setting of guest-visible WCE that the guest is allowed to change, or
 is cache=writeththrough a way of saying don't have a write cache
 (which may or may not be reflected in the guest-visible disk id).

It used to be the latter (with reflection in the disk data), but now it
is the former.

 I couldn't tell from QEMU documentation which is intended.  It would
 be a bit silly if it means different things for different backend
 storage.

It means the same thing for IDE, SCSI and virtio-blk.  Other backends,
such as SD, do not even have flush, and are really slow with
cache=writethrough because they write one sector at a time.  For this
reason they cannot really be used in a safe manner.

 I have seen (obscure) guest code which toggled WCE to simulate FUA,

That's quite useless, since WCE=1-WCE=0 is documented to cause a flush
(and it does).  Might as well send a real flush.

 and there is plenty of advice out there saying to set WCE=0 for
 certain kinds of databases because of its presumed crash safety.  Even
 very ancient guests on Linux and Windows can change WCE=0 with IDE and
 SCSI.

This is supported in QEMU.

 So from a guest point of view, I think guest setting WCE=0 should mean
 exactly the same as FUA every write, or flush after every write, until
 guest setting WCE=1.

Yes, that's indeed how it is implemented.

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread Jamie Lokier
Paolo Bonzini wrote:
 Il 10/01/2013 16:25, Jamie Lokier ha scritto:
   Perhaps it's a bug that the cache mode is not reset when the machine is
   reset.  I haven't checked that, but it would be a valid complaint.
  The question is, is cache=writeback/cache=writethrough an initial
  setting of guest-visible WCE that the guest is allowed to change, or
  is cache=writeththrough a way of saying don't have a write cache
  (which may or may not be reflected in the guest-visible disk id).
 
 It used to be the latter (with reflection in the disk data), but now it
 is the former.

Interesting.  It could be worth a note in the manual.

  I couldn't tell from QEMU documentation which is intended.  It would
  be a bit silly if it means different things for different backend
  storage.
 
 It means the same thing for IDE, SCSI and virtio-blk.  Other backends,
 such as SD, do not even have flush, and are really slow with
 cache=writethrough because they write one sector at a time.  For this
 reason they cannot really be used in a safe manner.
 
  I have seen (obscure) guest code which toggled WCE to simulate FUA,
 
 That's quite useless, since WCE=1-WCE=0 is documented to cause a flush
 (and it does).  Might as well send a real flush.

It was because the ATA spec seemed to permit the combination of WCE
with no flush command supported.  So WCE=1-WCE=0 was used to flush,
and kept at WCE=0 for the subsequent logging write-FUA(s), until a
non-FUA write was wanted.

-- Jamie



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-10 Thread MORITA Kazutaka
At Thu, 10 Jan 2013 13:38:16 +0800,
Liu Yuan wrote:
 
 On 01/09/2013 11:10 PM, Paolo Bonzini wrote:
  Il 09/01/2013 14:04, Liu Yuan ha scritto:
2 The upper layer software which relies on the 'cache=xxx' to choose
  cache mode will fail its assumption against new QEMU.
 
  Which assumptions do you mean? As far as I can say the behaviour hasn't
  changed, except possibly for the performance.
 
  When users set 'cache=writethrough' to export only a writethrough cache
  to Guest, but with new QEMU, it will actually get a writeback cache as
  default.
  
  They get a writeback cache implementation-wise, but they get a
  writethrough cache safety-wise.  How the cache is implemented doesn't
  matter, as long as it looks like a writethrough cache.
  
 
  In fact, consider a local disk that doesn't support FUA.  In old QEMU,
  images used to be opened with O_DSYNC and that splits each write into
  WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
  flushes are created.  Old QEMU changes it in the kernel, new QEMU
  changes it in userspace.
  
  We don't need to communicate to the guest. I think 'cache=xxx' means
  what kind of cache the users *expect* to export to Guest OS. So if
  cache=writethrough set, Guest OS couldn't turn it to writeback cache
  magically. This is like I bought a disk with 'writethrough' cache
  built-in, I didn't expect that it turned to be a disk with writeback
  cache under the hood which could possible lose data when power outage
  happened.
  
  It's not by magic.  It's by explicitly requesting the disk to do this.
  
  Perhaps it's a bug that the cache mode is not reset when the machine is
  reset.  I haven't checked that, but it would be a valid complaint.
  
 
 Ah I didn't get the current implementation right. I tried the 3.7 kernel
 and it works as expected (cache=writethrough result in a 'writethrough'
 cache in the guest).
 
 It looks fine to me to emulate writethrough as writeback + flush, since
 the profermance drop isn't big, though sheepdog itself support true
 writethrough cache (no flush).

Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests
when bdrv_enable_write_cache() is false?  Then the requests behave
like FUA writes and we can safely omit succeeding flush requests.

Thanks,

Kazutaka



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Paolo Bonzini
Il 08/01/2013 13:12, Kevin Wolf ha scritto:
  Okay, I see the code indeed support WCE but it requires Guest kernel to
  support it. For the kernel doesn't support it, there is no way to
  disable write cache, no?
 With Linux guests, it's possible for SCSI. I'm not sure about
 virtio-blk, but you might be right that you can't manually change it there.

New enough kernels support it.  Also, if you reboot from a system that
supports writeback to one that doesn't (including old-enough Linux), the
new QEMU will be able to use writethrough while running on the older system.

Paolo




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Paolo Bonzini
Il 08/01/2013 14:18, Liu Yuan ha scritto:
 Maybe not for a second thought. See following combination:
 
cache flagsWCE toggled and resulting behavior
writethrough   writethrough
writeback  writetrhough (writeback + flush as expected)
 
 cache flags means specify 'cache=xxx' at startup and WCE toggled on the
 fly in the guest (supose guest kernel support WCE control)
 
 So the result is *not* broken. If we set cache=writethrough for
 sheepdog, then WCE won't take any effect because 'flush' request will be
 ignored by Sheepdog driver. And with cache=writeback, WCE does disable
 the writecache and actually turns it to a writethrough cache by sending
 flush req every time for write.
 
 To conclude, let Sheepdog interpret cache flags won't cause trouble even
 with current Guest WCE feature, the different is that if we set
 cache=writethrough, guest can't change it via WCE toggling. Is this
 behavior acceptable?

But why is it useful to force-disable writeback caching?  Do you have
any performance numbers?

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 06:25 PM, Paolo Bonzini wrote:
 But why is it useful to force-disable writeback caching?  Do you have
 any performance numbers?

This is not related to performance. What I want is to allow users to
choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx.

Obviously writeback mode show much better write performance over
writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is
a readonly cache which is always consistent with cluster data thus need
no flush at all. This is to boost read performance for read intensive
Guest over Sheepdog iamges.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Paolo Bonzini
Il 09/01/2013 11:36, Liu Yuan ha scritto:
 But why is it useful to force-disable writeback caching?  Do you have
  any performance numbers?
 This is not related to performance. What I want is to allow users to
 choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx.
 
 Obviously writeback mode show much better write performance over
 writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is
 a readonly cache which is always consistent with cluster data thus need
 no flush at all. This is to boost read performance for read intensive
 Guest over Sheepdog iamges.

Ok, so the questions are (compared to older QEMU's writethrough mode):

1) how slower is QEMU's emulated-writethrough mode for writes, due to
the extra requests?

2) how slower is QEMU's writeback mode for reads, due to the different
structure of the cache?

Paolo



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 06:40 PM, Paolo Bonzini wrote:
 1) how slower is QEMU's emulated-writethrough mode for writes, due to
 the extra requests?
 

I'll collect some numbers on it.

 2) how slower is QEMU's writeback mode for reads, due to the different
 structure of the cache?

Sorry, I don't get your question.

I didn't say QEMU's writeback mode is slow for reads, did I?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 06:46 PM, Liu Yuan wrote:
 I'll collect some numbers on it.

Well, when I use hdparm -W 0 /dev/vdx to choose the writethrough cache
in the Guest (virtio disk), it says:

test@vm1:~$ sudo hdparm -W 0 /dev/vdb

/dev/vdb:
 setting drive write-caching to 0 (off)
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device

Do I need extra flags to enable WCE?

Both Guest and Host kernel is 3.7 and I start QEMU by following command:

$ qemu-system-x86_64 --enable-kvm -drive
file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024
-drive file=sheepdog:test,if=virtio,cache=writeback

Thanks,
Yuan





Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Kevin Wolf
Am 09.01.2013 11:36, schrieb Liu Yuan:
 On 01/09/2013 06:25 PM, Paolo Bonzini wrote:
 But why is it useful to force-disable writeback caching?  Do you have
 any performance numbers?
 
 This is not related to performance. What I want is to allow users to
 choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx.

The question was which effects these modes have, and especially what the
user-visible differences between writeback+flush and writethrough cache are.

 Obviously writeback mode show much better write performance over
 writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is
 a readonly cache which is always consistent with cluster data thus need
 no flush at all. This is to boost read performance for read intensive
 Guest over Sheepdog iamges.

This is the definition of a writethrough cache.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Paolo Bonzini
Il 09/01/2013 11:58, Liu Yuan ha scritto:
 On 01/09/2013 06:46 PM, Liu Yuan wrote:
 I'll collect some numbers on it.
 
 Well, when I use hdparm -W 0 /dev/vdx to choose the writethrough cache
 in the Guest (virtio disk), it says:
 
 test@vm1:~$ sudo hdparm -W 0 /dev/vdb
 
 /dev/vdb:
  setting drive write-caching to 0 (off)
  HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
  HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
  HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
  HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
  HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
  HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 
 Do I need extra flags to enable WCE?

hdparm is specific for IDE devices.  sysfs is supported by SCSI
(including IDE with libata, and virtio-scsi) and virtio-blk.

You can use this for SCSI:

echo 'write back'  /sys/block/sda/device/scsi_disk/*/cache_type

and a similar command for virtio-blk.

Paolo

 Both Guest and Host kernel is 3.7 and I start QEMU by following command:
 
 $ qemu-system-x86_64 --enable-kvm -drive
 file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024
 -drive file=sheepdog:test,if=virtio,cache=writeback
 
 Thanks,
 Yuan
 
 




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 06:46 PM, Liu Yuan wrote:
 1) how slower is QEMU's emulated-writethrough mode for writes, due to
  the extra requests?
  
 I'll collect some numbers on it.
 

Okay I got some nubmers. I run three sheep daemon on the same host to
emulate a 3 nodes cluster, Sheepdog image has 3 copies and I put
Sheepdog client cache and Sheepdog backend storage on a tmpfs.

Guest and Host are all Linux 3.7. I start QEMU by following command:

$ qemu-system-x86_64 --enable-kvm -drive
file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024
-drive file=sheepdog:test,if=virtio,cache=writeback

I run 5 times 'dd if=/dev/urandom of=/dev/vdb bs=1M count=100
oflag=direct' and get the average nubmer:

emulated (write + flush)  old impl (single write)
13.3 M/s   13.7 M/s

boost percentage: (13.7 - 13.3)/13.3 = 3%.

If boost number is not big, but if we run QEMU and Sheep daemon on the
separate boxes, we'll expect of bigger boost because the overhead of
extra 'flush' request is increased.

Besides performance, I think backward compatibility is more important:
  1 if we run a old kernel host (quite possible for a long running
server) which doesn't support WCE, then we will never have a chance to
choose writethrough cache for guest OS against new QEMU (most users tend
to update user space tools to exclude bugs)
  2 The upper layer software which relies on the 'cache=xxx' to choose
cache mode will fail its assumption against new QEMU.

and my proposal (adding another field to BlockDriverState to allow
self-interpretation cache flag) can work well with current block layer:

Sheepdog driver behavior:
   cache flagsWCE toggled and resulting behavior
   writethrough   writethrough
   writeback  writetrhough (writeback + flush)

We can see that the writeback behavior for Guest-WCE toggling is the
same as expected. The difference is that if we set cache=writethrough,
guest can't change it via WCE toggling.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 08:07 PM, Liu Yuan wrote:
 $ qemu-system-x86_64 --enable-kvm -drive
 file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024
 -drive file=sheepdog:test,if=virtio,cache=writeback

To emulate old impl, I modified the sheepdog block driver as following:

diff --git a/block/sheepdog.c b/block/sheepdog.c
index e821746..321a426 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1118,6 +1118,7 @@ static int sd_open(BlockDriverState *bs, const
char *filename, int flags)
 goto out;
 }

+if (0) {
 s-cache_enabled = true;
 s-flush_fd = connect_to_sdog(s-addr, s-port);
 if (s-flush_fd  0) {
@@ -1125,6 +1126,7 @@ static int sd_open(BlockDriverState *bs, const
char *filename, int flags)
 ret = s-flush_fd;
 goto out;
 }
+}

 if (snapid || tag[0] != '\0') {
 dprintf(% PRIx32  snapshot inode was open.\n, vid);




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 08:07 PM, Liu Yuan wrote:
 and my proposal (adding another field to BlockDriverState to allow
 self-interpretation cache flag) can work well with current block layer:
 
 Sheepdog driver behavior:
cache flagsWCE toggled and resulting behavior
writethrough   writethrough
writeback  writetrhough (writeback + flush)
 
 We can see that the writeback behavior for Guest-WCE toggling is the
 same as expected. The difference is that if we set cache=writethrough,
 guest can't change it via WCE toggling.

I am wondering if above scheme can apply to the block layer, then
'cache=xxx' can act as a stronger control over WCE toggling. Then we
don't need extra field.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Kevin Wolf
Am 09.01.2013 13:07, schrieb Liu Yuan:
 Besides performance, I think backward compatibility is more important:
   1 if we run a old kernel host (quite possible for a long running
 server) which doesn't support WCE, then we will never have a chance to
 choose writethrough cache for guest OS against new QEMU (most users tend
 to update user space tools to exclude bugs)

How does the host kernel even play a role in this context?

   2 The upper layer software which relies on the 'cache=xxx' to choose
 cache mode will fail its assumption against new QEMU.

Which assumptions do you mean? As far as I can say the behaviour hasn't
changed, except possibly for the performance.

 We can see that the writeback behavior for Guest-WCE toggling is the
 same as expected. The difference is that if we set cache=writethrough,
 guest can't change it via WCE toggling.

Then you need to make sure to communicate this to the guest. I'm not
convinced that adding extra logic to each device for this is a good idea.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 08:42 PM, Kevin Wolf wrote:
 Am 09.01.2013 13:07, schrieb Liu Yuan:
 Besides performance, I think backward compatibility is more important:
   1 if we run a old kernel host (quite possible for a long running
 server) which doesn't support WCE, then we will never have a chance to
 choose writethrough cache for guest OS against new QEMU (most users tend
 to update user space tools to exclude bugs)
 
 How does the host kernel even play a role in this context?
 

Oops, my mind was broken. Okay, guest OS which doesn't support WCE can't
change cache type as we discussed.

   2 The upper layer software which relies on the 'cache=xxx' to choose
 cache mode will fail its assumption against new QEMU.
 
 Which assumptions do you mean? As far as I can say the behaviour hasn't
 changed, except possibly for the performance.
 

When users set 'cache=writethrough' to export only a writethrough cache
to Guest, but with new QEMU, it will actually get a writeback cache as
default.

 We can see that the writeback behavior for Guest-WCE toggling is the
 same as expected. The difference is that if we set cache=writethrough,
 guest can't change it via WCE toggling.
 
 Then you need to make sure to communicate this to the guest. I'm not
 convinced that adding extra logic to each device for this is a good idea.
 

We don't need to communicate to the guest. I think 'cache=xxx' means
what kind of cache the users *expect* to export to Guest OS. So if
cache=writethrough set, Guest OS couldn't turn it to writeback cache
magically. This is like I bought a disk with 'writethrough' cache
built-in, I didn't expect that it turned to be a disk with writeback
cache under the hood which could possible lose data when power outage
happened.

So Guest-WCE only works when users provide us a writeback capable disk.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Paolo Bonzini
Il 09/01/2013 14:04, Liu Yuan ha scritto:
 2 The upper layer software which relies on the 'cache=xxx' to choose
   cache mode will fail its assumption against new QEMU.
  
  Which assumptions do you mean? As far as I can say the behaviour hasn't
  changed, except possibly for the performance.

 When users set 'cache=writethrough' to export only a writethrough cache
 to Guest, but with new QEMU, it will actually get a writeback cache as
 default.

They get a writeback cache implementation-wise, but they get a
writethrough cache safety-wise.  How the cache is implemented doesn't
matter, as long as it looks like a writethrough cache.

In fact, consider a local disk that doesn't support FUA.  In old QEMU,
images used to be opened with O_DSYNC and that splits each write into
WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
flushes are created.  Old QEMU changes it in the kernel, new QEMU
changes it in userspace.

 We don't need to communicate to the guest. I think 'cache=xxx' means
 what kind of cache the users *expect* to export to Guest OS. So if
 cache=writethrough set, Guest OS couldn't turn it to writeback cache
 magically. This is like I bought a disk with 'writethrough' cache
 built-in, I didn't expect that it turned to be a disk with writeback
 cache under the hood which could possible lose data when power outage
 happened.

It's not by magic.  It's by explicitly requesting the disk to do this.

Perhaps it's a bug that the cache mode is not reset when the machine is
reset.  I haven't checked that, but it would be a valid complaint.

Paolo

 
 So Guest-WCE only works when users provide us a writeback capable disk.




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-09 Thread Liu Yuan
On 01/09/2013 11:10 PM, Paolo Bonzini wrote:
 Il 09/01/2013 14:04, Liu Yuan ha scritto:
   2 The upper layer software which relies on the 'cache=xxx' to choose
 cache mode will fail its assumption against new QEMU.

 Which assumptions do you mean? As far as I can say the behaviour hasn't
 changed, except possibly for the performance.

 When users set 'cache=writethrough' to export only a writethrough cache
 to Guest, but with new QEMU, it will actually get a writeback cache as
 default.
 
 They get a writeback cache implementation-wise, but they get a
 writethrough cache safety-wise.  How the cache is implemented doesn't
 matter, as long as it looks like a writethrough cache.
 

 In fact, consider a local disk that doesn't support FUA.  In old QEMU,
 images used to be opened with O_DSYNC and that splits each write into
 WRITE+FLUSH, just like new QEMU.  All that changes is _where_ the
 flushes are created.  Old QEMU changes it in the kernel, new QEMU
 changes it in userspace.
 
 We don't need to communicate to the guest. I think 'cache=xxx' means
 what kind of cache the users *expect* to export to Guest OS. So if
 cache=writethrough set, Guest OS couldn't turn it to writeback cache
 magically. This is like I bought a disk with 'writethrough' cache
 built-in, I didn't expect that it turned to be a disk with writeback
 cache under the hood which could possible lose data when power outage
 happened.
 
 It's not by magic.  It's by explicitly requesting the disk to do this.
 
 Perhaps it's a bug that the cache mode is not reset when the machine is
 reset.  I haven't checked that, but it would be a valid complaint.
 

Ah I didn't get the current implementation right. I tried the 3.7 kernel
and it works as expected (cache=writethrough result in a 'writethrough'
cache in the guest).

It looks fine to me to emulate writethrough as writeback + flush, since
the profermance drop isn't big, though sheepdog itself support true
writethrough cache (no flush).

I am going to send v2 of directio patch for sheepdog driver.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Stefan Hajnoczi
On Tue, Jan 08, 2013 at 01:42:35PM +0800, Liu Yuan wrote:
 On 01/07/2013 09:23 PM, Kevin Wolf wrote:
  No, and in theory they shouldn't have to care.
  
  Why do you want to handle writethrough semantics in the block driver
  rather than letting qemu care for sending the right flushes?
 
 Because Sheepdog itself provide the client side cache implementation and
 we need means to get the cache hint from users when starting up the VM.
 This cache support:
  writethrough: provide a read cache for this VM, which is always
 consistent with cluster data
  writeback: provide a writeback cache for this VM, which only flush the
 dirty bits when VM send 'flush' request.
  directio: disable cache completely for this VM.
 
 Old QEMU (before v1.2) allows block drivers to get the cache flags, so
 Sheepdog can interpret these flags on its own to choose the right cache
 mode for each VM. doesn't QEMU need keep backward compatibility?

This sounds fine - it fits together with QEMU like this:

BDRV_NO_CACHE means sheepdog directio.  We're disabling the client-side
sheepdog cache.

Otherwise use sheepdog writeback and let QEMU block.c decide when to
flush.  Never use sheepdog writethrough because it's redundant here.

Stefan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.

I don't get it. What do you mean by 'redundant'? If we use virtio 
sheepdog block driver, how can we specify writethrough mode for Sheepdog
cache? Here 'writethrough' means use a pure read cache, which doesn't
need flush at all.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Kevin Wolf
Am 08.01.2013 10:45, schrieb Liu Yuan:
 On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.
 
 I don't get it. What do you mean by 'redundant'? If we use virtio 
 sheepdog block driver, how can we specify writethrough mode for Sheepdog
 cache? Here 'writethrough' means use a pure read cache, which doesn't
 need flush at all.

A writethrough cache is equivalent to a write-back cache where each
write is followed by a flush. qemu makes sure to send these flushes, so
there is no need use Sheepdog's writethrough mode.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 06:00 PM, Kevin Wolf wrote:
 Am 08.01.2013 10:45, schrieb Liu Yuan:
 On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.

 I don't get it. What do you mean by 'redundant'? If we use virtio 
 sheepdog block driver, how can we specify writethrough mode for Sheepdog
 cache? Here 'writethrough' means use a pure read cache, which doesn't
 need flush at all.
 
 A writethrough cache is equivalent to a write-back cache where each
 write is followed by a flush. qemu makes sure to send these flushes, so
 there is no need use Sheepdog's writethrough mode.
 

Implement writethrough as writeback + flush will cause considerable
overhead for network block device like Sheepdog: a single write request
will be executed as two requests: write + flush. This also explains why
I saw a regression about write performance: Old QEMU can issue multiple
write requests in one go, but now the requests are sent one by one (even
with cache=writeback set), which makes Sheepdog write performance drop a
lot. Is it possible to issue multiple requests in one go as old QEMU does?

It seems it is hard to restore into old semantics of cache flags due to
new design of QEMU block layer. So will you accept that adding a 'flags'
into BlockDriverState which carry the 'cache flags' from user to keep
backward compatibility?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Kevin Wolf
Am 08.01.2013 11:39, schrieb Liu Yuan:
 On 01/08/2013 06:00 PM, Kevin Wolf wrote:
 Am 08.01.2013 10:45, schrieb Liu Yuan:
 On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.

 I don't get it. What do you mean by 'redundant'? If we use virtio 
 sheepdog block driver, how can we specify writethrough mode for Sheepdog
 cache? Here 'writethrough' means use a pure read cache, which doesn't
 need flush at all.

 A writethrough cache is equivalent to a write-back cache where each
 write is followed by a flush. qemu makes sure to send these flushes, so
 there is no need use Sheepdog's writethrough mode.
 
 Implement writethrough as writeback + flush will cause considerable
 overhead for network block device like Sheepdog: a single write request
 will be executed as two requests: write + flush

Yeah, maybe we should have some kind of a FUA flag with write requests
instead of sending a separate flush.

 This also explains why
 I saw a regression about write performance: Old QEMU can issue multiple
 write requests in one go, but now the requests are sent one by one (even
 with cache=writeback set), which makes Sheepdog write performance drop a
 lot. Is it possible to issue multiple requests in one go as old QEMU does?

Huh? We didn't change anything to that respect, or at least not that I'm
aware of. qemu always only had single-request bdrv_co_writev, so if
anything that batching must have happened inside Sheepdog code? Do you
know what makes it not batch requests any more?

 It seems it is hard to restore into old semantics of cache flags due to
 new design of QEMU block layer. So will you accept that adding a 'flags'
 into BlockDriverState which carry the 'cache flags' from user to keep
 backward compatibility?

No, going back to the old behaviour would break guest-toggled WCE.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 06:51 PM, Kevin Wolf wrote:
 Am 08.01.2013 11:39, schrieb Liu Yuan:
 On 01/08/2013 06:00 PM, Kevin Wolf wrote:
 Am 08.01.2013 10:45, schrieb Liu Yuan:
 On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote:
 Otherwise use sheepdog writeback and let QEMU block.c decide when to
 flush.  Never use sheepdog writethrough because it's redundant here.

 I don't get it. What do you mean by 'redundant'? If we use virtio 
 sheepdog block driver, how can we specify writethrough mode for Sheepdog
 cache? Here 'writethrough' means use a pure read cache, which doesn't
 need flush at all.

 A writethrough cache is equivalent to a write-back cache where each
 write is followed by a flush. qemu makes sure to send these flushes, so
 there is no need use Sheepdog's writethrough mode.

 Implement writethrough as writeback + flush will cause considerable
 overhead for network block device like Sheepdog: a single write request
 will be executed as two requests: write + flush
 
 Yeah, maybe we should have some kind of a FUA flag with write requests
 instead of sending a separate flush.
 
 This also explains why
 I saw a regression about write performance: Old QEMU can issue multiple
 write requests in one go, but now the requests are sent one by one (even
 with cache=writeback set), which makes Sheepdog write performance drop a
 lot. Is it possible to issue multiple requests in one go as old QEMU does?
 
 Huh? We didn't change anything to that respect, or at least not that I'm
 aware of. qemu always only had single-request bdrv_co_writev, so if
 anything that batching must have happened inside Sheepdog code? Do you
 know what makes it not batch requests any more?
 

QEMU v1.1.x works well with batched write requests. Sheepdog block
driver doesn't do batching trick as far as I know, just send request as
it is feed. There isn't noticeable changes between v1.1.x and current
master regard to Sheepdog.c.

To detail the different behavior, from Sheepdog daemon which receives
the requests from QEMU:
 old: can receive multiple many requests at the virtually the same time
and handle them concurrently
 now: only receive one request, handle it, reply and get another.

So I think the problem is, current QEMU will wait for write response
before sending another request.

 It seems it is hard to restore into old semantics of cache flags due to
 new design of QEMU block layer. So will you accept that adding a 'flags'
 into BlockDriverState which carry the 'cache flags' from user to keep
 backward compatibility?
 
 No, going back to the old behaviour would break guest-toggled WCE.
 

Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't
support it, no? And I think there are huge virtio-blk users.

I didn't meant to break WCE. What I meant is to allow backward
compatibility. For e.g, Sheepdog driver can make use of this dedicated
cache flags to implement its own cache control and doesn't affect other
drivers at all.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Kevin Wolf
Am 08.01.2013 12:08, schrieb Liu Yuan:
 On 01/08/2013 06:51 PM, Kevin Wolf wrote:
 Am 08.01.2013 11:39, schrieb Liu Yuan:
 This also explains why
 I saw a regression about write performance: Old QEMU can issue multiple
 write requests in one go, but now the requests are sent one by one (even
 with cache=writeback set), which makes Sheepdog write performance drop a
 lot. Is it possible to issue multiple requests in one go as old QEMU does?

 Huh? We didn't change anything to that respect, or at least not that I'm
 aware of. qemu always only had single-request bdrv_co_writev, so if
 anything that batching must have happened inside Sheepdog code? Do you
 know what makes it not batch requests any more?

 
 QEMU v1.1.x works well with batched write requests. Sheepdog block
 driver doesn't do batching trick as far as I know, just send request as
 it is feed. There isn't noticeable changes between v1.1.x and current
 master regard to Sheepdog.c.
 
 To detail the different behavior, from Sheepdog daemon which receives
 the requests from QEMU:
  old: can receive multiple many requests at the virtually the same time
 and handle them concurrently
  now: only receive one request, handle it, reply and get another.
 
 So I think the problem is, current QEMU will wait for write response
 before sending another request.

I can't see a reason why it would do that. Can you bisect this?

 It seems it is hard to restore into old semantics of cache flags due to
 new design of QEMU block layer. So will you accept that adding a 'flags'
 into BlockDriverState which carry the 'cache flags' from user to keep
 backward compatibility?

 No, going back to the old behaviour would break guest-toggled WCE.

 
 Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't
 support it, no? And I think there are huge virtio-blk users.

It works with virtio-blk and SCSI as well.

 I didn't meant to break WCE. What I meant is to allow backward
 compatibility. For e.g, Sheepdog driver can make use of this dedicated
 cache flags to implement its own cache control and doesn't affect other
 drivers at all.

How would you do it? With a WCE that changes during runtime the idea of
a flag that is passed to bdrv_open() and stays valid as long as the
BlockDriverState exists doesn't match reality any more.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 07:19 PM, Kevin Wolf wrote:
 I can't see a reason why it would do that. Can you bisect this?
 

Sure, bisect it is on my schedule, but I can't promise a deadline.

  It seems it is hard to restore into old semantics of cache flags due to
  new design of QEMU block layer. So will you accept that adding a 
  'flags'
  into BlockDriverState which carry the 'cache flags' from user to keep
  backward compatibility?
 
  No, going back to the old behaviour would break guest-toggled WCE.
 
  
  Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't
  support it, no? And I think there are huge virtio-blk users.
 It works with virtio-blk and SCSI as well.
 

Okay, I see the code indeed support WCE but it requires Guest kernel to
support it. For the kernel doesn't support it, there is no way to
disable write cache, no?

  I didn't meant to break WCE. What I meant is to allow backward
  compatibility. For e.g, Sheepdog driver can make use of this dedicated
  cache flags to implement its own cache control and doesn't affect other
  drivers at all.
 How would you do it? With a WCE that changes during runtime the idea of
 a flag that is passed to bdrv_open() and stays valid as long as the
 BlockDriverState exists doesn't match reality any more.

I am not sure if I get it right. What I meant is allow Sheepdog to
control cache on the 'cache flags' at startup and ignore WCE on the run
time.

So you mean, if I choose witeback cache at startup, and then Guest
disable it via WCE, then block layer will never send flush request down
to Sheepdog driver?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Kevin Wolf
Am 08.01.2013 12:35, schrieb Liu Yuan:
 On 01/08/2013 07:19 PM, Kevin Wolf wrote:
 I can't see a reason why it would do that. Can you bisect this?

 
 Sure, bisect it is on my schedule, but I can't promise a deadline.

Ok, thanks. It would be good if it was before the hard freeze for 1.4.

 It seems it is hard to restore into old semantics of cache flags due to
 new design of QEMU block layer. So will you accept that adding a 
 'flags'
 into BlockDriverState which carry the 'cache flags' from user to keep
 backward compatibility?

 No, going back to the old behaviour would break guest-toggled WCE.


 Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't
 support it, no? And I think there are huge virtio-blk users.
 It works with virtio-blk and SCSI as well.

 
 Okay, I see the code indeed support WCE but it requires Guest kernel to
 support it. For the kernel doesn't support it, there is no way to
 disable write cache, no?

With Linux guests, it's possible for SCSI. I'm not sure about
virtio-blk, but you might be right that you can't manually change it there.

 I didn't meant to break WCE. What I meant is to allow backward
 compatibility. For e.g, Sheepdog driver can make use of this dedicated
 cache flags to implement its own cache control and doesn't affect other
 drivers at all.
 How would you do it? With a WCE that changes during runtime the idea of
 a flag that is passed to bdrv_open() and stays valid as long as the
 BlockDriverState exists doesn't match reality any more.
 
 I am not sure if I get it right. What I meant is allow Sheepdog to
 control cache on the 'cache flags' at startup and ignore WCE on the run
 time.

If you start with cache=writeback and then the guests switches WCE off
and you ignore that, then you're causing data corruption in the case of
a crash. This is not an option.

 So you mean, if I choose witeback cache at startup, and then Guest
 disable it via WCE, then block layer will never send flush request down
 to Sheepdog driver?

Yes, this is the problematic case. Currently the qemu block layer makes
sure to send flushes, but if you disable that logic for Sheepdog, you
would get broken behaviour in this case.

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 08:12 PM, Kevin Wolf wrote:
 Ok, thanks. It would be good if it was before the hard freeze for 1.4.
 

Oops, sorry. It is a false alarm. Last time I was running latest QEMU on
tmpfs which service the request too fast.

  It seems it is hard to restore into old semantics of cache 
  flags due to
  new design of QEMU block layer. So will you accept that 
  adding a 'flags'
  into BlockDriverState which carry the 'cache flags' from 
  user to keep
  backward compatibility?
 
  No, going back to the old behaviour would break guest-toggled 
  WCE.
 
 
  Guest-toggled WCE only works with IDE and seems that virtio-blk 
  doesn't
  support it, no? And I think there are huge virtio-blk users.
  It works with virtio-blk and SCSI as well.
 
  
  Okay, I see the code indeed support WCE but it requires Guest kernel to
  support it. For the kernel doesn't support it, there is no way to
  disable write cache, no?
 With Linux guests, it's possible for SCSI. I'm not sure about
 virtio-blk, but you might be right that you can't manually change it there.
 
  I didn't meant to break WCE. What I meant is to allow backward
  compatibility. For e.g, Sheepdog driver can make use of this 
  dedicated
  cache flags to implement its own cache control and doesn't affect 
  other
  drivers at all.
  How would you do it? With a WCE that changes during runtime the idea of
  a flag that is passed to bdrv_open() and stays valid as long as the
  BlockDriverState exists doesn't match reality any more.
  
  I am not sure if I get it right. What I meant is allow Sheepdog to
  control cache on the 'cache flags' at startup and ignore WCE on the run
  time.
 If you start with cache=writeback and then the guests switches WCE off
 and you ignore that, then you're causing data corruption in the case of
 a crash. This is not an option.
 
  So you mean, if I choose witeback cache at startup, and then Guest
  disable it via WCE, then block layer will never send flush request down
  to Sheepdog driver?
 Yes, this is the problematic case. Currently the qemu block layer makes
 sure to send flushes, but if you disable that logic for Sheepdog, you
 would get broken behaviour in this case.

Maybe not for a second thought. See following combination:

   cache flagsWCE toggled and resulting behavior
   writethrough   writethrough
   writeback  writetrhough (writeback + flush as expected)

cache flags means specify 'cache=xxx' at startup and WCE toggled on the
fly in the guest (supose guest kernel support WCE control)

So the result is *not* broken. If we set cache=writethrough for
sheepdog, then WCE won't take any effect because 'flush' request will be
ignored by Sheepdog driver. And with cache=writeback, WCE does disable
the writecache and actually turns it to a writethrough cache by sending
flush req every time for write.

To conclude, let Sheepdog interpret cache flags won't cause trouble even
with current Guest WCE feature, the different is that if we set
cache=writethrough, guest can't change it via WCE toggling. Is this
behavior acceptable?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-08 Thread Liu Yuan
On 01/08/2013 09:18 PM, Liu Yuan wrote:
 So the result is *not* broken. If we set cache=writethrough for
 sheepdog, then WCE won't take any effect because 'flush' request will be
 ignored by Sheepdog driver. 

The merit of this 'writethrough' is that, write request will never be
interpreted as two requests: write + flush in Sheepdog driver.

Thanks,
Yuan




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-07 Thread Stefan Hajnoczi
On Sat, Jan 05, 2013 at 03:56:11PM +0800, Liu Yuan wrote:
 On 01/05/2013 01:29 PM, Liu Yuan wrote:
  On 01/05/2013 12:40 PM, Liu Yuan wrote:
  On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
  Hi Yuan,
  BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).
 
  BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
  means whether the disk cache is writethrough or writeback.
 
  In other words, BDRV_O_NOCACHE is a host performance tweak while
  BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
  protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
  because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
  will flush when after each write when !bs-enable_write_cache).
 
  Hi Stefan,
 
Thanks for your explanation. But after more investigation, I find
  myself more confused:
 
flags passed from block layer
{writeback, writethrough}  0x2042
{directsync, off, none}0x2062
{unsafe}   0x2242
 
  So underlying driver like Sheepdog can't depend on 'flags' passed from
  .bdrv_file_open() to choose the right semantics (This was possible for
  old QEMU IIRC).
 
  If we can't rely on the 'flags' to get the cache indications of users,
  would you point me how to implement tristate cache control for network
  block driver like Sheepdog? For e.g, I want to implement following
  semantics:
  cache=writeback|none|off # enable writeback semantics for write
  cache=writethrough # enable writethrough semantics for write
  cache=directsync # disable cache completely
 
  Thanks,
  Yuan
 
  
  I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I
  expected. So I guess generic block layer got changed a bit and the
  'flags' meaning turned different than old code, which did indeed allow
  block drivers to interpret the 'flags' passed from bdrv_file_open().
  
  With the current upstream code, it seems that BDRV_O_CACHE_WB is always
  enabled and makes 'flags' completely unusable for block drivers to get
  the indications of user by specifying 'cache=' field.
  
  So is there other means to allow block drivers to rely on, in order to
  interpret the 'cache semantics'?
  
 
 I found the commit:e1e9b0ac 'always open drivers in writeback mode'.
 This is really undesired for network block drivers such as Sheepdog,
 which implement its own cache mechanism that support
 writeback/writethrough/directio behavior and then want to interpret
 these flags on its own.
 
 Is there any means for block drivers to get the semantics of 'cache=xxx'
 from users?

Hi Yuan,
Please explain what cache semantics the sheepdog server supports so I
can understand better what you are trying to achieve.

Stefan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-07 Thread Kevin Wolf
Am 05.01.2013 08:56, schrieb Liu Yuan:
 On 01/05/2013 01:29 PM, Liu Yuan wrote:
 On 01/05/2013 12:40 PM, Liu Yuan wrote:
 On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
 Hi Yuan,
 BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).

 BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
 means whether the disk cache is writethrough or writeback.

 In other words, BDRV_O_NOCACHE is a host performance tweak while
 BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
 protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
 because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
 will flush when after each write when !bs-enable_write_cache).

 Hi Stefan,

   Thanks for your explanation. But after more investigation, I find
 myself more confused:

   flags passed from block layer
   {writeback, writethrough}  0x2042
   {directsync, off, none}0x2062
   {unsafe}   0x2242

 So underlying driver like Sheepdog can't depend on 'flags' passed from
 .bdrv_file_open() to choose the right semantics (This was possible for
 old QEMU IIRC).

 If we can't rely on the 'flags' to get the cache indications of users,
 would you point me how to implement tristate cache control for network
 block driver like Sheepdog? For e.g, I want to implement following
 semantics:
 cache=writeback|none|off # enable writeback semantics for write
 cache=writethrough # enable writethrough semantics for write
 cache=directsync # disable cache completely

 Thanks,
 Yuan


 I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I
 expected. So I guess generic block layer got changed a bit and the
 'flags' meaning turned different than old code, which did indeed allow
 block drivers to interpret the 'flags' passed from bdrv_file_open().

 With the current upstream code, it seems that BDRV_O_CACHE_WB is always
 enabled and makes 'flags' completely unusable for block drivers to get
 the indications of user by specifying 'cache=' field.

 So is there other means to allow block drivers to rely on, in order to
 interpret the 'cache semantics'?

 
 I found the commit:e1e9b0ac 'always open drivers in writeback mode'.
 This is really undesired for network block drivers such as Sheepdog,
 which implement its own cache mechanism that support
 writeback/writethrough/directio behavior and then want to interpret
 these flags on its own.
 
 Is there any means for block drivers to get the semantics of 'cache=xxx'
 from users?

No, and in theory they shouldn't have to care.

Why do you want to handle writethrough semantics in the block driver
rather than letting qemu care for sending the right flushes?

Kevin



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-07 Thread Liu Yuan
On 01/07/2013 08:31 PM, Stefan Hajnoczi wrote:
 On Sat, Jan 05, 2013 at 03:56:11PM +0800, Liu Yuan wrote:
 On 01/05/2013 01:29 PM, Liu Yuan wrote:
 On 01/05/2013 12:40 PM, Liu Yuan wrote:
 On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
 Hi Yuan,
 BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).

 BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
 means whether the disk cache is writethrough or writeback.

 In other words, BDRV_O_NOCACHE is a host performance tweak while
 BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
 protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
 because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
 will flush when after each write when !bs-enable_write_cache).

 Hi Stefan,

   Thanks for your explanation. But after more investigation, I find
 myself more confused:

   flags passed from block layer
   {writeback, writethrough}  0x2042
   {directsync, off, none}0x2062
   {unsafe}   0x2242

 So underlying driver like Sheepdog can't depend on 'flags' passed from
 .bdrv_file_open() to choose the right semantics (This was possible for
 old QEMU IIRC).

 If we can't rely on the 'flags' to get the cache indications of users,
 would you point me how to implement tristate cache control for network
 block driver like Sheepdog? For e.g, I want to implement following
 semantics:
 cache=writeback|none|off # enable writeback semantics for write
 cache=writethrough # enable writethrough semantics for write
 cache=directsync # disable cache completely

 Thanks,
 Yuan


 I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I
 expected. So I guess generic block layer got changed a bit and the
 'flags' meaning turned different than old code, which did indeed allow
 block drivers to interpret the 'flags' passed from bdrv_file_open().

 With the current upstream code, it seems that BDRV_O_CACHE_WB is always
 enabled and makes 'flags' completely unusable for block drivers to get
 the indications of user by specifying 'cache=' field.

 So is there other means to allow block drivers to rely on, in order to
 interpret the 'cache semantics'?


 I found the commit:e1e9b0ac 'always open drivers in writeback mode'.
 This is really undesired for network block drivers such as Sheepdog,
 which implement its own cache mechanism that support
 writeback/writethrough/directio behavior and then want to interpret
 these flags on its own.

 Is there any means for block drivers to get the semantics of 'cache=xxx'
 from users?
 
 Hi Yuan,
 Please explain what cache semantics the sheepdog server supports so I
 can understand better what you are trying to achieve.
 

Hi Stefan,

Sheepdog support writethrough/writeback/directio(bypass) the cache, and
I want to implement following semantics:
cache=writeback|none|off # enable writeback semantics for write
cache=writethrough # enable writethrough semantics for write
cache=directsync # disable cache completely

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-07 Thread Liu Yuan
On 01/07/2013 09:23 PM, Kevin Wolf wrote:
 No, and in theory they shouldn't have to care.
 
 Why do you want to handle writethrough semantics in the block driver
 rather than letting qemu care for sending the right flushes?

Because Sheepdog itself provide the client side cache implementation and
we need means to get the cache hint from users when starting up the VM.
This cache support:
 writethrough: provide a read cache for this VM, which is always
consistent with cluster data
 writeback: provide a writeback cache for this VM, which only flush the
dirty bits when VM send 'flush' request.
 directio: disable cache completely for this VM.

Old QEMU (before v1.2) allows block drivers to get the cache flags, so
Sheepdog can interpret these flags on its own to choose the right cache
mode for each VM. doesn't QEMU need keep backward compatibility?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-04 Thread Stefan Hajnoczi
On Thu, Jan 03, 2013 at 09:43:53PM +0800, Liu Yuan wrote:
 On 12/25/2012 04:45 PM, Liu Yuan wrote:
  Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'.
  Is this a bug for current master? If no, my current scheme will be the
  only way to bypass cache of sheepdog.
 
 Ping. Can anyone confirm it is a bug that 'cache=directsync' will pass
 BDRV_O_CACHE_WB' in the flags?

Hi Yuan,
BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).

BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
means whether the disk cache is writethrough or writeback.

In other words, BDRV_O_NOCACHE is a host performance tweak while
BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
will flush when after each write when !bs-enable_write_cache).

Stefan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-04 Thread Liu Yuan
On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
 Hi Yuan,
 BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).
 
 BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
 means whether the disk cache is writethrough or writeback.
 
 In other words, BDRV_O_NOCACHE is a host performance tweak while
 BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
 protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
 because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
 will flush when after each write when !bs-enable_write_cache).

Hi Stefan,

  Thanks for your explanation. But after more investigation, I find
myself more confused:

  flags passed from block layer
  {writeback, writethrough}  0x2042
  {directsync, off, none}0x2062
  {unsafe}   0x2242

So underlying driver like Sheepdog can't depend on 'flags' passed from
.bdrv_file_open() to choose the right semantics (This was possible for
old QEMU IIRC).

If we can't rely on the 'flags' to get the cache indications of users,
would you point me how to implement tristate cache control for network
block driver like Sheepdog? For e.g, I want to implement following
semantics:
cache=writeback|none|off # enable writeback semantics for write
cache=writethrough # enable writethrough semantics for write
cache=directsync # disable cache completely

Thanks,
Yuan






Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-04 Thread Liu Yuan
On 01/05/2013 12:40 PM, Liu Yuan wrote:
 On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
 Hi Yuan,
 BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).

 BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
 means whether the disk cache is writethrough or writeback.

 In other words, BDRV_O_NOCACHE is a host performance tweak while
 BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
 protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
 because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
 will flush when after each write when !bs-enable_write_cache).
 
 Hi Stefan,
 
   Thanks for your explanation. But after more investigation, I find
 myself more confused:
 
   flags passed from block layer
   {writeback, writethrough}  0x2042
   {directsync, off, none}0x2062
   {unsafe}   0x2242
 
 So underlying driver like Sheepdog can't depend on 'flags' passed from
 .bdrv_file_open() to choose the right semantics (This was possible for
 old QEMU IIRC).
 
 If we can't rely on the 'flags' to get the cache indications of users,
 would you point me how to implement tristate cache control for network
 block driver like Sheepdog? For e.g, I want to implement following
 semantics:
 cache=writeback|none|off # enable writeback semantics for write
 cache=writethrough # enable writethrough semantics for write
 cache=directsync # disable cache completely
 
 Thanks,
 Yuan
 

I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I
expected. So I guess generic block layer got changed a bit and the
'flags' meaning turned different than old code, which did indeed allow
block drivers to interpret the 'flags' passed from bdrv_file_open().

With the current upstream code, it seems that BDRV_O_CACHE_WB is always
enabled and makes 'flags' completely unusable for block drivers to get
the indications of user by specifying 'cache=' field.

So is there other means to allow block drivers to rely on, in order to
interpret the 'cache semantics'?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-04 Thread Liu Yuan
On 01/05/2013 01:29 PM, Liu Yuan wrote:
 On 01/05/2013 12:40 PM, Liu Yuan wrote:
 On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote:
 Hi Yuan,
 BDRV_O_NOCACHE means bypass host page cache (O_DIRECT).

 BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that
 means whether the disk cache is writethrough or writeback.

 In other words, BDRV_O_NOCACHE is a host performance tweak while
 BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState.  A
 protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB
 because it is implemented in block.c (see bdrv_co_do_writev() where QEMU
 will flush when after each write when !bs-enable_write_cache).

 Hi Stefan,

   Thanks for your explanation. But after more investigation, I find
 myself more confused:

   flags passed from block layer
   {writeback, writethrough}  0x2042
   {directsync, off, none}0x2062
   {unsafe}   0x2242

 So underlying driver like Sheepdog can't depend on 'flags' passed from
 .bdrv_file_open() to choose the right semantics (This was possible for
 old QEMU IIRC).

 If we can't rely on the 'flags' to get the cache indications of users,
 would you point me how to implement tristate cache control for network
 block driver like Sheepdog? For e.g, I want to implement following
 semantics:
 cache=writeback|none|off # enable writeback semantics for write
 cache=writethrough # enable writethrough semantics for write
 cache=directsync # disable cache completely

 Thanks,
 Yuan

 
 I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I
 expected. So I guess generic block layer got changed a bit and the
 'flags' meaning turned different than old code, which did indeed allow
 block drivers to interpret the 'flags' passed from bdrv_file_open().
 
 With the current upstream code, it seems that BDRV_O_CACHE_WB is always
 enabled and makes 'flags' completely unusable for block drivers to get
 the indications of user by specifying 'cache=' field.
 
 So is there other means to allow block drivers to rely on, in order to
 interpret the 'cache semantics'?
 

I found the commit:e1e9b0ac 'always open drivers in writeback mode'.
This is really undesired for network block drivers such as Sheepdog,
which implement its own cache mechanism that support
writeback/writethrough/directio behavior and then want to interpret
these flags on its own.

Is there any means for block drivers to get the semantics of 'cache=xxx'
from users?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2013-01-03 Thread Liu Yuan
On 12/25/2012 04:45 PM, Liu Yuan wrote:
 Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'.
 Is this a bug for current master? If no, my current scheme will be the
 only way to bypass cache of sheepdog.

Ping. Can anyone confirm it is a bug that 'cache=directsync' will pass
BDRV_O_CACHE_WB' in the flags?

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2012-12-25 Thread Liu Yuan
On 12/25/2012 03:47 PM, MORITA Kazutaka wrote:
 I wonder if we should disable cache when cache=none.  Many management
 frontend uses cache=none by default but, I think, users still expect
 that data is cached (e.g. by disk write cache when a raw format is
 used).  cache=none only means that the host page cache is not used for
 VM disk IO.
 
 In that sense,
 

I am fine to adopt option to this semantics.

  @@ -1118,12 +1118,19 @@ static int sd_open(BlockDriverState *bs, const 
  char *filename, int flags)
   goto out;
   }
   
  -s-cache_enabled = true;
  -s-flush_fd = connect_to_sdog(s-addr, s-port);
  -if (s-flush_fd  0) {
  -error_report(failed to connect);
  -ret = s-flush_fd;
  -goto out;
  +if (flags  BDRV_O_NOCACHE) {
  +s-cache_flags = SD_FLAG_CMD_DIRECT;
  +} else if (flags  BDRV_O_CACHE_WB) {
 'else' should be removed, and
 

Seems should not. We need this 'else if' to allow writethrough flag.

  +s-cache_flags = SD_FLAG_CMD_CACHE;
  +}
  +
  +if (s-cache_flags != SD_FLAG_CMD_DIRECT) {
 should be 's-cache_flags == SD_FLAG_CMD_CACHE'?  Do we need to send a
 flush request when cache=writethourgh?

Looks good to me. I'll change it at V2.

Thanks,
Yuan




Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2012-12-25 Thread Liu Yuan
On 12/25/2012 03:47 PM, MORITA Kazutaka wrote:
 I wonder if we should disable cache when cache=none.  Many management
 frontend uses cache=none by default but, I think, users still expect
 that data is cached (e.g. by disk write cache when a raw format is
 used).  cache=none only means that the host page cache is not used for
 VM disk IO.
 
 In that sense,

Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'.
Is this a bug for current master? If no, my current scheme will be the
only way to bypass cache of sheepdog.

Thanks,
Yuan



Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics

2012-12-24 Thread MORITA Kazutaka
At Thu, 20 Dec 2012 02:29:31 +0800,
Liu Yuan wrote:
 
 From: Liu Yuan tailai...@taobao.com
 
 Sheepdog supports both writeback/writethrough write but has not yet supported
 DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon 
 is
 set up with cache enabled.
 
 Suppose cache is enabled on Sheepdog daemon size, the new cache control is
 
 cache=writeback # enable the writeback semantics for write
 cache=writethrough # enable the writethrough semantics for write
 cache='directsync | none | off' # disable cache competely

I wonder if we should disable cache when cache=none.  Many management
frontend uses cache=none by default but, I think, users still expect
that data is cached (e.g. by disk write cache when a raw format is
used).  cache=none only means that the host page cache is not used for
VM disk IO.

In that sense,

 @@ -1118,12 +1118,19 @@ static int sd_open(BlockDriverState *bs, const char 
 *filename, int flags)
  goto out;
  }
  
 -s-cache_enabled = true;
 -s-flush_fd = connect_to_sdog(s-addr, s-port);
 -if (s-flush_fd  0) {
 -error_report(failed to connect);
 -ret = s-flush_fd;
 -goto out;
 +if (flags  BDRV_O_NOCACHE) {
 +s-cache_flags = SD_FLAG_CMD_DIRECT;
 +} else if (flags  BDRV_O_CACHE_WB) {

'else' should be removed, and

 +s-cache_flags = SD_FLAG_CMD_CACHE;
 +}
 +
 +if (s-cache_flags != SD_FLAG_CMD_DIRECT) {

should be 's-cache_flags == SD_FLAG_CMD_CACHE'?  Do we need to send a
flush request when cache=writethourgh?

Thanks,

Kazutaka