Il 10/03/2012 19:02, Richard Laager ha scritto: > I propose adding the following behaviors in any event: > * If a QEMU block device reports a discard_granularity > 0, it > must be equal to 2^n (n >= 0), or QEMU's block core will change > it to 0. (Non-power-of-two granularities are not likely to exist > in the real world, and this assumption greatly simplifies > ensuring correctness.)
Yeah, I was considering this to be simply a bug in the block device. > * For SCSI, report an unmap_granularity to the guest as follows: > max(logical_block_size, discard_granularity) / logical_block_size This is more or less already in place later in the series. > As a design concept, instead of guaranteeing that 512B zero'ing discards > are supported, I think the QEMU block layer should instead guarantee > aligned discards to QEMU block devices, emulating any misaligned > discards (or portions thereof) by writing zeroes if (and only if) > discard_zeros_data is set. Yes, this can be done of course. This series does not include it yet. > This leaves one remaining issue: In raw-posix.c, for files (i.e. not > devices), I assume you're going to advertise discard_granularity=1 and > discard_zeros_data=1 when compiled with support for > fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually > guarantees that it zeros the data when punching holes. It does, that's pretty much the definition of a hole. > If the guest does a big discard (think mkfs) and fallocate() returns > EOPNOTSUPP, you'll have to zero essentially the whole virtual disk, > which, as you noted, will also allocate it (unless you explicitly check > for holes). This is bad. It can be avoided by not advertising > discard_zeros_data, but as you noted, that's unfortunate. If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also be done by skipping the zero write on known holes. This could even be done at the block layer level using bdrv_is_allocated. > If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid > advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not > going to work. This would side step these problems. ... and introduce others when migrating if your datacenter doesn't have homogeneous kernel versions and/or filesystems. :( > You said it wasn't > possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing > by extending the file by one byte and then punching that: > char buf = 0; > fstat(s->fd, &st); > pwrite(s->fd, &buf, 1, st.st_size + 1); > has_discard = !fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | > FALLOC_FL_KEEP_SIZE, > st.st_size + 1, 1); > ftruncate(s->fd, st.st_size); Nice trick. :) Yes, that could work. Do you know if non-Linux operating systems have something similar to BLKDISCARDZEROES? Paolo