On 04/25/2016 06:19 AM, Alex Bligh wrote: > > On 23 Apr 2016, at 00:40, Eric Blake <[email protected]> wrote: > >> The upstream NBD Protocol has defined a new extension to allow >> the server to advertise block sizes to the client, as well as >> a way for the client to inform the server that it intends to >> obey block sizes. >> >> Pass any received sizes on to the block layer. >> >> Use the minimum block size as the sector size we pass to the >> kernel - which also has the nice effect of cooperating with >> (non-qemu) servers that don't do read-modify-write when exposing >> a block device with 4k sectors; it can also allow us to visit a >> file larger than 2T on a 32-bit kernel. >>
>> + be32_to_cpus(&info->max_block);
>> + TRACE("Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%"
>> PRIx32,
>> + info->min_block, info->opt_block, info->max_block);
>> + break;
>> +
>
> You should probably check min_block <= opt_block <= max_block here
opt_block > max_block is possible if max_block is clamped to export size
(in the degenerate case where you have a small export that is too small
for the granularity of a hole or efficient I/O). But yes, some sanity
checks that the server isn't horribly broken might be worthwhile.
>
> Also should there be a check that BDRV_SECTOR_SIZE >= min_block?
No, because we take the server's min_block and feed it into
bs->request_align (which forces the block layer to comply with a minimum
alignment larger than 512, using code already tested on physical block
drives with 4k sectors), see the hunk in nbd-client.c.
>> int nbd_init(int fd, QIOChannelSocket *sioc, NbdExportInfo *info)
>> {
>> - unsigned long sectors = info->size / BDRV_SECTOR_SIZE;
>> - if (info->size / BDRV_SECTOR_SIZE != sectors) {
>> + unsigned long sector_size = MAX(BDRV_SECTOR_SIZE, info->min_block);
>> + unsigned long sectors = info->size / sector_size;
>> + if (info->size / sector_size != sectors) {
>> LOG("Export size %" PRId64 " too large for 32-bit kernel",
>> info->size);
>> return -E2BIG;
>> }
>> @@ -724,18 +784,18 @@ int nbd_init(int fd, QIOChannelSocket *sioc,
>> NbdExportInfo *info)
>> return -serrno;
>> }
>>
>> - TRACE("Setting block size to %lu", (unsigned long)BDRV_SECTOR_SIZE);
>> + TRACE("Setting block size to %lu", sector_size);
>>
>> - if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
>> + if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) {
We also feed the maximum of 512 or the advertised minimum block size
into the kernel when using ioctl() for the kernel to take over
transmission phase; although I'm not certain whether the kernel obeys
NBD_SET_BLKSIZE as a hint rather than a hard rule - but if that needs
patching, it needs patching in the kernel implementation, not in qemu.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
