Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Le vendredi 25 janvier 2008 à 20:52 +0100, Andre Przywara a écrit : Laurent Vivier wrote: Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit : Laurent Vivier wrote: Hi, this patch allows to mount qemu disk images on the host. Sorry, I didn't see you did a similar work 19 months ago. Note, the general problem with this approach is that mounting a NBD device locally with write access can lead to dead locks. If you look through the mailing list archives, you'll find a number of conversations on the topic. I sometimes ago was also working on a nbd implementation for qcow-images, but I came to the same deadlock conclusion. (At least theoretically, I didn't finish this as I ran first into debugging problems and secondly out of time). But IMHO this only applies to localhost mounts, real network mounting should work (this is actually As it seems a problem related to the page cache, perhaps we could open the QCOW file with O_DIRECT to avoid the problem ? Laurent -- - [EMAIL PROTECTED] -- La perfection est atteinte non quand il ne reste rien à ajouter mais quand il ne reste rien à enlever. Saint Exupéry
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
On Fri, Jan 25, 2008 at 02:27:34PM -0600, Anthony Liguori wrote: Andre Przywara wrote: Laurent Vivier wrote: What I'm wondering is how loop and device mapper can work ? I shortly evaluated the loop device idea, but came to the conclusion that this not so easy to implement (and would require qcow code in the kernel). I see only little chance for this go upstream in Linux and maintaining this out-of-tree is actually a bad idea. I recently was poking around at the loop device and discovered that it had a plugging xfer ops to allow for encrypted loop devices. My initial analysis was that by simply adding a couple of operations to that structure (such as map_sector and get_size), you could very easily write a kernel module that registered a set of xfer ops that implemented QCOW support. The loop device encryption stuff has long been deprecated in favour of the device mapper crypt layer - dm-crypt cryptsetup command. The loop device is really not at all nice for write access because it will cache data in memory arbitrarily leading to potentially huge data loss upon crashes. This is why Xen stopped using loop device and write blktap daemon - although that has its own set of problems, at least it has data integrity. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Andre Przywara wrote: Laurent Vivier wrote: What I'm wondering is how loop and device mapper can work ? I shortly evaluated the loop device idea, but came to the conclusion that this not so easy to implement (and would require qcow code in the kernel). I see only little chance for this go upstream in Linux and maintaining this out-of-tree is actually a bad idea. I recently was poking around at the loop device and discovered that it had a plugging xfer ops to allow for encrypted loop devices. My initial analysis was that by simply adding a couple of operations to that structure (such as map_sector and get_size), you could very easily write a kernel module that registered a set of xfer ops that implemented QCOW support. Of course, this would all be kernel code. The best solution would be a proper userspace block device. I think it's a pretty reasonable stop-gap though (that wouldn't be very difficult to get merged upstream). If you think about deferring the qcow code into userland, you will sooner or later run into the same deadlock problems as the current solution (after all this is what nbd does...) I have implemented a clean device-mapper solution, the big drawback is that it is read-only. It's a simple tool which converts the qcow map into a format suitable for dm-setup, to which the output can be directly piped to. I will clean up the code and send it to the list ASAP. You could only do something read-only with device mapper. dm-userspace was an effort to try and work around that with a userspace daemon but it didn't move upstream as quickly as we would have liked. Regards, Anthony Liguori Read/write support is not that easy, but maybe someone can comment on this idea: Create a sparse file on the host which is as large as the number of all still unallocated blocks. Assign these blocks via device mapper in addition to the already allocated ones. When unmounting the dm device, look for blocks which have been changed and allocate and write them into the qcow file. One could also use the bmap-ioctl to scan for non-sparse blocks. This is a bit complicated, but should work cleanly (especially for the quick fsck or file editing case). If you find it worth, I could try to implement it. Regards, Andre.
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Laurent Vivier wrote: Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit : Laurent Vivier wrote: Hi, this patch allows to mount qemu disk images on the host. Sorry, I didn't see you did a similar work 19 months ago. Note, the general problem with this approach is that mounting a NBD device locally with write access can lead to dead locks. If you look through the mailing list archives, you'll find a number of conversations on the topic. I sometimes ago was also working on a nbd implementation for qcow-images, but I came to the same deadlock conclusion. (At least theoretically, I didn't finish this as I ran first into debugging problems and secondly out of time). But IMHO this only applies to localhost mounts, real network mounting should work (this is actually not different from native nbd). Perhaps one could use a qemu instance for the server part ;-) BTW: nbd-server should be quite portable, I once had it run on an ancient PA-RISC machine under HP-UX 10.20. What I'm wondering is how loop and device mapper can work ? I shortly evaluated the loop device idea, but came to the conclusion that this not so easy to implement (and would require qcow code in the kernel). I see only little chance for this go upstream in Linux and maintaining this out-of-tree is actually a bad idea. If you think about deferring the qcow code into userland, you will sooner or later run into the same deadlock problems as the current solution (after all this is what nbd does...) I have implemented a clean device-mapper solution, the big drawback is that it is read-only. It's a simple tool which converts the qcow map into a format suitable for dm-setup, to which the output can be directly piped to. I will clean up the code and send it to the list ASAP. Read/write support is not that easy, but maybe someone can comment on this idea: Create a sparse file on the host which is as large as the number of all still unallocated blocks. Assign these blocks via device mapper in addition to the already allocated ones. When unmounting the dm device, look for blocks which have been changed and allocate and write them into the qcow file. One could also use the bmap-ioctl to scan for non-sparse blocks. This is a bit complicated, but should work cleanly (especially for the quick fsck or file editing case). If you find it worth, I could try to implement it. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 to satisfy European Law for business letters: AMD Saxony Limited Liability Company Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit : Laurent Vivier wrote: Hi, this patch allows to mount qemu disk images on the host. Sorry, I didn't see you did a similar work 19 months ago. Note, the general problem with this approach is that mounting a NBD device locally with write access can lead to dead locks. If you look through the mailing list archives, you'll find a number of conversations on the topic. Yes, I experimented some problems with heavily loaded I/O (2 * dbench 64 on a 4 CPUs SMP) But perhaps to edit config files or fsck partition of a virtual machine it is acceptable. What I'm wondering is how loop and device mapper can work ? Regards, Anthony Liguori Thank you, Laurent It is based on the Network Block Device protocol and allows qemu-img to become an NBD server (Yes, Anthony, userspace block device is the right way to do that... :-P ). Once you've applied the attached patch to Qemu and build the binaries, you can use it like that: # ./qemu-img server -d 1234 etch.qcow2 This starts an NBD server on port 1234. This server will expose the disk image etch.qcow2. -d means it will be daemonize and will run in background. Then you need to connect the block device to the server: # nbd-client localhost 1234 /dev/nbd0 Negotiation: ..size = 4194304KB bs=1024, sz=4194304 This will link etch.qcow2 to /dev/nbd0. Then to see partitions, you can use kpartx, as explained Daniel, or my patched loop modules (I can send an updated and bug free version). ... # kpartx -a /dev/nbd0 ... or ... # rmmod loop # insmod drivers/block/loop.ko max_part=64 # losetup -f /dev/nbd0 ... # mount /dev/loop0p1 /mnt # ls /mnt bench cdrometc initrd.img media proc selinux tmp vmlinuz binclients homelib mntroot srv usr boot dev initrd lost+found optsbin sys var # cd # umount /mnt # losetup -d /dev/loop0 # nbd-client -d /dev/nbd0 TODO: security/host client checking, device lock... As usual all comments are welcome, have fun, Laurent -- - [EMAIL PROTECTED] -- La perfection est atteinte non quand il ne reste rien à ajouter mais quand il ne reste rien à enlever. Saint Exupéry
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Laurent Vivier wrote: Hi, this patch allows to mount qemu disk images on the host. Note, the general problem with this approach is that mounting a NBD device locally with write access can lead to dead locks. If you look through the mailing list archives, you'll find a number of conversations on the topic. Regards, Anthony Liguori It is based on the Network Block Device protocol and allows qemu-img to become an NBD server (Yes, Anthony, userspace block device is the right way to do that... :-P ). Once you've applied the attached patch to Qemu and build the binaries, you can use it like that: # ./qemu-img server -d 1234 etch.qcow2 This starts an NBD server on port 1234. This server will expose the disk image etch.qcow2. -d means it will be daemonize and will run in background. Then you need to connect the block device to the server: # nbd-client localhost 1234 /dev/nbd0 Negotiation: ..size = 4194304KB bs=1024, sz=4194304 This will link etch.qcow2 to /dev/nbd0. Then to see partitions, you can use kpartx, as explained Daniel, or my patched loop modules (I can send an updated and bug free version). ... # kpartx -a /dev/nbd0 ... or ... # rmmod loop # insmod drivers/block/loop.ko max_part=64 # losetup -f /dev/nbd0 ... # mount /dev/loop0p1 /mnt # ls /mnt bench cdrometc initrd.img media proc selinux tmp vmlinuz binclients homelib mntroot srv usr boot dev initrd lost+found optsbin sys var # cd # umount /mnt # losetup -d /dev/loop0 # nbd-client -d /dev/nbd0 TODO: security/host client checking, device lock... As usual all comments are welcome, have fun, Laurent
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
On Jan 25, 2008, at 1:58 PM, Laurent Vivier wrote: Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a écrit : Hi, On Fri, 25 Jan 2008, Laurent Vivier wrote: this patch allows to mount qemu disk images on the host. This patch has an awful lot of #ifdef __linux__ in it. But I imagine that you could use it on a non-linux host, too, for example with yet another qemu instance running Linux... Or coLinux, if it supports network block devices somehow. I certainly saw nothing Linux-specific in the _code_... Yes, but as I can't test this on something else than linux, I prefer to disable this part to not break the existing and let competent people (like you) doing the work (IMHO, I've introduced enough bugs into Qemu...). Please make it a seperate define and ifdef then. Something like #ifdef __LINUX__ #define NBD_SERVER #endif #ifdef NBD_SERVER ... This way it's a lot easier to distinguish between platform and feature specific code and if someone finds out it works for windows, it's only one line to change. Regards, Alex
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a écrit : Hi, On Fri, 25 Jan 2008, Laurent Vivier wrote: this patch allows to mount qemu disk images on the host. This patch has an awful lot of #ifdef __linux__ in it. But I imagine that you could use it on a non-linux host, too, for example with yet another qemu instance running Linux... Or coLinux, if it supports network block devices somehow. I certainly saw nothing Linux-specific in the _code_... Yes, but as I can't test this on something else than linux, I prefer to disable this part to not break the existing and let competent people (like you) doing the work (IMHO, I've introduced enough bugs into Qemu...). I think it should work on Windows, too (as nbd-server and qemu-img are able to) Regards, Laurent -- - [EMAIL PROTECTED] -- La perfection est atteinte non quand il ne reste rien à ajouter mais quand il ne reste rien à enlever. Saint Exupéry
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Hi, On Fri, 25 Jan 2008, Laurent Vivier wrote: this patch allows to mount qemu disk images on the host. This patch has an awful lot of #ifdef __linux__ in it. But I imagine that you could use it on a non-linux host, too, for example with yet another qemu instance running Linux... Or coLinux, if it supports network block devices somehow. I certainly saw nothing Linux-specific in the _code_... Ciao, Dscho
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Le vendredi 25 janvier 2008 à 14:37 +0100, Alexander Graf a écrit : On Jan 25, 2008, at 1:58 PM, Laurent Vivier wrote: Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a écrit : Hi, On Fri, 25 Jan 2008, Laurent Vivier wrote: this patch allows to mount qemu disk images on the host. This patch has an awful lot of #ifdef __linux__ in it. But I imagine that you could use it on a non-linux host, too, for example with yet another qemu instance running Linux... Or coLinux, if it supports network block devices somehow. I certainly saw nothing Linux-specific in the _code_... Yes, but as I can't test this on something else than linux, I prefer to disable this part to not break the existing and let competent people (like you) doing the work (IMHO, I've introduced enough bugs into Qemu...). Please make it a seperate define and ifdef then. Something like #ifdef __LINUX__ #define NBD_SERVER #endif #ifdef NBD_SERVER ... This way it's a lot easier to distinguish between platform and feature specific code and if someone finds out it works for windows, it's only one line to change. I agree, new patch attached. Laurent -- - [EMAIL PROTECTED] -- La perfection est atteinte non quand il ne reste rien à ajouter mais quand il ne reste rien à enlever. Saint Exupéry --- qemu-img.c | 352 + 1 file changed, 352 insertions(+) Index: qemu/qemu-img.c === --- qemu.orig/qemu-img.c 2008-01-25 13:09:10.0 +0100 +++ qemu/qemu-img.c 2008-01-25 14:52:01.0 +0100 @@ -25,6 +25,16 @@ #include block_int.h #include assert.h +#ifdef __linux__ +#define NBD_SERVER +#endif + +#ifdef NBD_SERVER +#include arpa/inet.h +#include netinet/tcp.h +#include sys/wait.h +#endif /* NBD_SERVER */ + #ifdef _WIN32 #define WIN32_LEAN_AND_MEAN #include windows.h @@ -92,6 +102,9 @@ static void help(void) commit [-f fmt] filename\n convert [-c] [-e] [-6] [-f fmt] filename [filename2 [...]] [-O output_fmt] output_filename\n info [-f fmt] filename\n +#ifdef NBD_SERVER + server [-d] [-f fmt] port filename\n +#endif \n Command parameters:\n 'filename' is a disk image filename\n @@ -105,6 +118,9 @@ static void help(void) '-c' indicates that target image must be compressed (qcow format only)\n '-e' indicates that the target image must be encrypted (qcow format only)\n '-6' indicates that the target image must use compatibility level 6 (vmdk format only)\n +#ifdef NBD_SERVER + '-d' daemonize (server only)\n +#endif ); printf(\nSupported format:); bdrv_iterate_format(format_print, NULL); @@ -602,6 +618,338 @@ static int img_convert(int argc, char ** return 0; } +#ifdef NBD_SERVER + +//#define DEBUG_SERVER + +#ifdef DEBUG_SERVER +#define DPRINTF(fmt, args...) \ +do { printf(img-server: fmt , ##args); } while (0) +#else +#define DPRINTF(fmt, args...) do {} while(0) +#endif + +# if __BYTE_ORDER == __BIG_ENDIAN +# define htonll(x) (x) +# define ntohll(x) (x) +#else +# define htonll(x) __bswap_64(x) +# define ntohll(x) __bswap_64(x) +#endif + +#define BUFSIZE (1024*1024) + +#define INIT_PASSWD NBDMAGIC + +#define NBD_REQUEST_MAGIC 0x25609513 +#define NBD_REPLY_MAGIC 0x67446698 + +enum { +NBD_CMD_READ = 0, +NBD_CMD_WRITE = 1, +NBD_CMD_DISC = 2 +}; + +struct nbd_request { +uint32_t magic; +uint32_t type; +char handle[8]; +uint64_t from; +uint32_t len; +} __attribute__ ((packed)); + +struct nbd_reply { +uint32_t magic; +uint32_t error; +char handle[8]; +} __attribute__ ((packed)); + +static void sigchld_handler(int s) +{ +int status; +pid_t pid; + +pid = waitpid(-1, status, WNOHANG); +if (WIFEXITED(status)) { +DPRINTF(child %d exited\n, pid); +} +} + +static int nbd_receive(int fd, char *buf, size_t len) +{ +ssize_t rd; + +while (len 0) { +rd = read(fd, buf, len); +if (rd 0) +return -errno; +len -= rd; +buf += rd; +} +return 0; +} + +static int nbd_send(int fd, char *buf, size_t len) +{ +ssize_t written; + +while (len 0) { +written = write(fd, buf, len); +if (written 0) +return -errno; +len -= written; +buf += written; +} +return 0; +} + +static int negotiate(int sock, uint64_t size) +{ +char zeros[128]; +uint64_t magic = htonll(0x00420281861253ULL); +int ret; + +DPRINTF(negotiate(%d,%ld)\n, sock, size); + +memset(zeros, 0, sizeof(zeros)); + +ret = nbd_send(sock, INIT_PASSWD, 8); +if (ret 0) +return -1; +ret = nbd_send(sock, (char*)magic, sizeof(magic)); +if (ret 0) +return -1; +size =
Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host
Laurent Vivier wrote: Hi, this patch allows to mount qemu disk images on the host. It is based on the Network Block Device protocol and allows qemu-img to become an NBD server (Yes, Anthony, userspace block device is the right way to do that... :-P ). FYI, I've been maintaining qemu-nbd out of tree for a while now. http://hg.codemonkey.ws/qemu-nbd It also includes some nice features like read-only mount and exposing an individual partition. Regards, Anthony Liguori Once you've applied the attached patch to Qemu and build the binaries, you can use it like that: # ./qemu-img server -d 1234 etch.qcow2 This starts an NBD server on port 1234. This server will expose the disk image etch.qcow2. -d means it will be daemonize and will run in background. Then you need to connect the block device to the server: # nbd-client localhost 1234 /dev/nbd0 Negotiation: ..size = 4194304KB bs=1024, sz=4194304 This will link etch.qcow2 to /dev/nbd0. Then to see partitions, you can use kpartx, as explained Daniel, or my patched loop modules (I can send an updated and bug free version). ... # kpartx -a /dev/nbd0 ... or ... # rmmod loop # insmod drivers/block/loop.ko max_part=64 # losetup -f /dev/nbd0 ... # mount /dev/loop0p1 /mnt # ls /mnt bench cdrometc initrd.img media proc selinux tmp vmlinuz binclients homelib mntroot srv usr boot dev initrd lost+found optsbin sys var # cd # umount /mnt # losetup -d /dev/loop0 # nbd-client -d /dev/nbd0 TODO: security/host client checking, device lock... As usual all comments are welcome, have fun, Laurent