Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Laurent Vivier
Le vendredi 25 janvier 2008 à 20:52 +0100, Andre Przywara a écrit :
 Laurent Vivier wrote:
  Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit :
  Laurent Vivier wrote:
  Hi,
 
  this patch allows to mount qemu disk images on the host.

  
  Sorry, I didn't see you did a similar work 19 months ago.
  Note, the general problem with this approach is that mounting a NBD 
  device locally with write access can lead to dead locks.  If you look 
  through the mailing list archives, you'll find a number of conversations 
  on the topic.
 I sometimes ago was also working on a nbd implementation for 
 qcow-images, but I came to the same deadlock conclusion. (At least 
 theoretically, I didn't finish this as I ran first into debugging 
 problems and secondly out of time). But IMHO this only applies to 
 localhost mounts, real network mounting should work (this is actually 

As it seems a problem related to the page cache, perhaps we could open
the QCOW file with O_DIRECT to avoid the problem ?

Laurent
-- 
- [EMAIL PROTECTED]  --
  La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever. Saint Exupéry





Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Daniel P. Berrange
On Fri, Jan 25, 2008 at 02:27:34PM -0600, Anthony Liguori wrote:
 Andre Przywara wrote:
 Laurent Vivier wrote:
 
 What I'm wondering is how loop and device mapper can work ?
 I shortly evaluated the loop device idea, but came to the conclusion 
 that this not so easy to implement (and would require qcow code in the 
 kernel). I see only little chance for this go upstream in Linux and 
 maintaining this out-of-tree is actually a bad idea.
 
 I recently was poking around at the loop device and discovered that it 
 had a plugging xfer ops to allow for encrypted loop devices.  My initial 
 analysis was that by simply adding a couple of operations to that 
 structure (such as map_sector and get_size), you could very easily write 
 a kernel module that registered a set of xfer ops that implemented QCOW 
 support.

The loop device encryption stuff has long been deprecated in
favour of the device mapper crypt layer - dm-crypt  cryptsetup
command. The loop device is really not at all nice for write access
because it will cache data in memory arbitrarily leading to potentially
huge data loss upon crashes. This is why Xen stopped using loop device
and write blktap daemon - although that has its own set of problems,
at least it has data integrity. 

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Anthony Liguori

Andre Przywara wrote:

Laurent Vivier wrote:


What I'm wondering is how loop and device mapper can work ?
I shortly evaluated the loop device idea, but came to the conclusion 
that this not so easy to implement (and would require qcow code in the 
kernel). I see only little chance for this go upstream in Linux and 
maintaining this out-of-tree is actually a bad idea.


I recently was poking around at the loop device and discovered that it 
had a plugging xfer ops to allow for encrypted loop devices.  My initial 
analysis was that by simply adding a couple of operations to that 
structure (such as map_sector and get_size), you could very easily write 
a kernel module that registered a set of xfer ops that implemented QCOW 
support.


Of course, this would all be kernel code.  The best solution would be a 
proper userspace block device.  I think it's a pretty reasonable 
stop-gap though (that wouldn't be very difficult to get merged upstream).


If you think about deferring the qcow code into userland, you will 
sooner or later run into the same deadlock problems as the current 
solution (after all this is what nbd does...)


I have implemented a clean device-mapper solution, the big drawback is 
that it is read-only. It's a simple tool which converts the qcow map 
into a format suitable for dm-setup, to which the output can be 
directly piped to. I will clean up the code and send it to the list ASAP.


You could only do something read-only with device mapper.  dm-userspace 
was an effort to try and work around that with a userspace daemon but it 
didn't move upstream as quickly as we would have liked.


Regards,

Anthony Liguori

Read/write support is not that easy, but maybe someone can comment on 
this idea:
Create a sparse file on the host which is as large as the number of 
all still unallocated blocks. Assign these blocks via device mapper in 
addition to the already allocated ones. When unmounting the dm device, 
look for blocks which have been changed and allocate and write them 
into the qcow file. One could also use the bmap-ioctl to scan for 
non-sparse blocks.
This is a bit complicated, but should work cleanly (especially for the 
quick fsck or file editing case). If you find it worth, I could try to 
implement it.


Regards,
Andre.







Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Andre Przywara

Laurent Vivier wrote:

Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit :

Laurent Vivier wrote:

Hi,

this patch allows to mount qemu disk images on the host.
  


Sorry, I didn't see you did a similar work 19 months ago.
Note, the general problem with this approach is that mounting a NBD 
device locally with write access can lead to dead locks.  If you look 
through the mailing list archives, you'll find a number of conversations 
on the topic.
I sometimes ago was also working on a nbd implementation for 
qcow-images, but I came to the same deadlock conclusion. (At least 
theoretically, I didn't finish this as I ran first into debugging 
problems and secondly out of time). But IMHO this only applies to 
localhost mounts, real network mounting should work (this is actually 
not different from native nbd). Perhaps one could use a qemu instance 
for the server part ;-)
BTW: nbd-server should be quite portable, I once had it run on an 
ancient PA-RISC machine under HP-UX 10.20.



What I'm wondering is how loop and device mapper can work ?
I shortly evaluated the loop device idea, but came to the conclusion 
that this not so easy to implement (and would require qcow code in the 
kernel). I see only little chance for this go upstream in Linux and 
maintaining this out-of-tree is actually a bad idea.
If you think about deferring the qcow code into userland, you will 
sooner or later run into the same deadlock problems as the current 
solution (after all this is what nbd does...)


I have implemented a clean device-mapper solution, the big drawback is 
that it is read-only. It's a simple tool which converts the qcow map 
into a format suitable for dm-setup, to which the output can be directly 
piped to. I will clean up the code and send it to the list ASAP.
Read/write support is not that easy, but maybe someone can comment on 
this idea:
Create a sparse file on the host which is as large as the number of all 
still unallocated blocks. Assign these blocks via device mapper in 
addition to the already allocated ones. When unmounting the dm device, 
look for blocks which have been changed and allocate and write them into 
the qcow file. One could also use the bmap-ioctl to scan for non-sparse 
blocks.
This is a bit complicated, but should work cleanly (especially for the 
quick fsck or file editing case). If you find it worth, I could try to 
implement it.


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
to satisfy European Law for business letters:
AMD Saxony Limited Liability Company  Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy






Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Laurent Vivier
Le vendredi 25 janvier 2008 à 09:18 -0600, Anthony Liguori a écrit :
 Laurent Vivier wrote:
  Hi,
 
  this patch allows to mount qemu disk images on the host.


Sorry, I didn't see you did a similar work 19 months ago.

 Note, the general problem with this approach is that mounting a NBD 
 device locally with write access can lead to dead locks.  If you look 
 through the mailing list archives, you'll find a number of conversations 
 on the topic.

Yes, I experimented some problems with heavily loaded I/O (2 * dbench 64
on a 4 CPUs SMP)
But perhaps to edit config files or fsck partition of a virtual machine
it is acceptable.

What I'm wondering is how loop and device mapper can work ?

 Regards,
 
 Anthony Liguori

Thank you,
Laurent

  It is based on the Network Block Device protocol and allows qemu-img to
  become an NBD server (Yes, Anthony, userspace block device is the right
  way to do that... :-P ).
 
  Once you've applied the attached patch to Qemu and build the binaries,
  you can use it like that:
 
  # ./qemu-img server -d 1234 etch.qcow2
 
  This starts an NBD server on port 1234. This server will expose
  the disk image etch.qcow2. -d means it will be daemonize and will run
  in background.
 
  Then you need to connect the block device to the server:
 
  # nbd-client localhost 1234 /dev/nbd0
  Negotiation: ..size = 4194304KB
  bs=1024, sz=4194304
 
  This will link etch.qcow2 to /dev/nbd0.
 
  Then to see partitions, you can use kpartx, as explained Daniel, or my
  patched loop modules (I can send an updated and bug free version).
  ...
  # kpartx -a /dev/nbd0
  ...
  or
  ...
  # rmmod loop
  # insmod drivers/block/loop.ko max_part=64
  # losetup -f /dev/nbd0
  ...
  # mount /dev/loop0p1 /mnt
  # ls /mnt
  bench  cdrometc initrd.img  media  proc  selinux  tmp  vmlinuz
  binclients  homelib mntroot  srv  usr
  boot   dev  initrd  lost+found  optsbin  sys  var
  # cd
  # umount /mnt
  # losetup -d  /dev/loop0
  # nbd-client -d /dev/nbd0
 
  TODO: security/host client checking, device lock...
 
  As usual all comments are welcome,
  have fun,
  Laurent

 
 
 
 
-- 
- [EMAIL PROTECTED]  --
  La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever. Saint Exupéry





Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Anthony Liguori

Laurent Vivier wrote:

Hi,

this patch allows to mount qemu disk images on the host.
  


Note, the general problem with this approach is that mounting a NBD 
device locally with write access can lead to dead locks.  If you look 
through the mailing list archives, you'll find a number of conversations 
on the topic.


Regards,

Anthony Liguori


It is based on the Network Block Device protocol and allows qemu-img to
become an NBD server (Yes, Anthony, userspace block device is the right
way to do that... :-P ).

Once you've applied the attached patch to Qemu and build the binaries,
you can use it like that:

# ./qemu-img server -d 1234 etch.qcow2

This starts an NBD server on port 1234. This server will expose
the disk image etch.qcow2. -d means it will be daemonize and will run
in background.

Then you need to connect the block device to the server:

# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 4194304KB
bs=1024, sz=4194304

This will link etch.qcow2 to /dev/nbd0.

Then to see partitions, you can use kpartx, as explained Daniel, or my
patched loop modules (I can send an updated and bug free version).
...
# kpartx -a /dev/nbd0
...
or
...
# rmmod loop
# insmod drivers/block/loop.ko max_part=64
# losetup -f /dev/nbd0
...
# mount /dev/loop0p1 /mnt
# ls /mnt
bench  cdrometc initrd.img  media  proc  selinux  tmp  vmlinuz
binclients  homelib mntroot  srv  usr
boot   dev  initrd  lost+found  optsbin  sys  var
# cd
# umount /mnt
# losetup -d  /dev/loop0
# nbd-client -d /dev/nbd0

TODO: security/host client checking, device lock...

As usual all comments are welcome,
have fun,
Laurent
  






Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Alexander Graf


On Jan 25, 2008, at 1:58 PM, Laurent Vivier wrote:

Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a  
écrit :

Hi,

On Fri, 25 Jan 2008, Laurent Vivier wrote:


this patch allows to mount qemu disk images on the host.


This patch has an awful lot of #ifdef __linux__ in it.  But I  
imagine

that you could use it on a non-linux host, too, for example with yet
another qemu instance running Linux...  Or coLinux, if it supports  
network

block devices somehow.

I certainly saw nothing Linux-specific in the _code_...


Yes, but as I can't test this on something else than linux, I prefer  
to

disable this part to not break the existing and let competent people
(like you) doing the work (IMHO, I've introduced enough bugs into
Qemu...).


Please make it a seperate define and ifdef then. Something like

#ifdef __LINUX__
#define NBD_SERVER
#endif

#ifdef NBD_SERVER
...

This way it's a lot easier to distinguish between platform and feature  
specific code and if someone finds out it works for windows, it's only  
one line to change.


Regards,

Alex





Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Laurent Vivier
Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a écrit :
 Hi,
 
 On Fri, 25 Jan 2008, Laurent Vivier wrote:
 
  this patch allows to mount qemu disk images on the host.
 
 This patch has an awful lot of #ifdef __linux__ in it.  But I imagine 
 that you could use it on a non-linux host, too, for example with yet 
 another qemu instance running Linux...  Or coLinux, if it supports network 
 block devices somehow.
 
 I certainly saw nothing Linux-specific in the _code_...

Yes, but as I can't test this on something else than linux, I prefer to
disable this part to not break the existing and let competent people
(like you) doing the work (IMHO, I've introduced enough bugs into
Qemu...).

I think it should work on Windows, too (as nbd-server and qemu-img are
able to)

Regards,
Laurent
-- 
- [EMAIL PROTECTED]  --
  La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever. Saint Exupéry





Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Johannes Schindelin
Hi,

On Fri, 25 Jan 2008, Laurent Vivier wrote:

 this patch allows to mount qemu disk images on the host.

This patch has an awful lot of #ifdef __linux__ in it.  But I imagine 
that you could use it on a non-linux host, too, for example with yet 
another qemu instance running Linux...  Or coLinux, if it supports network 
block devices somehow.

I certainly saw nothing Linux-specific in the _code_...

Ciao,
Dscho





Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Laurent Vivier
Le vendredi 25 janvier 2008 à 14:37 +0100, Alexander Graf a écrit :
 On Jan 25, 2008, at 1:58 PM, Laurent Vivier wrote:
 
  Le vendredi 25 janvier 2008 à 12:48 +, Johannes Schindelin a  
  écrit :
  Hi,
 
  On Fri, 25 Jan 2008, Laurent Vivier wrote:
 
  this patch allows to mount qemu disk images on the host.
 
  This patch has an awful lot of #ifdef __linux__ in it.  But I  
  imagine
  that you could use it on a non-linux host, too, for example with yet
  another qemu instance running Linux...  Or coLinux, if it supports  
  network
  block devices somehow.
 
  I certainly saw nothing Linux-specific in the _code_...
 
  Yes, but as I can't test this on something else than linux, I prefer  
  to
  disable this part to not break the existing and let competent people
  (like you) doing the work (IMHO, I've introduced enough bugs into
  Qemu...).
 
 Please make it a seperate define and ifdef then. Something like
 
 #ifdef __LINUX__
 #define NBD_SERVER
 #endif
 
 #ifdef NBD_SERVER
 ...
 
 This way it's a lot easier to distinguish between platform and feature  
 specific code and if someone finds out it works for windows, it's only  
 one line to change.

I agree, new patch attached.

Laurent
-- 
- [EMAIL PROTECTED]  --
  La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever. Saint Exupéry
---
 qemu-img.c |  352 +
 1 file changed, 352 insertions(+)

Index: qemu/qemu-img.c
===
--- qemu.orig/qemu-img.c	2008-01-25 13:09:10.0 +0100
+++ qemu/qemu-img.c	2008-01-25 14:52:01.0 +0100
@@ -25,6 +25,16 @@
 #include block_int.h
 #include assert.h
 
+#ifdef __linux__
+#define NBD_SERVER
+#endif
+
+#ifdef NBD_SERVER
+#include arpa/inet.h
+#include netinet/tcp.h
+#include sys/wait.h
+#endif /* NBD_SERVER */
+
 #ifdef _WIN32
 #define WIN32_LEAN_AND_MEAN
 #include windows.h
@@ -92,6 +102,9 @@ static void help(void)
  commit [-f fmt] filename\n
  convert [-c] [-e] [-6] [-f fmt] filename [filename2 [...]] [-O output_fmt] output_filename\n
  info [-f fmt] filename\n
+#ifdef NBD_SERVER
+ server [-d] [-f fmt] port filename\n
+#endif
\n
Command parameters:\n
  'filename' is a disk image filename\n
@@ -105,6 +118,9 @@ static void help(void)
  '-c' indicates that target image must be compressed (qcow format only)\n
  '-e' indicates that the target image must be encrypted (qcow format only)\n
  '-6' indicates that the target image must use compatibility level 6 (vmdk format only)\n
+#ifdef NBD_SERVER
+ '-d' daemonize (server only)\n
+#endif
);
 printf(\nSupported format:);
 bdrv_iterate_format(format_print, NULL);
@@ -602,6 +618,338 @@ static int img_convert(int argc, char **
 return 0;
 }
 
+#ifdef NBD_SERVER
+
+//#define DEBUG_SERVER
+
+#ifdef DEBUG_SERVER
+#define DPRINTF(fmt, args...) \
+do { printf(img-server:  fmt , ##args); } while (0)
+#else
+#define DPRINTF(fmt, args...) do {} while(0)
+#endif
+
+# if __BYTE_ORDER == __BIG_ENDIAN
+# define htonll(x) (x)
+# define ntohll(x) (x)
+#else
+# define htonll(x) __bswap_64(x)
+# define ntohll(x) __bswap_64(x)
+#endif
+
+#define BUFSIZE (1024*1024)
+
+#define INIT_PASSWD NBDMAGIC
+
+#define NBD_REQUEST_MAGIC 0x25609513
+#define NBD_REPLY_MAGIC 0x67446698
+
+enum {
+NBD_CMD_READ = 0,
+NBD_CMD_WRITE = 1,
+NBD_CMD_DISC = 2
+};
+
+struct nbd_request {
+uint32_t magic;
+uint32_t type;
+char handle[8];
+uint64_t from;
+uint32_t len;
+} __attribute__ ((packed));
+
+struct nbd_reply {
+uint32_t magic;
+uint32_t error;
+char handle[8];
+} __attribute__ ((packed));
+
+static void sigchld_handler(int s)
+{
+int status;
+pid_t pid;
+
+pid = waitpid(-1, status, WNOHANG);
+if (WIFEXITED(status)) {
+DPRINTF(child %d exited\n, pid);
+}
+}
+
+static int nbd_receive(int fd, char *buf, size_t len)
+{
+ssize_t rd;
+
+while (len  0) {
+rd = read(fd, buf, len);
+if (rd  0)
+return -errno;
+len -= rd;
+buf += rd;
+}
+return 0;
+}
+
+static int nbd_send(int fd, char *buf, size_t len)
+{
+ssize_t written;
+
+while (len  0) {
+written = write(fd, buf, len);
+if (written  0)
+return -errno;
+len -= written;
+buf += written;
+}
+return 0;
+}
+
+static int negotiate(int sock, uint64_t size)
+{
+char zeros[128];
+uint64_t magic = htonll(0x00420281861253ULL);
+int ret;
+
+DPRINTF(negotiate(%d,%ld)\n, sock, size);
+
+memset(zeros, 0, sizeof(zeros));
+
+ret = nbd_send(sock, INIT_PASSWD, 8);
+if (ret  0)
+return -1;
+ret = nbd_send(sock, (char*)magic, sizeof(magic));
+if (ret  0)
+return -1;
+size = 

Re: [Qemu-devel] [PATCH][RFC] To mount qemu disk image on the host

2008-01-25 Thread Anthony Liguori

Laurent Vivier wrote:

Hi,

this patch allows to mount qemu disk images on the host.

It is based on the Network Block Device protocol and allows qemu-img to
become an NBD server (Yes, Anthony, userspace block device is the right
way to do that... :-P ).
  


FYI, I've been maintaining qemu-nbd out of tree for a while now.  
http://hg.codemonkey.ws/qemu-nbd


It also includes some nice features like read-only mount and exposing an 
individual partition.


Regards,

Anthony Liguori


Once you've applied the attached patch to Qemu and build the binaries,
you can use it like that:

# ./qemu-img server -d 1234 etch.qcow2

This starts an NBD server on port 1234. This server will expose
the disk image etch.qcow2. -d means it will be daemonize and will run
in background.

Then you need to connect the block device to the server:

# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 4194304KB
bs=1024, sz=4194304

This will link etch.qcow2 to /dev/nbd0.

Then to see partitions, you can use kpartx, as explained Daniel, or my
patched loop modules (I can send an updated and bug free version).
...
# kpartx -a /dev/nbd0
...
or
...
# rmmod loop
# insmod drivers/block/loop.ko max_part=64
# losetup -f /dev/nbd0
...
# mount /dev/loop0p1 /mnt
# ls /mnt
bench  cdrometc initrd.img  media  proc  selinux  tmp  vmlinuz
binclients  homelib mntroot  srv  usr
boot   dev  initrd  lost+found  optsbin  sys  var
# cd
# umount /mnt
# losetup -d  /dev/loop0
# nbd-client -d /dev/nbd0

TODO: security/host client checking, device lock...

As usual all comments are welcome,
have fun,
Laurent