Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell wrote: I don't think it'll be that bad; reset clears the device to unknown, bar0 moves it from unknown-legacy mode, bar1/2/3 changes it from unknown-modern mode, and anything else is bad (I prefer being strict so we catch bad implementations from the beginning). Will that work, if the guest with kernel that uses modern mode, kexecs to an older (but presumed reliable) kernel that only knows about legacy mode? I.e. will the replacement kernel, or (ideally) replacement driver on the rare occasion that is needed on a running kernel, be able to reset the device hard enough? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH
Rusty Russell wrote: On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote: Jens Axboe wrote: On Tue, May 04 2010, Rusty Russell wrote: ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... It would be nice to have a more fuller API for this, but the reality is that only the flush approach is really workable. Even just strict ordering of requests could only be supported on SCSI, and even there the kernel still lacks proper guarantees on error handling to prevent reordering there. There's a few I/O scheduling differences that might be useful: 1. The I/O scheduler could freely move WRITEs before a FLUSH but not before a BARRIER. That might be useful for time-critical WRITEs, and those issued by high I/O priority. This is only because noone actually wants flushes or barriers, though I/O people seem to only offer that. We really want these writes must occur before this write. That offers maximum choice to the I/O subsystem and potentially to smart (virtual?) disks. We do want flushes for the D in ACID - such things as after receiving a mail, or blog update into a database file (could be TDB), and confirming that to the sender, to have high confidence that the update won't disappear on system crash or power failure. Less obviously, it's also needed for the C in ACID when more than one file is involved. C is about differently updated things staying consistent with each other. For example, imagine you have a TDB file mapping Samba usernames to passwords, and another mapping Samba usernames to local usernames. (I don't know if you do this; it's just an illustration). To rename a Samba user involves updating both. Let's ignore transient transactional issues :-) and just think about what happens with per-file barriers and no sync, when a crash happens long after the updates, and before the system has written out all data and issued low level cache flushes. After restarting, due to lack of sync, the Samba username could be present in one file and not the other. 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is only for data belonging to a particular file (e.g. fdatasync with no file size change, even on btrfs if O_DIRECT was used for the writes being committed). That would entail tagging FLUSHes and WRITEs with a fs-specific identifier (such as inode number), opaque to the scheduler which only checks equality. This is closer. In userspace I'd be happy with a all prior writes to this struct file before all future writes. Even if the original guarantees were stronger (ie. inode basis). We currently implement transactions using 4 fsync /msync pairs. write_recovery_data(fd); fsync(fd); msync(mmap); write_recovery_header(fd); fsync(fd); msync(mmap); overwrite_with_new_data(fd); fsync(fd); msync(mmap); remove_recovery_header(fd); fsync(fd); msync(mmap); Yet we really only need ordering, not guarantees about it actually hitting disk before returning. In other words, FLUSH can be more relaxed than BARRIER inside the kernel. It's ironic that we think of fsync as stronger than fbarrier outside the kernel :-) It's an implementation detail; barrier has less flexibility because it has less information about what is required. I'm saying I want to give you as much information as I can, even if you don't use it yet. I agree, and I've started a few threads about it over the last couple of years. An fsync_range() system call would be very easy to use and (most importantly) easy to understand. With optional flags to weaken it (into fdatasync, barrier without sync, sync without barrier, one-sided barrier, no lowlevel cache-flush, don't rush, etc.), it would be very versatile, and still easy to understand. With an AIO version, and another flag meaning don't rush, just return when satisfied, and I suspect it would be useful for the most demanding I/O apps. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH
Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. One issue I struggled with especially is how type field mixes bits and non-bit values. I ended up simply defining all legal values, so that we have CMD = 2, CMD_OUT = 3 and so on. I also avoided instroducing inhdr/outhdr structures that virtio blk driver in linux uses, I was concerned that nesting tables will confuse the reader. Comments welcome. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/virtio-spec.lyx b/virtio-spec.lyx index d16104a..ed35893 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -67,7 +67,11 @@ IBM Corporation \end_layout \begin_layout Standard + +\change_deleted 0 1266531118 FIXME: virtio block scsi passthrough section +\change_unchanged + \end_layout \begin_layout Standard @@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac { The device can filter incoming packets by any number of destination MAC addresses. \begin_inset Foot -status open +status collapsed \begin_layout Plain Layout Since there are no guarentees, it can use a hash filter orsilently switch @@ -4549,6 +4553,22 @@ blk_size \end_inset . +\change_inserted 0 1266444580 + +\end_layout + +\begin_layout Description + +\change_inserted 0 1266471229 +VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands. +\end_layout + +\begin_layout Description + +\change_inserted 0 1266444605 +VIRTIO_BLK_F_FLUSH (9) Cache flush command support. +\change_unchanged + \end_layout \begin_layout Description @@ -4700,17 +4720,25 @@ struct virtio_blk_req { \begin_layout Plain Layout +\change_deleted 0 1266472188 + #define VIRTIO_BLK_T_IN 0 \end_layout \begin_layout Plain Layout +\change_deleted 0 1266472188 + #define VIRTIO_BLK_T_OUT 1 \end_layout \begin_layout Plain Layout +\change_deleted 0 1266472188 + #define VIRTIO_BLK_T_BARRIER 0x8000 +\change_unchanged + \end_layout \begin_layout Plain Layout @@ -4735,11 +4763,15 @@ struct virtio_blk_req { \begin_layout Plain Layout +\change_deleted 0 1266472204 + #define VIRTIO_BLK_S_OK0 \end_layout \begin_layout Plain Layout +\change_deleted 0 1266472204 + #define VIRTIO_BLK_S_IOERR 1 \end_layout @@ -4759,32 +4791,481 @@ struct virtio_blk_req { \end_layout \begin_layout Standard -The type of the request is either a read (VIRTIO_BLK_T_IN) or a write (VIRTIO_BL -K_T_OUT); the high bit indicates that this request acts as a barrier and - that all preceeding requests must be complete before this one, and all - following requests must not be started until this is complete. + +\change_inserted 0 1266472490 +If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet + command requests, each of these requests is of form: +\begin_inset listings +inline false +status open + +\begin_layout Plain Layout + +\change_inserted 0 1266472395 + +struct virtio_scsi_pc_req { +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472375 + + u32 type; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472375 + + u32 ioprio; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266474298 + + u64 sector; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266474308 + +char cmd[]; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266505809 + + char data[][512]; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266505825 + +#define SCSI_SENSE_BUFFERSIZE 96 +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266505848 + +u8 sense[SCSI_SENSE_BUFFERSIZE]; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472969 + +u32 errors; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472979 + +u32 data_len; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472984 + +u32 sense_len; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472987 + +u32 residual; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472375 + + u8 status; +\end_layout + +\begin_layout Plain Layout + +\change_inserted 0 1266472375 + +}; +\end_layout + +\end_inset + + +\change_unchanged + \end_layout \begin_layout Standard -The ioprio field is a hint about the relative priorities of requests to - the device: higher numbers indicate more important requests. +The +\emph on +type +\emph default + of the request is either a read (VIRTIO_BLK_T_IN) +\change_inserted 0 1266495815 +, +\change_unchanged +
Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
Avi Kivity wrote: On 01/05/2010 02:56 PM, Rusty Russell wrote: Those should be the same for any sane interface. They are for classical disk devices with larger block sizes (MO, s390 dasd) and also for the now appearing 4k sector scsi disks. But in the ide world people are concerned about dos/window legacy compatiblity so they came up with a nasty hack: - there is a physical block size as used by the disk internally (4k initially) - all the interfaces to the operating system still happen in the traditional 512 byte blocks to not break any existing assumptions - to make sure modern operating systems can optimize for the larger physical sectors the disks expose this size, too. - even worse disks can also have alignment hacks for the traditional DOS partitions tables, so that the 512 byte block zero might even have an offset into the first larger physical block. This is also exposed in the ATA identify information. All in all I don't think this mess is a good idea to replicate in virtio. Virtio by defintion requires virtualization aware guests, so we should just follow the SCSI way of larger real block sizes here. Yes. The current VIRTIO_BLK_F_BLK_SIZE says please use this block size. We haven't actually specified what happens if the guest doesn't, but the spec says must, and the Linux implementation does so AFAICT. If we want a soft size, we could add that as a separate feature. No - I agree with Christoph, there's no reason to use a 512/4096 monstrosity with virtio. It would be good if virtio relayed the backing device's basic topology hints, so: - If the backing dev is a real disk with 512-byte sectors, virtio should indicate 512-byte blocks to the guest. - If the backing dev is a real disk with 4096-byte sectors, virtio should indicate 4096-byte blocks to the guest. With databases and filesystems, if you care about data integrity: - If the backing dev is a real disk with 4096-byte sectors, or a file whose access is through a 4096-byte-per-page cache, virtio must indicate 4096-byte blocks otherwise guest journalling is not host-powerfail safe. You get the idea. If there is only one parameter, it really should be at least as large as the smallest unit which may be corrupted by writes when errors occur. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
Avi Kivity wrote: Physical block size is the what the logical block size would have been is software didn't suck. In theory they should be the same, but since compatibility reaons clamp the logical block size to 512, they have to differ. A disk may have a physical block size of 4096 and emulate logical block size of 512 on top of that using read-modify-write. Or so I understand it. I think that's right, but a side effect is that if you get a power failure during the read-modify-write, bytes anywhere in 4096 sector may be incorrect, so journalling (etc.) needs to use 4096 byte blocks for data integrity, even though the drive emulates smaller writes. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver (V2)
Anthony Liguori wrote: Rusty Russell wrote: The little-endian conversion of the balloon driver is a historical mistake (no other driver does this). Let's not extend it to the stats. I think the mistake is that the other drivers don't do that. We cheat in qemu and assume that the guest is always in a fixed endianness but this is not always the case for all architectures. If guests can have different endianness (reasonable on some CPUs where it's switchable - some even have more than 2 options), then I guess the *host* on those systems have different endianness too. Is the host's endianness signalled to the guest anywhere, so that guest drivers can do cpu_to_qemuhost32(), when someone eventually finds that necessary? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver
Anthony Liguori wrote: Avi Kivity wrote: On 11/10/2009 04:36 PM, Anthony Liguori wrote: A stats vq might solve this more cleanly? actual and target are both really just stats. Had we implemented those with a vq, I'd be inclined to agree with you but since they're implemented in the config space, it seems natural to extend the config space with other stats. There is in fact a difference; actual and target are very rarely updated, while the stats are updated very often. Using a vq means a constant number of exits per batch instead of one exit per statistic. If the vq is host-driven, it also allows the host to control the update frequency dynamically (i.e. stop polling when there is no memory pressure). I'm not terribly opposed to using a vq for this. I would expect the stat update interval to be rather long (10s probably) but a vq works just as well. If there's no memory pressure and no guest activity, you probably want the stat update to be as rare as possible to avoid wakeups. Save power on laptops, that sort of thing. If there's a host user interested in the state (qemutop?), you may want updates more often than 10s. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: Extending virtio_console to support multiple ports
Alan Cox wrote: - Then, are we certain that there's no case where the tty layer will call us with some lock held or in an atomic context ? To be honest, I've totally lost track of the locking rules in tty land lately so it might well be ok, but something to verify. Some of the less well behaved line disciplines do this and always have done. I had a backtrace in my kernel log recently which looked like that, while doing PPP over Bluetooth RFCOMM. Resulted in AppArmor complaining that it's hook was being called in irq context. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication
Amit Shah wrote: I think strings are better as numbers for identifying protocols as you can work without a central registry for the numbers then. I like the way assigned numbers work: it's simpler to code, needs a bitmap for all the ports that fits in nicely in the config space and udev rules / scripts can point /dev/vmch02 to /dev/console. How would a third party go about assigning themselves a number? For the sake of example, imagine they develop a simple service like guesttop which let's the host get a listing of guest processes. They'll have to distributed app-specific udev rule patches for every guest distro, which sounds like a lot of work. The app itself is probably a very simple C program; the hardest part of making it portable across distros would be the udev rules, which is silly. Anyway, every other device has a name or uuid these days. You can still use /dev/sda1 to refer to your boot partition, but LABEL=boot is also available if you prefer. Isn't that the ethos these days? Why not both? /dev/vmch05 if you prefer, plus symlink /dev/vmch-guesttop - /dev/vmch05 if name=guesttop was given to QEMU. If you do stay with numbers only, note that it's not like TCP/UDP port numbers because the number space is far smaller. Picking a random number that you hope nobody else uses is harder. ... Back to technical bits. If config space is tight, use a channel! Dedicate channel 0 to control, used to fetch the name (if there is one) for each number. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication
Amit Shah wrote: On (Thu) Aug 06 2009 [08:58:01], Anthony Liguori wrote: Amit Shah wrote: On (Thu) Aug 06 2009 [08:29:40], Anthony Liguori wrote: Amit Shah wrote: Sure; but there's been no resistance from anyone from including the virtio-serial device driver so maybe we don't need to discuss that. There certainly is from me. The userspace interface is not reasonable for guest applications to use. One example that would readily come to mind is dbus. A daemon running on the guest that reads data off the port and interacts with the desktop by appropriate dbus commands. All that's needed is a stream of bytes and virtio-serial provides just that. dbus runs as an unprivileged user, how does dbus know which virtio-serial port to open and who sets the permissions on that port? The permission part can be handled by package maintainers and sysadmins via udev policies. So all data destined for dbus consumption gets to a daemon and that daemon then sends it over to dbus. virtio-serial is nice, easy, simple and versatile. We like that; it should stay that way. dbus isn't a good match for this. dbus is not intended for communication between hosts, by design. It depends on per-app configuration files in /etc/dbus/{session,system}.d/, which are expected to match the installed services. For this, the guest's files in /etc/dbus/ would have to match the QEMU host host services in detail. dbus doesn't have a good mechanism for copying with version skew between both of them, because normally everything resides on the same machine and the config and service are updated at the same time. This is hard to guarantee with a VM. Apart from dbus, hard-coded meanings of small N in /dev/vmchN are asking for trouble. It is bound to break when widely deployed and guest/host configs don't match. It also doesn't fit comfortably when you have, say, bob and alice both logged in with desktops on separate VTs. Clashes are inevitable, as third-party apps pick N values for themselves then get distributed - unless N values can be large (/dev/vmch44324 == kernelstats...). Sysadmins shouldn't have to hand-configure each app, and shouldn't have to repair clashes in defaults. Just Work is better. virtio-serial is nice. The only ugly part is _finding_ the right /dev/vmchN. Fortunately, _any_ out-of-band id string or id number makes it perfect. An option to specify PCI vendor/product ids in the QEMU host configuration is good enough. An option to specify one or more id strings is nicer. Finally, Anthony hit on an interesting idea with USB. Emulating USB sucks. But USB's _descriptors_ are quite effective, and the USB basic protocol is quite reasonable too. Descriptors are just a binary blob in a particular format, which describe a device and also say what it supports, and what standard interfaces can be used with it too. Bluetooth is similar; they might even use the same byte format, I'm not sure. All the code for parsing USB descriptors is already present in guest kernels, and the code for making appropriate device nodes and launching apps is already in udev. libusb also allows devices to be used without a kernel driver, and is cross-platform. There are plenty of examples of creating USB descriptors in QEMU, and may be the code can be reused. The only down side of USB is that emulating it sucks :-) That's mainly due to the host controllers, and the way interrupts use polling. So here's a couple of ideas: - virtio-usb, using virtio instead of a hardware USB host controller. That would provide all the features of USB naturally, like hotplug, device binding, access from userspace, but with high performance, low overhead, and no interrupt polling. You'd even have the option of cross-platform guest apps, as well as working on all Linux versions, by emulating a host controller when the guest doesn't have virtio-usb. As a bonus, existing USB support would be accelerated. - virtio-serial providing a binary id blob, whose format is the same as USB descriptors. Reuse the guest's USB parsing and binding to find and identify, but the actual device functionality would just be a byte pipe. That might be simple, as all it involves is a blob passed to the guest from QEMU. QEMU would build the id blob, maybe reusing existing USB code, and the guest would parse the blob as it already does for USB devices, with udev creating devices as it already does. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication
Anthony Liguori wrote: Richard W.M. Jones wrote: Have you considered using a usb serial device? Something attractive about it is that a productid/vendorid can be specified which means that you can use that as a method of enumerating devices. Hot add/remove is supported automagically. The same applies to PCI: productid/vendorid (and subids); PCI hotplug is possible though not as native as USB. Here's another idea: Many devices these days have a serial number or id string. E.g. USB storage, ATA drives, media cards, etc. Linux these days creates alias device nodes which include the id string in the device name. E.g. /dev/disks/by-id/ata-FUJITSU_MHV2100BH_NWAQT662615H So in addition to (or instead of) /dev/vmch0, /dev/vmch1 etc., Linux guests could easily generate: /dev/vmchannel/by-role/clipboard-0 /dev/vmchannel/by-role/gueststats-0 /dev/vmchannel/by-role/vmmanager-0 It's not necessary to do this at the beginning. All that is needed is to provide enough id information that will appear in /sys/..., so that that a udev policy for naming devices can be created at some later date. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication
Gleb Natapov wrote: On Wed, Jul 29, 2009 at 01:14:18PM +0530, Amit Shah wrote: But why do we want to limit the device to only one port? It's not too complex supporting additional ones. As I see it qemu and the kernel should provide the basic abstraction for the userspace to go do its job. Why create unnecessary barriers? I agree. If userspace wants it may use only one channel and demultiplex messages by itself, but we shouldn't force it to. Also one of the requirements for virtio-serial is to have connect disconnect notifications. It is not possible with demultiplexing in the userspace. I agree too, for all those reasons. However it would be useful if the devices provided a simpler way to be found by guest applications than /dev/vmch0, vmch1, vmch2... On Linux udev provides a sane way to find devices according to roles, subtypes, serial numbers, whatever you want, if the appropriate id codes are available from the devices and put into /sys/* by the kernel driver. That would make the devices much more useful to independent applications, imho. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication
Daniel P. Berrange wrote: I expect the first problem you'll run into is that copy/paste daemon has to run as an unprivileged user but /dev/vmch3 is going to be owned by root. You could set udev rules for /dev/vmch3 but that's pretty terrible IMHO. I don't think that's not too bad, for example, with fast-user-switching between multiple X servers and/or text consoles, there's already support code that deals with chown'ing things like /dev/snd/* devices to match the active console session. Doing the same with the /dev/vmch3 device so that it is only ever accessible to the current logged in user actually fits in to that scheme quite well. With multiple X servers, there can be more than one currently logged in user. Same with multiple text consoles - that's more familiar. Which one owns /dev/vmch3? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication
Amit Shah wrote: A few sample uses for a vmchannel are to share the host and guest clipboards (to allow copy/paste between a host and a guest), to lock the screen of the guest session when the vnc viewer is closed, to find out which applications are installed on a guest OS even when the guest is powered down (using virt-inspector) and so on. Those all look like useful features. Can you run an application to provide those features on a guest which _doesn't_ have a vmchannel/virtio-serial support in the kernel? Or will it be restricted only to guests which have QEMU-specific support in their kernel? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication
Amit Shah wrote: On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: Amit Shah wrote: A few sample uses for a vmchannel are to share the host and guest clipboards (to allow copy/paste between a host and a guest), to lock the screen of the guest session when the vnc viewer is closed, to find out which applications are installed on a guest OS even when the guest is powered down (using virt-inspector) and so on. Those all look like useful features. Can you run an application to provide those features on a guest which _doesn't_ have a vmchannel/virtio-serial support in the kernel? Or will it be restricted only to guests which have QEMU-specific support in their kernel? libguestfs currently uses the -net user based vmchannel interface that exists in current qemu. That doesn't need a kernel that doesn't have support for virtio-serial. That's great! If that works fine, and guest apps/libraries are using that as a fallback anyway, what benefit do they get from switching to virtio-serial when they detect that instead, given they still have code for the -net method? Is the plan to remove -net user based support from libguestfs? Is virtio-serial significantly simpler to use? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication
Amit Shah wrote: On (Wed) Jun 24 2009 [18:50:02], Jamie Lokier wrote: Amit Shah wrote: On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote: Amit Shah wrote: A few sample uses for a vmchannel are to share the host and guest clipboards (to allow copy/paste between a host and a guest), to lock the screen of the guest session when the vnc viewer is closed, to find out which applications are installed on a guest OS even when the guest is powered down (using virt-inspector) and so on. Those all look like useful features. Can you run an application to provide those features on a guest which _doesn't_ have a vmchannel/virtio-serial support in the kernel? Or will it be restricted only to guests which have QEMU-specific support in their kernel? libguestfs currently uses the -net user based vmchannel interface that exists in current qemu. That doesn't need a kernel that doesn't have support for virtio-serial. That's great! If that works fine, and guest apps/libraries are using that as a fallback anyway, what benefit do they get from switching to virtio-serial when they detect that instead, given they still have code for the -net method? Speed is the biggest benefit. Fair enough, sounds good, and I can see how it's more usable than a network interface in many respects. Is the plan to remove -net user based support from libguestfs? I don't know what Richard's plan is, but if the kernel that libguestfs deploys in the appliance gains support for virtio-serial, there's no reason it shouldn't switch. Is virtio-serial significantly simpler to use? I think the interface from the guest POV stays the same: reads / writes to char devices. With virtio-serial, though, we can add a few other interesting things like names to ports, ability to hot-add ports on demand, request notifications when either end goes down, etc. Good features, useful for a lot of handy things. I think it would be handy if the same features were available to the guest application generally, not just on guest kernels with a specific driver though. As we talked before, about things like boot loaders and kernel debuggers, and installing the support applications on old guests. Is it possible to support access to the same capabilities through a well known IO/MMIO address, in the same way that VGA and IDE are both ordinary PCI devices, but also can be reached easily through well known IO/MMIO addresses in simple code? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: On 06/16/2009 09:32 PM, Jamie Lokier wrote: Avi Kivity wrote: Another issue is enumeration. Guests will present their devices in the order they find them on the pci bus (of course enumeration is guest specific). So if I have 2 virtio controllers the only way I can distinguish between them is using their pci slots. virtio controllers really should have a user-suppliable string or UUID to identify them to the guest. Don't they? virtio controllers don't exist. When they do, they may have a UUID or not, but in either case guest infrastructure is in place for reporting the PCI slot, not the UUID. virtio disks do have a UUID. I don't think older versions of Windows will use it though, so if you reorder your slots you'll see your drive letters change. Same with Linux if you don't use udev by-uuid rules. I guess I meant virtio disks, so that's ok. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: If management apps need to hard-code which slots are available on different targets and different qemu versions, or restrictions on which devices can use which slots, or knowledge that some devices can be multi-function, or ... anything like that is just lame. You can't abstract these things away. If you can't put a NIC in slot 4, and you have 7 slots, then you cannot have 7 NICs. Having qemu allocate the slot numbers does not absolve management from knowing this limitation and preventing the user from creating a machine with 7 slots. Likewise, management will have to know which devices are multi-function, since that affects their hotpluggability. Ditto if some slot if faster than others, if you want to make use of this information you have to let the upper layers know. It could be done using an elaborate machine description that qemu exposes to management coupled with a constraint solver that optimizes the machine layout according to user specifications and hardware limitations. Or we could take the view that real life is not perfect (especially where computers are involved), add some machine specific knowledge, and spend the rest of the summer at the beach. To be honest, an elaborate machine description is probably fine... A fancy constraint solver is not required. A simple one strikes me as about as simple as what you'd hard-code anyway, but with fewer special cases. Note that the result can fail due to things like insufficient address space for all the device BARs even when they _are_ in the right slots. Especially if there are lots of slots, or bridges which can provide unlimited slots. That is arcane: device-dependent, CPU-dependent, machine-dependent, RAM-size dependent (in a non-linear way), device-option-dependent and probably QEMU-version-dependent too. It would be nice if libvirt (et al) would prevent the user from creating a VM with insufficient BAR space for that machine, but I'm not sure how to do it sanely, without arcane knowledge getting about. Maybe that idea of a .so shared by qemu and libvirt, to manipulate device configurations, is a sane one after all. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Avi Kivity wrote: Another issue is enumeration. Guests will present their devices in the order they find them on the pci bus (of course enumeration is guest specific). So if I have 2 virtio controllers the only way I can distinguish between them is using their pci slots. virtio controllers really should have a user-suppliable string or UUID to identify them to the guest. Don't they? -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Mark McLoughlin wrote: After libvirt has done -drive file=foo... it should dump the machine config and use that from then on. Right - libvirt then wouldn't be able to avoid the complexity of merging any future changes into the dumped machine config. As long as qemu can accept a machine config _and_ -drive file=foo (and monitor commands to add/remove devices), libvirt could merge by simply calling qemu with whatever additional command line options or monitor commands modify the config, then dump the new config. That way, virtio would not have to deal with that complexity. It would be written in one place: qemu. Or better, a utility: qemu-machine-config. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]
Mark McLoughlin wrote: Worst case we hardcode those numbers (gasp, faint). Maybe we can just add the open slots to the -help output. That'd be nice and clean. Make them part of the machine configuration. After all, they are part of the machine configuration, and ACPI, BIOS etc. need to know about all the machine slots anyway. Having said that, I prefer the idea that slot allocation is handled either in Qemu, or in a separate utility called qemu-machine-config (for working with machine configs), or in a library libqemu-machine-config.so. I particularly don't like the idea of arcane machine-dependent slot allocation knowledge living in libvirt, because it needs to be in Qemu anyway for non-libvirt users. No point in having two implementations of something tricky and likely to have machine quirks, if one will do. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
Paul Brook wrote: caps can be anywhere, but we don't expect it to change during machine execution lifetime. Or I am just confused by the name pci_device_load ? Right. So I want to load an image and it has capability X at offset Y. wmask has to match. I don't want to assume that we never change Y for the device without breaking old images, so I clear wmask here and set it up again after looking up capabilities that I loaded. We should not be loading state into a different device (or a similar device with a different set of capabilities). If you want to provide backwards compatibility then you should do that by creating a device that is the same as the original. As I mentioned in my earlier mail, loading a snapshot should never do anything that can not be achieved through normal operation. If you can create a machine be restoring a snapshot which you can't create by normally starting QEMU, then you'll soon have guests which work fine from their snapshots, but which cannot be booted without a snapshot because there's no way to boot the right machine for the guest. Ssomeone might even have guests like that for years without noticing, because they always save and restore guest state using snapshots, then one day they simply want to boot the guest from it's disk image and find there's no way to do it with any QEMU which runs on their host platform. I think the right long term answer to all this is a way to get QEMU to dump it's current machine configuration in glorious detail as a file which can be reloaded as a machine configuration. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities
Michael S. Tsirkin wrote: I think the right long term answer to all this is a way to get QEMU to dump it's current machine configuration in glorious detail as a file which can be reloaded as a machine configuration. And then we'll have the same set of problems there. We will, and the solution will be the same: options to create devices as they were in older versions of QEMU. It only needs to cover device features which matter to guests, not every bug fix. However with a machine configuration which is generated by QEMU, there's less worry about proliferation of obscure options, compared with the command line. You don't necessarily have to document every backward-compatibility option in any detail, you just have to make sure it's written and read properly, which is much the same thing as the snapshot code does. -- Jamie ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization