Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Jamie Lokier
Rusty Russell wrote:
 I don't think it'll be that bad; reset clears the device to unknown,
 bar0 moves it from unknown-legacy mode, bar1/2/3 changes it from
 unknown-modern mode, and anything else is bad (I prefer being strict so
 we catch bad implementations from the beginning).

Will that work, if the guest with kernel that uses modern mode, kexecs
to an older (but presumed reliable) kernel that only knows about legacy mode?

I.e. will the replacement kernel, or (ideally) replacement driver on
the rare occasion that is needed on a running kernel, be able to reset
the device hard enough?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
 On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
  Jens Axboe wrote:
   On Tue, May 04 2010, Rusty Russell wrote:
ISTR someone mentioning a desire for such an API years ago, so CC'ing 
the
usual I/O suspects...
   
   It would be nice to have a more fuller API for this, but the reality is
   that only the flush approach is really workable. Even just strict
   ordering of requests could only be supported on SCSI, and even there the
   kernel still lacks proper guarantees on error handling to prevent
   reordering there.
  
  There's a few I/O scheduling differences that might be useful:
  
  1. The I/O scheduler could freely move WRITEs before a FLUSH but not
 before a BARRIER.  That might be useful for time-critical WRITEs,
 and those issued by high I/O priority.
 
 This is only because noone actually wants flushes or barriers, though
 I/O people seem to only offer that.  We really want these writes must
 occur before this write.  That offers maximum choice to the I/O subsystem
 and potentially to smart (virtual?) disks.

We do want flushes for the D in ACID - such things as after
receiving a mail, or blog update into a database file (could be TDB),
and confirming that to the sender, to have high confidence that the
update won't disappear on system crash or power failure.

Less obviously, it's also needed for the C in ACID when more than
one file is involved.  C is about differently updated things staying
consistent with each other.

For example, imagine you have a TDB file mapping Samba usernames to
passwords, and another mapping Samba usernames to local usernames.  (I
don't know if you do this; it's just an illustration).

To rename a Samba user involves updating both.  Let's ignore transient
transactional issues :-) and just think about what happens with
per-file barriers and no sync, when a crash happens long after the
updates, and before the system has written out all data and issued low
level cache flushes.

After restarting, due to lack of sync, the Samba username could be
present in one file and not the other.

  2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
 only for data belonging to a particular file (e.g. fdatasync with
 no file size change, even on btrfs if O_DIRECT was used for the
 writes being committed).  That would entail tagging FLUSHes and
 WRITEs with a fs-specific identifier (such as inode number), opaque
 to the scheduler which only checks equality.
 
 This is closer.  In userspace I'd be happy with a all prior writes to this
 struct file before all future writes.  Even if the original guarantees were
 stronger (ie. inode basis).  We currently implement transactions using 4 fsync
 /msync pairs.
 
   write_recovery_data(fd);
   fsync(fd);
   msync(mmap);
   write_recovery_header(fd);
   fsync(fd);
   msync(mmap);
   overwrite_with_new_data(fd);
   fsync(fd);
   msync(mmap);
   remove_recovery_header(fd);
   fsync(fd);
   msync(mmap);
 
 Yet we really only need ordering, not guarantees about it actually hitting
 disk before returning.
 
  In other words, FLUSH can be more relaxed than BARRIER inside the
  kernel.  It's ironic that we think of fsync as stronger than
  fbarrier outside the kernel :-)
 
 It's an implementation detail; barrier has less flexibility because it has
 less information about what is required. I'm saying I want to give you as
 much information as I can, even if you don't use it yet.

I agree, and I've started a few threads about it over the last couple of years.

An fsync_range() system call would be very easy to use and
(most importantly) easy to understand.

With optional flags to weaken it (into fdatasync, barrier without sync,
sync without barrier, one-sided barrier, no lowlevel cache-flush, don't rush,
etc.), it would be very versatile, and still easy to understand.

With an AIO version, and another flag meaning don't rush, just return
when satisfied, and I suspect it would be useful for the most
demanding I/O apps.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-19 Thread Jamie Lokier
Michael S. Tsirkin wrote:
 I took a stub at documenting CMD and FLUSH request types in virtio
 block.  Christoph, could you look over this please?
 
 I note that the interface seems full of warts to me,
 this might be a first step to cleaning them.
 
 One issue I struggled with especially is how type
 field mixes bits and non-bit values. I ended up
 simply defining all legal values, so that we have
 CMD = 2, CMD_OUT = 3 and so on.
 
 I also avoided instroducing inhdr/outhdr structures
 that virtio blk driver in linux uses, I was concerned
 that nesting tables will confuse the reader.
 
 Comments welcome.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 --
 
 diff --git a/virtio-spec.lyx b/virtio-spec.lyx
 index d16104a..ed35893 100644
 --- a/virtio-spec.lyx
 +++ b/virtio-spec.lyx
 @@ -67,7 +67,11 @@ IBM Corporation
  \end_layout
  
  \begin_layout Standard
 +
 +\change_deleted 0 1266531118
  FIXME: virtio block scsi passthrough section
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Standard
 @@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac {
  The device can filter incoming packets by any number of destination MAC
   addresses.
  \begin_inset Foot
 -status open
 +status collapsed
  
  \begin_layout Plain Layout
  Since there are no guarentees, it can use a hash filter orsilently switch
 @@ -4549,6 +4553,22 @@ blk_size
  \end_inset
  
  .
 +\change_inserted 0 1266444580
 +
 +\end_layout
 +
 +\begin_layout Description
 +
 +\change_inserted 0 1266471229
 +VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
 +\end_layout
 +
 +\begin_layout Description
 +
 +\change_inserted 0 1266444605
 +VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Description
 @@ -4700,17 +4720,25 @@ struct virtio_blk_req {
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_IN  0
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_OUT 1
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_BARRIER  0x8000
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Plain Layout
 @@ -4735,11 +4763,15 @@ struct virtio_blk_req {
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472204
 +
  #define VIRTIO_BLK_S_OK0
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472204
 +
  #define VIRTIO_BLK_S_IOERR 1
  \end_layout
  
 @@ -4759,32 +4791,481 @@ struct virtio_blk_req {
  \end_layout
  
  \begin_layout Standard
 -The type of the request is either a read (VIRTIO_BLK_T_IN) or a write 
 (VIRTIO_BL
 -K_T_OUT); the high bit indicates that this request acts as a barrier and
 - that all preceeding requests must be complete before this one, and all
 - following requests must not be started until this is complete.
 +
 +\change_inserted 0 1266472490
 +If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet
 + command requests, each of these requests is of form:
 +\begin_inset listings
 +inline false
 +status open
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472395
 +
 +struct virtio_scsi_pc_req {
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u32 type;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u32 ioprio;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266474298
 +
 + u64 sector;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266474308
 +
 +char cmd[];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505809
 +
 + char data[][512];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505825
 +
 +#define SCSI_SENSE_BUFFERSIZE   96
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505848
 +
 +u8 sense[SCSI_SENSE_BUFFERSIZE];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472969
 +
 +u32 errors;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472979
 +
 +u32 data_len;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472984
 +
 +u32 sense_len;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472987
 +
 +u32 residual;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u8 status;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 +};
 +\end_layout
 +
 +\end_inset
 +
 +
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Standard
 -The ioprio field is a hint about the relative priorities of requests to
 - the device: higher numbers indicate more important requests.
 +The 
 +\emph on
 +type
 +\emph default
 + of the request is either a read (VIRTIO_BLK_T_IN)
 +\change_inserted 0 1266495815
 +,
 +\change_unchanged
 + 
 

Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-05 Thread Jamie Lokier
Avi Kivity wrote:
 On 01/05/2010 02:56 PM, Rusty Russell wrote:
 
 Those should be the same for any sane interface.  They are for classical
 disk devices with larger block sizes (MO, s390 dasd) and also for the
 now appearing 4k sector scsi disks.  But in the ide world people are
 concerned about dos/window legacy compatiblity so they came up with a
 nasty hack:
 
   - there is a physical block size as used by the disk internally
 (4k initially)
   - all the interfaces to the operating system still happen in the
 traditional 512 byte blocks to not break any existing assumptions
   - to make sure modern operating systems can optimize for the larger
 physical sectors the disks expose this size, too.
   - even worse disks can also have alignment hacks for the traditional
 DOS partitions tables, so that the 512 byte block zero might even
 have an offset into the first larger physical block.  This is also
 exposed in the ATA identify information.
 
 All in all I don't think this mess is a good idea to replicate in
 virtio.  Virtio by defintion requires virtualization aware guests, so we
 should just follow the SCSI way of larger real block sizes here.
  
 Yes.  The current VIRTIO_BLK_F_BLK_SIZE says please use this block size.
 We haven't actually specified what happens if the guest doesn't, but the
 spec says must, and the Linux implementation does so AFAICT.
 
 If we want a soft size, we could add that as a separate feature.

 
 No - I agree with Christoph, there's no reason to use a 512/4096 
 monstrosity with virtio.

It would be good if virtio relayed the backing device's basic topology
hints, so:

- If the backing dev is a real disk with 512-byte sectors,
  virtio should indicate 512-byte blocks to the guest.

- If the backing dev is a real disk with 4096-byte sectors,
  virtio should indicate 4096-byte blocks to the guest.

With databases and filesystems, if you care about data integrity:

- If the backing dev is a real disk with 4096-byte sectors,
  or a file whose access is through a 4096-byte-per-page cache,
  virtio must indicate 4096-byte blocks otherwise guest
  journalling is not host-powerfail safe.

You get the idea.  If there is only one parameter, it really should be
at least as large as the smallest unit which may be corrupted by
writes when errors occur.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-05 Thread Jamie Lokier
Avi Kivity wrote:
 Physical block size is the what the logical block size would have been 
 is software didn't suck.  In theory they should be the same, but since 
 compatibility reaons clamp the logical block size to 512, they have to 
 differ.  A disk may have a physical block size of 4096 and emulate 
 logical block size of 512 on top of that using read-modify-write.
 
 Or so I understand it.

I think that's right, but a side effect is that if you get a power
failure during the read-modify-write, bytes anywhere in 4096 sector
may be incorrect, so journalling (etc.) needs to use 4096 byte blocks
for data integrity, even though the drive emulates smaller writes.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver (V2)

2009-11-18 Thread Jamie Lokier
Anthony Liguori wrote:
 Rusty Russell wrote:
 The little-endian conversion of the balloon driver is a historical mistake
 (no other driver does this).  Let's not extend it to the stats.
 
 I think the mistake is that the other drivers don't do that.
 
 We cheat in qemu and assume that the guest is always in a fixed 
 endianness but this is not always the case for all architectures.

If guests can have different endianness (reasonable on some CPUs where
it's switchable - some even have more than 2 options), then I guess
the *host* on those systems have different endianness too.

Is the host's endianness signalled to the guest anywhere, so that
guest drivers can do cpu_to_qemuhost32(), when someone eventually
finds that necessary?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver

2009-11-11 Thread Jamie Lokier
Anthony Liguori wrote:
 Avi Kivity wrote:
 On 11/10/2009 04:36 PM, Anthony Liguori wrote:
 
 A stats vq might solve this more cleanly?
 
 actual and target are both really just stats.  Had we implemented 
 those with a vq, I'd be inclined to agree with you but since they're 
 implemented in the config space, it seems natural to extend the 
 config space with other stats.
 
 
 There is in fact a difference; actual and target are very rarely 
 updated, while the stats are updated very often.  Using a vq means a 
 constant number of exits per batch instead of one exit per statistic.  
 If the vq is host-driven, it also allows the host to control the 
 update frequency dynamically (i.e. stop polling when there is no 
 memory pressure).
 
 I'm not terribly opposed to using a vq for this.  I would expect the 
 stat update interval to be rather long (10s probably) but a vq works 
 just as well.

If there's no memory pressure and no guest activity, you probably want
the stat update to be as rare as possible to avoid wakeups.  Save
power on laptops, that sort of thing.

If there's a host user interested in the state (qemutop?), you may
want updates more often than 10s.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: Extending virtio_console to support multiple ports

2009-08-28 Thread Jamie Lokier
Alan Cox wrote:
   - Then, are we certain that there's no case where the tty layer will
  call us with some lock held or in an atomic context ? To be honest,
  I've totally lost track of the locking rules in tty land lately so it
  might well be ok, but something to verify.
 
 Some of the less well behaved line disciplines do this and always have
 done.

I had a backtrace in my kernel log recently which looked like that,
while doing PPP over Bluetooth RFCOMM.  Resulted in AppArmor
complaining that it's hook was being called in irq context.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-20 Thread Jamie Lokier
Amit Shah wrote:
  I think strings are better as numbers for identifying protocols as you  
  can work without a central registry for the numbers then.
 
 I like the way assigned numbers work: it's simpler to code, needs a
 bitmap for all the ports that fits in nicely in the config space and
 udev rules / scripts can point /dev/vmch02 to /dev/console.

How would a third party go about assigning themselves a number?

For the sake of example, imagine they develop a simple service like
guesttop which let's the host get a listing of guest processes.

They'll have to distributed app-specific udev rule patches for every
guest distro, which sounds like a lot of work.  The app itself is
probably a very simple C program; the hardest part of making it
portable across distros would be the udev rules, which is silly.

Anyway, every other device has a name or uuid these days.  You can
still use /dev/sda1 to refer to your boot partition, but LABEL=boot is
also available if you prefer.  Isn't that the ethos these days?

Why not both?  /dev/vmch05 if you prefer, plus symlink
/dev/vmch-guesttop - /dev/vmch05 if name=guesttop was given to QEMU.

If you do stay with numbers only, note that it's not like TCP/UDP port
numbers because the number space is far smaller.  Picking a random
number that you hope nobody else uses is harder.

... Back to technical bits.  If config space is tight, use a channel!
Dedicate channel 0 to control, used to fetch the name (if there is
one) for each number.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-06 Thread Jamie Lokier
Amit Shah wrote:
 On (Thu) Aug 06 2009 [08:58:01], Anthony Liguori wrote:
  Amit Shah wrote:
  On (Thu) Aug 06 2009 [08:29:40], Anthony Liguori wrote:

  Amit Shah wrote:
  
  Sure; but there's been no resistance from anyone from including the
  virtio-serial device driver so maybe we don't need to discuss that.
  
  There certainly is from me.  The userspace interface is not 
  reasonable  for guest applications to use.
  
 
  One example that would readily come to mind is dbus. A daemon running on
  the guest that reads data off the port and interacts with the desktop by
  appropriate dbus commands. All that's needed is a stream of bytes and
  virtio-serial provides just that.

 
  dbus runs as an unprivileged user, how does dbus know which  
  virtio-serial port to open and who sets the permissions on that port?
 
 The permission part can be handled by package maintainers and sysadmins
 via udev policies.
 
 So all data destined for dbus consumption gets to a daemon and that
 daemon then sends it over to dbus.

virtio-serial is nice, easy, simple and versatile.  We like that; it
should stay that way.

dbus isn't a good match for this.

dbus is not intended for communication between hosts, by design.

It depends on per-app configuration files in
/etc/dbus/{session,system}.d/, which are expected to match the
installed services.

For this, the guest's files in /etc/dbus/ would have to match the QEMU
host host services in detail.  dbus doesn't have a good mechanism for
copying with version skew between both of them, because normally
everything resides on the same machine and the config and service are
updated at the same time.  This is hard to guarantee with a VM.

Apart from dbus, hard-coded meanings of small N in /dev/vmchN are
asking for trouble.  It is bound to break when widely deployed and
guest/host configs don't match.  It also doesn't fit comfortably when
you have, say, bob and alice both logged in with desktops on separate
VTs.  Clashes are inevitable, as third-party apps pick N values for
themselves then get distributed - unless N values can be large
(/dev/vmch44324 == kernelstats...).

Sysadmins shouldn't have to hand-configure each app, and shouldn't
have to repair clashes in defaults.  Just Work is better.

virtio-serial is nice.  The only ugly part is _finding_ the right
/dev/vmchN.

Fortunately, _any_ out-of-band id string or id number makes it perfect.

An option to specify PCI vendor/product ids in the QEMU host
configuration is good enough.

An option to specify one or more id strings is nicer.

Finally, Anthony hit on an interesting idea with USB.  Emulating USB
sucks.  But USB's _descriptors_ are quite effective, and the USB basic
protocol is quite reasonable too.

Descriptors are just a binary blob in a particular format, which
describe a device and also say what it supports, and what standard
interfaces can be used with it too.  Bluetooth is similar; they might
even use the same byte format, I'm not sure.

All the code for parsing USB descriptors is already present in guest
kernels, and the code for making appropriate device nodes and
launching apps is already in udev.  libusb also allows devices to be
used without a kernel driver, and is cross-platform.  There are plenty
of examples of creating USB descriptors in QEMU, and may be the code
can be reused.

The only down side of USB is that emulating it sucks :-)  That's mainly
due to the host controllers, and the way interrupts use polling.

So here's a couple of ideas:

   - virtio-usb, using virtio instead of a hardware USB host
 controller.  That would provide all the features of USB
 naturally, like hotplug, device binding, access from userspace,
 but with high performance, low overhead, and no interrupt polling.

 You'd even have the option of cross-platform guest apps, as well
 as working on all Linux versions, by emulating a host controller
 when the guest doesn't have virtio-usb.

 As a bonus, existing USB support would be accelerated.

   - virtio-serial providing a binary id blob, whose format is the
 same as USB descriptors.  Reuse the guest's USB parsing and
 binding to find and identify, but the actual device functionality
 would just be a byte pipe.

 That might be simple, as all it involves is a blob passed to the
 guest from QEMU.  QEMU would build the id blob, maybe reusing
 existing USB code, and the guest would parse the blob as it
 already does for USB devices, with udev creating devices as it
 already does.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-05 Thread Jamie Lokier
Anthony Liguori wrote:
 Richard W.M. Jones wrote:
 Have you considered using a usb serial device?  Something attractive 
 about it is that a productid/vendorid can be specified which means that 
 you can use that as a method of enumerating devices.
 
 Hot add/remove is supported automagically.

The same applies to PCI: productid/vendorid (and subids);
PCI hotplug is possible though not as native as USB.

Here's another idea: Many devices these days have a serial number or
id string.  E.g. USB storage, ATA drives, media cards, etc.  Linux
these days creates alias device nodes which include the id string in
the device name.  E.g. /dev/disks/by-id/ata-FUJITSU_MHV2100BH_NWAQT662615H

So in addition to (or instead of) /dev/vmch0, /dev/vmch1 etc.,
Linux guests could easily generate:

/dev/vmchannel/by-role/clipboard-0
/dev/vmchannel/by-role/gueststats-0
/dev/vmchannel/by-role/vmmanager-0

It's not necessary to do this at the beginning.  All that is needed is
to provide enough id information that will appear in /sys/..., so that
that a udev policy for naming devices can be created at some later date.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-05 Thread Jamie Lokier
Gleb Natapov wrote:
 On Wed, Jul 29, 2009 at 01:14:18PM +0530, Amit Shah wrote:
  But why do we want to limit the device to only one port? It's not too
  complex supporting additional ones.
  
  As I see it qemu and the kernel should provide the basic abstraction for
  the userspace to go do its job. Why create unnecessary barriers?
  
 I agree. If userspace wants it may use only one channel and demultiplex
 messages by itself, but we shouldn't force it to. Also one of the
 requirements for virtio-serial is to have connect disconnect
 notifications. It is not possible with demultiplexing in the userspace.

I agree too, for all those reasons.

However it would be useful if the devices provided a simpler way to be
found by guest applications than /dev/vmch0, vmch1, vmch2...

On Linux udev provides a sane way to find devices according to roles,
subtypes, serial numbers, whatever you want, if the appropriate id
codes are available from the devices and put into /sys/* by the kernel
driver.  That would make the devices much more useful to independent
applications, imho.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-07-27 Thread Jamie Lokier
Daniel P. Berrange wrote:
  I expect the first problem you'll run into is that copy/paste daemon has 
  to run as an unprivileged user but /dev/vmch3 is going to be owned by 
  root.  You could set udev rules for /dev/vmch3 but that's pretty 
  terrible IMHO.
 
 I don't think that's not too bad, for example, with fast-user-switching 
 between multiple X servers and/or text consoles, there's already support
 code that deals with chown'ing things like /dev/snd/* devices to match 
 the active console session. Doing the same with the /dev/vmch3 device so
 that it is only ever accessible to the current logged in user actually
 fits in to that scheme quite well.

With multiple X servers, there can be more than one currently logged in user.

Same with multiple text consoles - that's more familiar.

Which one owns /dev/vmch3?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication

2009-06-24 Thread Jamie Lokier
Amit Shah wrote:
 A few sample uses for a vmchannel are to share the host and guest
 clipboards (to allow copy/paste between a host and a guest), to
 lock the screen of the guest session when the vnc viewer is closed,
 to find out which applications are installed on a guest OS even when
 the guest is powered down (using virt-inspector) and so on.

Those all look like useful features.

Can you run an application to provide those features on a guest which
_doesn't_ have a vmchannel/virtio-serial support in the kernel?

Or will it be restricted only to guests which have QEMU-specific
support in their kernel?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication

2009-06-24 Thread Jamie Lokier
Amit Shah wrote:
 On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote:
  Amit Shah wrote:
   A few sample uses for a vmchannel are to share the host and guest
   clipboards (to allow copy/paste between a host and a guest), to
   lock the screen of the guest session when the vnc viewer is closed,
   to find out which applications are installed on a guest OS even when
   the guest is powered down (using virt-inspector) and so on.
  
  Those all look like useful features.
  
  Can you run an application to provide those features on a guest which
  _doesn't_ have a vmchannel/virtio-serial support in the kernel?
  
  Or will it be restricted only to guests which have QEMU-specific
  support in their kernel?
 
 libguestfs currently uses the -net user based vmchannel interface that
 exists in current qemu. That doesn't need a kernel that doesn't have
 support for virtio-serial.

That's great!

If that works fine, and guest apps/libraries are using that as a
fallback anyway, what benefit do they get from switching to
virtio-serial when they detect that instead, given they still have
code for the -net method?

Is the plan to remove -net user based support from libguestfs?

Is virtio-serial significantly simpler to use?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication

2009-06-24 Thread Jamie Lokier
Amit Shah wrote:
 On (Wed) Jun 24 2009 [18:50:02], Jamie Lokier wrote:
  Amit Shah wrote:
   On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote:
Amit Shah wrote:
 A few sample uses for a vmchannel are to share the host and guest
 clipboards (to allow copy/paste between a host and a guest), to
 lock the screen of the guest session when the vnc viewer is closed,
 to find out which applications are installed on a guest OS even when
 the guest is powered down (using virt-inspector) and so on.

Those all look like useful features.

Can you run an application to provide those features on a guest which
_doesn't_ have a vmchannel/virtio-serial support in the kernel?

Or will it be restricted only to guests which have QEMU-specific
support in their kernel?
   
   libguestfs currently uses the -net user based vmchannel interface that
   exists in current qemu. That doesn't need a kernel that doesn't have
   support for virtio-serial.
  
  That's great!
  
  If that works fine, and guest apps/libraries are using that as a
  fallback anyway, what benefit do they get from switching to
  virtio-serial when they detect that instead, given they still have
  code for the -net method?
 
 Speed is the biggest benefit.

Fair enough, sounds good, and I can see how it's more usable than a
network interface in many respects.

  Is the plan to remove -net user based support from libguestfs?
 
 I don't know what Richard's plan is, but if the kernel that libguestfs
 deploys in the appliance gains support for virtio-serial, there's no
 reason it shouldn't switch.
 
  Is virtio-serial significantly simpler to use?
 
 I think the interface from the guest POV stays the same: reads / writes
 to char devices. With virtio-serial, though, we can add a few other
 interesting things like names to ports, ability to hot-add ports on
 demand, request notifications when either end goes down, etc.

Good features, useful for a lot of handy things.  I think it would be
handy if the same features were available to the guest application
generally, not just on guest kernels with a specific driver though.

As we talked before, about things like boot loaders and kernel
debuggers, and installing the support applications on old guests.

Is it possible to support access to the same capabilities through a
well known IO/MMIO address, in the same way that VGA and IDE are both
ordinary PCI devices, but also can be reached easily through well
known IO/MMIO addresses in simple code?

-- Jamie

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-17 Thread Jamie Lokier
Avi Kivity wrote:
 On 06/16/2009 09:32 PM, Jamie Lokier wrote:
 Avi Kivity wrote:

 Another issue is enumeration.  Guests will present their devices in the
 order they find them on the pci bus (of course enumeration is guest
 specific).  So if I have 2 virtio controllers the only way I can
 distinguish between them is using their pci slots.
 
 virtio controllers really should have a user-suppliable string or UUID
 to identify them to the guest.  Don't they?
 
 virtio controllers don't exist.  When they do, they may have a UUID or 
 not, but in either case guest infrastructure is in place for reporting 
 the PCI slot, not the UUID.
 
 virtio disks do have a UUID.  I don't think older versions of Windows 
 will use it though, so if you reorder your slots you'll see your drive 
 letters change.  Same with Linux if you don't use udev by-uuid rules.

I guess I meant virtio disks, so that's ok.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-17 Thread Jamie Lokier
Avi Kivity wrote:
 If management apps need to hard-code which slots are available on
 different targets and different qemu versions, or restrictions on which
 devices can use which slots, or knowledge that some devices can be
 multi-function, or ... anything like that is just lame.

 
 You can't abstract these things away.  If you can't put a NIC in slot 4, 
 and you have 7 slots, then you cannot have 7 NICs.  Having qemu allocate 
 the slot numbers does not absolve management from knowing this 
 limitation and preventing the user from creating a machine with 7 slots.
 
 Likewise, management will have to know which devices are multi-function, 
 since that affects their hotpluggability.  Ditto if some slot if faster 
 than others, if you want to make use of this information you have to let 
 the upper layers know.
 
 It could be done using an elaborate machine description that qemu 
 exposes to management coupled with a constraint solver that optimizes 
 the machine layout according to user specifications and hardware 
 limitations.  Or we could take the view that real life is not perfect 
 (especially where computers are involved), add some machine specific 
 knowledge, and spend the rest of the summer at the beach.

To be honest, an elaborate machine description is probably fine...

A fancy constraint solver is not required.  A simple one strikes me as
about as simple as what you'd hard-code anyway, but with fewer special
cases.

Note that the result can fail due to things like insufficient address
space for all the device BARs even when they _are_ in the right slots.
Especially if there are lots of slots, or bridges which can provide
unlimited slots.

That is arcane: device-dependent, CPU-dependent, machine-dependent,
RAM-size dependent (in a non-linear way), device-option-dependent and
probably QEMU-version-dependent too.

It would be nice if libvirt (et al) would prevent the user from
creating a VM with insufficient BAR space for that machine, but I'm
not sure how to do it sanely, without arcane knowledge getting about.

Maybe that idea of a .so shared by qemu and libvirt, to manipulate
device configurations, is a sane one after all.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Avi Kivity wrote:
 Another issue is enumeration.  Guests will present their devices in the 
 order they find them on the pci bus (of course enumeration is guest 
 specific).  So if I have 2 virtio controllers the only way I can 
 distinguish between them is using their pci slots.

virtio controllers really should have a user-suppliable string or UUID
to identify them to the guest.  Don't they?

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Mark McLoughlin wrote:
  After libvirt has done -drive file=foo... it should dump the machine 
  config and use that from then on.
 
 Right - libvirt then wouldn't be able to avoid the complexity of merging
 any future changes into the dumped machine config.

As long as qemu can accept a machine config _and_ -drive file=foo (and
monitor commands to add/remove devices), libvirt could merge by simply
calling qemu with whatever additional command line options or monitor
commands modify the config, then dump the new config.

That way, virtio would not have to deal with that complexity.  It
would be written in one place: qemu.

Or better, a utility: qemu-machine-config.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Mark McLoughlin wrote:
  Worst case we hardcode those numbers (gasp, faint).
 
 Maybe we can just add the open slots to the -help output. That'd be nice
 and clean.

Make them part of the machine configuration.

After all, they are part of the machine configuration, and ACPI, BIOS
etc. need to know about all the machine slots anyway.

Having said that, I prefer the idea that slot allocation is handled
either in Qemu, or in a separate utility called qemu-machine-config
(for working with machine configs), or in a library
libqemu-machine-config.so.

I particularly don't like the idea of arcane machine-dependent slot
allocation knowledge living in libvirt, because it needs to be in Qemu
anyway for non-libvirt users.  No point in having two implementations
of something tricky and likely to have machine quirks, if one will do.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Jamie Lokier
Paul Brook wrote:
   caps can be anywhere, but we don't expect it to change during machine
   execution lifetime.
  
   Or I am just confused by the name pci_device_load ?
 
  Right. So I want to load an image and it has capability X at offset Y.
  wmask has to match. I don't want to assume that we never change Y
  for the device without breaking old images, so I clear wmask here
  and set it up again after looking up capabilities that I loaded.
 
 We should not be loading state into a different device (or a similar device 
 with a different set of capabilities).
 
 If you want to provide backwards compatibility then you should do that by 
 creating a device that is the same as the original.  As I mentioned in my 
 earlier mail, loading a snapshot should never do anything that can not be 
 achieved through normal operation.

If you can create a machine be restoring a snapshot which you can't
create by normally starting QEMU, then you'll soon have guests which
work fine from their snapshots, but which cannot be booted without a
snapshot because there's no way to boot the right machine for the guest.

Ssomeone might even have guests like that for years without noticing,
because they always save and restore guest state using snapshots, then
one day they simply want to boot the guest from it's disk image and
find there's no way to do it with any QEMU which runs on their host
platform.

I think the right long term answer to all this is a way to get QEMU to
dump it's current machine configuration in glorious detail as a file
which can be reloaded as a machine configuration.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Jamie Lokier
Michael S. Tsirkin wrote:
  I think the right long term answer to all this is a way to get QEMU to
  dump it's current machine configuration in glorious detail as a file
  which can be reloaded as a machine configuration.
 
 And then we'll have the same set of problems there.

We will, and the solution will be the same: options to create devices
as they were in older versions of QEMU.  It only needs to cover device
features which matter to guests, not every bug fix.

However with a machine configuration which is generated by QEMU,
there's less worry about proliferation of obscure options, compared
with the command line.  You don't necessarily have to document every
backward-compatibility option in any detail, you just have to make
sure it's written and read properly, which is much the same thing as
the snapshot code does.

-- Jamie
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization