Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Jamie Lokier
Rusty Russell wrote:
 I don't think it'll be that bad; reset clears the device to unknown,
 bar0 moves it from unknown-legacy mode, bar1/2/3 changes it from
 unknown-modern mode, and anything else is bad (I prefer being strict so
 we catch bad implementations from the beginning).

Will that work, if the guest with kernel that uses modern mode, kexecs
to an older (but presumed reliable) kernel that only knows about legacy mode?

I.e. will the replacement kernel, or (ideally) replacement driver on
the rare occasion that is needed on a running kernel, be able to reset
the device hard enough?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-09 Thread Jamie Lokier
Anthony Liguori wrote:
 The new API will do away with the IOAPIC/PIC/PIT emulation and defer
 them to userspace.
 
 I'm a big fan of this.

I agree with getting rid of unnecessary emulations.
(Why were those things emulated in the first place?)

But it would be good to retain some way to plugin device emulations
in the kernel, separate from KVM core with a well-defined API boundary.

Then it wouldn't matter to the KVM core whether there's PIT emulation
or whatever; that would just be a separate module.  Perhaps even with
its own /dev device and maybe not tightly bound to KVM,

 Note: this may cause a regression for older guests that don't
 support MSI or kvmclock.  Device assignment will be done using
 VFIO, that is, without direct kvm involvement.

I don't like the sound of regressions.

I tend to think of a VM as something that needs to have consistent
behaviour over a long time, for keeping working systems running for
years despite changing hardware, or reviving old systems to test
software and make patches for things in long-term maintenance etc.

But I haven't noticed problems from upgrading kernelspace-KVM yet,
only upgrading the userspace parts.  If a kernel upgrade is risky,
that makes upgrading host kernels difficult and all or nothing for
all the guests within.

However it looks like you mean only the performance characteristics
will change because of moving things back to userspace?

 Local APICs will be mandatory, but it will be possible to hide them from
 the guest.  This means that it will no longer be possible to emulate an
 APIC in userspace, but it will be possible to virtualize an APIC-less
 core - userspace will play with the LINT0/LINT1 inputs (configured as
 EXITINT and NMI) to queue interrupts and NMIs.
 
 I think this makes sense.  An interesting consequence of this is
 that it's no longer necessary to associate the VCPU context with an
 MMIO/PIO operation.  I'm not sure if there's an obvious benefit to
 that but it's interesting nonetheless.

Would that be useful for using VCPUs to run sandboxed userspace code
with ability to trap and control the whole environment (as opposed to
guest OSes, or ptrace which is rather incomplete and unsuitable for
sandboxing code meant for other OSes)?

Thanks,
-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Jamie Lokier
Jan Kiszka wrote:
 Usability. Users should not have to care about individual tick-based
 clocks. They care about my OS requires lost ticks compensation, yes or no.

Conceivably an OS may require lost ticks compensation depending on
boot options given to the OS telling it which clock sources to use.

However I like the idea of a global default, which you can set and all
the devices inherit it unless overridden in each device.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Jamie Lokier
Marcelo Tosatti wrote:
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.

If the guest doesn't know it's been stopped, then its time
representation will be wrong until it finds out, e.g. after a few
minutes with NTP, or even a seconds can be too long.

That is sad when it happens because it breaks the coherence of any
timed-lease caching the guest is involved in.  I.e. where the guest
acquires a lock on some data object (like a file in NFS) that it can
efficiently access without network round trips (similar to MESI), with
all nodes having agreed that it's coherent for, say, 5 seconds before
renewing or breaking.  (It's just a way to reduce latency.)

But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
one of those facts of life.  So instead the effort is to try and
detect when a VM is involved and then distrust the clock.

(Non-VM) suspend/resume is similar, but there's usually a way to
be notified about that as it happens.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Avoid soft lockup message when KVM is stopped by host

2011-09-20 Thread Jamie Lokier
Marcelo Tosatti wrote:
 In case the VM stops for whatever reason, the host system is not
 supposed to adjust time related hardware state to compensate, in an
 attempt to present apparent continuous time.
 
 If you save a VM and then restore it later, it is the guest
 responsability to adjust its time representation.

If the guest doesn't know it's been stopped, then its time
representation will be wrong until it finds out, e.g. after a few
minutes with NTP, or even a seconds can be too long.

That is sad when it happens because it breaks the coherence of any
timed-lease caching the guest is involved in.  I.e. where the guest
acquires a lock on some data object (like a file in NFS) that it can
efficiently access without network round trips (similar to MESI), with
all nodes having agreed that it's coherent for, say, 5 seconds before
renewing or breaking.  (It's just a way to reduce latency.)

But we can't trust CLOCK_MONOTONIC when a VM is involved, it's just
one of those facts of life.  So instead the effort is to try and
detect when a VM is involved and then distrust the clock.

(Non-VM) suspend/resume is similar, but there's usually a way to
be notified about that as it happens.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-05-03 Thread Jamie Lokier
Blue Swirl wrote:
 On Fri, Apr 8, 2011 at 9:04 AM, Gleb Natapov g...@redhat.com wrote:
  On Thu, Apr 07, 2011 at 04:41:03PM -0500, Anthony Liguori wrote:
  On 04/07/2011 02:17 PM, Gleb Natapov wrote:
  On Thu, Apr 07, 2011 at 10:04:00PM +0300, Blue Swirl wrote:
  On Thu, Apr 7, 2011 at 9:51 PM, Gleb Natapovg...@redhat.com  wrote:
  
  I'd prefer something more generic like these:
  raise /apic@fee0:l1int
  lower /i44FX-pcihost/e1000@03.0/pinD
  
  The clumsier syntax shouldn't be a problem, since this would be a
  system developer tool.
  
  Some kind of IRQ registration would be needed for this to work without
  lots of changes.
  True. The ability to trigger any interrupt line is very useful for
  debugging. I often re-implement it during debug.
 
  And it's a good thing to have, but exposing this as the only API to
  do something as simple as generating a guest crash dump is not the
  friendliest thing in the world to do to users.
 
  Well, this is not intended to be used by regular users directly and
  management can provide nicer interface for issuing NMI. But really,
  my point is that NMI actually generates guest core dump in such rare
  cases (only preconfigured Windows guests) that it doesn't warrant to
  name command as such. Management is in much better position to implement
  functionality with such name since it knows what type of guest it runs
  and can tell agent to configure guest accordingly.
 
 Does the management need to know about each and every debugging
 oriented interface? For example, info regs,  info mem, info irq
 and tracepoints?

Linux uses NMI for performance tracing, profiling, watchdog etc. so in
practice, NMI is very similar to the other IRQs.  I.e. highly guest
specific and depending on what's wired up to it.  Injecting NMI to all
CPUs at once does not make any sense for those Linux guests.

For Windows crash dumps, I think it makes sense to have a button
wired to NMI device, rather than inject-nmi directly, but I can see
that inject-nmi solves the intended problem quite neatly.

For Linux crash dumps, for example, there are other key combinations,
as well as watchdog devices, that can be used to trigger them.  A
virtual button wired to GPIO/PCI-IRQ/etc. device might be quite
handy for debugging Linux guests, and would fit comfortably in a
management interface.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-05-03 Thread Jamie Lokier
Gleb Natapov wrote:
 On Thu, Apr 07, 2011 at 04:39:58PM -0500, Anthony Liguori wrote:
  On 04/07/2011 01:51 PM, Gleb Natapov wrote:
  NMI does not have to generate crash dump on every guest we support.
  Actually even for windows guest it does not generate one without
  tweaking registry. For all I know there is a guest that checks mail when
  NMI arrives.
  
  And for all we know, a guest can respond to an ACPI poweroff event
  by tweeting the star spangled banner but we still call the
  corresponding QMP command system_poweroff.
  
 Correct :) But at least system_poweroff implements ACPI poweroff as
 defined by ACPI spec. NMI is not designed as core dump event and is not
 used as such by majority of the guests.

Imho acpi_poweroff or poweroff_button would have been a clearer name.
Or even 'sendkey poweroff' - it's just a button someone on the
keyboard on a lot of systems anyway.  Next to the email button and what
looks, on my laptop, like the play-a-tune button :-)

I put system_poweroff into some QEMU-controlling scripts once, and was
disappointed when several guests ignored it.

But it's done now.

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Anyone seeing huge slowdown launching qemu with Linux 2.6.35?

2010-08-03 Thread Jamie Lokier
Richard W.M. Jones wrote:
 We could demand that OSes write device drivers for more qemu devices
 -- already OS vendors write thousands of device drivers for all sorts
 of obscure devices, so this isn't really much of a demand for them.
 In fact, they're already doing it.

Result: Most OSes not working with qemu?

Actually we seem to be going that way.  Recent qemus don't work with
older versions of Windows any more, so we have to use different
versions of qemu for different guests.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010

2010-05-19 Thread Jamie Lokier
Michael Tokarev wrote:
 Anthony Liguori wrote:
 []
  For the Bug Day, anything is interesting IMHO.  My main interest is to
  get as many people involved in testing and bug fixing as possible.  If
  folks are interested in testing specific things like unusual or older
  OSes, I'm happy to see it!
 
 Well, interesting or not, but I for one don't know what to do with the
 results.  There were a thread on kvm@ about sigsegv in cirrus code when
 running winNT. The issue has been identified and appears to be fixed,
 as in, kvm process does not SIGSEGV anymore, but it does not work anyway,
 now printing:
 
  BUG: kvm_dirty_pages_log_enable_slot: invalid parameters
 
 with garbled guest display.  Thanks goes to Stefano Stabellini for
 finding the SIGSEGV case, but unfortunately his hard work isn't quite
 useful since the behavour isn't very much different from the previous
 version... ;)

A BUG: is good to see in a bug report: It gives you something
specific to analyse.  Good luck ;-)

Imho, it'd be quite handy to keep a timeline of working/non-working
guests in a table somewhere, and which qemu versions and options they
were observed to work or break with.

 Also, thanks to Andre Przywara, whole winNT thing works but it requires
 -cpu qemu64,level=1 (or level=2 or =3), -- _not_ with default CPU.  This
 is also testing, but it's not obvious what to do witht the result...

Doesn't WinNT work with qemu32 or kvm32?
It's a 32-bit OS after all.

- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Bug Day - June 1st, 2010

2010-05-18 Thread Jamie Lokier
Natalia Portillo wrote:
 Hi,
 
  - We'll try to migrate as many confirmable bugs from the Source Forge 
  tracker to Launchpad.
 I think that part of the bug day should also include retesting OSes that 
 appear in OS Support List as having bug and confirming if the bug is still 
 present and if it's in Launchpad or not.

There have been reports of several legacy OSes being unable to install
or boot in the newer qemu while working in the older one.  They're
probably not in the OS Support List though.  Are they effectively
uninteresting for the purpose of the 0.13 release?

Unfortunately I doubt I will have time to participate in the Bug Day.

Thanks,
-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Another SIGFPE in display code, now in cirrus

2010-05-13 Thread Jamie Lokier
Stefano Stabellini wrote:
  I think we need to consider only dstpitch for a full invalidate.  We 
  might be copying an offscreen bitmap into the screen, and srcpitch is 
  likely to be the bitmap width instead of the screen pitch.
 
 Agreed.

Even when copying on-screen (or partially on-screen), the srcpitch
does not affect the invalidated area.  The source area might be
strange (parallelogram, single line repeated), but srcpitch should
only affect whether qemu_console_copy can be used, not the
invalidation.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-12 Thread Jamie Lokier
Gerhard Wiesinger wrote:
 On Wed, 21 Apr 2010, Jamie Lokier wrote:
 
 Gerhard Wiesinger wrote:
 Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW
 of QEMU even from KVM must be possible (e.g. memory and port accesses are
 done on nearly every virtual device) and therefore I'm ending in C code in
 the QEMU hw/*.c directory. Therefore also the VGA memory area should be
 able to be accessable from KVM but with the specialized and fast memory
 access of QEMU.  Am I missing something?
 
 What you're missing is that when KVM calls out to QEMU to handle
 hw/*.c traps, that call is very slow.  It's because the hardware-VM
 support is a bit slow when the trap happens, and then the the call
 from KVM in the kernel up to QEMU is a bit slow again.  Then all the
 way back.  It adds up to a lot, for every I/O operation.
 
 Isn't that then a general problem of KVM virtualization (oder hardware 
 virtualization) in general? Is this CPU dependend (AMD vs. Intel)?

Yes it is a general problem, but KVM emulates some time-critical
things in the kernel (like APIC and CPU instructions), so it's not too bad.

KVM is about 5x faster than TCG for most things, and slower for a few
things, so on balance it is usually faster.

The slow 256-colour mode writes sound like just a simple bug, though.
No need for complicated changes.

 In 256-colour mode, KVM should be writing to the VGA memory at high
 speed a lot like normal RAM, not trapping at the hardware-VM level,
 and not calling up to the code in hw/*.c for every byte.
 
 Yes, same picture to me: 256 color mode should be only a memory write (16 
 color mode is more difficult as pixel/byte mapping is not the same).
 But it looks like this isn't the case in this test scenario.
 
 You might double-check if your guest is using VGA Mode X.  (See 
 Wikipedia.)
 
 That was a way to accelerate VGA on real PCs, but it will be slow in
 KVM for the same reasons as 16-colour mode.
 
 Which way do you mean?

Look up Mode X on Wikipedia if you're interested, but it isn't
relevant to the problem you've reported.  Mode X cannot be enabled
with a BIOS call; it's a VGA hardware programming trick.  It would not
be useful in a VM environment.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-05-12 Thread Jamie Lokier
Gerhard Wiesinger wrote:
 Can one switch to the old software vmm in VMWare?

Perhaps you can install a very old version of VMWare.
Maybe run it under KVM ;-)

 That was one of the reasons why I was looking for alternatives for 
 graphical DOS programs. Overall summary so far:
 1.) QEMU without KVM: Problem with 286 DOS Extender instruction set, but 
 fast VGA
 2.) QEMU with KVM: 286 DOS Extender apps ok, but slow VGA memory 
 performance
 3.) VMWare Server 2.0 under Linux, application ok, but slow VGA memory 
 performance
 4.) Virtual PC: Problems with 286 DOS Extender
 5.) Bochs: Works well, but very slow.

I would be interested in the 286 DOS Extender issue, as I'd like to
use some 286 programs in QEMU at some point.

There were some changes to KVM in the kernel recently.  Were those
needed to get the 286 apps working?

 Looks like that VMWare Server and QEMU with KVM maybe have the same 
 architectural problems going through the whole slow chain from Guest OS to 
 virtualization layer for VGA writes.

They do have a similar architecture.

the VGA write speed is a bit surprising, as it should be fast in
256-colour non-modeX modes for both.  But maybe there's something
we've missed that makes it architecturally slow.  It will be
interesting to see what you find :-)

Thanks,
-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Another SIGFPE in display code, now in cirrus

2010-05-12 Thread Jamie Lokier
Stefano Stabellini wrote:
 On Wed, 12 May 2010, Avi Kivity wrote:
  It's useful if you have a one-line horizontal pattern you want to 
  propagate all over.
  
 It might be useful all right, but it is not entirely clear what the
 hardware should do in this situation from the documentation we have, and
 certainly the current state of the cirrus emulation code doesn't help.

It's quite a reasonable thing for hardware to do, even if not documented.
It would be surprising if the hardware didn't copy the one-line pattern.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
  Seems over-zealous.
  If the recovery_header held a strong checksum of the recovery_data you would
  not need the first fsync, and as long as you have two places to write 
  recovery
  data, you don't need the 3rd and 4th syncs.
  Just:

  write_internally_checksummed_recovery_data_and_header_to_unused_log_space()
fsync / msync
overwrite_with_new_data()
  
  To recovery you choose the most recent log_space and replay the content.
  That may be a redundant operation, but that is no loss.
 
 I think you missed a checksum for the new data?  Otherwise we can't tell if
 the new data is completely written.

The data checksum can go in the recovery-data block.  If there's
enough slack in the log, by the time that recovery-data block is
overwritten, you can be sure that an fsync has been done for that
data (by a later commit).

 But yes, I will steal this scheme for TDB2, thanks!

Take a look at the filesystems.  I think ext4 did some optimisations
in this area, and that checksums had to be added anyway due to a
subtle replay-corruption problem that happens when the log is
partially corrupted, and followed by non-corrupt blocks.

Also, you can remove even more fsyncs by adding a bit of slack to the
data space and writing into unused/fresh areas some of the time -
i.e. a bit like btrfs/zfs or anything log-structured, but you don't
have to go all the way with that.

 In practice, it's the first sync which is glacial, the rest are pretty cheap.

The 3rd and 4th fsyncs imply a disk seek each, just because the
preceding writes are to different areas of the disk.  Seeks are quite
slow - but not as slow as ext3 fsyncs :-) What do you mean by cheap?
That it's only a couple of seeks, or that you don't see even that?

 
  Also cannot see the point of msync if you have already performed an fsync,
  and if there is a point, I would expect you to call msync before
  fsync... Maybe there is some subtlety there that I am not aware of.
 
 I assume it's this from the msync man page:
 
msync()  flushes  changes  made  to the in-core copy of a file that was
mapped into memory using mmap(2) back to disk.   Without  use  of  this
call  there  is  no guarantee that changes are written back before mun‐
map(2) is called. 

Historically, that means msync() ensures dirty mapping data is written
to the file as if with write(), and that mapping pages are removed or
refreshed to get the effect of read() (possibly a lazy one).  It's
more obvious in the early mmap implementations where mappings don't
share pages with the filesystem cache, so msync() has explicit
behaviour.

Like with write(), after calling msync() you would then call fsync()
to ensure the data is flushed to disk.

If you've been calling fsync then msync, I guess that's another fine
example of how these function are so hard to test, that they aren't.

Historically on Linux, msync has been iffy on some architectures, and
I'm still not sure it has the same semantics as other unixes.  fsync
as we know has also been iffy, and even now that fsync is tidier it
does not always issue a hardware-level cache commit.

But then historically writable mmap has been iffy on a boatload of
unixes.

   It's an implementation detail; barrier has less flexibility because it has
   less information about what is required. I'm saying I want to give you as
   much information as I can, even if you don't use it yet.
  
  Only we know that approach doesn't work.
  People will learn that they don't need to give the extra information to 
  still
  achieve the same result - just like they did with ext3 and fsync.
  Then when we improve the implementation to only provide the guarantees that
  you asked for, people will complain that they are getting empty files that
  they didn't expect.
 
 I think that's an oversimplification: IIUC that occurred to people *not*
 using fsync().  They weren't using it because it was too slow.  Providing
 a primitive which is as fast or faster and more specific doesn't have the
 same magnitude of social issues.

I agree with Rusty.  Let's make it perform well so there is no reason
to deliberately avoid using it, and let's make say what apps actually
want to request without being way too strong.

And please, if anyone has ideas on how we could make correct use of
these functions *testable* by app authors, I'm all ears.  Right now it
is quite difficult - pulling power on hard disks mid-transaction is
not a convenient method :)

  The abstraction I would like to see is a simple 'barrier' that contains no
  data and has a filesystem-wide effect.
 
 I think you lack ambition ;)
 
 Thinking about the single-file use case (eg. kvm guest or tdb), isn't that
 suboptimal for md?  Since you have to hand your barrier to every device
 whereas a file-wide primitive may theoretically only go to some.

Yes.

Note that database-like programs still need fsync-like behaviour
*sometimes*: The D in ACID depends on it, and the C 

Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-06 Thread Jamie Lokier
Rusty Russell wrote:
 On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
  Jens Axboe wrote:
   On Tue, May 04 2010, Rusty Russell wrote:
ISTR someone mentioning a desire for such an API years ago, so CC'ing 
the
usual I/O suspects...
   
   It would be nice to have a more fuller API for this, but the reality is
   that only the flush approach is really workable. Even just strict
   ordering of requests could only be supported on SCSI, and even there the
   kernel still lacks proper guarantees on error handling to prevent
   reordering there.
  
  There's a few I/O scheduling differences that might be useful:
  
  1. The I/O scheduler could freely move WRITEs before a FLUSH but not
 before a BARRIER.  That might be useful for time-critical WRITEs,
 and those issued by high I/O priority.
 
 This is only because noone actually wants flushes or barriers, though
 I/O people seem to only offer that.  We really want these writes must
 occur before this write.  That offers maximum choice to the I/O subsystem
 and potentially to smart (virtual?) disks.

We do want flushes for the D in ACID - such things as after
receiving a mail, or blog update into a database file (could be TDB),
and confirming that to the sender, to have high confidence that the
update won't disappear on system crash or power failure.

Less obviously, it's also needed for the C in ACID when more than
one file is involved.  C is about differently updated things staying
consistent with each other.

For example, imagine you have a TDB file mapping Samba usernames to
passwords, and another mapping Samba usernames to local usernames.  (I
don't know if you do this; it's just an illustration).

To rename a Samba user involves updating both.  Let's ignore transient
transactional issues :-) and just think about what happens with
per-file barriers and no sync, when a crash happens long after the
updates, and before the system has written out all data and issued low
level cache flushes.

After restarting, due to lack of sync, the Samba username could be
present in one file and not the other.

  2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
 only for data belonging to a particular file (e.g. fdatasync with
 no file size change, even on btrfs if O_DIRECT was used for the
 writes being committed).  That would entail tagging FLUSHes and
 WRITEs with a fs-specific identifier (such as inode number), opaque
 to the scheduler which only checks equality.
 
 This is closer.  In userspace I'd be happy with a all prior writes to this
 struct file before all future writes.  Even if the original guarantees were
 stronger (ie. inode basis).  We currently implement transactions using 4 fsync
 /msync pairs.
 
   write_recovery_data(fd);
   fsync(fd);
   msync(mmap);
   write_recovery_header(fd);
   fsync(fd);
   msync(mmap);
   overwrite_with_new_data(fd);
   fsync(fd);
   msync(mmap);
   remove_recovery_header(fd);
   fsync(fd);
   msync(mmap);
 
 Yet we really only need ordering, not guarantees about it actually hitting
 disk before returning.
 
  In other words, FLUSH can be more relaxed than BARRIER inside the
  kernel.  It's ironic that we think of fsync as stronger than
  fbarrier outside the kernel :-)
 
 It's an implementation detail; barrier has less flexibility because it has
 less information about what is required. I'm saying I want to give you as
 much information as I can, even if you don't use it yet.

I agree, and I've started a few threads about it over the last couple of years.

An fsync_range() system call would be very easy to use and
(most importantly) easy to understand.

With optional flags to weaken it (into fdatasync, barrier without sync,
sync without barrier, one-sided barrier, no lowlevel cache-flush, don't rush,
etc.), it would be very versatile, and still easy to understand.

With an AIO version, and another flag meaning don't rush, just return
when satisfied, and I suspect it would be useful for the most
demanding I/O apps.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] question on virtio

2010-05-06 Thread Jamie Lokier
Michael S. Tsirkin wrote:
 Hi!
 I see this in virtio_ring.c:
 
 /* Put entry in available array (but don't update avail-idx *
  until they do sync). */
 
 Why is it done this way?
 It seems that updating the index straight away would be simpler, while
 this might allow the host to specilatively look up the buffer and handle
 it, without waiting for the kick.

Even better, if the host updates a location containing which index it
has seen recently, you can avoid the kick entirely during sustained
flows - just like your recent patch to avoid sending irqs to the
guest.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jamie Lokier
Jens Axboe wrote:
 On Tue, May 04 2010, Rusty Russell wrote:
  ISTR someone mentioning a desire for such an API years ago, so CC'ing the
  usual I/O suspects...
 
 It would be nice to have a more fuller API for this, but the reality is
 that only the flush approach is really workable. Even just strict
 ordering of requests could only be supported on SCSI, and even there the
 kernel still lacks proper guarantees on error handling to prevent
 reordering there.

There's a few I/O scheduling differences that might be useful:

1. The I/O scheduler could freely move WRITEs before a FLUSH but not
   before a BARRIER.  That might be useful for time-critical WRITEs,
   and those issued by high I/O priority.

2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
   only for data belonging to a particular file (e.g. fdatasync with
   no file size change, even on btrfs if O_DIRECT was used for the
   writes being committed).  That would entail tagging FLUSHes and
   WRITEs with a fs-specific identifier (such as inode number), opaque
   to the scheduler which only checks equality.

3. By delaying FLUSHes through reordering as above, the I/O scheduler
   could merge multiple FLUSHes into a single command.

4. On MD/RAID, BARRIER requires every backing device to quiesce before
   sending the low-level cache-flush, and all of those to finish
   before resuming each backing device.  FLUSH doesn't require as much
   synchronising.  (With per-file FLUSH; see 2; it could even avoid
   FLUSH altogether to some backing devices for small files).

In other words, FLUSH can be more relaxed than BARRIER inside the
kernel.  It's ironic that we think of fsync as stronger than
fbarrier outside the kernel :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jamie Lokier
Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?
 
 I've given up on figuring out the block device.  What seem to me to be sane
 semantics along the lines of memory barriers are foreign to disk people: they
 want (and depend on) flushing everywhere.
 
 For example, tdb transactions do not require a flush, they only require what
 I would call a barrier: that prior data be written out before any future data.
 Surely that would be more efficient in general than a flush!  In fact, TDB
 wants only writes to *that file* (and metadata) written out first; it has no
 ordering issues with other I/O on the same device.

I've just posted elsewhere on this thread, that an I/O level flush can
be more efficient than an I/O level barrier (implemented using a
cache-flush really), because the barrier has stricter ordering
requirements at the I/O scheduling level.

By the time you work up to tdb, another way to think of it is
distinguishing eager fsync from fsync but I'm not in a hurry -
delay as long as is convenient.  The latter makes much more sense
with AIO.

 A generic I/O interface would allow you to specify this request
 depends on these outstanding requests and leave it at that.  It
 might have some sync flush command for dumb applications and OSes.

For filesystems, it would probably be easy to label in-place
overwrites and fdatasync data flushes when there's no file extension
with an opqaue per-file identifier for certain operations.  Typically
over-writing in place and fdatasync would match up and wouldn't need
ordering against anything else.  Other operations would tend to get
labelled as ordered against everything including these.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-23 Thread Jamie Lokier
Yoshiaki Tamura wrote:
 Jamie Lokier wrote:
 Yoshiaki Tamura wrote:
 Dor Laor wrote:
 On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
 Event tapping is the core component of Kemari, and it decides on which
 event the
 primary should synchronize with the secondary. The basic assumption
 here is
 that outgoing I/O operations are idempotent, which is usually true for
 disk I/O
 and reliable network protocols such as TCP.
 
 IMO any type of network even should be stalled too. What if the VM runs
 non tcp protocol and the packet that the master node sent reached some
 remote client and before the sync to the slave the master failed?
 
 In current implementation, it is actually stalling any type of network
 that goes through virtio-net.
 
 However, if the application was using unreliable protocols, it should have
 its own recovering mechanism, or it should be completely stateless.
 
 Even with unreliable protocols, if slave takeover causes the receiver
 to have received a packet that the sender _does not think it has ever
 sent_, expect some protocols to break.
 
 If the slave replaying master's behaviour since the last sync means it
 will definitely get into the same state of having sent the packet,
 that works out.
 
 That's something we're expecting now.
 
 But you still have to be careful that the other end's responses to
 that packet are not seen by the slave too early during that replay.
 Otherwise, for example, the slave may observe a TCP ACK to a packet
 that it hasn't yet sent, which is an error.
 
 Even current implementation syncs just before network output, what you 
 pointed out could happen.  In this case, would the connection going to be 
 lost, or would client/server recover from it?  If latter, it would be fine, 
 otherwise I wonder how people doing similar things are handling this 
 situation.

In the case of TCP in a synchronised state, I think it will recover
according to the rules in RFC793.  In an unsynchronised state
(during connection), I'm not sure if it recovers or if it looks like a
Connection reset error.  I suspect it does recover but I'm not certain.

But that's TCP.  Other protocols, such as over UDP, may behave
differently, because this is not an anticipated behaviour of a
network.

 However there is one respect in which they're not idempotent:
 
 The TTL field should be decreased if packets are delayed.  Packets
 should not appear to live in the network for longer than TTL seconds.
 If they do, some protocols (like TCP) can react to the delayed ones
 differently, such as sending a RST packet and breaking a connection.
 
 It is acceptable to reduce TTL faster than the minimum.  After all, it
 is reduced by 1 on every forwarding hop, in addition to time delays.
 
 So the problem is, when the slave takes over, it sends a packet with same 
 TTL which client may have received.

Yes.  I guess this is a general problem with time-based protocols and
virtual machines getting stopped for 1 minute (say), without knowing
that real time has moved on for the other nodes.

Some application transaction, caching and locking protocols will give
wrong results when their time assumptions are discontinuous to such a
large degree.  It's a bit nasty to impose that on them after they
worked so hard on their reliability :-)

However, I think such implementations _could_ be made safe if those
programs can arrange to definitely be interrupted with a signal when
the discontinuity happens.  Of course, only if they're aware they may
be running on a Kemari system...

I have an intuitive idea that there is a solution to that, but each
time I try to write the next paragraph explaining it, some little
complication crops up and it needs more thought.  Something about
concurrent, asynchronous transactions to keep the master running while
recording the minimum states that replay needs to be safe, while
slewing the replaying slave's virtual clock back to real time quickly
during recovery mode.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-22 Thread Jamie Lokier
Yoshiaki Tamura wrote:
 Dor Laor wrote:
 On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
 Event tapping is the core component of Kemari, and it decides on which
 event the
 primary should synchronize with the secondary. The basic assumption
 here is
 that outgoing I/O operations are idempotent, which is usually true for
 disk I/O
 and reliable network protocols such as TCP.
 
 IMO any type of network even should be stalled too. What if the VM runs
 non tcp protocol and the packet that the master node sent reached some
 remote client and before the sync to the slave the master failed?
 
 In current implementation, it is actually stalling any type of network 
 that goes through virtio-net.
 
 However, if the application was using unreliable protocols, it should have 
 its own recovering mechanism, or it should be completely stateless.

Even with unreliable protocols, if slave takeover causes the receiver
to have received a packet that the sender _does not think it has ever
sent_, expect some protocols to break.

If the slave replaying master's behaviour since the last sync means it
will definitely get into the same state of having sent the packet,
that works out.

But you still have to be careful that the other end's responses to
that packet are not seen by the slave too early during that replay.
Otherwise, for example, the slave may observe a TCP ACK to a packet
that it hasn't yet sent, which is an error.

About IP idempotency:

In general, IP packets are allowed to be lost or duplicated in the
network.  All IP protocols should be prepared for that; it is a basic
property.

However there is one respect in which they're not idempotent:

The TTL field should be decreased if packets are delayed.  Packets
should not appear to live in the network for longer than TTL seconds.
If they do, some protocols (like TCP) can react to the delayed ones
differently, such as sending a RST packet and breaking a connection.

It is acceptable to reduce TTL faster than the minimum.  After all, it
is reduced by 1 on every forwarding hop, in addition to time delays.

 I currently don't have good numbers that I can share right now.
 Snapshots/sec depends on what kind of workload is running, and if the 
 guest was almost idle, there will be no snapshots in 5sec.  On the other 
 hand, if the guest was running I/O intensive workloads (netperf, iozone 
 for example), there will be about 50 snapshots/sec.

That is a really satisfying number, thank you :-)

Without this work I wouldn't have imagined that synchronised machines
could work with such a low transaction rate.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
 On 04/19/2010 10:14 PM, Gerhard Wiesinger wrote:
 Hello,
 
 Finally I got QEMU-KVM to work but video performance under DOS is very 
 low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM 
 is slow)
 
 I'm measuring 2 performance critical video performance parameters:
 1.) INT 10h, function AX=4F05h (set same window/set window/get window)
 2.) Memory performance to segment page A000h
 
 So BIOS performance (which might be port performance to VGA 
 index/value port) is about factor 5 slower, memory performance is 
 about factor 100 slower.
 
 QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement 
 tolerance) and listed only once, QEMU KVM is much more slower (details 
 see below).
 
 Test programs can be provided, source code will be release soon.
 
 Any ideas why KVM is so slow? 
 
 16-color vga is slow because kvm cannot map the framebuffer to the guest 
 (writes are not interpreted as RAM writes).  256+-color vga should be 
 fast, except when switching the vga window.  Note it's only fast on 
 average, the first write into a page will be slow as kvm maps it in.

I don't understand: why is 256+-colour mappable and 16-colour not mappable?

Is this a case where TCG would run significantly faster for code blocks
that have been detected to access the VGA memory?

 Which mode are you using?
 
 Any ideas for improvement?
 
 Currently when the physical memory map changes (which is what happens 
 when the vga window is updated), kvm drops the entire shadow cache.  
 It's possible to do this only for vga memory, but not easy.

If it's a page fault handled in the kernel, I would expect it to be
about as fast as those old VGA DOS-extender drivers which provide the
illusion of a single flat mapping, and bank switch on page faults -
multiplied by the speed of modern CPUs compared with then.  For many
graphics things those DOS-extender drivers worked perfectly well.

If it's a trap out to qemu on every vga window change, perhaps not
quite so well.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
 Writes to vga in 16-color mode don't change set a memory location to a 
 value, instead they change multiple memory locations.

While code is just writing to the VGA memory, not reading(*) and not
touching the VGA I/O register that control the write latches, is it
possible in principle to swizzle the format around in memory to make
regular writes work?

(*) Reading should be ok for some settings of the write latches, I
think.

I wonder if guests of interest behave like that.

 Is this a case where TCG would run significantly faster for code blocks
 that have been detected to access the VGA memory?
 
 Yes.

$ date
Wed Apr 21 19:37:38 2015
$ modprobe ktcg
;-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Gerhard Wiesinger wrote:
 Would it be possible to handle these writes through QEMU directly 
 (without
 KVM), because performance is there very well (looking at the code there
 is some pointer arithmetic and some memory write done)?
 
 I've noticed extremely slow VGA performance too, when installing OSes.
 It makes the difference between installing in a few minutes, and
 installing taking hours - just because of the slow VGA.
 
 So generally I use qemu for installing old versions of Windows, then
 change to KVM to run them after installing.
 
 Switching between KVM and qemu automatically based on guest code
 behaviour, and making both memory models and device models compatible
 at run time, is a difficult thing.  I guess it's not worth the
 difficulty just to speed up VGA.
 
 I think this is very easy to distingish:
 1.) VGA Segment A000 is legacy and should be handled through QEMU 
 and not through KVM (because it is much more faster). Also 16 color modes 
 should be fast enough there.
 2.) All other flat PCI memory accesses should be handled through KVM 
 (there is a specialized driver loaded for that PCI device in the non 
 legacy OS).
 
 Is that easily possible?

No it isn't.  Distingushing addresses is trivial.  You've ignored the
hard part, which is switching between different virtualisation
architectures...

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Avi Kivity wrote:
 On 04/21/2010 09:39 PM, Jamie Lokier wrote:
 Avi Kivity wrote:

 Writes to vga in 16-color mode don't change set a memory location to a
 value, instead they change multiple memory locations.
  
 While code is just writing to the VGA memory, not reading(*) and not
 touching the VGA I/O register that control the write latches, is it
 possible in principle to swizzle the format around in memory to make
 regular writes work?

 
 Not in software.  We can map pages, not cross address lines.

Hence swizzle.  You rearrange the data inside the page for the
crossed address lines, and undo the swizzle later on demand.  That
doesn't work for other VGA magic though.

 Guests that use 16 color vga are usually of little interest.

Fair enough.  We can move on :-)

It's been said that the super-slow VGA writes triggering this thread
are in 256-colour mode, so there's a different problem.  That should
be fast, shouldn't it?

I vaguely recall extremely slow OS installs I've seen in KVM, which
were fast in QEMU (and fast in KVM after installing), were using text
mode.  Possibly it was Windows 2000, or Windows Server 2003.  Text
mode should be fast too, shouldn't it?  I suppose it's possible that
it just looked like text mode and was really 16-colour mode.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: QEMU-KVM and video performance

2010-04-21 Thread Jamie Lokier
Gerhard Wiesinger wrote:
 Hmmm. I'm very new to QEMU and KVM but at least accessing the virtual HW 
 of QEMU even from KVM must be possible (e.g. memory and port accesses are 
 done on nearly every virtual device) and therefore I'm ending in C code in
 the QEMU hw/*.c directory. Therefore also the VGA memory area should be 
 able to be accessable from KVM but with the specialized and fast memory
 access of QEMU.  Am I missing something?

What you're missing is that when KVM calls out to QEMU to handle
hw/*.c traps, that call is very slow.  It's because the hardware-VM
support is a bit slow when the trap happens, and then the the call
from KVM in the kernel up to QEMU is a bit slow again.  Then all the
way back.  It adds up to a lot, for every I/O operation.

When QEMU does the same thing, it's fast because it's inside the same
process; it's just a function call.

That's why the most often called devices are emulated separately in
KVM's kernel code, things like the interrupt controller, timer chip
etc.  It's also why individual instructions that need help are
emulated in KVM's kernel code, instead of passing control up to QEMU
just for one instruction.

 BTW: Still not clear why performance is low with KVM since there are 
 no window changes in the testcase involved which could cause a (slow) page 
 fault.

It sounds like a bug.  Avi gave suggests about what to look for.
If it fixes my OS install speeds too, I'll be very happy :-)

In 256-colour mode, KVM should be writing to the VGA memory at high
speed a lot like normal RAM, not trapping at the hardware-VM level,
and not calling up to the code in hw/*.c for every byte.

You might double-check if your guest is using VGA Mode X.  (See Wikipedia.)

That was a way to accelerate VGA on real PCs, but it will be slow in
KVM for the same reasons as 16-colour mode.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-19 Thread Jamie Lokier
Michael S. Tsirkin wrote:
 I took a stub at documenting CMD and FLUSH request types in virtio
 block.  Christoph, could you look over this please?
 
 I note that the interface seems full of warts to me,
 this might be a first step to cleaning them.
 
 One issue I struggled with especially is how type
 field mixes bits and non-bit values. I ended up
 simply defining all legal values, so that we have
 CMD = 2, CMD_OUT = 3 and so on.
 
 I also avoided instroducing inhdr/outhdr structures
 that virtio blk driver in linux uses, I was concerned
 that nesting tables will confuse the reader.
 
 Comments welcome.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 --
 
 diff --git a/virtio-spec.lyx b/virtio-spec.lyx
 index d16104a..ed35893 100644
 --- a/virtio-spec.lyx
 +++ b/virtio-spec.lyx
 @@ -67,7 +67,11 @@ IBM Corporation
  \end_layout
  
  \begin_layout Standard
 +
 +\change_deleted 0 1266531118
  FIXME: virtio block scsi passthrough section
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Standard
 @@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac {
  The device can filter incoming packets by any number of destination MAC
   addresses.
  \begin_inset Foot
 -status open
 +status collapsed
  
  \begin_layout Plain Layout
  Since there are no guarentees, it can use a hash filter orsilently switch
 @@ -4549,6 +4553,22 @@ blk_size
  \end_inset
  
  .
 +\change_inserted 0 1266444580
 +
 +\end_layout
 +
 +\begin_layout Description
 +
 +\change_inserted 0 1266471229
 +VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
 +\end_layout
 +
 +\begin_layout Description
 +
 +\change_inserted 0 1266444605
 +VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Description
 @@ -4700,17 +4720,25 @@ struct virtio_blk_req {
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_IN  0
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_OUT 1
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472188
 +
  #define VIRTIO_BLK_T_BARRIER  0x8000
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Plain Layout
 @@ -4735,11 +4763,15 @@ struct virtio_blk_req {
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472204
 +
  #define VIRTIO_BLK_S_OK0
  \end_layout
  
  \begin_layout Plain Layout
  
 +\change_deleted 0 1266472204
 +
  #define VIRTIO_BLK_S_IOERR 1
  \end_layout
  
 @@ -4759,32 +4791,481 @@ struct virtio_blk_req {
  \end_layout
  
  \begin_layout Standard
 -The type of the request is either a read (VIRTIO_BLK_T_IN) or a write 
 (VIRTIO_BL
 -K_T_OUT); the high bit indicates that this request acts as a barrier and
 - that all preceeding requests must be complete before this one, and all
 - following requests must not be started until this is complete.
 +
 +\change_inserted 0 1266472490
 +If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet
 + command requests, each of these requests is of form:
 +\begin_inset listings
 +inline false
 +status open
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472395
 +
 +struct virtio_scsi_pc_req {
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u32 type;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u32 ioprio;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266474298
 +
 + u64 sector;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266474308
 +
 +char cmd[];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505809
 +
 + char data[][512];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505825
 +
 +#define SCSI_SENSE_BUFFERSIZE   96
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266505848
 +
 +u8 sense[SCSI_SENSE_BUFFERSIZE];
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472969
 +
 +u32 errors;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472979
 +
 +u32 data_len;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472984
 +
 +u32 sense_len;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472987
 +
 +u32 residual;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 + u8 status;
 +\end_layout
 +
 +\begin_layout Plain Layout
 +
 +\change_inserted 0 1266472375
 +
 +};
 +\end_layout
 +
 +\end_inset
 +
 +
 +\change_unchanged
 +
  \end_layout
  
  \begin_layout Standard
 -The ioprio field is a hint about the relative priorities of requests to
 - the device: higher numbers indicate more important requests.
 +The 
 +\emph on
 +type
 +\emph default
 + of the request is either a read (VIRTIO_BLK_T_IN)
 +\change_inserted 0 1266495815
 +,
 +\change_unchanged
 + 
 

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-12 Thread Jamie Lokier
Mohammed Gamal wrote:
 On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier ja...@shareable.org wrote:
  Javier Guerra Giraldez wrote:
  On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal m.gamal...@gmail.com 
  wrote:
   On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier ja...@shareable.org 
   wrote:
   To throw a spanner in, the most widely supported filesystem across
   operating systems is probably NFS, version 2 :-)
  
   Remember that Windows usage on a VM is not some rare use case, and
   it'd be a little bit of a pain from a user's perspective to have to
   install a third party NFS client for every VM they use. Having
   something supported on the VM out of the box is a better option IMO.
 
  i don't think virtio-CIFS has any more support out of the box (on any
  system) than virtio-9P.
 
  It doesn't, but at least network-CIFS tends to work ok and is the
  method of choice for Windows VMs - when you can setup Samba on the
  host (which as previously noted you cannot always do non-disruptively
  with current Sambas).
 
  -- Jamie
 
 
 I think having support for both 9p and CIFS would be the best option.
 In that case the user will have the option to use either one,
 depending on how their guests support these filesystems. In that case
 I'd prefer to work on CIFS support while the 9p effort can still go
 on. I don't think both efforts are mutually exclusive.
 
 What do the rest of you guys think?

I only noted NFS because most old OSes do not support CIFS or 9P -
especially all the old unixes.

I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
even support current CIFS.  They need extra server settings to work
- such as setting passwords on the server to non-encrypted and other quirks.

Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
properly see symlinks and hard links.

So there is no really nice out of the box file service which works
easily with all guest OSes.

I'm guessing that out of all the filesystems, CIFS is the most widely
supported in recent OSes (released in the last 10 years).  But I'm not
really sure what the state of CIFS is for non-Windows, non-Linux,
non-BSD guests.

I'm not sure why 9P is being pursued.  Does anything much support it,
or do all OSes except quite recent Linux need a custom driver for 9P?
Even Linux only got the first commit in the kernel 5 years ago, so
probably it was only about 3 years ago that it will have begun
appearing in stable distros, if at all.  Filesystem passthrough to
Linux guests installed in the last couple of years is a useful
feature, and I know that for many people that is their only use of
KVM, but compared with CIFS' broad support it seems like quite a
narrow goal.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-12 Thread Jamie Lokier
Alexander Graf wrote:
 Also since -net user does support samba exporting already,

This I'm interested in.  Last time I tried to use it, the smb=
option didn't work because Samba refused to run when launched with
qemu's mini config file and launched as a regular user.  It needed
access to various hard-coded root-owned directories, and spewed lots
of errors about it into its logs.

On closer inspection, those directories were hard-coded and could not
be changed by the config file, nor could the features they were for be
disabled.

Even if I gave it permission to write to those, by running kvm/qemu as
root, there was plenty of reason to worry that each instance of
qemu-spawned Samba may interfere with the others and with the host's
own  starting with errors spewed into log files by both Sambas.

So I had to give up on -net user,smb= completely :-(

Is this something that you at SuSE have fixed or simply never encountered?

My problems were with Debian and Ubuntu installations.  I suspect they
might have fixed some Samba problems by patching in different problems.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-11 Thread Jamie Lokier
Javier Guerra Giraldez wrote:
 On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal m.gamal...@gmail.com wrote:
  On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier ja...@shareable.org wrote:
  To throw a spanner in, the most widely supported filesystem across
  operating systems is probably NFS, version 2 :-)
 
  Remember that Windows usage on a VM is not some rare use case, and
  it'd be a little bit of a pain from a user's perspective to have to
  install a third party NFS client for every VM they use. Having
  something supported on the VM out of the box is a better option IMO.
 
 i don't think virtio-CIFS has any more support out of the box (on any
 system) than virtio-9P.

It doesn't, but at least network-CIFS tends to work ok and is the
method of choice for Windows VMs - when you can setup Samba on the
host (which as previously noted you cannot always do non-disruptively
with current Sambas).

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-10 Thread Jamie Lokier
Mohammed Gamal wrote:
 Hi Javier,
 Thanks for the link. However, I'm still concerned with
 interoperability with other operating systems, including non-Windows
 ones. I am not sure of how many operating systems actually support 9p,
 but I'm almost certain that CIFS would be more widely-supported.
 I am still a newbie as far as all this is concerned, so if anyone has
 any arguments as to whether which approach should be taken, I'd be
 enlightened to hear them.

To throw a spanner in, the most widely supported filesystem across
operating systems is probably NFS, version 2 :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-09 Thread Jamie Lokier
Mohammed Gamal wrote:
 2- With respect to CIFS. I wonder how the shares are supposed to be
 exposed to the guest. Should the Samba server be modified to be able
 to use unix domain sockets instead of TCP ports and then QEMU
 communicating on these sockets. With that approach, how should the
 guest be able to see the exposed share? And what is the problem of
 using Samba with TCP ports?

One problem with TCP ports is it only works when the guest's network
is up :) You can't boot from that.  It also makes things fragile or
difficult if the guest work you are doing involves fiddling with the
network settings.

Doing it over virtio-serial would have many benefits.

On the other hand, Samba+TCP+CIFS does have the advantage of working
with virtually all guest OSes, including Linux / BSDs / Windows /
MacOSX / Solaris etc.  9P only works with Linux as far as I know.

I big problem with Samba at the moment is it's not possible to
instantiate multiple instances of Samba any more, and not as a
non-root user.  That's because it contains some hard-coded paths to
directories of run-time state, at least on Debian/Ubuntu hosts where I
have tried and failed to use qemu's smb option, and there is no config
file option to disable that or even change all the paths.

Patching Samba to make per-user instantiations possible again would go
a long way to making it useful for filesystem passthrough.  Patching
it so you can turn off all the fancy features and have it _just_ serve
a filesystem with the most basic necessary authentication would be
even better.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v3 1/1] Shared memory uio_pci driver

2010-03-28 Thread Jamie Lokier
Avi Kivity wrote:
 ioctls encode the buffer size into the ioctl number, so in theory strace 
 doesn't need to be taught about an ioctl to show its buffer.

Unfortunately ioctl numbers don't always follow that rule :-(
But maybe that's just awful proprietary drivers that I've seen.

Anyway, strace should be taught how to read kernel headers to get
ioctl types ;-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-25 Thread Jamie Lokier
Cam Macdonell wrote:
 An irqfd can only trigger a single vector in a guest.  Right now I
 only have one eventfd per guest.So ioeventfd/irqfd restricts the
 current implementation to a single vector that a guest can trigger.
 Without irqfd, eventfds can be used like registers a write the number
 of the vector they want to trigger, but as you point out it is racy.

It's not racy if you use a pipe instead of eventfd. :-)

Actually, why not?  A byte pipe between guests would be more versatile.

Could it even integrate with virtio-serial, somehow?

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Completing big real mode emulation

2010-03-23 Thread Jamie Lokier
Avi Kivity wrote:
 Either way - then we should make the goal of the project to support those 
 old boot loaders. IMHO it should contain visibility. Doing theoretical 
 stuff is just less fun for all parties. Or does that stuff work already?
 
 Mostly those old guests aged beyond usefulness.  They are still broken, 
 but nobody installs new images.  Old images installed via workarounds work.

Hey :) I still install old OSes occasionally, so that I can build and
test code that will run on other people's still-running old machines.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Jamie Lokier
Avi Kivity wrote:
 On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 
 2. Do we have kvm-specific projects? Can they be part of the QEMU project
 or do we need a different mentoring organization for it?

 
 Something really interesting is kvm-assisted tcg.  I'm afraid it's a bit 
 too complicated to GSoC.

Is this simpler: kvm-assisted user-mode emulation (no TCG involved)?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Jamie Lokier
Paul Brook wrote:
   In a cross environment that becomes extremely hairy.  For example the x86
   architecture effectively has an implicit write barrier before every
   store, and an implicit read barrier before every load.
  
  Btw, x86 doesn't have any implicit barriers due to ordinary loads.
  Only stores and atomics have implicit barriers, afaik.
 
 As of March 2009[1] Intel guarantees that memory reads occur in
 order (they may only be reordered relative to writes). It appears
 AMD do not provide this guarantee, which could be an interesting
 problem for heterogeneous migration..

(Summary: At least on AMD64, it does too, for normal accesses to
naturally aligned addresses in write-back cacheable memory.)

Oh, that's interesting.  Way back when I guess we knew writes were in
order and it wasn't explicit that reads were, hence smp_rmb() using a
locked atomic.

Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD
documents asserting that reads to cacheable memory are in program order:

http://lkml.org/lkml/2007/9/28/212
Subject: [patch] x86: improved memory barrier implementation

Links to documents:

http://developer.intel.com/products/processor/manuals/318147.pdf

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

The Intel link doesn't work any more, but the AMD one does.

Nick asserts both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture.

I have just read the AMD document, and it is in there (but not
completely obviously), in section 7.2.  The implicit load-load and
store-store barriers are only guaranteed for normal cacheable
accesses on naturally aligned boundaries to WB [write-back cacheable]
memory.  There are also implicit load-store barriers but not
store-load.

Note that the document covers AMD64; it does not say anything about
their (now old) 32-bit processors.

 [*] The most recent docs I have handy. Up to and including Core-2 Duo.

Are you sure the read ordering applies to 32-bit Intel and AMD CPUs too?

Many years ago, before 64-bit x86 existed, I recall discussions on
LKML where it was made clear that stores were performed in program
order.  If it were known at the time that loads were performed in
program order on 32-bit x86s, I would have expected that to have been
mentioned by someone.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Paul Brook wrote:
  However, coherence could be made host-type-independent by the host
  mapping and unampping pages, so that each page is only mapped into one
  guest (or guest CPU) at a time.  Just like some clustering filesystems
  do to maintain coherence.
 
 You're assuming that a TLB flush implies a write barrier, and a TLB miss 
 implies a read barrier.  I'd be surprised if this were true in general.

The host driver itself can issue full barriers at the same time as it
maps pages on TLB miss, and would probably have to interrupt the
guest's SMP KVM threads to insert a full barrier when broadcasting a
TLB flush on unmap.

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Avi Kivity wrote:
 On 03/08/2010 03:03 PM, Paul Brook wrote:
 On 03/08/2010 12:53 AM, Paul Brook wrote:
  
 Support an inter-vm shared memory device that maps a shared-memory
 object as a PCI device in the guest.  This patch also supports
 interrupts between guest by communicating over a unix domain socket.
 This patch applies to the qemu-kvm repository.
  
 No. All new devices should be fully qdev based.
 
 I suspect you've also ignored a load of coherency issues, especially when
 not using KVM. As soon as you have shared memory in more than one host
 thread/process you have to worry about memory barriers.

 Shouldn't it be sufficient to require the guest to issue barriers (and
 to ensure tcg honours the barriers, if someone wants this with tcg)?.
  
 In a cross environment that becomes extremely hairy.  For example the x86
 architecture effectively has an implicit write barrier before every store, 
 and
 an implicit read barrier before every load.

 
 Ah yes.  For cross tcg environments you can map the memory using mmio 
 callbacks instead of directly, and issue the appropriate barriers there.

That makes sense.  It will force an mmio callback for every access to
the shared memory, which is ok for correctness but vastly slower when
running in TCG compared with KVM.

But it's hard to see what else could be done - those implicit write
barries on x86 have to be emulated somehow.  For TCG without inter-vm
shared memory, those barriers aren't a problem.

Non-random-corruption guest behaviour is paramount, so I hope the
inter-vm device will add those mmio callbacks for the cross-arch case
before it sees much action.  (Strictly, it isn't cross-arch, but
host-has-more-relaxed-implicit-memory-model-than-guest.  I'm assuming
TCG doesn't reorder memory instructions).

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Paul Brook wrote:
  On 03/08/2010 12:53 AM, Paul Brook wrote:
   Support an inter-vm shared memory device that maps a shared-memory
   object as a PCI device in the guest.  This patch also supports
   interrupts between guest by communicating over a unix domain socket. 
   This patch applies to the qemu-kvm repository.
  
   No. All new devices should be fully qdev based.
  
   I suspect you've also ignored a load of coherency issues, especially when
   not using KVM. As soon as you have shared memory in more than one host
   thread/process you have to worry about memory barriers.
  
  Shouldn't it be sufficient to require the guest to issue barriers (and
  to ensure tcg honours the barriers, if someone wants this with tcg)?.
 
 In a cross environment that becomes extremely hairy.  For example the x86 
 architecture effectively has an implicit write barrier before every store, 
 and 
 an implicit read barrier before every load.

Btw, x86 doesn't have any implicit barriers due to ordinary loads.
Only stores and atomics have implicit barriers, afaik.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Jamie Lokier
Alexander Graf wrote:
 Or we could put in some code that tells the guest the host shm  
 architecture and only accept x86 on x86 for now. If anyone cares for  
 other combinations, they're free to implement them.
 
 Seriously, we're looking at an interface designed for kvm here. Let's  
 please keep it as simple and fast as possible for the actual use case,  
 not some theoretically possible ones.

The concern is that a perfectly working guest image running on kvm,
the guest being some OS or app that uses this facility (_not_ a
kvm-only guest driver), is later run on qemu on a different host, and
then mostly works except for some silent data corruption.

That is not a theoretical scenario.

Well, the bit with this driver is theoretical, obviously :-)
But not the bit about moving to a different host.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-07 Thread Jamie Lokier
Paul Brook wrote:
  Support an inter-vm shared memory device that maps a shared-memory object
  as a PCI device in the guest.  This patch also supports interrupts between
  guest by communicating over a unix domain socket.  This patch applies to
   the qemu-kvm repository.
 
 No. All new devices should be fully qdev based.
 
 I suspect you've also ignored a load of coherency issues, especially when not 
 using KVM. As soon as you have shared memory in more than one host 
 thread/process you have to worry about memory barriers.

Yes. Guest-observable behaviour is likely to be quite different on
different hosts, expecially beteen x86 and non-x86 hosts, which is not
good at all for emulation.

Memory barriers performed by the guest would help, but would not
remove the fact that behaviour would vary beteen different host types
if a guest doesn't call them.  I.e. you could accidentally have some
guests working fine for years on x86 hosts, which gain subtle
memory corruption as soon as you run them on a different host.

This is acceptable when recompiling code for different architectures,
but it's asking for trouble with binary guest images which aren't
supposed to depend on host architecture.

However, coherence could be made host-type-independent by the host
mapping and unampping pages, so that each page is only mapped into one
guest (or guest CPU) at a time.  Just like some clustering filesystems
do to maintain coherence.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Slowdowns comparing qemu-kvm.git to qemu.git: vcpu/thread scheduling differences

2010-02-09 Thread Jamie Lokier
Anthony Liguori wrote:
 No, basically, the problem will boil down to, the IO thread is 
 select()'d waiting for an event to occur.  However, you've done 
 something in the VCPU thread that requires the IO thread to run it's 
 main loop.  You need to use qemu_notify_event() to force the IO thread 
 to break out of select().
 
 Debugging these problems are very difficult and the complexity here is 
 the main reason the IO thread still hasn't been enabled by default in 
 qemu.git.

It is difficult.  One approach to debugging them, in general, is to
have a special debugging mode where the select() loop wakes up
repeatedly at high speed and checks all the conditions it's supposed
to to be waiting for that _should_ have triggered a wakeup, and if any
of them are asserted for two timed wakeups (or some sufficient duration,
checked by gettimeofday), emit a bug message because it probably is one.

I don't know if that could be applied in qemu's event loop.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-25 Thread Jamie Lokier
Dor Laor wrote:
 x86   qemu64
 x86   phenom
 x86 core2duo
 x86kvm64
 x86   qemu32
 x86  coreduo
 x86  486
 x86  pentium
 x86 pentium2
 x86 pentium3
 x86   athlon
 x86 n270

I wonder if kvm32 would be good, for symmetry if nothing else.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
 kvm itself can modify flags exported from qemu to a guest.

I would hope for an option to request that qemu doesn't run if the
guest won't get the cpuid flags requested on the command line.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
  I foresee wanting to iterate over the models and pick the latest one
  which a host supports - on the grounds that you have done the hard
  work of ensuring it is a reasonably good performer, while probably
  working on another host of similar capability when a new host is made
  available.
 
 That's a fairly close use case to that of safe migration
 which was one of the primary motivations to identify
 the models being discussed.  Although presentation and
 administration of such was considered the domain of management
 tools.

My hypothetical script which iterates over models in that way is a
management tool, and would use qemu to help do its job.

Do you mean that more powerful management tools to support safe
migration will maintain _their own_ processor model tables, and
perform their calculations using their own tables instead of querying
qemu, and therefore not have any need of qemu's built in table?

If so, I favour more strongly Anthony's suggestion that the processor
model table lives in a config file (eventually), as that file could be
shared between management tools and qemu itself without duplication.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
 I can appreciate the argument above, however the goal was
 choosing names with some basis in reality.  These were
 recommended by our contacts within Intel, are used by VmWare
 to describe their similar cpu models, and arguably have fallen
 to defacto usage as evidenced by such sources as:
 
 http://en.wikipedia.org/wiki/Conroe_(microprocessor)
 http://en.wikipedia.org/wiki/Penryn_(microprocessor)
 http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

(Aside: I can confirm they haven't fallen into de facto usage anywhere
in my vicinity :-) I wonder if the contact within Intel are living in
a bit of a bubble where these names are more familiar than the outside
world.)

I think we can all agree that there is no point looking for a familiar
-cpu naming scheme because there aren't any familiar and meaningful names
these days.

 used by VmWare to describe their similar cpu models

If the same names are being used, I see some merit in qemu's list
matching VMware's cpu models *exactly* (in capabilities, not id
strings), to aid migration from VMware.  Is that feasible?  Do they
match already?

 I suspect whatever we choose of reasonable length as a model
 tag for -cpu some further detail is going to be required.
 That was the motivation to augment the table as above with
 an instance of a LCD for that associated class.
  
  I'm not a typical user: I know quite a lot about x86 architecture;
  I just haven't kept up to date enough to know the code/model names.
  Typical users will know less about them.
 
 Understood.


 One thought I had to further clarify what is going on under the hood
 was to dump the cpuid flags for each model as part of (or in
 addition to) the above table.  But this seems a bit extreme and kvm
 itself can modify flags exported from qemu to a guest.

Here's another idea.

It would be nice if qemu could tell the user which of the built-in
-cpu choices is the most featureful subset of their own host.  With
-cpu host implemented, finding that is probably quite easy.

Users with multiple hosts will get a better feel for what the -cpu
names mean that way, probably better than any documentation would give
them, because they probably have not much idea what CPU families they
have anyway.  (cat /proc/cpuinfo doesn't clarify, as I found).

And it would give a simple, effective, quick indication of what they
must choose if they want an VM image that runs on more than one of
their hosts without a management tool.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-19 Thread Jamie Lokier
Anthony Liguori wrote:
 On 01/18/2010 10:45 AM, john cooper wrote:
  x86   Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)
  x86   Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)
  x86  Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)
  x86   Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)
  x86   Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)
  x86   Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)
 
 I'm very much against having -cpu Nehalem.  The whole point of this is 
 to make things easier for a user and for most of the users I've 
 encountered, -cpu Nehalem is just as obscure as -cpu qemu64,-sse3,+vmx,...

When I saw that table just now, I had no idea whether Nehalem is newer
and more advanced than Penryn, or the other way around.  I also have
no idea if Core i7 is newer than Core 2 Duo or not.

I'm not a typical user: I know quite a lot about x86 architecture;
I just haven't kept up to date enough to know the code/model names.
Typical users will know less about them.

It's only from seeing the G1/G2/G3 order that I guess they are listed
in ascending order of functionality.

Naturally, if I were choosing one, I'd want to choose the one with the
most capabilities that works on whatever my host hardware provides.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-19 Thread Jamie Lokier
Chris Wright wrote:
 * Anthony Liguori (anth...@codemonkey.ws) wrote:
  I'm very much against having -cpu Nehalem.  The whole point of this is  
  to make things easier for a user and for most of the users I've  
  encountered, -cpu Nehalem is just as obscure as -cpu 
  qemu64,-sse3,+vmx,...
 
 What name will these users know?  FWIW, it makes sense to me as it is.

2001, 2005, 2008, 2010 :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-19 Thread Jamie Lokier
john cooper wrote:
 As before a cpu feature 'check' option is added which warns when
 feature flags (either implicit in a cpu model or explicit on the
 command line) would have otherwise been quietly unavailable to a
 guest:
 
 # qemu-system-x86_64 ... -cpu Nehalem,check
 warning: host cpuid _0001 lacks requested flag 'sse4.2' [0x0010]
 warning: host cpuid _0001 lacks requested flag 'popcnt' [0x0080]

That's a nice feature.  Can we have a 'checkfail' option which refuses
to run if a requested capability isn't available?  Thanks.

I foresee wanting to iterate over the models and pick the latest one
which a host supports - on the grounds that you have done the hard
work of ensuring it is a reasonably good performer, while probably
working on another host of similar capability when a new host is made
available.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-19 Thread Jamie Lokier
Anthony Liguori wrote:
 On 01/19/2010 02:03 PM, Chris Wright wrote:
 * Anthony Liguori (anth...@codemonkey.ws) wrote:

 I'm very much against having -cpu Nehalem.  The whole point of this is
 to make things easier for a user and for most of the users I've
 encountered, -cpu Nehalem is just as obscure as -cpu
 qemu64,-sse3,+vmx,...
  
 What name will these users know?  FWIW, it makes sense to me as it is.

 
 Whatever is in /proc/cpuinfo.
 
 There is no mention of Nehalem in /proc/cpuinfo.

My 5 /proc/cpuinfos say:

Genuine Intel(R) CPU T2500  @ 2.00GHz
Intel(R) Xeon(TM) CPU 3.00GHz
Intel(R) Xeon(R) CPU E5335  @ 2.00GHz
Intel(R) Xeon(TM) CPU 2.80GHz
Intel(R) Xeon(R) CPU X5482  @ 3.20GHz

I'm not sure if that's any more helpful :-)

Especially the first one.  I don't think of my laptop as having a
T2500.  I think of it as having a 32-bit Core Duo.  And I have no idea
what the different types of Xeon are.  But then, I couldn't tell you
whether they are Nehalems or Penryns either, and I'm quite sure the
owners couldn't either.

$ grep name /proc/cpuinfo
model name : QEMU Virtual CPU version 0.9.1

If only they were all so clear :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-05 Thread Jamie Lokier
Avi Kivity wrote:
 On 01/05/2010 02:56 PM, Rusty Russell wrote:
 
 Those should be the same for any sane interface.  They are for classical
 disk devices with larger block sizes (MO, s390 dasd) and also for the
 now appearing 4k sector scsi disks.  But in the ide world people are
 concerned about dos/window legacy compatiblity so they came up with a
 nasty hack:
 
   - there is a physical block size as used by the disk internally
 (4k initially)
   - all the interfaces to the operating system still happen in the
 traditional 512 byte blocks to not break any existing assumptions
   - to make sure modern operating systems can optimize for the larger
 physical sectors the disks expose this size, too.
   - even worse disks can also have alignment hacks for the traditional
 DOS partitions tables, so that the 512 byte block zero might even
 have an offset into the first larger physical block.  This is also
 exposed in the ATA identify information.
 
 All in all I don't think this mess is a good idea to replicate in
 virtio.  Virtio by defintion requires virtualization aware guests, so we
 should just follow the SCSI way of larger real block sizes here.
  
 Yes.  The current VIRTIO_BLK_F_BLK_SIZE says please use this block size.
 We haven't actually specified what happens if the guest doesn't, but the
 spec says must, and the Linux implementation does so AFAICT.
 
 If we want a soft size, we could add that as a separate feature.

 
 No - I agree with Christoph, there's no reason to use a 512/4096 
 monstrosity with virtio.

It would be good if virtio relayed the backing device's basic topology
hints, so:

- If the backing dev is a real disk with 512-byte sectors,
  virtio should indicate 512-byte blocks to the guest.

- If the backing dev is a real disk with 4096-byte sectors,
  virtio should indicate 4096-byte blocks to the guest.

With databases and filesystems, if you care about data integrity:

- If the backing dev is a real disk with 4096-byte sectors,
  or a file whose access is through a 4096-byte-per-page cache,
  virtio must indicate 4096-byte blocks otherwise guest
  journalling is not host-powerfail safe.

You get the idea.  If there is only one parameter, it really should be
at least as large as the smallest unit which may be corrupted by
writes when errors occur.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size

2010-01-05 Thread Jamie Lokier
Avi Kivity wrote:
 Physical block size is the what the logical block size would have been 
 is software didn't suck.  In theory they should be the same, but since 
 compatibility reaons clamp the logical block size to 512, they have to 
 differ.  A disk may have a physical block size of 4096 and emulate 
 logical block size of 512 on top of that using read-modify-write.
 
 Or so I understand it.

I think that's right, but a side effect is that if you get a power
failure during the read-modify-write, bytes anywhere in 4096 sector
may be incorrect, so journalling (etc.) needs to use 4096 byte blocks
for data integrity, even though the drive emulates smaller writes.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2009-12-30 Thread Jamie Lokier
john cooper wrote:
  {
 +.name = Merom,
 +.level = 2,
 +.vendor1 = CPUID_VENDOR_INTEL_1,
 +.vendor2 = CPUID_VENDOR_INTEL_2,
 +.vendor3 = CPUID_VENDOR_INTEL_3,
 +.family = 6, /* P6 */
 +.model = 2,
 +.stepping = 3,
 +.features = PPRO_FEATURES | 
 +/* these features are needed for Win64 and aren't fully implemented 
 */
 +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA |
 +/* this feature is needed for Solaris and isn't fully implemented */
 +CPUID_PSE36,
 +.ext_features = CPUID_EXT_SSE3,  /* from qemu64 */

Isn't SSE3 a generic feature on these Intel CPUs, so this comment is 
unnecessary?
Or is SSE3 not present on a real Merom?  If so, wouldn't it be better to omit 
it?

 +.ext2_features = (PPRO_FEATURES  0x0183F3FF) | 

Could we have a meaningful name for the magic number, please?
Maybe even a:

   #define PPRO_EXT2_FEATURES (PPRO_FEATURES  PPRO_EXT2_MASK)
   #define PPRO_EXT2_MASK (CPUID_... | CPUID_... | ...) /* Fill in. */

 +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX,
 +.ext3_features = CPUID_EXT3_SVM, /* from qemu64 */
 +.xlevel = 0x800A,
 +.model_id = Intel Merom Core 2,
 +},

Does this mean requesting an Intel Merom will give the guest AMD's SVM
capability?  That's handy for virtualisation, but not an accurate CPU
model.  It seems inappropriate to name it Merom, with model_id
Intel Merom Core 2, if it's adding extra qemu-specific capabilities.

I would think few guests are likely to need the nested-SVM capability,
so it could be omitted when Merom is requested, and added as an
additional feature on request from the command line, just like other
cpuid features can be added.

 +{
 +.name = Penryn,
 +.level = 2,
 +.vendor1 = CPUID_VENDOR_INTEL_1,
 +.vendor2 = CPUID_VENDOR_INTEL_2,
 +.vendor3 = CPUID_VENDOR_INTEL_3,
 +.family = 6, /* P6 */
 +.model = 2,
 +.stepping = 3,
 +.features = PPRO_FEATURES | 
 +/* these features are needed for Win64 and aren't fully implemented 
 */
 +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA |
 +/* this feature is needed for Solaris and isn't fully implemented */
 +CPUID_PSE36,
 +.ext_features = CPUID_EXT_SSE3 |
 +CPUID_EXT_CX16 | CPUID_EXT_SSSE3 | CPUID_EXT_SSE41,
 +.ext2_features = (PPRO_FEATURES  0x0183F3FF) | 
 +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX,
 +.ext3_features = CPUID_EXT3_SVM,
 +.xlevel = 0x800A,
 +.model_id = Intel Penryn Core 2,
 +},

Same comments as above for Merom about SVM and the PPRO_FEATURES mask.

You don't include the from qemu64 comments this time.  Is there a reason?

 +{
 +.name = Nehalem,
 +.level = 2,
 +.vendor1 = CPUID_VENDOR_INTEL_1,
 +.vendor2 = CPUID_VENDOR_INTEL_2,
 +.vendor3 = CPUID_VENDOR_INTEL_3,
 +.family = 6, /* P6 */
 +.model = 2,
 +.stepping = 3,
 +.features = PPRO_FEATURES | 
 +/* these features are needed for Win64 and aren't fully implemented 
 */
 +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA |
 +/* this feature is needed for Solaris and isn't fully implemented */
 +CPUID_PSE36,
 +.ext_features = CPUID_EXT_SSE3 |
 +CPUID_EXT_CX16 | CPUID_EXT_SSSE3 | CPUID_EXT_SSE41 |
 +CPUID_EXT_SSE42 | CPUID_EXT_POPCNT,
 +.ext2_features = (PPRO_FEATURES  0x0183F3FF) | 
 +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX,
 +.ext3_features = CPUID_EXT3_SVM,
 +.xlevel = 0x800A,
 +.model_id = Intel Nehalem Core i7,
 +},

Same as previous.

 +{
 +.name = Opteron_G1,
 +.level = 5,
 +.vendor1 = CPUID_VENDOR_INTEL_1,
 +.vendor2 = CPUID_VENDOR_INTEL_2,
 +.vendor3 = CPUID_VENDOR_INTEL_3,

Someone else has already enquired - why Intel vendor id?

 +.family = 15,
 +.model = 6,
 +.stepping = 1,
 +.features = PPRO_FEATURES | 
 +/* these features are needed for Win64 and aren't fully implemented 
 */
 +CPUID_MTRR | CPUID_CLFLUSH | CPUID_MCA |
 +/* this feature is needed for Solaris and isn't fully implemented */
 +CPUID_PSE36,
 +.ext_features = CPUID_EXT_SSE3 | CPUID_EXT_MONITOR,
 +.ext2_features = (PPRO_FEATURES  0x0183F3FF) | 
 +CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX,
 +.ext3_features = CPUID_EXT3_SVM,
 +.xlevel = 0x8008,
 +.model_id = AMD Opteron G1,
 +},

Why do the AMD models have CPUID_EXT_MONITOR but the Intel ones don't.
Is it correct for the CPU models?  Even a lowly 32-bit Intel Core has MONITOR.
Is it omitted for performance?  In that case shouldn't it be omitted for AMD 
too?

 +{
 + 

Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-05 Thread Jamie Lokier
Avi Kivity wrote:
 On 11/03/2009 12:09 AM, Alexander Graf wrote:
 When we want to create a full VirtIO based machine, we're still missing
 graphics output. Fortunately, Linux provides us with most of the frameworks
 to render text and everything, we only need to implement a transport.
 
 So this is a frame buffer backend written for VirtIO. Using this and my
 patch to qemu, you can use paravirtualized graphics.
 
 What does this do that cirrus and/or vmware-vga don't?

*This* virtio-fb doesn't, but one feature I think a lot of users
(including me) would like is:

   Option to resize the guest desktop when the host desktop / host
   window / VNC client resizes.

   Tell the guest to provide multiple desktops when the host has
   multiple desktops, so things like twin monitors work nicely with
   guests.

   Relay EDID/Xrandr information and updates from host to guest, and
   generally handle hotplugging host monitors nicely.

Are there any real hardware standards worth emulating which do that?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] whitelist host virtio networking features [was Re: qemu-kvm-0.11 regression, crashes on older ...]

2009-11-02 Thread Jamie Lokier
Michael Tokarev wrote:
 If you want kvm to behave like this, wrap it into a trivial
 shell script that restarts the guest.

True, kvm has enough crash-bugs elsewhere that I already have to deal
with that.  It'd be nice to distinguish kvm/qemu bugs from guest
bugs, though :-)

kvm/qemu also has lock-up bugs, where it's spinning at 100% and the
guest seems to be stuck (even though the VNC server continues to
work).  That's a bit harder to fix with a wrapper script.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Release plan for 0.12.0

2009-10-14 Thread Jamie Lokier
Michael S. Tsirkin wrote:
 On Wed, Oct 14, 2009 at 09:17:15AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin wrote:
  Looks like Or has abandoned it.  I have an updated version which works
  with new APIs, etc.  Let me post it and we'll go from there.
 

  I'm generally inclined to oppose the functionality as I don't think 
  it  offers any advantages over the existing backends.
  
 
  I patch it in and use it all the time.  It's much easier to setup
  on a random machine than a bridged config.

 
  Having two things that do the same thing is just going to lead to user  
  confusion.
 
 They do not do the same thing. With raw socket you can use windows
 update without a bridge in the host, with tap you can't.

On the other hand, with raw socket, guest Windows can't access files
on the host's Samba share can it?  So it's not that useful even for
Windows guests.

  If the problem is tap is too hard to setup, we should try to  
  simplify tap configuration.
 
 The problem is bridge is too hard to setup.
 Simplifying that is a good idea, but outside the scope
 of the qemu project.

I venture it's important enough for qemu that it's worth working on
that.  Something that looks like the raw socket but behaves like an
automatically instantiated bridge attached to the bound interface
would be a useful interface.

I don't have much time, but I'll help anybody who wants to do that.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/9] provide in-kernel ioapic

2009-10-12 Thread Jamie Lokier
Anthony Liguori wrote:
 We already have the single device model implementation and the 
 limitations are well known.  The best way to move forward is for someone 
 to send out patches implementing separate device models.
 
 At that point, it becomes a discussion of two concrete pieces of code 
 verses hand waving.

Out of curiosity now, what _are_ the behavioural differences between
the in-kernel irqchip and the qemu one?

Are the differences significant to guests, such that it might be
necessary to disable the in-kernel irqchip for some guests, or
conversely, necessary to use KVM for some guests?

Thanks,
-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/3] qemu: make cirrus init value pci spec compliant

2009-10-12 Thread Jamie Lokier
Gleb Natapov wrote:
 But KVM doesn't support it (memory is always writable).

That seems like something which could be fixed.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2 3/9] provide in-kernel ioapic

2009-10-09 Thread Jamie Lokier
Gleb Natapov wrote:
 On Thu, Oct 08, 2009 at 06:42:07PM +0200, Avi Kivity wrote:
  On 10/08/2009 06:34 PM, Gleb Natapov wrote:
  So suppose I have simple watchdog device that required to be poked every
  second, otherwise it resets a computer. On migration we have to migrate
  time elapsed since last poke, but if device doesn't expose it to
  software in any way you are saying we can recreate is some other way?
  
  The time is exposed (you can measure it by poking the device and
 The time yes, not its internal representation. What if one implementation 
 stores how much time passed and another how much time left.
 One counts in wall clack another only when guest runs. etc... and
 this is a device with only one write only register.

In that case you can decide between calling it two different devices
(which have the same guest-visible behaviour but are not
interchangable), or calling them different implementations of one
device - by adding a little more code to save state in a common format.

(Although they may have to be different devices for qemu
configuration, it's ok for them to have the same PCI IDs and for the
guest not to know the difference)

For your watchdog example, it's not hard to pick a saved state which
works for both.

ioapic will be harder to find a useful common saved state, and there
might need to be an *optional hint* section (e.g. for selecting the
next CPU to get an interrupt), but I think it would be worth it in
this case.  Being able to load a KVM image into TCG and vice versa is
quite useful sometimes.  E.g. I've had to do some OS installs using
TCG at first, then switch to KVM later for performance.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2 4/9] provide in-kernel apic

2009-10-09 Thread Jamie Lokier
Glauber Costa wrote:
  It ensures the two models are compatible.  Since they're the same device  
  from the point of view of the guest, there's no reason for them to have  
  different representations or to be incompatible.
 
 live migration between something that has in-kernel irqchip and
 something that has not is likely to be a completely untested
 thing. And this is the only reason we might think of it as the same
 device. I don't see any value in supporting this combination

Not just live migration.  ACPI sleep + savevm + loadvm + ACPI resume,
for example.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/3] qemu: make cirrus init value pci spec compliant

2009-10-09 Thread Jamie Lokier
Michael S. Tsirkin wrote:
 More long-term, we have duplication between reset and init
 routines. Maybe devices really should have init and cleanup,
 and on reset we'd cleanup all devices and then init them again?

It sounds look a good idea to me.  That is, after all, what hardware
reset often does :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2 4/9] provide in-kernel apic

2009-10-09 Thread Jamie Lokier
Glauber Costa wrote:
 On Fri, Oct 09, 2009 at 11:06:41AM +0100, Jamie Lokier wrote:
  Glauber Costa wrote:
It ensures the two models are compatible.  Since they're the same 
device  
from the point of view of the guest, there's no reason for them to have 
 
different representations or to be incompatible.
   
   live migration between something that has in-kernel irqchip and
   something that has not is likely to be a completely untested
   thing. And this is the only reason we might think of it as the same
   device. I don't see any value in supporting this combination
  
  Not just live migration.  ACPI sleep + savevm + loadvm + ACPI resume,
  for example.
 Yes, but in this case too, I'd expect the irqchipness of qemu not to change.

If I've just been sent an image produced by someone running KVM, and
my machine is not KVM-capable, or I cannot upgrade the KVM kernel
module because it's in use by other VMs (had this problem a few
times), there's no choice but to change the irqchipness.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2 3/9] provide in-kernel ioapic

2009-10-09 Thread Jamie Lokier
Glauber Costa wrote:
 On Thu, Oct 08, 2009 at 06:22:48PM +0200, Gleb Natapov wrote:
  On Thu, Oct 08, 2009 at 06:17:57PM +0200, Avi Kivity wrote:
   On 10/08/2009 06:07 PM, Jamie Lokier wrote:
   Haven't we already confirmed that it *isn't* just an ioapic accelerator
   because you can't migrate between in-kernel iopic and qemu's ioapic?
   
   We haven't confirmed it.  Both implement the same spec, and if you
   can't migrate between them, one of them has a bug (for example, qemu
   ioapic doesn't implement polarity - but it's still just a bug).
   
  Are you saying that HW spec (that only describes software visible behavior)
  completely defines implementation? No other internal state is needed
  that may be done differently by different implementations?
 Most specifications leaves a lot as implementation specific.
 
 It's not hard to imagine a case in which both devices will follow
 the spec correctly, (no bugs involved), and yet differ in the
 implementation.

Avi's not saying the implementations won't differ.  I believe he's
saying that implementation-specific states don't need to be saved if
they have no effect on guest visible behaviour.

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH v2 3/9] provide in-kernel ioapic

2009-10-08 Thread Jamie Lokier
Avi Kivity wrote:
 On 10/08/2009 03:49 PM, Anthony Liguori wrote:
 Glauber Costa wrote:
 This patch provides kvm with an in-kernel ioapic. We are currently 
 not enabling it.
 The code is heavily based on what's in qemu-kvm.git.
 
 It really ought to be it's own file and own device model.  Having the 
 code mixed in with ioapic.c is confusing because it's unclear what 
 code is in use when the in-kernel model is used.
 
 I disagree.  It's the same device with the same guest-visible interface 
 and the same host-visible interface (save/restore, 'info ioapic' if we 
 write one).  Splitting it into two files will only result in code 
 duplication.
 
 Think of it as an ioapic accelerator.

Haven't we already confirmed that it *isn't* just an ioapic accelerator
because you can't migrate between in-kernel iopic and qemu's ioapic?

Imho, if they cannot be swapped transparently, they are different
device emulations.

OF course there's nothing wrong with sharing lots of code.

Maybe ioapic.c and ioapic-kvm.c, with shared code in ioapic-common.c?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/3] qemu: make cirrus init value pci spec compliant

2009-10-08 Thread Jamie Lokier
Avi Kivity wrote:
 On 10/08/2009 06:06 PM, Michael S. Tsirkin wrote:
 On Thu, Oct 08, 2009 at 05:29:29PM +0200, Avi Kivity wrote:

 On 10/08/2009 04:52 PM, Michael S. Tsirkin wrote:
  
 PCI memory should be disabled at reset, otherwise
 we might claim transactions at address 0.
 I/O should also be disabled, although for cirrus
 it is harmless to enable it as we do not
 have I/O bar.
 
 Note: need bios fix for this patch to work:
 currently pc-bios incorrently assumes that it does not
 need to enable i/o unless device has i/o bar.
 
 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 
 This needs to be conditional on the machine type.  Old machines must
 retain this code for live migration to work (we need to live migrate the
 bios, so we can't assume the bios fix is in during live migration from
 older qemus).
  
 No, if you migrate from older qemu you will be fine as command
 is enabled there on init.

 
 Right.

No, I think Avi was right the first time.

Migrating from an older qemu will be fine at first, but at the next
reset _following_ migration, it'll be running the old BIOS on a new
qemu and fail.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Extending virtio_console to support multiple ports

2009-08-28 Thread Jamie Lokier
Alan Cox wrote:
   - Then, are we certain that there's no case where the tty layer will
  call us with some lock held or in an atomic context ? To be honest,
  I've totally lost track of the locking rules in tty land lately so it
  might well be ok, but something to verify.
 
 Some of the less well behaved line disciplines do this and always have
 done.

I had a backtrace in my kernel log recently which looked like that,
while doing PPP over Bluetooth RFCOMM.  Resulted in AppArmor
complaining that it's hook was being called in irq context.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Notes on block I/O data integrity

2009-08-27 Thread Jamie Lokier
Christoph Hellwig wrote:
 On Wed, Aug 26, 2009 at 07:57:55PM +0100, Jamie Lokier wrote:
  Christoph Hellwig wrote:
what about LVM? iv'e read somewhere that it used to just eat barriers
used by XFS, making it less safe than simple partitions.
   
   Oh, any additional layers open another by cans of worms.  On Linux until
   very recently using LVM or software raid means only disabled
   write caches are safe.
  
  I believe that's still true except if there's more than one backing
  drive, so software RAID still isn't safe.  Did that change?
 
 Yes, it did change. 

 I will recommend to keep doing what people caring for their data
 have been doing since these volatile write caches came up: turn them
 off.

Unfortunately I tried that on a batch of 1000 or so embedded thingies
with ext3, and the write performance plummeted.  They are the same
thingies where I observed lack of barriers resulting in filesystem
corruption after power failure.  We really need barriers with ATA
disks to get decent write performance.

It's a good recommendation though.

 That being said with the amount of bugs in filesystems related to
 write barriers my expectation for the RAID and device mapper code is
 not too high.

Turning off volatile write cache does not provide commit integrity
with RAID.

RAID needs barriers to plug, drain and unplug the queues across all
backing devices in a coordinated manner quite apart from the volatile
write cache.  And then there's still that pesky problem of writes
which reach one disk and not it's parity disk.

Unfortunately turning off the volatile write caches could actually
make the timing window for failure worse, in the case of system crash
without power failure.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Notes on block I/O data integrity

2009-08-27 Thread Jamie Lokier
Christoph Hellwig wrote:
 As various people wanted to know how the various data integrity patches
 I've send out recently play together here's a small writeup on what
 issues we have in QEMU and how to fix it:

Thanks for taking this on.  Both this email and the one on
linux-fsdevel about Linux behaviour are wonderfully clear summaries of
the issues.

 Action plan for QEMU:

  - IDE needs to set the write cache enabled bit
  - virtio needs to implement a cache flush command and advertise it
(also needs a small change to the host driver)

With IDE and SCSI, and perhaps virtio-blk, guests should also be able
to disable the write cache enabled bit, and that should be
equivalent to the guest issuing a cache flush command after every
write.

At the host it could be implemented as if every write were followed by
flush, or by switching to O_DSYNC (cache=writethrough) in response.

The other way around: for guests where integrity isn't required
(e.g. disposable guests for testing - or speed during guest OS
installs), you might want an option to ignore cache flush commands -
just let the guest *think* it's committing to disk, but don't waste
time doing that on the host.

 For disks using volatile write caches, the cache flush is implemented by
 a protocol specific request, and the the barrier request are implemented
 by performing cache flushes before and after the barrier request, in
 addition to the draining mentioned above.  The second cache flush can be
 replaced by setting the Force Unit Access bit on the barrier request 
 on modern disks.

For fdatasync (etc), you've probably noticed that it only needs one
cache flush by itself, no second request or FUA write.

Less obviously, there are opportunities to merge and reorder around
non-barrier flush requests in the elevator, and to eliminate redundant
flush requests.

Also you don't need flushes to reach every backing drive on RAID, but
knowing which ones to leave out is tricky and needs more hints from
the filesystem.

I agree with the whole of your general plan, both in QEMU and in Linux
as a host.  Spot on!

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Notes on block I/O data integrity

2009-08-26 Thread Jamie Lokier
Nikola Ciprich wrote:
 clustered LVM SHOULD not have problems with it, as we're using just
 striped volumes,

Note that LVM does not implement barriers at all, except for simple
cases of a single backing device (I'm not sure if that includes
dm-crypt).

So your striped volumes may not offer this level of integrity.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Notes on block I/O data integrity

2009-08-26 Thread Jamie Lokier
Christoph Hellwig wrote:
  what about LVM? iv'e read somewhere that it used to just eat barriers
  used by XFS, making it less safe than simple partitions.
 
 Oh, any additional layers open another by cans of worms.  On Linux until
 very recently using LVM or software raid means only disabled
 write caches are safe.

I believe that's still true except if there's more than one backing
drive, so software RAID still isn't safe.  Did that change?

But even with barriers, software RAID may have a consistency problem
if one stripe is updated and the system fails before the matching
parity stripe is updated.

I've been told that some hardware RAID implementations implement a
kind of journalling to deal with this, but Linux software RAID does not.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] introduce kvm64 CPU

2009-08-23 Thread Jamie Lokier
Avi Kivity wrote:
 On 08/22/2009 12:59 AM, Andre Przywara wrote:
 Typically users will want more specialized greatest common denomiator 
 cpu types; if a site has standardized on recent hardware they will 
 want the features of that hardware exposed.
 Sure, but this was not the purpose of this patch. Currently KVM guests 
 see a CPU type which is TCG dependent, so I just wanted to get rid of 
 this. Features of TCG and features of the host processor are totally 
 uncorrelated. This new type should be KVM's default, leaving -cpu host 
 as the alternative for the non-migration case.
 
 That does make sense.  Note we can call it '-cpu kvm' since qemu will 
 strip away long mode if it is not supported by the cpu or by the kernel.

I thought the point was to provide a lowest common denominator for
migration, while acknowledging that 64-bit is too useful to exclude?

So if you start running on a 64-bit host, but know you have 32-bit
hosts in your pool, you'll need '-cpu kvm32'.

And if you start on a 32-bit host, but want to migrate to a 64-bit
host, will that work if the destination has different cpuid than the
source?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] introduce kvm64 CPU

2009-08-21 Thread Jamie Lokier
Andre Przywara wrote:
 In addition to the TCG based qemu64 type let's introduce a kvm64 CPU type,
 which is the least common denominator of all KVM-capable x86-CPUs
 (based on Intel Pentium 4 Prescott). It can be used as a base type
 for migration.

The idea is nice and the name is right, but the description is wrong.

It obviously isn't the least common denominator of all KVM-capable
x86-CPUs, as my KVM-capable Core Duo (32-bit) cannot run it.

A kvm32 would be nice for symmetry.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] introduce kvm64 CPU

2009-08-21 Thread Jamie Lokier
Avi Kivity wrote:
 On 08/21/2009 12:34 AM, Andre Przywara wrote:
 In addition to the TCG based qemu64 type let's introduce a kvm64 CPU type,
 which is the least common denominator of all KVM-capable x86-CPUs
 (based on Intel Pentium 4 Prescott). It can be used as a base type
 for migration.

 
 Typically users will want more specialized greatest common denomiator 
 cpu types; if a site has standardized on recent hardware they will want 
 the features of that hardware exposed.

My experience is of sites which don't standardise on hardware in that
way.  They standardise on suppliers, and buy whatever is available
when more hardware is needed, or reuse existing hardware which is made
redundant from some other purpose.

kvm64 is a good compromise for sites like that, because it should work
with everything that's 64-bit and capable of running KVM.  I expect
all server machines which can run KVM are 64-bit, and it's only
laptops which have 32-bit KVM-capable chips (but I'm not sure).

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] introduce kvm64 CPU

2009-08-21 Thread Jamie Lokier
Andre Przywara wrote:
 If you happen to stuck with 32bit 
 (pity you!) then I agree that a kvm32 would be nice to have.
 Will think about it...

I know that 32-bit is a bit slower for some things due to register
pressure (but it's a bit faster for some things due to less memory
needed for pointers), and it's RAM is limited to about 3GB in
practice, which affects some things but is plenty for others.

I know it's a pain for KVM developers to support 32-bit hosts.

And yes, it would be nice to run a 64-bit guest from time to time.

But apart from being a bit slower, is there anything wrong with 32-bit
x86s compared with 64-bit that justifies pity?

The 32-bitness doesn't seem to be a handicap, only perhaps the
expected amount of slowness for a laptop that's 2-3 years old, or a
current netbook, compared with current desktops and servers.

So I'm having a hard time understanding why 32-bitness is considered
bad for KVM - why pity?  Does it have any other real problems than
not being able to emulate 64-bit guests that I should know about, or
is it just a matter of distaste?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-20 Thread Jamie Lokier
Amit Shah wrote:
  I think strings are better as numbers for identifying protocols as you  
  can work without a central registry for the numbers then.
 
 I like the way assigned numbers work: it's simpler to code, needs a
 bitmap for all the ports that fits in nicely in the config space and
 udev rules / scripts can point /dev/vmch02 to /dev/console.

How would a third party go about assigning themselves a number?

For the sake of example, imagine they develop a simple service like
guesttop which let's the host get a listing of guest processes.

They'll have to distributed app-specific udev rule patches for every
guest distro, which sounds like a lot of work.  The app itself is
probably a very simple C program; the hardest part of making it
portable across distros would be the udev rules, which is silly.

Anyway, every other device has a name or uuid these days.  You can
still use /dev/sda1 to refer to your boot partition, but LABEL=boot is
also available if you prefer.  Isn't that the ethos these days?

Why not both?  /dev/vmch05 if you prefer, plus symlink
/dev/vmch-guesttop - /dev/vmch05 if name=guesttop was given to QEMU.

If you do stay with numbers only, note that it's not like TCP/UDP port
numbers because the number space is far smaller.  Picking a random
number that you hope nobody else uses is harder.

... Back to technical bits.  If config space is tight, use a channel!
Dedicate channel 0 to control, used to fetch the name (if there is
one) for each number.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qcow2 corruption - seems to be fixed in kvm-85 and later :-)

2009-08-09 Thread Jamie Lokier
Hi,

Sometimes it's nice to find a mail with good news.

A while back I reported corruption with qcow2, with the subject
qcow2 corruption observed, fixed by reverting old change.

I'd noticed that kvm-83 was corrupting a Windows 2k disk image, which
was failing to boot, blue screening quite early.

I found there was a qcow2 bug introduced from kvm-72 to kvm-73, and
another from kvm-76 to kvm-77.  Reverting both fixed this symptom.


In order to check the bug later, I kept a copy of the disk image which
blue screened.

It still crashes with kvm-84.  The release notes indicate there were
some qcow2 fixes in that version; they were not enough to fix this
problem.

There were more qcow2 fixes in kvm-85.

Happily, I can now report that kvm-85 and kvm-88 both boot this image
with no apparent problems, and I will be deleting this junk disk image
now that I'm confident no further testing is required :-)


Thanks!
-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-06 Thread Jamie Lokier
Amit Shah wrote:
 On (Thu) Aug 06 2009 [08:58:01], Anthony Liguori wrote:
  Amit Shah wrote:
  On (Thu) Aug 06 2009 [08:29:40], Anthony Liguori wrote:

  Amit Shah wrote:
  
  Sure; but there's been no resistance from anyone from including the
  virtio-serial device driver so maybe we don't need to discuss that.
  
  There certainly is from me.  The userspace interface is not 
  reasonable  for guest applications to use.
  
 
  One example that would readily come to mind is dbus. A daemon running on
  the guest that reads data off the port and interacts with the desktop by
  appropriate dbus commands. All that's needed is a stream of bytes and
  virtio-serial provides just that.

 
  dbus runs as an unprivileged user, how does dbus know which  
  virtio-serial port to open and who sets the permissions on that port?
 
 The permission part can be handled by package maintainers and sysadmins
 via udev policies.
 
 So all data destined for dbus consumption gets to a daemon and that
 daemon then sends it over to dbus.

virtio-serial is nice, easy, simple and versatile.  We like that; it
should stay that way.

dbus isn't a good match for this.

dbus is not intended for communication between hosts, by design.

It depends on per-app configuration files in
/etc/dbus/{session,system}.d/, which are expected to match the
installed services.

For this, the guest's files in /etc/dbus/ would have to match the QEMU
host host services in detail.  dbus doesn't have a good mechanism for
copying with version skew between both of them, because normally
everything resides on the same machine and the config and service are
updated at the same time.  This is hard to guarantee with a VM.

Apart from dbus, hard-coded meanings of small N in /dev/vmchN are
asking for trouble.  It is bound to break when widely deployed and
guest/host configs don't match.  It also doesn't fit comfortably when
you have, say, bob and alice both logged in with desktops on separate
VTs.  Clashes are inevitable, as third-party apps pick N values for
themselves then get distributed - unless N values can be large
(/dev/vmch44324 == kernelstats...).

Sysadmins shouldn't have to hand-configure each app, and shouldn't
have to repair clashes in defaults.  Just Work is better.

virtio-serial is nice.  The only ugly part is _finding_ the right
/dev/vmchN.

Fortunately, _any_ out-of-band id string or id number makes it perfect.

An option to specify PCI vendor/product ids in the QEMU host
configuration is good enough.

An option to specify one or more id strings is nicer.

Finally, Anthony hit on an interesting idea with USB.  Emulating USB
sucks.  But USB's _descriptors_ are quite effective, and the USB basic
protocol is quite reasonable too.

Descriptors are just a binary blob in a particular format, which
describe a device and also say what it supports, and what standard
interfaces can be used with it too.  Bluetooth is similar; they might
even use the same byte format, I'm not sure.

All the code for parsing USB descriptors is already present in guest
kernels, and the code for making appropriate device nodes and
launching apps is already in udev.  libusb also allows devices to be
used without a kernel driver, and is cross-platform.  There are plenty
of examples of creating USB descriptors in QEMU, and may be the code
can be reused.

The only down side of USB is that emulating it sucks :-)  That's mainly
due to the host controllers, and the way interrupts use polling.

So here's a couple of ideas:

   - virtio-usb, using virtio instead of a hardware USB host
 controller.  That would provide all the features of USB
 naturally, like hotplug, device binding, access from userspace,
 but with high performance, low overhead, and no interrupt polling.

 You'd even have the option of cross-platform guest apps, as well
 as working on all Linux versions, by emulating a host controller
 when the guest doesn't have virtio-usb.

 As a bonus, existing USB support would be accelerated.

   - virtio-serial providing a binary id blob, whose format is the
 same as USB descriptors.  Reuse the guest's USB parsing and
 binding to find and identify, but the actual device functionality
 would just be a byte pipe.

 That might be simple, as all it involves is a blob passed to the
 guest from QEMU.  QEMU would build the id blob, maybe reusing
 existing USB code, and the guest would parse the blob as it
 already does for USB devices, with udev creating devices as it
 already does.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-05 Thread Jamie Lokier
Anthony Liguori wrote:
 Richard W.M. Jones wrote:
 Have you considered using a usb serial device?  Something attractive 
 about it is that a productid/vendorid can be specified which means that 
 you can use that as a method of enumerating devices.
 
 Hot add/remove is supported automagically.

The same applies to PCI: productid/vendorid (and subids);
PCI hotplug is possible though not as native as USB.

Here's another idea: Many devices these days have a serial number or
id string.  E.g. USB storage, ATA drives, media cards, etc.  Linux
these days creates alias device nodes which include the id string in
the device name.  E.g. /dev/disks/by-id/ata-FUJITSU_MHV2100BH_NWAQT662615H

So in addition to (or instead of) /dev/vmch0, /dev/vmch1 etc.,
Linux guests could easily generate:

/dev/vmchannel/by-role/clipboard-0
/dev/vmchannel/by-role/gueststats-0
/dev/vmchannel/by-role/vmmanager-0

It's not necessary to do this at the beginning.  All that is needed is
to provide enough id information that will appear in /sys/..., so that
that a udev policy for naming devices can be created at some later date.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-07-27 Thread Jamie Lokier
Daniel P. Berrange wrote:
  I expect the first problem you'll run into is that copy/paste daemon has 
  to run as an unprivileged user but /dev/vmch3 is going to be owned by 
  root.  You could set udev rules for /dev/vmch3 but that's pretty 
  terrible IMHO.
 
 I don't think that's not too bad, for example, with fast-user-switching 
 between multiple X servers and/or text consoles, there's already support
 code that deals with chown'ing things like /dev/snd/* devices to match 
 the active console session. Doing the same with the /dev/vmch3 device so
 that it is only ever accessible to the current logged in user actually
 fits in to that scheme quite well.

With multiple X servers, there can be more than one currently logged in user.

Same with multiple text consoles - that's more familiar.

Which one owns /dev/vmch3?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-15 Thread Jamie Lokier
Kevin Wolf wrote:
 Can we at least allow \, instead of ,, in parameter parsing, so that the
 backslash has the practical benefit of being a single universal escape
 character?

Is there a good reason why we cannot simply use \char to escape
_any_ character, in every context where a user-supplied
string/name/path/file is used?

I'm thinking of consistency here.  Instead of special cases for
filenames, why not a standard scheme for all the places in command
lines _and_ the monitor where a name/path/file is needed?

There are many examples where it would be useful if unusual characters
didn't break things, they simply worked.

Examples: -vnc unix: path, -net port: device path, -net script path,
-net sock= path, -net group= groupname, tap and bt device names.

\char is an obvious scheme to standardise on given QEMU's unix shell
heritage.  It would work equally well for command line options (which
are often comma-separated) and for monitor commands (which are often
space-separated).

It would have the nice property of being easy for management
programs/scripts to quote, without them having a special list of
characters to quote, without needing to update them if QEMU needs to
quote more characters in future for some reason.

Now, I see one significant hurdle with that: it's quite inconvenient
for Windows users, typing paths like c:\path\to\dir\file, if those
backslashes are stipped.

So I propose this as a universal quoting scheme:

\char where char is not ASCII alphanumeric.

Shell quoting is easy:

   qfile=`printf %s $file | sed 's/[^0-9a-zA-Z]//g'`

   qemu -drive file=$qfile,if=scsi,media=disk

Same quoting applied when sending the monitor a command to change a
CD-ROM file or add a USB disk, for example.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] rev5: support colon in filenames

2009-07-15 Thread Jamie Lokier
Ram Pai wrote:
 I have verified with relative paths and it works.
 
 After analyzing the code, i came to the conclusion that call to
 realpath()  adds no real value. 
 
 The logic in bdrv_open2() is something like this
 
 bdrv_open2()
 {
if (snapshot) {
  backup = realpath(filename);
  filename=generate_a_temp_file();
}
drv=parse_and_get_bdrv(filename);
drv-bdrv_open(filename);
if (backup) {
  bdrv_open2(backup);
}
 }
   
 in the above function, the call to realpath() would have been useful had
 it changed the current working directory before calling
 bdrv_open2(backup). It does not. If in case any function within
 drv-bdrv_open change the cwd, then I expect them to restore before
 returning.
 
 Also drv-bdrv_open() can anyway handle relative paths. 
 
 Hence I conclude that the call to realpath() adds no value.
 
 Do you see a flaw in this logic? 

I don't know about snapshot, but when a qcow2 file contains a relative
path to it's backing file, QEMU cannot simply open using that relative
path, because it's relative to the directory containing the qcow2 file,
not QEMU's current directory.

(That said, I find it quite annoying when renaming qcow2 files that
there's no easy way to rename their backing files, and it's even worse
when moving qcow2 files which refer to backing files in another
directory, and _especially_ when the qcow2 file contains an absolute
path to the backing file and you're asked to move it to another system
which doesn't have those directories.)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qcow2 relative paths (was: [PATCH] rev5: support colon in filenames)

2009-07-15 Thread Jamie Lokier
Ram Pai wrote:
 I have successfully verified qcow2 files. But then I may not be trying
 out the exact thing that you are talking about. Can you give me a test 
 case that I can verify.

Commands tried with qemu-0.10.0-1ubuntu1:

$ mkdir unlikely_subdir
$ cd unlikely_subdir
$ qemu-img create -f qcow2 backing.img 10
Formatting 'backing.img', fmt=qcow2, size=10 kB
$ qemu-img create -f qcow2 -b ../unlikely_subdir/backing.img main.img 10
Formatting 'main.img', fmt=qcow2, backing_file=../unlikely_subdir/backing.img, 
size=10 kB
$ cd ..
$ qemu-img info unlikely_subdir/main.img 
image: unlikely_subdir/main.img
file format: qcow2
virtual size: 10K (10240 bytes)
disk size: 16K
cluster_size: 4096
highest_alloc: 16384
backing file: ../unlikely_subdir/backing.img (actual path: 
unlikely_subdir/../unlikely_subdir/backing.img)

See especially the actual path line.

$ mv unlikely_subdir other_subdir
$ ls -l other_subdir
total 32
-rw-r--r-- 1 jamie jamie 16384 2009-07-15 21:59 backing.img
-rw-r--r-- 1 jamie jamie 16384 2009-07-15 21:59 main.img
$ qemu-img info other_subdir/main.img 
qemu-img: Could not open 'other_subdir/main.img'

What an unhelpful error message...  There isn't even a way to find out
the backing file path which the tool is looking for.

 And one other thing. Let me know if there a test-suite that I can try
 for regressions.

Sorry, I don't know anything about any QEMU test suites.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-15 Thread Jamie Lokier
Jan Kiszka wrote:
  Now, I see one significant hurdle with that: it's quite inconvenient
  for Windows users, typing paths like c:\path\to\dir\file, if those
  backslashes are stipped.
 
 We could exclude Windows from this (I think to remember that filenames
 are more restricted there anyway) or define a different, Windows-only
 escape character.

I think both of those are bad ideas, because the same management
scripts can run on Windows, and for consistency it's not just file
names.  Even Windows has block devices and network devices :-)

Fortunately where char is not ASCII alphanumeric solves the
practical cases where the user types an ordinary pathname.

Or the user can type forward slashes just like they do in unix.

  So I propose this as a universal quoting scheme:
  
  \char where char is not ASCII alphanumeric.
  
  Shell quoting is easy:
  
 qfile=`printf %s $file | sed 's/[^0-9a-zA-Z]//g'`
  
 qemu -drive file=$qfile,if=scsi,media=disk

I forgot a very obscure corner case, where the last character of the
filename is a newline character.  To do the right thing (with Bash at
least), it should say '%s\n' instead of %s. Sue me :-)

  Same quoting applied when sending the monitor a command to change a
  CD-ROM file or add a USB disk, for example.
 
 To me this direction looks more promising than any other proposal so far.

I wondered if it was just me...

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-15 Thread Jamie Lokier
Jan Kiszka wrote:
 Jamie Lokier wrote:
  Jan Kiszka wrote:
  Now, I see one significant hurdle with that: it's quite inconvenient
  for Windows users, typing paths like c:\path\to\dir\file, if those
  backslashes are stipped.
  We could exclude Windows from this (I think to remember that filenames
  are more restricted there anyway) or define a different, Windows-only
  escape character.
  
  I think both of those are bad ideas, because the same management
  scripts can run on Windows, and for consistency it's not just file
  names.  Even Windows has block devices and network devices :-)
 
 I'm not sure if there is actually so much portability/reusability
 between Windows and the rest of the universe, but I'm surely not an
 expert in this.

In my experience, shell scripts and Perl scripts tend to work either
with no changes, or very small changes.

  Fortunately where char is not ASCII alphanumeric solves the
  practical cases where the user types an ordinary pathname.
  
  Or the user can type forward slashes just like they do in unix.
 
 We would still have to deal with the fact that so far '\' had no special
 meaning on Windows - except that is was the well-known path separator.
 So redefining its meaning would break a bit...

The point is that paths tend to have alphanumeric characters at the
start of each component, so it doesn't matter in most cases that it's
redefined.  People won't notice because c:\path\to\file will continue
to work, whether it's by itself or part of a multi-option option.

Exceptions are \\host\path and \\.\device, where the error will be so
obvious they'll learn quickly.  We could find a more complex scheme
where \\ is unaffected, but complex is not good and will be wrongly
implemented by other programs.

Whereas \char is very common, well known and easy to get right, even
when people guess how it's done, like they do when working out how to
quote paths for rsync and ssh.

Oh, I'm suddenly thinking that . should be included in alphanumeric :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-15 Thread Jamie Lokier
Anthony Liguori wrote:
 Jan Kiszka wrote:
 We would still have to deal with the fact that so far '\' had no special
 meaning on Windows - except that is was the well-known path separator.
 So redefining its meaning would break a bit...
   
 
 That's the problem.  You will break existing Windows users.
 
 I know this goes against the current momentum in qemu, but overloading 
 one option with a bunch of parameters seems absolutely silly to me.
 
 IMHO, -drive file=foo.img,if=virtio,cache=off should have always been at 
 least three parameters.

That's fine for command lines.  I don't necessarily disagree with you.

But how do you propose to handle paths in monitor commands, when the
path contains a space/quote/whatever as it often does on Windows (My
Documents, Program Files)?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-15 Thread Jamie Lokier
Anthony Liguori wrote:
 Jamie Lokier wrote:
 Anthony Liguori wrote:
   
 Jan Kiszka wrote:
 
 We would still have to deal with the fact that so far '\' had no special
 meaning on Windows - except that is was the well-known path separator.
 So redefining its meaning would break a bit...
  
   
 That's the problem.  You will break existing Windows users.
 
 I know this goes against the current momentum in qemu, but overloading 
 one option with a bunch of parameters seems absolutely silly to me.
 
 IMHO, -drive file=foo.img,if=virtio,cache=off should have always been at 
 least three parameters.
 
 
 That's fine for command lines.  I don't necessarily disagree with you.
 
 But how do you propose to handle paths in monitor commands, when the
 path contains a space/quote/whatever as it often does on Windows (My
 Documents, Program Files)?
   
 
 Same basic rules apply.  The monitor should use shell-style quoting.

So instead of consistency, you like the idea of using different
quoting rules for the monitor than for command line arguments?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [RFC] allow multi-core guests: introduce cores= option to -cpu

2009-07-03 Thread Jamie Lokier
Andre Przywara wrote:
 So what about: -smp 4,cores=2,threads=2[,sockets=1] to inject 4 vCPUs 
 in one package (automatically determined if omitted) with two cores and 
 two threads/core? All parameters except the number of vCPUs would be 
 optional,

Why is the number of vCPUs required at all?

   -smp cores=2,threads=2

The 4 is redundant.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest - host interface for simple communication

2009-06-24 Thread Jamie Lokier
Amit Shah wrote:
 On (Wed) Jun 24 2009 [17:40:49], Jamie Lokier wrote:
  Amit Shah wrote:
   A few sample uses for a vmchannel are to share the host and guest
   clipboards (to allow copy/paste between a host and a guest), to
   lock the screen of the guest session when the vnc viewer is closed,
   to find out which applications are installed on a guest OS even when
   the guest is powered down (using virt-inspector) and so on.
  
  Those all look like useful features.
  
  Can you run an application to provide those features on a guest which
  _doesn't_ have a vmchannel/virtio-serial support in the kernel?
  
  Or will it be restricted only to guests which have QEMU-specific
  support in their kernel?
 
 libguestfs currently uses the -net user based vmchannel interface that
 exists in current qemu. That doesn't need a kernel that doesn't have
 support for virtio-serial.

That's great!

If that works fine, and guest apps/libraries are using that as a
fallback anyway, what benefit do they get from switching to
virtio-serial when they detect that instead, given they still have
code for the -net method?

Is the plan to remove -net user based support from libguestfs?

Is virtio-serial significantly simpler to use?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-24 Thread Jamie Lokier
Avi Kivity wrote:
 On 06/23/2009 02:31 PM, Paul Brook wrote:
 On Tuesday 23 June 2009, Avi Kivity wrote:

 On 06/23/2009 12:47 AM, Andre Przywara wrote:
  
 KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU
 clears it. On some occasions one want to set or clear it the other way
 round (for instance to get HyperV running inside a guest).
 Allow the default to be overridden on the command line and fix some
 whitespace damage on the way.

 It makes sense for qemu to set the hypervisor bit unconditionally.  A
 guest running under qemu is not bare metal.
  
 
 I see no reason why a guest has to be told that it's running inside a VM.
 In principle an appropriately configured qemu should be indistinguishable 
 from
 real hardware. In practice it's technically infeasible to cover absolutely
 everything, but if we set this bit we're not even trying.
 
 I have no objection to the bit being set by default for the QEMU CPU types.

 
 I agree it's pointless, but it is a Microsoft requirement for passing 
 their SVVP tests.  Enabling it by default makes life a little easier for 
 users who wish to validate their hypervisor and has no drawbacks.

Hold on.

Do the SVVP tests fail on a real (non-virtal) machine then?

Or is QEMU's machine emulation insufficiently accurate?

I see a drawback in setting the bit by default.

Something I expect from an emulator is that it behaves like a real
machine to the extent possible.  In particular, guest code which
attempts to check if it's running on a real machine should get the
answer yes.  Unfriendly guest code which pops up a message like
Sorry I refuse to work for you after 100 hours/ because you are
attempting to run me in a virtual machine, and don't even think of
trying to hide this from me now you know I look for it should never
do so.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-17 Thread Jamie Lokier
Avi Kivity wrote:
 On 06/16/2009 09:32 PM, Jamie Lokier wrote:
 Avi Kivity wrote:

 Another issue is enumeration.  Guests will present their devices in the
 order they find them on the pci bus (of course enumeration is guest
 specific).  So if I have 2 virtio controllers the only way I can
 distinguish between them is using their pci slots.
 
 virtio controllers really should have a user-suppliable string or UUID
 to identify them to the guest.  Don't they?
 
 virtio controllers don't exist.  When they do, they may have a UUID or 
 not, but in either case guest infrastructure is in place for reporting 
 the PCI slot, not the UUID.
 
 virtio disks do have a UUID.  I don't think older versions of Windows 
 will use it though, so if you reorder your slots you'll see your drive 
 letters change.  Same with Linux if you don't use udev by-uuid rules.

I guess I meant virtio disks, so that's ok.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-17 Thread Jamie Lokier
Avi Kivity wrote:
 If management apps need to hard-code which slots are available on
 different targets and different qemu versions, or restrictions on which
 devices can use which slots, or knowledge that some devices can be
 multi-function, or ... anything like that is just lame.

 
 You can't abstract these things away.  If you can't put a NIC in slot 4, 
 and you have 7 slots, then you cannot have 7 NICs.  Having qemu allocate 
 the slot numbers does not absolve management from knowing this 
 limitation and preventing the user from creating a machine with 7 slots.
 
 Likewise, management will have to know which devices are multi-function, 
 since that affects their hotpluggability.  Ditto if some slot if faster 
 than others, if you want to make use of this information you have to let 
 the upper layers know.
 
 It could be done using an elaborate machine description that qemu 
 exposes to management coupled with a constraint solver that optimizes 
 the machine layout according to user specifications and hardware 
 limitations.  Or we could take the view that real life is not perfect 
 (especially where computers are involved), add some machine specific 
 knowledge, and spend the rest of the summer at the beach.

To be honest, an elaborate machine description is probably fine...

A fancy constraint solver is not required.  A simple one strikes me as
about as simple as what you'd hard-code anyway, but with fewer special
cases.

Note that the result can fail due to things like insufficient address
space for all the device BARs even when they _are_ in the right slots.
Especially if there are lots of slots, or bridges which can provide
unlimited slots.

That is arcane: device-dependent, CPU-dependent, machine-dependent,
RAM-size dependent (in a non-linear way), device-option-dependent and
probably QEMU-version-dependent too.

It would be nice if libvirt (et al) would prevent the user from
creating a VM with insufficient BAR space for that machine, but I'm
not sure how to do it sanely, without arcane knowledge getting about.

Maybe that idea of a .so shared by qemu and libvirt, to manipulate
device configurations, is a sane one after all.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Avi Kivity wrote:
 Another issue is enumeration.  Guests will present their devices in the 
 order they find them on the pci bus (of course enumeration is guest 
 specific).  So if I have 2 virtio controllers the only way I can 
 distinguish between them is using their pci slots.

virtio controllers really should have a user-suppliable string or UUID
to identify them to the guest.  Don't they?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Mark McLoughlin wrote:
  After libvirt has done -drive file=foo... it should dump the machine 
  config and use that from then on.
 
 Right - libvirt then wouldn't be able to avoid the complexity of merging
 any future changes into the dumped machine config.

As long as qemu can accept a machine config _and_ -drive file=foo (and
monitor commands to add/remove devices), libvirt could merge by simply
calling qemu with whatever additional command line options or monitor
commands modify the config, then dump the new config.

That way, virtio would not have to deal with that complexity.  It
would be written in one place: qemu.

Or better, a utility: qemu-machine-config.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuration vs. compat hints [was Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities]

2009-06-16 Thread Jamie Lokier
Mark McLoughlin wrote:
  Worst case we hardcode those numbers (gasp, faint).
 
 Maybe we can just add the open slots to the -help output. That'd be nice
 and clean.

Make them part of the machine configuration.

After all, they are part of the machine configuration, and ACPI, BIOS
etc. need to know about all the machine slots anyway.

Having said that, I prefer the idea that slot allocation is handled
either in Qemu, or in a separate utility called qemu-machine-config
(for working with machine configs), or in a library
libqemu-machine-config.so.

I particularly don't like the idea of arcane machine-dependent slot
allocation knowledge living in libvirt, because it needs to be in Qemu
anyway for non-libvirt users.  No point in having two implementations
of something tricky and likely to have machine quirks, if one will do.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Jamie Lokier
Paul Brook wrote:
   caps can be anywhere, but we don't expect it to change during machine
   execution lifetime.
  
   Or I am just confused by the name pci_device_load ?
 
  Right. So I want to load an image and it has capability X at offset Y.
  wmask has to match. I don't want to assume that we never change Y
  for the device without breaking old images, so I clear wmask here
  and set it up again after looking up capabilities that I loaded.
 
 We should not be loading state into a different device (or a similar device 
 with a different set of capabilities).
 
 If you want to provide backwards compatibility then you should do that by 
 creating a device that is the same as the original.  As I mentioned in my 
 earlier mail, loading a snapshot should never do anything that can not be 
 achieved through normal operation.

If you can create a machine be restoring a snapshot which you can't
create by normally starting QEMU, then you'll soon have guests which
work fine from their snapshots, but which cannot be booted without a
snapshot because there's no way to boot the right machine for the guest.

Ssomeone might even have guests like that for years without noticing,
because they always save and restore guest state using snapshots, then
one day they simply want to boot the guest from it's disk image and
find there's no way to do it with any QEMU which runs on their host
platform.

I think the right long term answer to all this is a way to get QEMU to
dump it's current machine configuration in glorious detail as a file
which can be reloaded as a machine configuration.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Jamie Lokier
Michael S. Tsirkin wrote:
  I think the right long term answer to all this is a way to get QEMU to
  dump it's current machine configuration in glorious detail as a file
  which can be reloaded as a machine configuration.
 
 And then we'll have the same set of problems there.

We will, and the solution will be the same: options to create devices
as they were in older versions of QEMU.  It only needs to cover device
features which matter to guests, not every bug fix.

However with a machine configuration which is generated by QEMU,
there's less worry about proliferation of obscure options, compared
with the command line.  You don't necessarily have to document every
backward-compatibility option in any detail, you just have to make
sure it's written and read properly, which is much the same thing as
the snapshot code does.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-05-01 Thread Jamie Lokier
Christoph Hellwig wrote:
 On Thu, Apr 30, 2009 at 10:49:19PM +0100, Paul Brook wrote:
  Only if you emulate a crufty old parallel scsi bus, and that's just silly.
  One of the nice things about scsi is it separates the command set from the 
  transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably 
  several others I've forgotten.
 
 It has nothing to do with an SPI bus.  Everything that resembles a SAM
 architecture can have multiple LUs per targer, and multiple targers per
 initiator port, so we need all the complex queing code, and we need
 error handling and and and.

If you're using virtio-block to connect to lots of LUNs on lots of
targets (i.e. lots of block devices), don't you need similar queuing
code and error handling for all that too?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Re: [Qemu-devel] Changing the QEMU svn VERSION string

2009-04-08 Thread Jamie Lokier
Paul Brook wrote:
 I'm extremely sceptical of anything that claims to need a fine
 grained version number. In practice version numbers for open source
 projects are fairly arbitrary and meaningless because almost
 everyone has their own set of patches and backported fixes anyway.

I find it's needed onlyh when you need to interact with a program and
workaround bugs or temporarily broken features, and also when the
program gives no other way to determine its features.  For some
reason, I find kernels are the main thing this matters for...

If the help text, some other output, or an API gives enough
information for interacting programs to know what to do, that's much
better and works with arbitrary patches etc.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Re: [Qemu-devel] Changing the QEMU svn VERSION string

2009-04-07 Thread Jamie Lokier
Anthony Liguori wrote:
 I still think libvirt should work with versions of QEMU/KVM built from 
 svn/git though.  I think the only way to do that is for libvirt to relax 
 their version checks to accommodate suffixes in the form 
 major.minor.stable-foo.

Ok, but try to stick to a well-defined rule about what suffix means
later or earlier.  In package managers, 1.2.3-rc1 is typically
seen as a later version than 1.2.3 purely due to syntax.  If you're
consistently meaning 0.11.0-rc1 is earlier than 0.11.0 (final),
that might need to be encoded in libvirt and other wrappers, if they
have any fine-grained version sensistivity such as command line
changes or bug workarounds.

The Linux kernel was guilty of mixing up later and earlier version
suffixes like this.  With Linux this is a bit more important because
it changes a lot between versions, so some apps do need fine-grained
version checks to workaround bugs or avoid buggy features.  Maybe that
won't even happen with QEMU and libvirt working together.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >