Re: Who signed gemu-1.7.1.tar.bz2?

2014-04-23 Thread Anthony Liguori
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/22/14 07:35, Michael Roth wrote:
 Quoting Stefan Hajnoczi (2014-04-22 08:31:08)
 On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote:
 and where is their gpg key?
 
 Michael Roth mdr...@linux.vnet.ibm.com is doing releases:
 
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0x3353C9CEF108B584


 
$ gpg --verify qemu-2.0.0.tar.bz2.sig
 gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA
 key ID F108B584 gpg: Good signature from Michael Roth
 fluks...@gmail.com gpg: aka Michael Roth
 mdr...@utexas.edu gpg: aka Michael Roth
 mdr...@linux.vnet.ibm.com
 
 Missed the context, but if this is specifically about 1.7.1:
 
 1.7.1 was prior to me handling the release tarballs, Anthony
 actually did the signing and uploading for that one. I'm a bit
 confused though, as the key ID on that tarball is:
 
 mdroth@loki:~/Downloads$ gpg --verify qemu-1.7.1.tar.bz2.sig gpg:
 Signature made Tue 25 Mar 2014 09:03:24 AM CDT using RSA key ID
 ADF0D2D9 gpg: Can't check signature: public key not found
 
 I can't seem to locate ADF0D2D9 though:
 
 http://pgp.mit.edu/pks/lookup?search=0xADF0D2D9op=vindex
 
 Anthony's normal key (for 1.6.0 and 1.7.0 at least) was 7C18C076:
 
 http://pgp.mit.edu/pks/lookup?search=0x7C18C076op=vindex
 
 I think maybe Anthony might've signed it with a separate local
 key?

Yeah, I accidentally signed it with the wrong key.  Replacing the
signature doesn't seem like the right thing to do since release
artifacts should never change.

Regards,

Anthony Liguori

 
 Stefan
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTV8NqAAoJEBqtxxBWguX/j9oH/3eVb+PgcXhEHICRXNoPyNy8
wiMeNABsTh7xn/wYpUHBxIa0lWWeO/W/6ZFLhfL50C8Nm8fsldEASOB6jngcK1dZ
5jAexApGeN5Q10Bi+reum7/bqCgxaHRmXEO/wyJtlOiC/fxsbdupg04Zk6dO2b5h
gRHxkt8uC2DWRJjb8fReR1K96aTPm9SI9GRrNZ9pAHrT6MeF3FOQGkY0hhpPDE6k
YPXb8keAlldT0U9h/Du+8m7mMCKMvwa3rRMNSw+lw7Oc5eMRwQzxUB+B4jEJ9f1k
+bL7opOcYNgqBxhKzAFgmMqlnwvM55CsWiPRq5L0/68w8qxWRQl+ECPfpJ1O0ac=
=/bg9
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-04-01

2014-03-31 Thread Anthony Liguori
On Mon, Mar 31, 2014 at 6:25 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 31 March 2014 14:21, Christian Borntraeger borntrae...@de.ibm.com wrote:
 Another thing might be the release process in general. Currently it seems
 that everybody tries to push everything just before the hard freeze.  I had
 to debug some problems introduced _after_ soft freeze. Is there some
 interest in having a Linux-like process (merge window + stabilization)? This
 would require shorter release cycles of course.

 merge window has been suggested before. I think it would be
 a terrible idea for QEMU, personally. We're not the kernel in
 many ways, notably dev community size and a greater tendency
 to changes that have effects across the whole tree.

 Soft + hard freeze is our stabilization period currently.

Peter, are you willing to do the tagging and announcement for the 2.0
rcs?  I sent instructions privately and between stefanha and I we can
get your permissions sorted out.

Regards,

Anthony Liguori

 thanks
 -- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-04-01

2014-03-31 Thread Anthony Liguori
On Mon, Mar 31, 2014 at 7:46 AM, Andreas Färber afaer...@suse.de wrote:
 Am 31.03.2014 16:32, schrieb Peter Maydell:
 On 31 March 2014 15:28, Paolo Bonzini pbonz...@redhat.com wrote:
 I think it would be a good idea to separate the committer and release
 manager roles.  Peter is providing the community with a wonderful service,
 just like you were; putting too much work on his shoulders risks getting us
 in the same situation if anything were to affect his ability to provide it.

 Yes, I strongly agree with this. I think we'll do much better
 if we can manage to share out responsibilities among a wider
 group of people.

 May I propose Michael Roth, who is already experienced from the N-1
 stable releases?

 If we can enable him to upload the tarballs created from his tags that
 would also streamline the stable workflow while at it.

If mdroth is willing to take this on, I am very supportive.

Regards,

Anthony Liguori


 Regards,
 Andreas

 --
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-12-10

2013-12-10 Thread Anthony Liguori
On Tue, Dec 10, 2013 at 4:37 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 10/12/2013 12:42, Juan Quintela ha scritto:

 Hi

 Please, send any topic that you are interested in covering.

 May not need a phone call, but I'll drop it here:

Could we move the time on this phone call?  7am conflicts with my
daily commute.  I could do 6am or 9am.  I think it would be very
useful to be able to attend this call.

 what happened to
 acknowledgement emails from the patches script?

It's buggy and I haven't had a chance to rewrite it yet.

 Also, Anthony, it looks like you're still adjusting to the new job.  If
 you need help with anything, I guess today's call could be a good place
 to discuss it.

 And someone needs to send out the email saying that 1.7.0 is out and
 that the next version will be 2.0!

Mail is out now, sorry for the delay.

Pull requests should be getting processed in a reasonable time.  I am
not yet spending enough time doing patch review but that should
improve in the very near future.

It's not so much the new job as it is relocating and moving all at the
same time.  I'm hoping the holiday break is a good way to catch up
things.  Of course, we should revisit again soon.

Regards,

Anthony Liguori


 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-12-10

2013-12-10 Thread Anthony Liguori
On Tue, Dec 10, 2013 at 4:54 AM, Markus Armbruster arm...@redhat.com wrote
 Paolo Bonzini pbonz...@redhat.com writes:

 Il 10/12/2013 12:42, Juan Quintela ha scritto:

 Hi

 Please, send any topic that you are interested in covering.

 May not need a phone call, but I'll drop it here: what happened to
 acknowledgement emails from the patches script?

 Also, Anthony, it looks like you're still adjusting to the new job.  If
 you need help with anything, I guess today's call could be a good place
 to discuss it.

 And someone needs to send out the email saying that 1.7.0 is out and
 that the next version will be 2.0!

 Speaking of sending out e-mail: did I miss the promised followup to the
 key signing party?

I need to find the papers from KVM Forum which are somewhere among the
stacks of boxes here :-/

Regards,

Anthony Liguori

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Anthony Liguori
Abel Gordon ab...@il.ibm.com writes:

 Michael S. Tsirkin m...@redhat.com wrote on 27/11/2013 12:27:19 PM:


 On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
  Hi,
 
  Razya is out for a few days, so I will try to answer the questions as
 well
  as I can:
 
  Michael S. Tsirkin m...@redhat.com wrote on 26/11/2013 11:11:57 PM:
 
   From: Michael S. Tsirkin m...@redhat.com
   To: Abel Gordon/Haifa/IBM@IBMIL,
   Cc: Anthony Liguori anth...@codemonkey.ws, abel.gor...@gmail.com,
   as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
   IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
   IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
   Haifa/IBM@IBMIL
   Date: 27/11/2013 01:08 AM
   Subject: Re: Elvis upstreaming plan
  
   On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
   
   
Anthony Liguori anth...@codemonkey.ws wrote on 26/11/2013
 08:05:00
  PM:
   

 Razya Ladelsky ra...@il.ibm.com writes:

  edit
   
That's why we are proposing to implement a mechanism that will
 enable
the management stack to configure 1 thread per I/O device (as it is
  today)
or 1 thread for many I/O devices (belonging to the same VM).
   
 Once you are scheduling multiple guests in a single vhost device,
 you
 now create a whole new class of DoS attacks in the best case
  scenario.
   
Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices
 belonging
to the same guest in a single (or multiple) vhost thread/s.
   
  
   I guess a question then becomes why have multiple devices?
 
  If you mean why serve multiple devices from a single thread the
 answer is
  that we cannot rely on the Linux scheduler which has no knowledge of
 I/O
  queues to do a decent job of scheduling I/O.  The idea is to take over
 the
  I/O scheduling responsibilities from the kernel's thread scheduler with
 a
  more efficient I/O scheduler inside each vhost thread.  So by combining
 all
  of the I/O devices from the same guest (disks, network cards, etc) in a
  single I/O thread, it allows us to provide better scheduling by giving
 us
  more knowledge of the nature of the work.  So now instead of relying on
 the
  linux scheduler to perform context switches between multiple vhost
 threads,
  we have a single thread context in which we can do the I/O scheduling
 more
  efficiently.  We can closely monitor the performance needs of each
 queue of
  each device inside the vhost thread which gives us much more
 information
  than relying on the kernel's thread scheduler.
  This does not expose any additional opportunities for attacks (DoS or
  other) than are already available since all of the I/O traffic belongs
 to a
  single guest.
  You can make the argument that with low I/O loads this mechanism may
 not
  make much difference.  However when you try to maximize the utilization
 of
  your hardware (such as in a commercial scenario) this technique can
 gain
  you a large benefit.
 
  Regards,
 
  Joel Nider
  Virtualization Research
  IBM Research and Development
  Haifa Research Lab

 So all this would sound more convincing if we had sharing between VMs.
 When it's only a single VM it's somehow less convincing, isn't it?
 Of course if we would bypass a scheduler like this it becomes harder to
 enforce cgroup limits.

 True, but here the issue becomes isolation/cgroups. We can start to show
 the value for VMs that have multiple devices / queues and then we could
 re-consider extending the mechanism for multiple VMs (at least as a
 experimental feature).

 But it might be easier to give scheduler the info it needs to do what we
 need.  Would an API that basically says run this kthread right now
 do the trick?

 ...do you really believe it would be possible to push this kind of change
 to the Linux scheduler ? In addition, we need more than
 run this kthread right now because you need to monitor the virtio
 ring activity to specify when you will like to run a specific kthread
 and for how long.

Paul Turner has a proposal for exactly this:

http://www.linuxplumbersconf.org/2013/ocw/sessions/1653

The video is up on Youtube I think. It definitely is a general problem
that is not at all virtual I/O specific.

Regards,

Anthony Liguori



 

 

 

   Phone: 972-4-829-6326 | Mobile: 972-54-3155635  (Embedded
 image moved to file:
   E-mail: jo...@il.ibm.com
 pic39571.gif)IBM
 

 

 
 
 
 
  Hi all,
 
  I am Razya Ladelsky, I work at IBM Haifa virtualization team,
 which
  developed Elvis, presented by Abel Gordon at the last KVM
 forum:
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
  ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
 
 
  According to the discussions that took place at the forum,
  upstreaming
  some of the Elvis approaches seems to be a good idea, which we
  would

Re: Elvis upstreaming plan

2013-11-26 Thread Anthony Liguori
Razya Ladelsky ra...@il.ibm.com writes:

 Hi all,

 I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
 developed Elvis, presented by Abel Gordon at the last KVM forum: 
 ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
 ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 


 According to the discussions that took place at the forum, upstreaming 
 some of the Elvis approaches seems to be a good idea, which we would like 
 to pursue.

 Our plan for the first patches is the following: 

 1.Shared vhost thread between mutiple devices 
 This patch creates a worker thread and worker queue shared across multiple 
 virtio devices 
 We would like to modify the patch posted in
 https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
  
 to limit a vhost thread to serve multiple devices only if they belong to 
 the same VM as Paolo suggested to avoid isolation or cgroups concerns.

 Another modification is related to the creation and removal of vhost 
 threads, which will be discussed next.

I think this is an exceptionally bad idea.

We shouldn't throw away isolation without exhausting every other
possibility.

We've seen very positive results from adding threads.  We should also
look at scheduling.

Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case scenario.

 2. Sysfs mechanism to add and remove vhost threads 
 This patch allows us to add and remove vhost threads dynamically.

 A simpler way to control the creation of vhost threads is statically 
 determining the maximum number of virtio devices per worker via a kernel 
 module parameter (which is the way the previously mentioned patch is 
 currently implemented)

 I'd like to ask for advice here about the more preferable way to go:
 Although having the sysfs mechanism provides more flexibility, it may be a 
 good idea to start with a simple static parameter, and have the first 
 patches as simple as possible. What do you think?

 3.Add virtqueue polling mode to vhost 
 Have the vhost thread poll the virtqueues with high I/O rate for new 
 buffers , and avoid asking the guest to kick us.
 https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Ack on this.

Regards,

Anthony Liguori

 4. vhost statistics
 This patch introduces a set of statistics to monitor different performance 
 metrics of vhost and our polling and I/O scheduling mechanisms. The 
 statistics are exposed using debugfs and can be easily displayed with a 
 Python script (vhost_stat, based on the old kvm_stats)
 https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0


 5. Add heuristics to improve I/O scheduling 
 This patch enhances the round-robin mechanism with a set of heuristics to 
 decide when to leave a virtqueue and proceed to the next.
 https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

 This patch improves the handling of the requests by the vhost thread, but 
 could perhaps be delayed to a 
 later time , and not submitted as one of the first Elvis patches.
 I'd love to hear some comments about whether this patch needs to be part 
 of the first submission.

 Any other feedback on this plan will be appreciated,
 Thank you,
 Razya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()

2013-11-12 Thread Anthony Liguori
On Tue, Nov 12, 2013 at 8:08 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 12 November 2013 15:58, Paolo Bonzini pbonz...@redhat.com wrote:
 I don't really see a reason why QEMU should give clang more weight than
 Windows or Mac OS X.

 I'm not asking for more weight (and actually my main
 reason for caring about clang is exactly MacOSX). I'm
 just asking that when a bug is reported whose underlying
 cause is we don't work on clang because we're relying on
 undocumented behaviour of gcc with an attached patch that
 fixes this by not relying on the undocumented behaviour,
 that we apply the patch rather than saying why do we
 care about clang...

QEMU has always been intimately tied to GCC.  Heck, it all started as
a giant GCC hack relying on entirely undocumented behavior (dyngen's
disassembly of functions).

There's nothing intrinsically bad about being tied to GCC.  If you
were making argument that we could do it a different way and the
result would be as nice or nicer, then it wouldn't be a discussion.

But if supporting clang means we have to remove useful things, then
it's always going to be an uphill battle.

In this case, the whole discussion is a bit silly.  Have you actually
tried -O1 under a debugger with clang?  Is it noticably worse than
-O0?

I find QEMU extremely difficult to use an interactive debugger on
anyway.  I doubt the difference between -O0 and -O1 is even close to
the breaking point between usability under a debugger...

Regards,

Anthony Liguori

 This seems to me to be a win-win situation:
  * we improve our code by not relying on undocumented
implentation specifics
  * we work on a platform that, while not a primary
platform, is at least supported in the codebase and
has people who fix it when it breaks

 -- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()

2013-11-11 Thread Anthony Liguori
On Mon, Nov 11, 2013 at 3:11 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 11/11/2013 23:38, Peter Maydell ha scritto:
 If we have other places where we're relying on dead code elimination
 to not provide a function definition, please point them out, because
 they're bugs we need to fix, ideally before they cause compilation
 failures.

 I'm not sure, there are probably a few others.  Linux also relies on the
 idiom (at least KVM does on x86).

And they are there because it's a useful tool.

 Huh? The point of stub functions is to provide versions of functions
 which either need to return an always fails code, or which will never
 be called, but in either case this is so we can avoid peppering the
 code with #ifdefs. The latter category is why we have stubs which
 do nothing but call abort().

 There are very few stubs that call abort():

 int kvm_cpu_exec(CPUState *cpu)
 {
 abort();
 }

 int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset)
 {
 abort();
 }

 Calling abort() would be marginally better than returning 0, but why
 defer checks to runtime when you can let the linker do them?

Exactly.

 I wouldn't be surprised if this also affected debug gcc
 builds with KVM disabled, but I haven't checked.

 No, it doesn't affect GCC.  See Andreas's bug report.  Is it a bug or a
 feature?  Having some kind of -O0 dead-code elimination is definitely a
 feature (http://gcc.gnu.org/ml/gcc-patches/2003-03/msg02443.html).

 That patch says it is to speed up these RTL optimizers and by allocating
 less memory, reduce the compiler footprint and possible memory
 fragmentation. So they might investigate it as a performance
 regression, but it's only a make compilation faster feature, not
 correctness. Code which relies on dead-code-elimination is broken.

 There's plenty of tests in the GCC testsuite that rely on DCE to test
 that an optimization happened; some of them at -O0 too.  So it's become
 a GCC feature in the end.

 Code which relies on dead-code-elimination is not broken, it's relying
 on the full power of the toolchain to ensure bugs are detected as soon
 as possible, i.e. at build time.

 I am okay with Andreas's patch of course, but it would also be fine with
 me to split the if in two, each with its own separate break statement.

 I think Andreas's patch is a bad idea and am against it being
 applied. It's very obviously a random tweak aimed at a specific
 compiler's implementation of dead-code elimination, and it's the
 wrong way to fix the problem.

 It's very obviously a random tweak aimed at a specific compiler's bug in
 dead-code elimination, I'm not denying that.  But the same compiler
 feature is being exploited elsewhere.

We're not talking about something obscure here.  It's eliminating an
if(0) block.  There's no reason to leave an if (0) block around.  The
code is never reachable.

 Since it only affects debug builds, there is no hurry to fix this in 1.7
 if the approach cannot be agreed with.

 ??  Debug builds should absolutely work out of the box -- if
 debug build fails that is IMHO a release critical bug.

 Debug builds for qemu-system-{i386,x86_64} with clang on systems other
 than x86/Linux.

Honestly, it's hard to treat clang as a first class target.  We don't
have much infrastructure around so it's not getting that much testing.

We really need to figure out how we're going to do CI.

FWIW, I'd rather just add -O1 for debug builds than add more stub functions.

Regards,

Anthony Liguori


 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/6] VFIO updates for QEMU

2013-10-09 Thread Anthony Liguori
Alex Williamson alex.william...@redhat.com writes:

 The following changes since commit a684f3cf9b9b9c3cb82be87aafc463de8974610c:

   Merge remote-tracking branch 'kraxel/seabios-1.7.3.2' into staging 
 (2013-09-30 17:15:27 -0500)

 are available in the git repository at:


   git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-20131003.0

 for you to fetch changes up to 1d5bf692e55ae22b59083741d521e27db704846d:

   vfio: Fix debug output for int128 values (2013-10-03 09:10:09 -0600)

 

Judging from the review comments, I think this needs a v2.

Regards,

Anthony Liguori

 vfio-pci updates include:
  - Forgotten MSI affinity patch posted several months ago
  - Lazy option ROM loading to delay load until after device/bus resets
  - Error reporting cleanups
  - PCI hot reset support introduced with Linux v3.12 development kernels
  - Debug build fix for int128

 The lazy ROM loading and hot reset should help VGA assignment as we can
 now do a bus reset when there are multiple devices on the bus, ex.
 multi-function graphics and audio cards.  The known remaining part for
 VGA is the KVM-VFIO device and matching QEMU support to properly handle
 devices that make use of No-Snoop transactions, particularly on Intel
 host systems.

 
 Alex Williamson (5):
   vfio-pci: Add support for MSI affinity
   vfio-pci: Test device reset capabilities
   vfio-pci: Lazy PCI option ROM loading
   vfio-pci: Cleanup error_reports
   vfio-pci: Implement PCI hot reset

 Alexey Kardashevskiy (1):
   vfio: Fix debug output for int128 values

  hw/misc/vfio.c | 621 
 +++--
  1 file changed, 512 insertions(+), 109 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is fallback vhost_net to qemu for live migrate available?

2013-08-29 Thread Anthony Liguori
Hi Qin,

On Mon, Aug 26, 2013 at 10:32 PM, Qin Chuanyu qinchua...@huawei.com wrote:
 Hi all

 I am participating in a project which try to port vhost_net on Xen。

Neat!

 By change the memory copy and notify mechanism ,currently virtio-net with
 vhost_net could run on Xen with good performance。

I think the key in doing this would be to implement a property
ioeventfd and irqfd interface in the driver domain kernel.  Just
hacking vhost_net with Xen specific knowledge would be pretty nasty
IMHO.

Did you modify the front end driver to do grant table mapping or is
this all being done by mapping the domain's memory?

 TCP receive throughput of
 single vnic from 2.77Gbps up to 6Gps。In VM receive side,I instead grant_copy
 with grant_map + memcopy,it efficiently reduce the cost of grant_table
 spin_lock of dom0,So the hole server TCP performance from 5.33Gps up to
 9.5Gps。

 Now I am consider the live migrate of vhost_net on Xen,vhost_net use
 vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the hole
 memory of VM,So I am trying to fallback datapath from vhost_net to qemu when
 doing live migrate ,and fallback datapath from qemu to
 vhost_net again after vm migrate to new server。

KVM and Xen represent memory in a very different way.  KVM can only
track when guest mode code dirties memory.  It relies on QEMU to track
when guest memory is dirtied by QEMU.  Since vhost is running outside
of QEMU, vhost also needs to tell QEMU when it has dirtied memory.

I don't think this is a problem with Xen though.  I believe (although
could be wrong) that Xen is able to track when either the domain or
dom0 dirties memory.

So I think you can simply ignore the dirty logging with vhost and it
should Just Work.


 My question is:
 why didn't vhost_net do the same fallback operation for live migrate
 on KVM,but use vhost_log to mark the dirty page?
 Is there any mechanism fault for the idea of fallback datapath from
 vhost_net to qemu for live migrate?

No, we don't have a mechanism to fallback  to QEMU for the datapath.
It would be possible but I think it's a bad idea to mix and match the
two.

Regards,

Anthony Liguori

 any question about the detail of vhost_net on Xen is welcome。

 Thanks


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Anthony Liguori
Chijianchun chijianc...@huawei.com writes:

 Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
 restrictions to users.

 Are there plans to achieve ram live Snapshot feature?

I think you mean a live version of the savevm command.

You can approximate live migrating to a file, creating an external disk
snapshot, then resuming the guest.

Regards,

Anthony Liguori


 in my mind, Snapshots can not occupy additional too much memory, So when the 
 memory needs to be changed, the old memory page is needed to flush to the 
 file first.  But flushing to file is too slower than memory,  and when 
 flushing, the vcpu or VM is need to be paused until finished flushing,  so 
 pause...resume...pause...resume., more and more slower.

 Is this idea feasible? Are there any other thoughts?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VU#976534 - How to submit security bugs?

2013-07-24 Thread Anthony Liguori
CERT(R) Coordination Center c...@cert.org writes:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Greetings,
   My name is Adam Rauf and I work for the CERT Coordination Center.  We
 have a report that may affect KVM/QEMU.  How can we securely send it over to
 you?  Thanks so much!

For QEMU bugs, please file a bug in Launchpad and mark it as a security
bug.  That will appropriately limit visibility.

http://launchpad.net/qemu

If you want to contact me directly, my public key is:

http://www.codemonkey.ws/files/aliguori.pub

You can verify that this key is what is used to sign QEMU releases at:

http://wiki.qemu.org/Download

Regards,

Anthony Liguori


 Adam Rauf
 Software Engineering Institute
 CERT Vulnerability Analysis Team

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.5 (GNU/Linux)

 iQEVAwUBUe2DstXCAanP4MNyAQI8nwf/eTb1Qox5lmgMHifDKRjj69E37FW+o5Jp
 KMIP6+IgKdWQizPctXk2Gae50a+ioaXgkCGZZ7SwNJ9iE/AX2I32QvX6pZrDCBGw
 l5Ht6UiwOLUTP3sKWO9AIYcgTDABzyNE2+bCGvDz8aqwLB8NNVqQ50f46TrQNlmB
 oiG+XzskRG0BAxKTwWc8f4v+1hdqMtp811I7XmxXkAdtlmTWPHZfPiFs0dS++Puh
 T0uLuC4nDo83hP6Yv8seMZKZZApFGfR+q4qKx7f6riNsa5v1zGgW2if++u+zRKvg
 DvjLxjtRfE9JGmCZMBcFmRJ5y4Wx/m/2wtj2+a7D/D2Hd9L5LRB0lA==
 =npI1
 -END PGP SIGNATURE-
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] Key Signing Party at KVM Forum 2013

2013-07-24 Thread Anthony Liguori

I will be hosting a key signing party at this year's KVM Forum.

http://wiki.qemu.org/KeySigningParty2013

Starting for the 1.7 release (begins in December), I will only accepted
signed pull requests so please try to attend this event or make
alternative arrangements to have someone sign your key who will attend
the event.

I will also be attending LinuxCon/CloudOpen/Plumbers North America if
anyone wants to have another key signing party at that event and cannot
attend KVM Forum.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Forum 2013 Call for Participation - Extended to August 4th

2013-07-23 Thread Anthony Liguori

We have received numerous requests to extend the CFP deadline and so
we are happy to announce that the CFP deadline has been moved by two
weeks to August 4th.

=
KVM Forum 2013: Call For Participation
October 21-23, 2013 - Edinburgh International Conference Centre - Edinburgh, UK

(All submissions must be received before midnight July 21, 2013)
=

KVM is an industry leading open source hypervisor that provides an ideal
platform for datacenter virtualization, virtual desktop infrastructure,
and cloud computing.  Once again, it's time to bring together the
community of developers and users that define the KVM ecosystem for
our annual technical conference.  We will discuss the current state of
affairs and plan for the future of KVM, its surrounding infrastructure,
and management tools.  The oVirt Workshop will run in parallel with the
KVM Forum again, bringing in a community focused on enterprise datacenter
virtualization management built on KVM.  For topics which overlap we will
have shared sessions.  So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon Europe.
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2013.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM/Kernel
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - VFIO, device assignment, SR-IOV
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - Network virtualization
 - Security
 - Architecture ports

 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support

 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized

 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management

 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: July 21, 2013
Notification: August 1, 2013

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Aug 1st.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

HOTEL / TRAVEL

The KVM Forum 2013 will be held in Edinburgh, UK at the Edinburgh
International Conference Centre.

http://events.linuxfoundation.org/events/kvm-forum/hotel

Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2013 in October!

Thanks,
-your KVM Forum 2013 Program Committee

Please contact us with any questions or comments.
kvm-forum-2013...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote:
 Juan is not available now, and Anthony asked for
 agenda to be sent early.
 So here comes:
 
 Agenda for the meeting Tue, June 11:
  
 - Generating acpi tables, redux

 Not so much notes as a quick summary of the call:

 There are the following reasons to generate ACPI tables in QEMU:

 - sharing code with e.g. ovmf
   Anthony thinks this is not a valid argument

 - so we can make tables more dynamic and move away from iasl
   Anthony thinks this is not a valid reason too,
   since qemu and seabios have access to same info
   MST noted several info not accessible to bios.
   Anthony said they can be added, e.g. by exposing
   QOM to the bios.

 - even though most tables are static, hardcoded
   they are likely to change over time
   Anthony sees this as justified

 To summarize, there's a concensus now that generating ACPI
 tables in QEMU is a good idea.

I would say best worst idea ;-)

I am deeply concerned about the complexity it introduces but I don't see
many other options.


 Two issues that need to be addressed:
 - original patches break cross-version migration. Need to fix that.

 - Anthony requested that patchset is merged together with
   some new feature. I'm not sure the reasoning is clear:
   current a version intentionally generates tables
   that are bug for bug compatible with seabios,
   to simplify testing.

I expect that there will be additional issues that need to be worked out
and want to see a feature that actually uses the infrastructure before
we add it.

   It seems clear we have users for this such as
   hotplug of devices behind pci bridges, so
   why keep the infrastructure out of tree?

It's hard to evaluate the infrastructure without a user.

   Looking for something additional, smaller as the hotplug patch
   is a bit big, so might delay merging.


 Going forward - would we want to move
 smbios as well? Everyone seems to think it's a
 good idea.

Yes, independent of ACPI, I think QEMU should be generating the SMBIOS
tables.

Regards,

Anthony Liguori

 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori

Hi Rusty,

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.

 Now you have a different compatibility problem; how do you know the
 guest supports the new virtio-pcie net?

We don't care.

We would still use virtio-pci for existing devices.  Only new devices
would use virtio-pcie.

 If you put a virtio-pci card behind a PCI-e bridge today, it's not
 compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I believe you can put it in legacy mode and then there isn't the 16-dev
limit.  I believe the only advantage of putting it in native mode is
that then you can do native hotplug (as opposed to ACPI hotplug).

So sticking with virtio-pci seems reasonable to me.

 I've been assuming we'd avoid a flag day change; that devices would
 look like existing virtio-pci with capabilities indicating the new
 config layout.

I don't think that's feasible.  Maybe 5 or 10 years from now, we switch
the default adapter to virtio-pcie.

 I think 4 is the best path forward.  It's better for users (guests
 continue to work as they always have).  There's less confusion about
 enabling PCI-e support--you must ask for the virtio-pcie variant and you
 must have a virtio-pcie driver.  It's easy to explain.

 Removing both forward and backward compatibility is easy to explain, but
 I think it'll be harder to deploy.  This is your area though, so perhaps
 I'm wrong.

My concern is that it's not real backwards compatibility.

 It also maps to what regular hardware does.  I highly doubt that there
 are any real PCI cards that made the shift from PCI to PCI-e without
 bumping at least a revision ID.

 Noone expected the new cards to Just Work with old OSes: a new machine
 meant a new OS and new drivers.  Hardware vendors like that.

Yup.

 Since virtualization often involves legacy, our priorities might be
 different.

So realistically, I think if we introduce virtio-pcie with a different
vendor ID, it will be adopted fairly quickly.  The drivers will show up
in distros quickly and get backported.

New devices can be limited to supporting virtio-pcie and we'll certainly
provide a way to use old devices with virtio-pcie too.  But for
practical reasons, I think we have to continue using virtio-pci by
default.

Regards,

Anthony Liguori


 Cheers,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori
Gleb Natapov g...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
  On 06/05/2013 03:08 PM, Anthony Liguori wrote:
 
  Definitely an option.  However, we want to be able to boot from native
  devices, too, so having an I/O BAR (which would not be used by the OS
  driver) should still at the very least be an option.
  
  What makes it so difficult to work with an MMIO bar for PCI-e?
  
  With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
  forward.  Is there something special about PCI-e here?
  
 
  It's not tracking allocation.  It is that accessing memory above 1 MiB
  is incredibly painful in the BIOS environment, which basically means
  MMIO is inaccessible.
 
 Oh, you mean in real mode.
 
 SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
 There are loads of ASSERT32FLAT()s in the code to make sure of this.
 
 Well, not exactly. Initialization is done in 32bit, but disk
 reads/writes are done in 16bit mode since it should work from int13
 interrupt handler. The only way I know to access MMIO bars from 16 bit
 is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.

SeaBIOS is polling for completion anyway.

Regards,

Anthony Liguori


 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
 You mean make BAR0 an MMIO BAR?
 Yes, it would break current windows guests.
 Further, as long as we use same address to notify all queues,
 we would also need to decode the instruction on x86 and that's
 measureably slower than PIO.
 We could go back to discussing hypercall use for notifications,
 but that has its own set of issues...

So... does violating the PCI-e spec really matter?  Is it preventing
any guest from working properly?

I don't think we should rush an ABI breakage if the only benefit is
claiming spec compliance.

Regards,

Anthony Liguori


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
  You mean make BAR0 an MMIO BAR?
  Yes, it would break current windows guests.
  Further, as long as we use same address to notify all queues,
  we would also need to decode the instruction on x86 and that's
  measureably slower than PIO.
  We could go back to discussing hypercall use for notifications,
  but that has its own set of issues...
 
 So... does violating the PCI-e spec really matter?  Is it preventing
 any guest from working properly?

 Yes, absolutely, this wording in spec is not there without reason.

 Existing guests allocate io space for PCI express ports in
 multiples on 4K.

 Since each express device is behind such a port, this means
 at most 15 such devices can use IO ports in a system.

 That's why to make a pci express virtio device,
 we must allow MMIO and/or some other communication
 mechanism as the spec requires.

This is precisely why this is an ABI breaker.

If you disable IO bars in the BIOS, than the interface that the OS sees
will *not have an IO bar*.

This *breaks existing guests*.

Any time the programming interfaces changes on a PCI device, the
revision ID and/or device ID must change.  The spec is very clear about
this.

We cannot disable the IO BAR without changing revision ID/device ID.

 That's on x86.

 Besides x86, there are achitectures where IO is unavailable or very slow.

 I don't think we should rush an ABI breakage if the only benefit is
 claiming spec compliance.
 
 Regards,
 
 Anthony Liguori

 Why do you bring this up? No one advocates any ABI breakage,
 I only suggest extensions.

It's an ABI breakage.  You're claiming that the guests you tested
handle the breakage reasonably but it is unquestionably an ABI breakage.

Regards,

Anthony Liguori



 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
   You mean make BAR0 an MMIO BAR?
   Yes, it would break current windows guests.
   Further, as long as we use same address to notify all queues,
   we would also need to decode the instruction on x86 and that's
   measureably slower than PIO.
   We could go back to discussing hypercall use for notifications,
   but that has its own set of issues...
  
  So... does violating the PCI-e spec really matter?  Is it preventing
  any guest from working properly?
 
  Yes, absolutely, this wording in spec is not there without reason.
 
  Existing guests allocate io space for PCI express ports in
  multiples on 4K.
 
  Since each express device is behind such a port, this means
  at most 15 such devices can use IO ports in a system.
 
  That's why to make a pci express virtio device,
  we must allow MMIO and/or some other communication
  mechanism as the spec requires.
 
 This is precisely why this is an ABI breaker.
 
 If you disable IO bars in the BIOS, than the interface that the OS sees
 will *not have an IO bar*.
 
 This *breaks existing guests*.
 Any time the programming interfaces changes on a PCI device, the
 revision ID and/or device ID must change.  The spec is very clear about
 this.
 
 We cannot disable the IO BAR without changing revision ID/device ID.
 

 But it's a bios/PC issue. It's not a device issue.

 Anyway, let's put express aside.

 It's easy to create non-working setups with pci, today:

 - create 16 pci bridges
 - put one virtio device behind each

 boom

 Try it.

 I want to fix that.


  That's on x86.
 
  Besides x86, there are achitectures where IO is unavailable or very slow.
 
  I don't think we should rush an ABI breakage if the only benefit is
  claiming spec compliance.
  
  Regards,
  
  Anthony Liguori
 
  Why do you bring this up? No one advocates any ABI breakage,
  I only suggest extensions.
 
 It's an ABI breakage.  You're claiming that the guests you tested
 handle the breakage reasonably but it is unquestionably an ABI breakage.
 
 Regards,
 
 Anthony Liguori

 Adding BAR is not an ABI breakage, do we agree on that?

 Disabling IO would be but I am not proposing disabling IO.

 Guests might disable IO.

Look, it's very simple.

If the failure in the guest is that BAR0 mapping fails because the
device is enabled but the BAR is disabled, then you've broken the ABI.

And what's worse is that this isn't for an obscure scenario (like having
15 PCI bridges) but for something that would become the standard
scenario (using a PCI-e bus).

We need to either bump the revision ID or the device ID if we do this.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
 Look, it's very simple.
 We only need to do it if we do a change that breaks guests.

 Please find a guest that is broken by the patches. You won't find any.

I think the problem in this whole discussion is that we're talking past
each other.

Here is my understanding:

1) PCI-e says that you must be able to disable IO bars and still have a
functioning device.

2) It says (1) because you must size IO bars to 4096 which means that
practically speaking, once you enable a dozen or so PIO bars, you run
out of PIO space (16 * 4k == 64k and not all that space can be used).

virtio-pci uses a IO bars exclusively today.  Existing guest drivers
assume that there is an IO bar that contains the virtio-pci registers.

So let's consider the following scenarios:

QEMU of today:

1) qemu -drive file=ubuntu-13.04.img,if=virtio

This works today.  Does adding an MMIO bar at BAR1 break this?
Certainly not if the device is behind a PCI bus...

But are we going to put devices behind a PCI-e bus by default?  Are we
going to ask the user to choose whether devices are put behind a legacy
bus or the express bus?

What happens if we put the device behind a PCI-e bus by default?  Well,
it can still work.  That is, until we do something like this:

2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
-device virtio-balloon..

Such that we have more than a dozen or so devices.  This works
perfectly fine today.  It works fine because we've designed virtio to
make sure it works fine.  Quoting the spec:

Configuration space is generally used for rarely-changing or
 initialization-time parameters. But it is a limited resource, so it
 might be better to use a virtqueue to update configuration information
 (the network device does this for filtering, otherwise the table in the
 config space could potentially be very large).

In fact, we can have 100s of PCI devices today without running out of IO
space because we're so careful about this.

So if we switch to using PCI-e by default *and* we keep virtio-pci
without modifying the device IDs, then very frequently we are going to
break existing guests because the drivers they already have no longer
work.

A few virtio-serial channels, a few block devices, a couple of network
adapters, the balloon and RNG driver, and we hit the IO space limit
pretty damn quickly so this is not a contrived scenario at all.  I would
expect that we frequently run into this if we don't address this problem.

So we have a few options:

1) Punt all of this complexity to libvirt et al and watch people make
   the wrong decisions about when to use PCI-e.  This will become yet
   another example of KVM being too hard to configure.

2) Enable PCI-e by default and just force people to upgrade their
   drivers.

3) Don't use PCI-e by default but still add BAR1 to virtio-pci

4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
   it a new device/vendor ID.   Continue to use virtio-pci for existing
   devices potentially adding virtio-{net,blk,...}-pcie variants for
   people that care to use them.

I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
policy so they're going to make a simple decision and always use the
same bus by default.  I suspect if we made PCI the default, they might
just always set the PCI-e flag just because.

There are hundreds of thousands if not millions of guests with existing
virtio-pci drivers.  Forcing them to upgrade better have an extremely
good justification.

I think 4 is the best path forward.  It's better for users (guests
continue to work as they always have).  There's less confusion about
enabling PCI-e support--you must ask for the virtio-pcie variant and you
must have a virtio-pcie driver.  It's easy to explain.

It also maps to what regular hardware does.  I highly doubt that there
are any real PCI cards that made the shift from PCI to PCI-e without
bumping at least a revision ID.

It also means we don't need to play games about sometimes enabling IO
bars and sometimes not.

Regards,

Anthony Liguori



 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
  Look, it's very simple.
  We only need to do it if we do a change that breaks guests.
 
  Please find a guest that is broken by the patches. You won't find any.
 
 I think the problem in this whole discussion is that we're talking past
 each other.
 
 Here is my understanding:
 
 1) PCI-e says that you must be able to disable IO bars and still have a
 functioning device.
 
 2) It says (1) because you must size IO bars to 4096 which means that
 practically speaking, once you enable a dozen or so PIO bars, you run
 out of PIO space (16 * 4k == 64k and not all that space can be used).


 Let me add 3 other issues which I mentioned and you seem to miss:

 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM

Which architectures have PCI but no IO ports?

 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

 virtio-pci uses a IO bars exclusively today.  Existing guest drivers
 assume that there is an IO bar that contains the virtio-pci registers.
 So let's consider the following scenarios:
 
 QEMU of today:
 
 1) qemu -drive file=ubuntu-13.04.img,if=virtio
 
 This works today.  Does adding an MMIO bar at BAR1 break this?
 Certainly not if the device is behind a PCI bus...
 
 But are we going to put devices behind a PCI-e bus by default?  Are we
 going to ask the user to choose whether devices are put behind a legacy
 bus or the express bus?
 
 What happens if we put the device behind a PCI-e bus by default?  Well,
 it can still work.  That is, until we do something like this:
 
 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
 -device virtio-balloon..
 
 Such that we have more than a dozen or so devices.  This works
 perfectly fine today.  It works fine because we've designed virtio to
 make sure it works fine.  Quoting the spec:
 
 Configuration space is generally used for rarely-changing or
  initialization-time parameters. But it is a limited resource, so it
  might be better to use a virtqueue to update configuration information
  (the network device does this for filtering, otherwise the table in the
  config space could potentially be very large).
 
 In fact, we can have 100s of PCI devices today without running out of IO
 space because we're so careful about this.
 
 So if we switch to using PCI-e by default *and* we keep virtio-pci
 without modifying the device IDs, then very frequently we are going to
 break existing guests because the drivers they already have no longer
 work.
 
 A few virtio-serial channels, a few block devices, a couple of network
 adapters, the balloon and RNG driver, and we hit the IO space limit
 pretty damn quickly so this is not a contrived scenario at all.  I would
 expect that we frequently run into this if we don't address this problem.
 
 So we have a few options:
 1) Punt all of this complexity to libvirt et al and watch people make
the wrong decisions about when to use PCI-e.  This will become yet
another example of KVM being too hard to configure.
 
 2) Enable PCI-e by default and just force people to upgrade their
drivers.
 
 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
 
 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),

 We can't do this - it will hurt performance.

Can you explain?  I thought the whole trick with separating out the
virtqueue notification register was to regain the performance?

give
it a new device/vendor ID.   Continue to use virtio-pci for existing
devices potentially adding virtio-{net,blk,...}-pcie variants for
people that care to use them.
 
 I think 1 == 2 == 3 and I view 2 as an ABI breaker.

 Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

It doesn't change the default behavior but then we're pushing the
decision of when to use pci-e to the user.  They have to understand that
there can be subtle breakages because the virtio-pci driver may not work
if they are using an old guest.

 libvirt does like
 policy so they're

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
   Look, it's very simple.
   We only need to do it if we do a change that breaks guests.
  
   Please find a guest that is broken by the patches. You won't find any.
  
  I think the problem in this whole discussion is that we're talking past
  each other.
  
  Here is my understanding:
  
  1) PCI-e says that you must be able to disable IO bars and still have a
  functioning device.
  
  2) It says (1) because you must size IO bars to 4096 which means that
  practically speaking, once you enable a dozen or so PIO bars, you run
  out of PIO space (16 * 4k == 64k and not all that space can be used).
 
 
 Let me add 3 other issues which I mentioned and you seem to miss:
 
 3) architectures which don't have fast access to IO ports, exist
virtio does not work there ATM
 
 4) setups with many PCI bridges exist and have the same issue
as PCI express. virtio does not work there ATM
 
 5) On x86, even with nested page tables, firmware only decodes
the page address on an invalid PTE, not the data. You need to
emulate the guest to get at the data. Without
nested page tables, we have to do page table walk and emulate
to get both address and data. Since this is how MMIO
is implemented in kvm on x86, MMIO is much slower than PIO
(with nested page tables by a factor of 2, did not test without).

 Oh I forgot:

 6) access to MMIO BARs is painful in the BIOS environment
so BIOS would typically need to enable IO for the boot device.

But if you want to boot from the 16th device, the BIOS needs to solve
this problem anyway.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
 
 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.
 

 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.

Do legacy endpoints also use 4k for BARs?

If not, can't we use a new device id for native endpoints and call it a
day?  Legacy endpoints would continue using the existing BAR layout.

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
 Can you explain?  I thought the whole trick with separating out the
 virtqueue notification register was to regain the performance?

 Yes but this trick only works well with NPT (it's still a bit
 slower than PIO but not so drastically).
 Without NPT you still need a page walk so it will be slow.

Do you mean NPT/EPT?

If your concern is shadow paging, then I think you're concerned about
hardware that is so slow to start with that it's not worth considering.

  It also maps to what regular hardware does.  I highly doubt that there
  are any real PCI cards that made the shift from PCI to PCI-e without
  bumping at least a revision ID.
 
  Only because the chance it's 100% compatible on the software level is 0.
  It always has some hardware specific quirks.
  No such excuse here.
 
  It also means we don't need to play games about sometimes enabling IO
  bars and sometimes not.
 
  This last paragraph is wrong, it ignores the issues 3) to 5) 
  I added above.
 
  If you do take them into account:
 - there are reasons to add MMIO BAR to PCI,
   even without PCI express
 
 So far, the only reason you've provided is it doesn't work on some
 architectures.  Which architectures?

 PowerPC wants this.

Existing PowerPC remaps PIO to MMAP so it works fine today.

Future platforms may not do this but future platforms can use a
different device.  They certainly won't be able to use the existing
drivers anyway.

Ben, am I wrong here?

 - we won't be able to drop IO BAR from virtio
 
 An IO BAR is useless if it means we can't have more than 12 devices.


 It's not useless. A smart BIOS can enable devices one by one as
 it tries to boot from them.

A smart BIOS can also use MMIO to program virtio.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 02:50 PM, Anthony Liguori wrote:
 H. Peter Anvin h...@zytor.com writes:
 
 On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:

 Spec says IO and memory can be enabled/disabled, separately.
 PCI Express spec says devices should work without IO.


 For native endpoints.  Currently virtio would be a legacy endpoint
 which is quite correct -- it is compatible with a legacy interface.
 
 Do legacy endpoints also use 4k for BARs?

 There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
 is no technical enforcement, however) to 256 bytes.

 The 4K come from the upstream bridge windows, which are only 4K granular
 (historic stuff from when bridges were assumed rare.)  However, there
 can be multiple devices, functions, and BARs inside that window.

Got it.


 The issue with PCIe is that each PCIe port is a bridge, so in reality
 there is only one real device per bus number.

 If not, can't we use a new device id for native endpoints and call it a
 day?  Legacy endpoints would continue using the existing BAR layout.

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
forward.  Is there something special about PCI-e here?

Regards,

Anthony Liguori


   -hpa

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:

 A smart BIOS can also use MMIO to program virtio.

 Indeed :-)

 I see no reason why not providing both access path though. Have the PIO
 BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
 MMIO window which I'd be happy to favor on power.

 We could even put somewhere in there a feature bit set by qemu to
 indicate whether it thinks PIO or MMIO is faster on a given platform if
 you really think that's worth it (I don't).

That's okay, but what I'm most concerned about is compatibility.

A virtio PCI device that's a native endpoint needs to have a different
device ID than one that is a legacy endpoint.  The current drivers
have no hope of working (well) with virtio PCI devices exposed as native
endpoints.

I don't care if the native PCI endpoint also has a PIO bar.  But it
seems silly (and confusing) to me to make that layout be the legacy
layout verses a straight mirror of the new layout if we're already
changing the device ID.

In addition, it doesn't seem at all necessary to have an MMIO bar to the
legacy device.  If the reason you want MMIO is to avoid using PIO, then
you break existing drivers because they assume PIO.  If you are breaking
existing drivers then you should change the device ID.

If strictly speaking it's just that MMIO is a bit faster, I'm not sure
that complexity is worth it without seeing performance numbers first.

Regards,

Anthony Liguori


 Cheers,
 Ben.


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
H. Peter Anvin h...@zytor.com writes:

 On 06/05/2013 03:08 PM, Anthony Liguori wrote:

 Definitely an option.  However, we want to be able to boot from native
 devices, too, so having an I/O BAR (which would not be used by the OS
 driver) should still at the very least be an option.
 
 What makes it so difficult to work with an MMIO bar for PCI-e?
 
 With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
 forward.  Is there something special about PCI-e here?
 

 It's not tracking allocation.  It is that accessing memory above 1 MiB
 is incredibly painful in the BIOS environment, which basically means
 MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Regards,

Anthony Liguori


   -hpa


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Kevin O'Connor ke...@koconnor.net writes:

 On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
 There were discussions on potentially introducing a middle component
 to generate the tables.  Coreboot was raised as a possibility, and
 David thought it would be okay to use coreboot for both OVMF and
 SeaBIOS.  The possibility was also raised of a rom that lives in the
 qemu repo, is run in the guest, and generates the tables (which is
 similar to the hvmloader approach that Xen uses).

 Given the objections to implementing ACPI directly in QEMU, one
 possible way forward would be to split the current SeaBIOS rom into
 two roms: qvmloader and seabios.  The qvmloader would do the
 qemu specific platform init (pci init, smm init, mtrr init, bios
 tables) and then load and run the regular seabios rom.  With this
 split, qvmloader could be committed into the QEMU repo and maintained
 there.  This would be analogous to Xen's hvmloader with the seabios
 code used as a starting point to implement it.

What about a small change to the SeaBIOS build system to allow ACPI
table generation to be done via a plugin.

This could be as simple as moving acpi.c and *.dsl into the QEMU build
tree and then having a way to point the SeaBIOS makefiles to our copy of
it.

Then the logic is maintained stays in firmware but the churn happens in
the QEMU tree instead of the SeaBIOS tree.

Regards,

Anthony Liguori


 With both the hardware implementation and acpi descriptions for that
 hardware in the same source code repository, it would be possible to
 implement changes to both in a single patch series.  The fwcfg entries
 used to pass data between qemu and qvmloader could also be changed in
 a single patch and thus those fwcfg entries would not need to be
 considered a stable interface.  The qvmloader code also wouldn't need
 the 16bit handlers that seabios requires and thus wouldn't need the
 full complexity of the seabios build.  Finally, it's possible that
 both ovmf and seabios could use a single qvmloader implementation.

 On the down side, reboots can be a bit goofy today in kvm, and that
 would need to be settled before something like qvmloader could be
 implemented.  Also, it may be problematic to support passing of bios
 tables from qvmloader to seabios for guests with only 1 meg of ram.

 Thoughts?
 -Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek ler...@redhat.com writes:

 On 05/31/13 09:09, Jordan Justen wrote:

 Due to licensing differences I can't just port code from SeaBIOS to
 OVMF

soapbox

Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
solvable problem.

Rewriting BSD implementations of everything is silly.  Every other
vendor that uses TianoCore has a proprietary fork.  Maintaining a GPL
fork seems just as reasonable.

/soapbox

Regards,

Anthony Liguori

 (and I never have without explicit permission), so it's been a lot of
 back and forth with acpidump / iasl -d in guests (massage OVMF, boot
 guest, check guest dmesg / lspci, dump tables, compare, repeat), brain
 picking colleagues, the ACPI and PIIX specs and so on. I have a page on
 the RH intranet dedicated to this. When something around these parts is
 being changed (or looks like it could be changed) in SeaBIOS, or between
 qemu and SeaBIOS, I always must be alert and consider reimplementing it
 in, or porting it with permission to, OVMF. (Most recent example:
 pvpanic device -- currently only in SeaBIOS.)

 It worries me that if I slack off, or am busy with something else, or
 simply don't notice, then the gap will widen again. I appreciate
 learning a bunch about ACPI, and don't mind the days of work that went
 into some of my simple-looking ACPI patches for OVMF, but had the tables
 come from a common (programmatic) source, none of this would have been
 an issue, and I wouldn't have felt even occasionally that ACPI patches
 for OVMF were both duplicate work *and* futile (considering how much
 ahead SeaBIOS was).

 I don't mind reimplementing stuff, or porting it with permission, going
 forward, but the sophisticated parts in SeaBIOS are a hard nut. For
 example I'll never be able to auto-extract offsets from generated AML
 and patch the AML using those offsets; the edk2 build tools (a project
 separate from edk2) don't support this, and it takes several months to
 get a thing as simple as gcc-47 build flags into edk2-buildtools.

 Instead I have to write template ASL, compile it to AML, hexdump the
 result, verify it against the AML grammar in the ACPI spec (offsets
 aren't obvious, BytePrefix and friends are a joy), define  initialize a
 packed struct or array in OVMF, and patch the template AML using fixed
 field names or array subscripts. Workable, but dog slow. If the ACPI
 payload came from up above, we might be as well provided with a list of
 (canonical name, offset, size) triplets, and could perhaps blindly patch
 the contents. (Not unlike Michael's linker code for connecting tables
 into a hierarchy.)

 AFAIK most recently iasl got built-in support for offset extraction (and
 in the process the current SeaBIOS build method was broken...), so that
 part might get easier in the future.

 Oh well it's Friday, sorry about this rant! :) I'll happily do what I
 can in the current status quo, but frequently, it won't amount to much.

 Thanks,
 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek ler...@redhat.com writes:

 On 05/31/13 15:04, Anthony Liguori wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 05/31/13 09:09, Jordan Justen wrote:

 Due to licensing differences I can't just port code from SeaBIOS to
 OVMF
 
 soapbox

 :)

 Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
 solvable problem.

 It's not optimal for the upstream first principle;

still on soapbox

OVMF is not Open Source so upstream first doesn't apply.  At least,
the FAT module is not Open Source.

Bullet 8 from the Open Source Definition[1]

8. License Must Not Be Specific to a Product

The rights attached to the program must not depend on the program's
being part of a particular software distribution. If the program is
extracted from that distribution and used or distributed within the
terms of the program's license, all parties to whom the program is
redistributed should have the same rights as those that are granted in
conjunction with the original software distribution.

License from OVMF FAT module[2]:

Additional terms: In addition to the forgoing, redistribution and use
of the code is conditioned upon the FAT 32 File System Driver and all
derivative works thereof being used for and designed only to read and/or
write to a file system that is directly managed by: Intel’s Extensible
Firmware Initiative (EFI) Specification v. 1.0 and later and/or the
Unified Extensible Firmware Interface (UEFI) Forum’s UEFI Specifications
v.2.0 and later (together the “UEFI Specifications”); only as necessary
to emulate an implementation of the UEFI Specifications; and to create
firmware, applications, utilities and/or drivers.

[1] http://opensource.org/osd-annotated
[2] 
http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=Edk2-fat-driver

AFAIK, for the systems that we'd actually want to use OVMF for, a FAT
module is a hard requirement.

 we'd have to
 backport upstream edk2 patches forever (there's a whole lot of edk2
 modules outside of direct OvmfPkg that get built into OVMF.fd -- OvmfPkg
 only customizes / cherry-picks the full edk2 tree for virtual
 machines), or to periodically rebase an ever-increasing set of patches.

 Independently, we need *some* FAT driver (otherwise you can't even boot
 most installer media), which is where the already discussed worries lie.
 Whatever solves this aspect is independent of forking all of edk2.

It's either Open Source or it's not.  It's currently not.  I have a hard
time sympathesizing with trying to work with a proprietary upstream.

 Rewriting BSD implementations of everything is silly.  Every other
 vendor that uses TianoCore has a proprietary fork.

 Correct, but they (presumably) keep rebasing their ever accumulating
 stuff at least on the periodically refreshed stable edk2 subset
 (UDK2010, which BTW doesn't include OvmfPkg). This must be horrible for
 them, but in exchange they get to remain proprietary (which may benefit
 them commercially).

 Maintaining a GPL
 fork seems just as reasonable.

 Perhaps; diverging from upstream first would hurt for certain.

Well I'm suggesting creating a real upstream (that is actually Open
Source).  Then I'm all for upstream first.

In terms of creating a FAT module, the most likely source would seem to
be the kernel code and since that's GPL, I don't think it's terribly
avoidable to end up with a GPL'd uefi implementation.

If that's inevitable, then we're wasting effort by rewriting stuff under
a BSD license.

Regards,

Anthony Liguori


 /soapbox

 Thanks for the suggestion :)
 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
David Woodhouse dw...@infradead.org writes:

 On Fri, 2013-05-31 at 08:04 -0500, Anthony Liguori wrote:
 
 soapbox
 
 Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
 solvable problem.

 Heh. Actually it doesn't need to be a fork. It's modular, and the FAT
 driver is just a single module. Which is actually included in *binary*
 form in the EDK2 repository, I believe, and its source code is
 elsewhere.

 We could happily make a GPL¹ or LGPL implementation of a FAT module and
 build our OVMF with that instead, and we wouldn't need to fork OVMF at
 all.

So can't we have GPL virtio modules too?  I don't think there's any
problem there except for the FAT module.

I would propose more of a virtual fork.  It could consist of a git repo with
the GPL modules + a submodule for edk2.  Ideally, there would be no need
to actually fork edk2.

My assumption is that edk2 won't take GPL code.  But does ovmf really
need to live in the edk2 tree?

If we're going to get serious about supporting OVMF, it we need
something that isn't proprietary.

 -- 
 dwmw2

 ¹ If it's GPL, of course, then we mustn't include any *other* binary
 blobs in our OVMF build. But the whole point in this conversation is
 that we don't *want* to do that. So that's fine.

It's even more fundamental.  OVMF as a whole (at least in it's usable
form) is not Open Source.  Without even tackling the issue of GPL code
sharing, that is a fundamental problem that needs to be solved if we're
going to serious about making changes to QEMU to support it.

I think solving the general problem will also enable GPL code sharing
though.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
David Woodhouse dw...@infradead.org writes:

 On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote:
 It's even more fundamental.  OVMF as a whole (at least in it's usable
 form) is not Open Source. 

 The FAT module is required to make EDK2 usable, and yes, that's not Open
 Source. So in a sense you're right.

 But we're talking here about *replacing* the FAT module with something
 that *is* open source. And the FAT module isn't a fundamental part of
 EDK2; it's just an optional module that happens to be bundled with the
 repository.

So *if* we replace the FAT module *and* that replacement was GPL, would
there be any objects to having more GPL modules for things like virtio,
ACPI, etc?

And would that be doable in the context of OVMF or would another project
need to exist for this purpose?

 So I think you're massively overstating the issue. OVMF/EDK2 *is* Open
 Source, and replacing the FAT module really isn't that hard.

 We can only bury our heads in the sand and ship qemu with
 non-EFI-capable firmware for so long...

Which is why I think we need to solve the real problem here.

 I *know* there's more work to be done. We have SeaBIOS-as-CSM, Jordan
 has mostly sorted out the NV variable storage, and now the FAT issue is
 coming up to the top of the pile. But we aren't far from the point where
 we can realistically say that we want the Open Source OVMF to be the
 default firmware shipped with qemu.

Yes, that's why I'm raising this now.  We all knew that we'd have to
talk about this eventually.

Regards,

Anthony Liguori


 -- 
 dwmw2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek ler...@redhat.com writes:

 On 05/31/13 16:38, Anthony Liguori wrote:

 It's either Open Source or it's not.  It's currently not.

 I disagree with this binary representation of Open Source or Not. If it
 weren't (mostly) Open Source, how could we fork (most of) it as you're
 suggesting (from the soapbox :))?

 I have a hard
 time sympathesizing with trying to work with a proprietary upstream.

 My experience has been positive.

 First of all, whether UEFI is a good thing or not is controversial. I
 won't try to address that.

 However UEFI is here to stay, machines are being shipped with it, Linux
 and other OSen try to support it. Developing (or running) an OS in
 combination with a specific firmware is sometimes easier / more economic
 in a virtual environment, hence there should be support for qemu + UEFI.
 It is this mindset that I operate in. (Oh, I also forgot to mention that
 this task has been assigned to me by my superiors as well :))

 Jordan, the OvmfPkg maintainer is responsive and progressive in the true
 FLOSS manner (*), which was a nice surprise for a project whose coding
 standards for example are made 100% after Windows source code, and whose
 mailing list is mostly subscribed to by proprietary vendors. Really when
 it comes to OvmfPkg patches the process follows the normal FLOSS
 development model.

 (*) Jordan, I hope this will prompt you to merge VirtioNetDxe v4 real
 soon now :)

(Removing seabios from the CC as we've moved far away from seabios as a topic)

Just so no one gets the wrong idea, the OVMF team is now a victim of
their own success.  I had hoped that no one would do the work necessary
to get us to the point where we had to seriously think about UEFI
support but that's where we are now :-)

 Thus far we've been talking copyright rather than patents, but there's
 also this:

 http://en.wikipedia.org/wiki/FAT_filesystem#Challenge
 http://en.wikipedia.org/wiki/FAT_filesystem#Patent_infringement_lawsuits

 It almost doesn't matter who prevails in such a lawsuit; the
 *possibility* of such a lawsuit gives people cold feet. Blame the
 USPTO.

Just to say it once so I don't have to ever say it again.

I'm not going to discuss anything relating to patents and FAT publicly.
Everyone should consult with their respective lawyers on such issues.

Copyright is straight forward.  Patents are not.

Regards,

Anthony Liguori


 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Paolo Bonzini pbonz...@redhat.com writes:

 Il 31/05/2013 19:06, Anthony Liguori ha scritto:
 David Woodhouse dw...@infradead.org writes:
 
 On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote:
 It's even more fundamental.  OVMF as a whole (at least in it's usable
 form) is not Open Source. 

 The FAT module is required to make EDK2 usable, and yes, that's not Open
 Source. So in a sense you're right.

 But we're talking here about *replacing* the FAT module with something
 that *is* open source. And the FAT module isn't a fundamental part of
 EDK2; it's just an optional module that happens to be bundled with the
 repository.
 
 So *if* we replace the FAT module *and* that replacement was GPL, would
 there be any objects to having more GPL modules for things like virtio,
 ACPI, etc?
 
 And would that be doable in the context of OVMF or would another project
 need to exist for this purpose?

 I don't think it would be doable in TianoCore.  I think it would end up
 either in distros, or in QEMU.

As I think more about it, I think forking edk2 is inevitable.  We need a
clean repo that doesn't include the proprietary binaries.  I doubt
upstream edk2 is willing to remove the binaries.

But this can be quite simple using a combination of git-svn and a
rewriting script.  We did exactly this to pull out the VGABios from
Bochs and remove the binaries associated with it.  It's 100% automated
and can be kept in sync via a script on qemu.org.

 A separate question is whether OVMF makes more sense as part of
 TianoCore or rather as part of QEMU.

I'm not sure if qemu.git is the right location, but we can certainly
host an ovmf.git on qemu.git that embeds the scrubbed version of
edk2.git.

Of course, this would enable us to add GPL code (including a FAT module)
to ovmf.git without any impact on upstream edk2.

 With 75% of the free hypervisors
 now reunited under the same source repository, the balance is
 tilting...

insert evil laugh :-)

Regards,

Anthony Liguori


 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Jordan Justen jljus...@gmail.com writes:

 On Fri, May 31, 2013 at 7:38 AM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 In terms of creating a FAT module, the most likely source would seem to
 be the kernel code and since that's GPL, I don't think it's terribly
 avoidable to end up with a GPL'd uefi implementation.

 Why would OpenBSD not be a potential source?

 http://www.openbsd.org/cgi-bin/cvsweb/src/sys/msdosfs/

If someone is going to do it, that's fine.

But if me, it's going to be a GPL base.  Actually, enabling GPL
contributions to OVMF is a major motivating factor for me in this whole
discussion.

Regards,

Anthony Liguori


 We have a half-done ext2 fs from GSoC2011 that started with OpenBSD.

 https://github.com/the-ridikulus-rat/Tianocore_Ext2Pkg

 If that's inevitable, then we're wasting effort by rewriting stuff under
 a BSD license.

 Regards,

 Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Jordan Justen jljus...@gmail.com writes:

 On Fri, May 31, 2013 at 11:35 AM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 As I think more about it, I think forking edk2 is inevitable.  We need a
 clean repo that doesn't include the proprietary binaries.  I doubt
 upstream edk2 is willing to remove the binaries.

 No, probably not unless a BSD licensed alternative was available. :)

 But, in thinking about what might make sense for EDK II with git, one
 option that should be considered is breaking the top-level 'packages'
 into separate sub-modules. I had gone so far as to start pushing repos
 as sub-modules.

 But, as the effort to convert EDK II to git has stalled (actually
 never even thought about leaving the ground), I abandoned that
 approach and went back to just mirroring one EDK II.

 I could fairly easily re-enable mirror the sub-set of packages needed
 for OVMF. So, in that case, the FatBinPkg sub-module could easily be
 dropped from a tree.

 But this can be quite simple using a combination of git-svn and a
 rewriting script.  We did exactly this to pull out the VGABios from
 Bochs and remove the binaries associated with it.  It's 100% automated
 and can be kept in sync via a script on qemu.org.

 I would love to mirror the BaseTools as a sub-package without all the
 silly windows binaries... What script did you guys use?

We did this in git pre-history, now git has a fancy git-filter-branch
command that makes it a breeze:

http://git-scm.com/book/ch6-4.html

Regards,

Anthony Liguori


 -Jordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.
 
 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

 The model is very interesting for QEMU because then we can use vhost as
 a backend for other types of network adapters (like vmxnet3 or even
 e1000).

 It also helps for things like fault tolerance where we need to be able
 to control packet flow within QEMU.

 (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).

 Then I'm really confused as to what this would look like.  A zero copy
 sendmsg?  We should be able to implement that today.

The only trouble with sendmsg would be doing batch submission and
asynchronous completion.

A thread pool could certainly be used for this I guess.

Regards,

Anthony Liguori

 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

 Confused,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
Stefan Hajnoczi stefa...@gmail.com writes:

 On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.

 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

 The model is very interesting for QEMU because then we can use vhost as
 a backend for other types of network adapters (like vmxnet3 or even
 e1000).

 It also helps for things like fault tolerance where we need to be able
 to control packet flow within QEMU.

 (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).

 Then I'm really confused as to what this would look like.  A zero copy
 sendmsg?  We should be able to implement that today.

 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

 Sounds like recvmmsg(2).

Could we map this to mergable rx buffers though?

Regards,

Anthony Liguori


 Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Forcing a guest driver change is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 So we're stuck with the 1.0 config layout for a very long time.

 We definitely must not force a guest change.  The explicit aim of the
 standard is that legacy and 1.0 be backward compatible.  One
 deliverable is a document detailing how this is done (effectively a
 summary of changes between what we have and 1.0).

If 2.0 is fully backwards compatible, great.  It seems like such a
difference that that would be impossible but I need to investigate
further.

Regards,

Anthony Liguori


 It's a delicate balancing act.  My plan is to accompany any changes in
 the standard with a qemu implementation, so we can see how painful those
 changes are.  And if there are performance implications, measure them.

 Cheers,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
 Stefan Hajnoczi stefa...@gmail.com writes:
 
  On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au 
  wrote:
  Anthony Liguori anth...@codemonkey.ws writes:
  Rusty Russell ru...@rustcorp.com.au writes:
  On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
  FWIW, I think what's more interesting is using vhost-net as a 
  networking
  backend with virtio-net in QEMU being what's guest facing.
 
  In theory, this gives you the best of both worlds: QEMU acts as a first
  line of defense against a malicious guest while still getting the
  performance advantages of vhost-net (zero-copy).
 
  It would be an interesting idea if we didn't already have the vhost
  model where we don't need the userspace bounce.
 
  The model is very interesting for QEMU because then we can use vhost as
  a backend for other types of network adapters (like vmxnet3 or even
  e1000).
 
  It also helps for things like fault tolerance where we need to be able
  to control packet flow within QEMU.
 
  (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
 
  Then I'm really confused as to what this would look like.  A zero copy
  sendmsg?  We should be able to implement that today.
 
  On the receive side, what can we do better than readv?  If we need to
  return to userspace to tell the guest that we've got a new packet, we
  don't win on latency.  We might reduce syscall overhead with a
  multi-dimensional readv to read multiple packets at once?
 
  Sounds like recvmmsg(2).
 
 Could we map this to mergable rx buffers though?
 
 Regards,
 
 Anthony Liguori

 Yes because we don't have to complete buffers in order.

What I meant though was for GRO, we don't know how large the received
packet is going to be.  Mergable rx buffers lets us allocate a pool of
data for all incoming packets instead of allocating max packet size *
max packets.

recvmmsg expects an array of msghdrs and I presume each needs to be
given a fixed size.  So this seems incompatible with mergable rx
buffers.

Regards,

Anthony Liguori


 
  Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

 It is pretty ugly.

I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.

 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.

From an API design point of view, here are the problems I see:

1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.

 So AFAICT the question is, do we put the required

 #define VIRTIO_PCI_CFG_FEATURE_SEL \
  (offsetof(struct virtio_pci_common_cfg, device_feature_select))

 etc. in the kernel headers or qemu?

I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.

 Haven't looked at the proposed new ring layout yet.

 No change, but there's an open question on whether we should nail it to
 little endian (or define the endian by the transport).

 Of course, I can't rule out that the 1.0 standard *may* decide to frob
 the ring layout somehow,

Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori

 but I'd think it would require a compelling
 reason.  I suggest that's 2.0 material...

 Cheers,
 Rusty.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-29 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
  On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
   Hey guys,
   I've updated the kvm networking todo wiki with current projects.
   Will try to keep it up to date more often.
   Original announcement below.
  
  Thanks a lot. I've added the tasks I'm currently working on to the wiki.
  
  btw. I notice the virtio-net data plane were missed in the wiki. Is the
  project still being considered?
 
  It might have been interesting several years ago, but now that linux has
  vhost-net in kernel, the only point seems to be to
  speed up networking on non-linux hosts.
 
 Data plane just means having a dedicated thread for virtqueue processing
 that doesn't hold qemu_mutex.
 
 Of course we're going to do this in QEMU.  It's a no brainer.  But not
 as a separate device, just as an improvement to the existing userspace
 virtio-net.
 
  Since non-linux does not have kvm, I doubt virtio is a bottleneck.
 
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.
 
 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 Great idea, that sounds very intresting.

 I'll add it to the wiki.

 In fact a bit of complexity in vhost was put there in the vague hope to
 support something like this: virtio rings are not translated through
 regular memory tables, instead, vhost gets a pointer to ring address.

 This allows qemu acting as a man in the middle,
 verifying the descriptors but not touching the

 Anyone interested in working on such a project?

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

The model is very interesting for QEMU because then we can use vhost as
a backend for other types of network adapters (like vmxnet3 or even
e1000).

It also helps for things like fault tolerance where we need to be able
to control packet flow within QEMU.

Regards,

Anthony Liguori

 We already have two
 sets of host side ring code in the kernel (vhost and vringh, though
 they're being unified).

 All an accelerator can offer on the tx side is zero copy and direct
 update of the used ring.  On rx userspace could register the buffers and
 the accelerator could fill them and update the used ring.  It still
 needs to deal with merged buffers, for example.

 You avoid the address translation in the kernel, but I'm not convinced
 that's a key problem.

 Cheers,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

 You expect a compiler to pad this structure:

 struct foo {
   uint8_t a;
   uint8_t b;
   uint16_t c;
   uint32_t d;
 };

 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

There are other responses in the thread here and I don't really care to
bikeshed on this issue.

 Well, given that virtio is widely deployed today, I would think the 1.0
 standard should strictly reflect what's deployed today, no?
 Any new config layout would be 2.0 material, right?

 Not as it's currently planned. Devices can choose
 to support a legacy layout in addition to the new one,
 and if you look at the patch you will see that that
 is exactly what it does.

Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 

 Re: the new config layout, I don't think we would want to use it for
 anything but new devices.  Forcing a guest driver change

 There's no forcing.
 If you look at the patches closely, you will see that
 we still support the old layout on BAR0.


 is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 There are many a compelling reasons, and they are well known
 limitations of virtio PCI:

 - PCI spec compliance (madates device operation with IO memory
 disabled).

PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.

 - support 64 bit addressing

We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.

 - add more than 32 feature bits.
 - individually disable queues.
 - sanely support cross-endian systems.
 - support very small (1 PAGE) for virtio rings.
 - support a separate page for each vq kick.
 - make each device place config at flexible offset.

None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori


 Addressing any one of these would cause us to add a substantially new
 way to operate virtio devices.

 And since it's a guest change anyway, it seemed like a
 good time to do the new layout and fix everything in one go.

 And they are needed like yesterday.


 So we're stuck with the 1.0 config layout for a very long time.
 
 Regards,
 
 Anthony Liguori

 Absolutely. This patch let us support both which will allow for
 a gradual transition over the next 10 years or so.

  reason.  I suggest that's 2.0 material...
 
  Cheers,
  Rusty.
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Paolo Bonzini pbonz...@redhat.com writes:

 Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
 You expect a compiler to pad this structure:
 
 struct foo {
  uint8_t a;
  uint8_t b;
  uint16_t c;
  uint32_t d;
 };
 
 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

 You know the virtio-pci config structures are padded, but not all of
 them are.  For example, virtio_balloon_stat is not padded and indeed has
 an __attribute__((__packed__)) in the spec.

Not that these structures are actually used for something.

We store the config in these structures so they are actually used for
something.

The proposed structures only serve as a way to express offsets.  You
would never actually have a variable of this type.

Regards,

Anthony Liguori


 For this reason I prefer to have the attribute everywhere.  So people
 don't have to wonder why it's here and not there.

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  I'm guessing any compiler that decides to waste memory in this way
  will quickly get dropped by users and then we won't worry
  about building QEMU with it.
 
 There are other responses in the thread here and I don't really care to
 bikeshed on this issue.

 Great. Let's make the bikeshed blue then?

It's fun to argue about stuff like this and I certainly have an opinion,
but I honestly don't care all that much about the offsetof thing.
However...



  Well, given that virtio is widely deployed today, I would think the 1.0
  standard should strictly reflect what's deployed today, no?
  Any new config layout would be 2.0 material, right?
 
  Not as it's currently planned. Devices can choose
  to support a legacy layout in addition to the new one,
  and if you look at the patch you will see that that
  is exactly what it does.
 
 Adding a new BAR most certainly requires bumping the revision ID or
 changing the device ID, no?

 No, why would it?

If we change the programming interface for a device in a way that is
incompatible, we are required to change the revision ID and/or device
ID.

 If a device dropped BAR0, that would be a good reason
 to bump revision ID.
 We don't do this yet.

But we have to drop BAR0 to put it behind a PCI express bus, right?

If that's the case, then device that's exposed on the PCI express bus
must use a different device ID and/or revision ID.

That means a new driver is needed in the guest.

 Didn't we run into this problem with the virtio-win drivers with just
 the BAR size changing? 

 Because they had a bug: they validated BAR0 size. AFAIK they don't care
 what happens with other bars.

I think there's a grey area with respect to the assumptions a device can
make about the programming interface.

But very concretely, we cannot expose virtio-pci-net via PCI express
with BAR0 disabled because that will result in existing virtio-pci Linux
drivers breaking.

 Not we. The BIOS can disable IO BAR: it can do this already
 but the device won't be functional.

But the only way to expose the device over PCI express is to disable the
IO BAR, right?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
 On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
  Juan is not available now, and Anthony asked for
  agenda to be sent early.
  So here comes:
  
  Agenda for the meeting Tue, May 28:
  
  - Generating acpi tables
 
 I didn't see any meeting notes, but I thought it would be worthwhile
 to summarize the call.  This is from memory so correct me if I got
 anything wrong.
 
 Anthony believes that the generation of ACPI tables is the task of the
 firmware.  Reasons cited include security implications of running more
 code in qemu vs the guest context, complexities in running iasl on
 big-endian machines, possible complexity of having to regenerate
 tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
 that QOM interface should be sufficient.
 
 Kevin believes that the bios table code should be moved up into QEMU.
 Reasons cited include the churn rate in SeaBIOS for this QEMU feature
 (15-20% of all SeaBIOS commits since integrating with QEMU have been
 for bios tables; 20% of SeaBIOS commits in last year), complexity of
 trying to pass all the content needed to generate the tables (eg,
 device details, power tree, irq routing), complexity of scheduling
 changes across different repos and synchronizing their rollout,
 complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
 wasn't aware of a requirement to regenerate acpi tables on a vm
 reboot.

 I think this last one is based on a misunderstanding: it's based
 on assumption that we we change hardware by hotplug
 we should regenerate the tables to match.
 But there's no management that can take advantage of
 this.
 Two possible reasonable things we can tell management:
 - hotplug for device XXX is not supported: restart qemu
   to make guest use the device
 - hotplug for device XXX is supported

This introduces an assumption: that the device model never radically
changes across resets.

Why should this be true?  Shouldn't we be allowed to increase the amount
of memory the guest has across reboots?  That's equivalent to adding
another DIMM after power off.

Not generating tables on reset does limit what we can do in a pretty
fundamental way.  Even if you can argue it in the short term, I don't
think it's viable in the long term.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-29 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

 On 05/29/13 01:53, Kevin O'Connor wrote:
 On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
 Juan is not available now, and Anthony asked for
 agenda to be sent early.
 So here comes:

 Agenda for the meeting Tue, May 28:

 - Generating acpi tables
 
 I didn't see any meeting notes, but I thought it would be worthwhile
 to summarize the call.  This is from memory so correct me if I got
 anything wrong.
 
 Anthony believes that the generation of ACPI tables is the task of the
 firmware.  Reasons cited include security implications of running more
 code in qemu vs the guest context,

 I fail to see the security issues here.  It's not like the apci table
 generation code operates on untrusted input from the guest ...

But possibly untrusted input from a malicious user.  You can imagine
something like a IaaS provider that let's a user input arbitrary values
for memory, number of nics, etc.

It's a stretch of an example, I agree, but the general principle I think
is sound:  we should push as much work as possible to the least
privileged part of the stack.  In this case, firmware has much less
privileges than QEMU.

 complexities in running iasl on
 big-endian machines,

 We already have a bunch of prebuilt blobs in the qemu repo for simliar
 reasons, we can do that with iasl output too.

 possible complexity of having to regenerate
 tables on a vm reboot,

 Why tables should be regenerated at reboot?  I remember hotplug being
 mentioned in the call.  Hmm?  Which hotplugged component needs acpi
 table updates to work properly?  And what is the point of hotplugging if
 you must reboot the guest anyway to get the acpi updates needed?
 Details please.

See my response to Michael.

 Also mentioned in the call: architectural reasons, which I understand
 as real hardware works that way.  Correct.  But qemu's virtual
 hardware is configurable in more ways than real hardware, so we have
 different needs.  For example: pci slots can or can't be hotpluggable.
 On real hardware this is fixed.  IIRC this is one of the reasons why we
 have to patch acpi tables.

It's not really fixed.  Hardware supports PCI expansion chassises.
Multi-node NUMA systems also affect the ACPI tables.

 overall sloppiness of doing it in QEMU.

 /me gets the feeling that this is the *main* reason, given that the
 other ones don't look very convincing to me.

 Raised
 that QOM interface should be sufficient.

 Agree on this one.  Ideally the acpi table generation code should be
 able to gather all information it needs from the qom tree, so it can be
 a standalone C file instead of being scattered over all qemu.

Ack.  So my basic argument is why not expose the QOM interfaces to
firmware and move the generation code there?  Seems like it would be
more or less a copy/paste once we had a proper implementation in QEMU.

 There were discussions on potentially introducing a middle component
 to generate the tables.  Coreboot was raised as a possibility, and
 David thought it would be okay to use coreboot for both OVMF and
 SeaBIOS.

 Certainly an option, but that is a long-term project.

Out of curiousity, are there other benefits to using coreboot as a core
firmware in QEMU?

Is there a payload we would ever plausibly use besides OVMF and SeaBIOS?

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 This adds support for new config, and is designed to work with
 the new layout code in Rusty's new layout branch.

 At the moment all fields are in the same memory BAR (bar 2).
 This will be used to test performance and compare
 memory, io and hypercall latency.

 Compiles but does not work yet.
 Migration isn't handled yet.

 It's not clear what do queue_enable/queue_disable
 fields do, not yet implemented.

 Gateway for config access with config cycles
 not yet implemented.

 Sending out for early review/flames.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/virtio/virtio-pci.c | 393 
 +++--
  hw/virtio/virtio-pci.h |  55 +++
  hw/virtio/virtio.c |  20 +++
  include/hw/virtio/virtio.h |   4 +
  4 files changed, 458 insertions(+), 14 deletions(-)

 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index 752991a..f4db224 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
 *proxy)
  proxy-ioeventfd_started = false;
  }
  
 +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
 +{
 +VirtIODevice *vdev = proxy-vdev;
 +
 +if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 +virtio_pci_stop_ioeventfd(proxy);
 +}
 +
 +virtio_set_status(vdev, val  0xFF);
 +
 +if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 +virtio_pci_start_ioeventfd(proxy);
 +}
 +
 +if (vdev-status == 0) {
 +virtio_reset(proxy-vdev);
 +msix_unuse_all_vectors(proxy-pci_dev);
 +}
 +}
 +
  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
 @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
 addr, uint32_t val)
  }
  break;
  case VIRTIO_PCI_STATUS:
 -if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 -virtio_pci_stop_ioeventfd(proxy);
 -}
 -
 -virtio_set_status(vdev, val  0xFF);
 -
 -if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 -virtio_pci_start_ioeventfd(proxy);
 -}
 -
 -if (vdev-status == 0) {
 -virtio_reset(proxy-vdev);
 -msix_unuse_all_vectors(proxy-pci_dev);
 -}
 +virtio_pci_set_status(proxy, val);
  
  /* Linux before 2.6.34 sets the device as OK without enabling
 the PCI device bus master bit. In this case we need to disable
 @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
 hwaddr addr,
  }
  }
  
 +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
 +  unsigned size)
 +{
 +VirtIOPCIProxy *proxy = opaque;
 +VirtIODevice *vdev = proxy-vdev;
 +
 +uint64_t low = 0xull;
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

Oh dear no...  Please use defines like the rest of QEMU.

From a QEMU pov, take a look at:

https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659

And:

https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb

Which lets virtio-pci be subclassable and then remaps the config space to
BAR2.

Haven't looked at the proposed new ring layout yet.

Regards,

Anthony Liguori

 +case offsetof(struct virtio_pci_common_cfg, device_feature):
 +/* TODO: 64-bit features */
 + return proxy-device_feature_select ? 0 : proxy-host_features;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
 +return proxy-guest_feature_select;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature):
 +/* TODO: 64-bit features */
 + return proxy-guest_feature_select ? 0 : vdev-guest_features;
 +case offsetof(struct virtio_pci_common_cfg, msix_config):
 + return vdev-config_vector;
 +case offsetof(struct virtio_pci_common_cfg, num_queues):
 +/* TODO: more exact limit? */
 + return VIRTIO_PCI_QUEUE_MAX;
 +case offsetof(struct virtio_pci_common_cfg, device_status):
 +return vdev-status;
 +
 + /* About a specific virtqueue. */
 +case offsetof(struct virtio_pci_common_cfg, queue_select):
 +return  vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_size):
 +return virtio_queue_get_num(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
 + return virtio_queue_vector(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_enable):
 +/* TODO */
 + return 0;
 +case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
 +return vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_desc):
 +return virtio_queue_get_desc_addr(vdev, vdev-queue_sel

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
  @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
  hwaddr addr,
   }
   }
   
  +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
  +  unsigned size)
  +{
  +VirtIOPCIProxy *proxy = opaque;
  +VirtIODevice *vdev = proxy-vdev;
  +
  +uint64_t low = 0xull;
  +
  +switch (addr) {
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
 Oh dear no...  Please use defines like the rest of QEMU.

 Any good reason not to use offsetof?
 I see about 138 examples in qemu.

There are exactly zero:

$ find . -name *.c -exec grep -l case offset {} \;
$

 Anyway, that's the way Rusty wrote it in the kernel header -
 I started with defines.
 If you convince Rusty to switch I can switch too,

We have 300+ devices in QEMU that use #defines.  We're not using this
kind of thing just because you want to copy code from the kernel.

 https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
 
 And:
 
 https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
 
 Which lets virtio-pci be subclassable and then remaps the config space to
 BAR2.


 Interesting. Have the spec anywhere?

Not yet, but working on that.

 You are saying this is going to conflict because
 of BAR2 usage, yes?

No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
be the vram mapping.  It also had to be an MMIO bar.

The new layout might make it easier to implement a device like this.  I
shared it mainly because I wanted to show the subclassing idea vs. just
tacking an option onto the existing virtio-pci code in QEMU.

Regards,

Anthony Liguori

 So let's only do this virtio-fb only for new layout, so we don't need
 to maintain compatibility. In particular, we are working
 on making memory BAR access fast for virtio devices
 in a generic way. At the moment they are measureably slower
 than PIO on x86.


 Haven't looked at the proposed new ring layout yet.
 
 Regards,

 No new ring layout. It's new config layout.


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-24 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
 On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
  Hey guys,
  I've updated the kvm networking todo wiki with current projects.
  Will try to keep it up to date more often.
  Original announcement below.
 
 Thanks a lot. I've added the tasks I'm currently working on to the wiki.
 
 btw. I notice the virtio-net data plane were missed in the wiki. Is the
 project still being considered?

 It might have been interesting several years ago, but now that linux has
 vhost-net in kernel, the only point seems to be to
 speed up networking on non-linux hosts.

Data plane just means having a dedicated thread for virtqueue processing
that doesn't hold qemu_mutex.

Of course we're going to do this in QEMU.  It's a no brainer.  But not
as a separate device, just as an improvement to the existing userspace
virtio-net.

 Since non-linux does not have kvm, I doubt virtio is a bottleneck.

FWIW, I think what's more interesting is using vhost-net as a networking
backend with virtio-net in QEMU being what's guest facing.

In theory, this gives you the best of both worlds: QEMU acts as a first
line of defense against a malicious guest while still getting the
performance advantages of vhost-net (zero-copy).

 IMO yet another networking backend is a distraction,
 and confusing to users.
 In any case, I'd like to see virtio-blk dataplane replace
 non dataplane first. We don't want two copies of
 virtio-net in qemu.

100% agreed.

Regards,

Anthony Liguori


  
 
  I've put up a wiki page with a kvm networking todo list,
  mainly to avoid effort duplication, but also in the hope
  to draw attention to what I think we should try addressing
  in KVM:
 
  http://www.linux-kvm.org/page/NetworkingTodo
 
  This page could cover all networking related activity in KVM,
  currently most info is related to virtio-net.
 
  Note: if there's no developer listed for an item,
  this just means I don't know of anyone actively working
  on an issue at the moment, not that no one intends to.
 
  I would appreciate it if others working on one of the items on this list
  would add their names so we can communicate better.  If others like this
  wiki page, please go ahead and add stuff you are working on if any.
 
  It would be especially nice to add autotest projects:
  there is just a short test matrix and a catch-all
  'Cover test matrix with autotest', currently.
 
  Currently there are some links to Red Hat bugzilla entries,
  feel free to add links to other bugzillas.
 
  Thanks!
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
 
 Hi
 
 Please, send any topic that you are interested in covering.
 
 Thanks, Juan.

 Generating acpi tables.

 Cc'd a bunch of people who might be interested in this topic.

Unfortunately I have a conflict this morning so I won't be able to
join.  I just saw Kevin's response here from last week and I'll respond
to it later this morning.

Can we post the call for agenda for this call on Fridays in the future?
I need more than 24 hours to make sure to keep my calendar clear...

Regards,

Anthony Liguori


 Kevin - could you join on Tuesday? There appears a disconnect
 between the seabios and qemu that a conf call
 might help resolve.

 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
  
  Hi
  
  Please, send any topic that you are interested in covering.
  
  Thanks, Juan.
 
  Generating acpi tables.
 
  Cc'd a bunch of people who might be interested in this topic.
 
 Unfortunately I have a conflict this morning so I won't be able to
 join.  I just saw Kevin's response here from last week and I'll respond
 to it later this morning.

 Unfortunate.
 Let's talk about this on the next slot: next Tuesday, June 4 then.
 Could you keep your agenda clear on that day please?

Ack.

Perhaps we could move this call to bimonthly and cancel it less
frequently?  That will make it easier to reserve calendar time for it.


 Can we post the call for agenda for this call on Fridays in the future?
 I need more than 24 hours to make sure to keep my calendar clear...
 
 Regards,
 
 Anthony Liguori

 We don't work on Fridays in Israel so that means we'll only be able to
 respond Sunday, and you'll only see it Monday anyway.
 Setting agenda Thursday is probably too aggressive?

Maybe we could use a wiki page to setup a rolling agenda?

Regards,

Anthony Liguori


 
  Kevin - could you join on Tuesday? There appears a disconnect
  between the seabios and qemu that a conf call
  might help resolve.
 
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 21, 2013 at 09:29:07AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
   
   Hi
   
   Please, send any topic that you are interested in covering.
   
   Thanks, Juan.
  
   Generating acpi tables.
  
   Cc'd a bunch of people who might be interested in this topic.
  
  Unfortunately I have a conflict this morning so I won't be able to
  join.  I just saw Kevin's response here from last week and I'll respond
  to it later this morning.
 
  Unfortunate.
  Let's talk about this on the next slot: next Tuesday, June 4 then.
  Could you keep your agenda clear on that day please?
 
 Ack.
 
 Perhaps we could move this call to bimonthly and cancel it less
 frequently?  That will make it easier to reserve calendar time for it.

 I think you mean bi-weekly? If yes, ack.

I meant twice a month (or every other week).

Regards,

Anthony Liguori


 
  Can we post the call for agenda for this call on Fridays in the future?
  I need more than 24 hours to make sure to keep my calendar clear...
  
  Regards,
  
  Anthony Liguori
 
  We don't work on Fridays in Israel so that means we'll only be able to
  respond Sunday, and you'll only see it Monday anyway.
  Setting agenda Thursday is probably too aggressive?
 
 Maybe we could use a wiki page to setup a rolling agenda?
 
 Regards,
 
 Anthony Liguori
 
 
  
   Kevin - could you join on Tuesday? There appears a disconnect
   between the seabios and qemu that a conf call
   might help resolve.
  
   -- 
   MST
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes for 2013-04-23

2013-04-24 Thread Anthony Liguori
Eric Blake ebl...@redhat.com writes:

 On 04/23/2013 08:45 AM, Juan Quintela wrote:
 
 * 1.5 pending patches (paolo)
   anthony thinks nothing big is outstanding
   rdma: not probably for this release,  too big change on migration
   cpu-hotplug: andreas expect to get it for 1.5
 
 
 * What can libvirt expect in 1.5 for introspection of command-line support?
   command extensions?  libvirt want then
 * What are the rules for adding optional parameters to existing QMP
   commands?  Would it help if we had introspection of QMP commands?
   what are the options that each command support.
 
   command line could work for 1.5
 if we got patches on the next 2 days we can get it.

 Goal is to provide a QMP command that provides JSON representation of
 command line options; I will help review whatever is posted to make sure
 we like the interface.  Anthony agreed the implementation should be
 relatively straightforward and okay to add after soft freeze (but must
 be before hard freeze).  Libvirt has some code that would like to make
 use of the new command-line introspection; Osier will probably be the
 first libvirt developer taking advantage of it - if we can swing it,
 we'd like libvirt 1.0.5 to use the new command (libvirt freezes this
 weekend for a May 2 release).

   rest of introspection need 1.6
 it is challenging
 we are interesting into feature introspection
 and comand extensions?
 one command to return the schema?

 Anthony was okay with the idea of a full JSON introspection of all QMP
 commands, but it is probably too big to squeeze into 1.5 timeframe.
 Furthermore, while the command will be useful, we should always be
 thinking about API - having to parse through JSON to see if a feature is
 present is not always the nicest interface; when adding a new feature,
 consider improving an existing query-* or adding a counterpart new
 query-* command that makes it much easier to tell if a feature is
 available, without having to resort to a QMP introspection.

Ack.

One of the problems with using schema introspection for feature
detection is that there isn't always a 1-1 mapping.  You can imagine
that we have an optional parameter that gets added to a structure and is
initially tied to a specific feature but later gets used by another
feature.

If a distro backports the later and not the former, but a management
tool uses this field to probe for the former feature, it will result in
a false positive.

That's why a more direct feature negotiation mechanism is better IMHO.

Regards,

Anthony Liguori



   if we change a command,  how we change the interface without
   changing the c-api?

 c-api is not yet a strong consideration (but see [1] below).  Also,
 there may be ways to design a C api that is robust to extensions (but
 that means designing it into the QMP up front when adding a new
 command); there has been some list traffic on this thought.

 More importantly, adding an optional parameter to an existing command is
 not okay unless something else is also available to tell whether the
 feature is usable - QMP introspection would solve this, but is not
 necessarily the most elegant way.  For now, while adding QMP
 introspection is a good idea, we still want case-by-case review of any
 command extensions.

 
   we can change drive_mirror to use a new command to see if there
   are the new features.

 drive-mirror changed in 1.4 to add optional buf-size parameter; right
 now, libvirt is forced to limit itself to 1.3 interface (no buf-size or
 granularity) because there is no introspection and no query-* command
 that witnesses that the feature is present.  Idea was that we need to
 add a new query-drive-mirror-capabilities (name subject to bikeshedding)
 command into 1.5 that would let libvirt know that buf-size/granularity
 is usable (done right, it would also prevent the situation of buf-size
 being a write-only interface where it is set when starting the mirror
 but can not be queried later to see what size is in use).

 Unclear whether anyone was signing up to tackle the addition of a query
 command counterpart for drive-mirror in time for 1.5.

 
   if we have a stable c-api we can do test cases that work. 

 Having such a testsuite would make a stable C API more important.

 
 Eric will complete this with his undrestanding from libvirt point of
 view.

 Also under discussion: the existing QMP 'screendump' command is not
 ideal (not extensible, doesn't allow fd passing, hard-coded output
 format).  This was used as an example command that should not be
 extended until we have appropriate feature detection in place; probably
 easier to add a new QMP command than to add parameters to the existing
 one.  At any rate, we're late enough that 'screendump' probably won't be
 improved in 1.5, so we have the full 1.6 cycle to get it right.

 Not on the phone call, but a recent mail that is related to the topic -
 feature detection of whether dump-guest-memory supports paging:
 https

Re: [PATCH-v2 1/2] virtio-scsi: create VirtIOSCSICommon

2013-04-08 Thread Anthony Liguori
Nicholas A. Bellinger n...@linux-iscsi.org writes:

 From: Paolo Bonzini pbonz...@redhat.com

 This patch refactors existing virtio-scsi code into VirtIOSCSICommon
 in order to allow virtio_scsi_init_common() to be used by both internal
 virtio_scsi_init() and external vhost-scsi-pci code.

 Changes in Patch-v2:
- Move -get_features() assignment to virtio_scsi_init() instead of
  virtio_scsi_init_common()


Any reason we're not doing this as a QOM base class?

Similiar to how the in-kernel PIT/PIC work using a common base class...

Regards,

Anthony Liguori


 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Asias He as...@redhat.com
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  hw/virtio-scsi.c |  192 
 +-
  hw/virtio-scsi.h |  130 --
  include/qemu/osdep.h |4 +
  3 files changed, 178 insertions(+), 148 deletions(-)

 diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
 index 8620712..c59e9c6 100644
 --- a/hw/virtio-scsi.c
 +++ b/hw/virtio-scsi.c
 @@ -18,118 +18,6 @@
  #include hw/scsi.h
  #include hw/scsi-defs.h
  
 -#define VIRTIO_SCSI_VQ_SIZE 128
 -#define VIRTIO_SCSI_CDB_SIZE32
 -#define VIRTIO_SCSI_SENSE_SIZE  96
 -#define VIRTIO_SCSI_MAX_CHANNEL 0
 -#define VIRTIO_SCSI_MAX_TARGET  255
 -#define VIRTIO_SCSI_MAX_LUN 16383
 -
 -/* Response codes */
 -#define VIRTIO_SCSI_S_OK   0
 -#define VIRTIO_SCSI_S_OVERRUN  1
 -#define VIRTIO_SCSI_S_ABORTED  2
 -#define VIRTIO_SCSI_S_BAD_TARGET   3
 -#define VIRTIO_SCSI_S_RESET4
 -#define VIRTIO_SCSI_S_BUSY 5
 -#define VIRTIO_SCSI_S_TRANSPORT_FAILURE6
 -#define VIRTIO_SCSI_S_TARGET_FAILURE   7
 -#define VIRTIO_SCSI_S_NEXUS_FAILURE8
 -#define VIRTIO_SCSI_S_FAILURE  9
 -#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED   10
 -#define VIRTIO_SCSI_S_FUNCTION_REJECTED11
 -#define VIRTIO_SCSI_S_INCORRECT_LUN12
 -
 -/* Controlq type codes.  */
 -#define VIRTIO_SCSI_T_TMF  0
 -#define VIRTIO_SCSI_T_AN_QUERY 1
 -#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
 -
 -/* Valid TMF subtypes.  */
 -#define VIRTIO_SCSI_T_TMF_ABORT_TASK   0
 -#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET   1
 -#define VIRTIO_SCSI_T_TMF_CLEAR_ACA2
 -#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET   3
 -#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET  4
 -#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
 -#define VIRTIO_SCSI_T_TMF_QUERY_TASK   6
 -#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET   7
 -
 -/* Events.  */
 -#define VIRTIO_SCSI_T_EVENTS_MISSED0x8000
 -#define VIRTIO_SCSI_T_NO_EVENT 0
 -#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
 -#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
 -#define VIRTIO_SCSI_T_PARAM_CHANGE 3
 -
 -/* Reasons for transport reset event */
 -#define VIRTIO_SCSI_EVT_RESET_HARD 0
 -#define VIRTIO_SCSI_EVT_RESET_RESCAN   1
 -#define VIRTIO_SCSI_EVT_RESET_REMOVED  2
 -
 -/* SCSI command request, followed by data-out */
 -typedef struct {
 -uint8_t lun[8];  /* Logical Unit Number */
 -uint64_t tag;/* Command identifier */
 -uint8_t task_attr;   /* Task attribute */
 -uint8_t prio;
 -uint8_t crn;
 -uint8_t cdb[];
 -} QEMU_PACKED VirtIOSCSICmdReq;
 -
 -/* Response, followed by sense data and data-in */
 -typedef struct {
 -uint32_t sense_len;  /* Sense data length */
 -uint32_t resid;  /* Residual bytes in data buffer */
 -uint16_t status_qualifier;   /* Status qualifier */
 -uint8_t status;  /* Command completion status */
 -uint8_t response;/* Response values */
 -uint8_t sense[];
 -} QEMU_PACKED VirtIOSCSICmdResp;
 -
 -/* Task Management Request */
 -typedef struct {
 -uint32_t type;
 -uint32_t subtype;
 -uint8_t lun[8];
 -uint64_t tag;
 -} QEMU_PACKED VirtIOSCSICtrlTMFReq;
 -
 -typedef struct {
 -uint8_t response;
 -} QEMU_PACKED VirtIOSCSICtrlTMFResp;
 -
 -/* Asynchronous notification query/subscription */
 -typedef struct {
 -uint32_t type;
 -uint8_t lun[8];
 -uint32_t event_requested;
 -} QEMU_PACKED VirtIOSCSICtrlANReq;
 -
 -typedef struct {
 -uint32_t event_actual;
 -uint8_t response;
 -} QEMU_PACKED VirtIOSCSICtrlANResp;
 -
 -typedef struct {
 -uint32_t event;
 -uint8_t lun[8];
 -uint32_t reason;
 -} QEMU_PACKED VirtIOSCSIEvent;
 -
 -typedef struct {
 -uint32_t num_queues;
 -uint32_t seg_max;
 -uint32_t max_sectors;
 -uint32_t cmd_per_lun;
 -uint32_t event_info_size;
 -uint32_t sense_size;
 -uint32_t cdb_size;
 -uint16_t max_channel;
 -uint16_t max_target;
 -uint32_t max_lun

Re: KVH call agenda for 2013-02-05

2013-02-05 Thread Anthony Liguori
Juan Quintela quint...@redhat.com writes:

 Hi

 Please send in any agenda topics you are interested in.

FYI, I have a conflict for today so I won't be able to attend.

Regards,

Anthony Liguori


 Later, Juan.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 RESEND 00/22] Multiqueue virtio-net

2013-02-04 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 00/22] Multiqueue virtio-net

2013-02-04 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jan 30, 2013 at 11:48:14AM +, Peter Maydell wrote:
 On 30 January 2013 11:39, Andreas Färber afaer...@suse.de wrote:
  Proposal by hpoussin was to move _list_add() code to ISADevice:
  http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
 
  Concerns:
  * PCI devices (VGA, QXL) register I/O ports as well
= above patches add dependency on ISABus to machines
   - benh no mac ever had one
= PCIDevice shouldn't use ISA API with NULL ISADevice
  * Lack of avi: Who decides about memory API these days?
 
  armbru and agraf concluded that moving this into ISA is wrong.
 
  = I will drop the remaining ioport patches from above series.
 
  Suggestions on how to proceed with tackling the issue are welcome.
 
 How does this stuff work on real hardware? I would have
 expected that a PCI device registering the fact it has
 IO ports would have to do so via the PCI controller it
 is plugged into...

 All programming is done by the OS, devices do not register
 with controller.

 Each bridge has two ways to claim an IO transaction:
 - transaction is within the window programmed in the bridge
 - subtractive decoding enabled and no one else claims the transaction

And there can only be one endpoint that accepts subtractive decoding and
this is usually the ISA bridge.

Also note that there are some really special cases with PCI.  The legacy
VGA ports are always routed to the first device with a DISPLAY class
type.

Likewise, with legacy IDE ports are routed to the first device with an
IDE class.  That's the only reason you can have these legacy devices not
behind the ISA bridge.

Regards,

Anthony Liguori


 At the bus level, transaction happens on a bus and an appropriate device
 will claim it.

 My naive don't-know-much-about-portio suggestion is that this
 should work the same way as memory regions: each device
 provides portio regions, and the controller for the bus
 (ISA or PCI) exposes those to the next layer up, and
 something at board level maps it all into the right places.
 
 -- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Andreas Färber afaer...@suse.de writes:

 Am 29.01.2013 16:41, schrieb Juan Quintela:
 * Portio port to new memory regions?
   Andreas, could you fill?

 MemoryRegion's .old_portio mechanism requires workarounds for VGA on
 ppc, affecting among others the sPAPR PCI host bridge:
 http://git.qemu.org/?p=qemu.git;a=commit;h=a3cfa18eb075c7ef78358ca1956fe7b01caa1724

 Patches were posted and merged removing all .old_portio users but one:
 hw/ioport.c:portio_list_add_1(), used by portio_list_add()

 hw/isa-bus.c:portio_list_add(piolist, isabus-address_space_io, start);
 hw/qxl.c:portio_list_add(qxl_vga_port_list,
 pci_address_space_io(dev), 0x3b0);
 hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);
 hw/vga.c:portio_list_add(vbe_port_list, address_space_io, 0x1ce);

 Proposal by hpoussin was to move _list_add() code to ISADevice:
 http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html

Okay, a couple things here:

There is no such thing as PIO as a general concept.  What leaves the
CPU and what a bus interprets are totally different things.

An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
the top bit is set determines whether it's a PIO transaction or an
MMIO transaction.  A large chunk of that address space is invalid of
course.

PCI has a 65 bit address space too.  The 65th bit determines whether
it's an IO transaction or an MMIO transaction.

For architectures that only have a 64-bit address space, what the PCI
controller typically does is pick a 16-bit window within that address
space to map to a PCI address with the 65th bit set.

Within the PCI bus, transactions are usually routed to devices via
positive decoding.  The device lists what address regions it wants to
handle (via BARs) and the PCI bus uses those to determine who to send
transactions to.

There are some exceptions though.  Specifically:

1) A chipset will route any non-positively decoded IO transaction (65th
   bit set) to a single end point (usually the ISA-bridge).  Which one it
   chooses is up to the chipset.  This is called subtractive decoding
   because the PCI bus will wait multiple cycles for that device to
   claim the transaction before bouncing it.

2) There are special hacks in most PCI chipsets to route very specific
   addresses ranges to certain devices.  Namely, legacy VGA IO transactions
   go to the first VGA device.  Legacy IDE IO transactions go to the first
   IDE device.  This doesn't need to be programmed in the BARs.  It will
   just happen.

3) As it turns out, all legacy PIIX3 devices are positively decoded and
   sent to the ISA-bridge (because it's faster this way).

Notice the lack of the word ISA in all of this other than describing
the PCI class of an end point.

So how should this be modeled?

On x86, the CPU has a pio address space.  That can propagate down
through the PCI bus which is what we do today.

On !x86, the PCI controller ought to setup a MemoryRegion for downstream
PIO that devices can use to register on.

We probably need to do something like change the PCI VGA devices to
export a MemoryRegion and allow the PCI controller to device how to
register that as a subregion.

Regards,

Anthony Liguori


 Concerns:
 * PCI devices (VGA, QXL) register I/O ports as well
   = above patches add dependency on ISABus to machines
  - benh no mac ever had one
   = PCIDevice shouldn't use ISA API with NULL ISADevice
 * Lack of avi: Who decides about memory API these days?

 armbru and agraf concluded that moving this into ISA is wrong.

 = I will drop the remaining ioport patches from above series.

 Suggestions on how to proceed with tackling the issue are welcome.

 Regards,
 Andreas

 -- 
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] What to do about non-qdevified devices?

2013-01-30 Thread Anthony Liguori
Markus Armbruster arm...@redhat.com writes:

 Peter Maydell peter.mayd...@linaro.org writes:

 On 30 January 2013 07:02, Markus Armbruster arm...@redhat.com wrote:
 Anthony Liguori aligu...@us.ibm.com writes:

 [...]
 The problems I ran into were (1) this is a lot of work (2) it basically
 requires that all bus children have been qdev/QOM-ified.  Even with
 something like the ISA bus which is where I started, quite a few devices
 were not qdevified still.

 So what's the plan to complete the qdevification job?  Lay really low
 and quietly hope the problem goes away?  We've tried that for about
 three years, doesn't seem to work.

 Do we have a list of not-yet-qdevified devices? Maybe we need to
 start saying fix X Y and Z or platform P is dropped from the next
 release. (This would of course be easier if we had a way to let users
 know that platform P was in danger...)

 I think that's a good idea.  Only problem is identifying pre-qdev
 devices in the code requires code inspection (grep won't do, I'm
 afraid).

 If we agree on a qdevify or else plan, I'd be prepared to help with
 the digging up of devices.

That's a nice thought, but we're not going to rip out dma.c and break
every PC target.

But I will help put together a list of devices that need converting.  I
have patches actually for most of the PC devices.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU buildbot maintenance state

2013-01-30 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel
 and Christian?  It would be awesome if you could do this given your
 experience running and customizing buildbot.

 I'll try to set aside some time for that.  Christians idea to host the
 config at github is good, that certainly makes it easier to balance
 things to more people.

 Another thing which would be helpful:  Any chance we can setup a
 maintainer tree mirror @ git.qemu.org?  A single repository where each
 maintainer tree shows up as a branch?

I will setup a tree based on the 'T:' fields in MAINTAINERS.  So if you
want your tree to be part of buildbot, please make sure that you have a
correct entry in MAINTAINERS.

Regards,

Anthony Liguori


 This would make the buildbot setup *alot* easier.  We can go for a
 AnyBranchScheduler then with BuildFactory and BuildConfig shared,
 instead of needing one BuildFactory and BuildConfig per branch.  Also
 makes the buildbot web interface less cluttered as we don't have a
 insane amount of BuildConfigs any more.  And saves some resources
 (bandwidth + diskspace) for the buildslaves.

 I think people who want to look what is coming or who want to test stuff
 cooking it would be a nice service too if they have a one-stop shop
 where they can get everything.

 cheers,
   Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Markus Armbruster arm...@redhat.com writes:

 Peter Maydell peter.mayd...@linaro.org writes:

 On 30 January 2013 11:39, Andreas Färber afaer...@suse.de wrote:
 Proposal by hpoussin was to move _list_add() code to ISADevice:
 http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html

 Concerns:
 * PCI devices (VGA, QXL) register I/O ports as well
   = above patches add dependency on ISABus to machines
  - benh no mac ever had one
   = PCIDevice shouldn't use ISA API with NULL ISADevice
 * Lack of avi: Who decides about memory API these days?

 armbru and agraf concluded that moving this into ISA is wrong.

 = I will drop the remaining ioport patches from above series.

 Suggestions on how to proceed with tackling the issue are welcome.

 How does this stuff work on real hardware? I would have
 expected that a PCI device registering the fact it has
 IO ports would have to do so via the PCI controller it
 is plugged into...

 My naive don't-know-much-about-portio suggestion is that this
 should work the same way as memory regions: each device
 provides portio regions, and the controller for the bus
 (ISA or PCI) exposes those to the next layer up, and
 something at board level maps it all into the right places.

 Makes sense me, but I'm naive, too :)

 For me, I/O ports are just an alternate address space some devices
 have.  For instance, x86 CPUs have an extra pin for selecting I/O
 vs. memory address space.  The ISA bus has separate read/write pins for
 memory and I/O.

 This isn't terribly special.  Mapping address spaces around is what
 devices bridging buses do.

 I'd expect a system bus for an x86 CPU to have both a memory and an I/O
 address space.

There is no such thing as a system bus.

There is a bus that links the CPUs to each other and to the North
Bridge.  This is QPI on modern systems.

Sometimes there's a bus to link the North Bridge to the South Bridge.
On modern systems, this is QPI.  On the i440fx, the i440fx is both the
South Bridge and North Bridge and the link between the two is internal
to the chip.  The South Bridge may then export one or more downstream
interfaces.  In the i440fx, it only exports PCI.

Behind the PCI bus, there may be bridges.  On the i440fx, there is a ISA
Bridge which also acts as a Super I/O chip.  It exposes a downstream ISA
bus.

sysbus is a relic of poor modeling.  A major milestone in QEMU's
evolution will be when sysbus is completely removed.

Regards,

Anthony Liguori


 I'd expect an ISA PC's sysbus - ISA bridge to map both directly.

 I'd expect an ISA bridge for a sysbus without a separate I/O address
 space to map the ISA I/O address space into the sysbus's normal address
 space somehow.

 PCI ISA bridges have their own rules, but I've gotten away with ignoring
 the details so far :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 hw/qxl.c:portio_list_add(qxl_vga_port_list,
 pci_address_space_io(dev), 0x3b0);
 hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);

 That reminds me I should solve this in a more elegant way.

 qxl takes over the vga io ports.  The reason it does this is because qxl
 switches into vga mode in case the vga ports are accessed while not in
 vga mode.  After doing the check (and possibly switching mode) the vga
 handler is called to actually handle it.

The best way to handle this would be to remodel how we do VGA.

Make VGACommonState a proper QOM object and use it as the base class for
QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

The VGA accessors should be exposed as a memory region but the sub class
ought to be responsible for actually adding it to a subregion.


 That twist makes it a bit hard to convert vga ...

 Anyone knows how one would do that with the memory api instead? I think
 taking over the ports is easy as the memory regions have priorities so I
 can simply register a region with higher priority. I have no clue how to
 forward the access to the vga code though.


That should be possible with priorities, but I think it's wrong.  There
aren't two VGA devices.  QXL is-a VGA device and the best way to
override behavior of base VGA device is through polymorphism.

This isn't really a memory API issue, it's a modeling issue.

Regards,

Anthony Liguori

 Anyone has clues / suggestions?

 thanks,
   Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Andreas Färber afaer...@suse.de writes:

 Am 30.01.2013 17:33, schrieb Anthony Liguori:
 Gerd Hoffmann kra...@redhat.com writes:
 
 hw/qxl.c:portio_list_add(qxl_vga_port_list,
 pci_address_space_io(dev), 0x3b0);
 hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);

 That reminds me I should solve this in a more elegant way.

 qxl takes over the vga io ports.  The reason it does this is because qxl
 switches into vga mode in case the vga ports are accessed while not in
 vga mode.  After doing the check (and possibly switching mode) the vga
 handler is called to actually handle it.
 
 The best way to handle this would be to remodel how we do VGA.
 
 Make VGACommonState a proper QOM object and use it as the base class for
 QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

 That would require polymorphism since we already need to derive from
 PCIDevice or ISADevice respectively for interfacing with the bus...

Nope.  You can use composition:

QXLDevice is-a VGACommonState

QXLPCI is-a PCIDevice
   has-a QXLDevice

 Modern object-oriented languages have tried to avoid multi-inheritence
 due to arising complications, I thought. Wouldn't object if someone
 wanted to do the dirty implementation work though. ;)

There is no need for MI.

 Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
 sharing an EHCIState struct and having helper functions operating on
 that core state only. Quite a few device share such a pattern today
 actually (serial, m48t59, ...).

Yes, this is all about chipset modelling.  Chipsets should derive from
device and then be embedded in the appropriate bus device.

For instance.

SerialState is-a DeviceState

ISASerialState is-a ISADevice, has-a SerialState
MMIOSerialState is-a SysbusDevice, has-a SerialState

This is what we're doing in practice, we just aren't modeling the
chipsets and we're open coding the relationships (often in subtley
different ways).

Regards,

Anthony Liguori

 The VGA accessors should be exposed as a memory region but the sub class
 ought to be responsible for actually adding it to a subregion.
 

 That twist makes it a bit hard to convert vga ...

 Anyone knows how one would do that with the memory api instead? I think
 taking over the ports is easy as the memory regions have priorities so I
 can simply register a region with higher priority. I have no clue how to
 forward the access to the vga code though.

 
 That should be possible with priorities, but I think it's wrong.  There
 aren't two VGA devices.  QXL is-a VGA device and the best way to
 override behavior of base VGA device is through polymorphism.

 In this particular case QXL is-a PCI VGA device though, so we can
 decouple it from core VGA modeling. Placing the MemoryRegionOps inside
 the Class (rather than static const) might be a short-term solution for
 overriding read/write handlers of a particular VGA MemoryRegion. :)

 Cheers,
 Andreas

 This isn't really a memory API issue, it's a modeling issue.
 
 Regards,
 
 Anthony Liguori
 
 Anyone has clues / suggestions?

 thanks,
   Gerd

 -- 
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
 An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
 the top bit is set determines whether it's a PIO transaction or an
 MMIO transaction.  A large chunk of that address space is invalid of
 course.
 
 PCI has a 65 bit address space too.  The 65th bit determines whether
 it's an IO transaction or an MMIO transaction.

 This is somewhat an over simplification since IO and MMIO differs in
 other ways, such as ordering rules :-) But for the sake of memory
 regions decoding I suppose it will do.

 For architectures that only have a 64-bit address space, what the PCI
 controller typically does is pick a 16-bit window within that address
 space to map to a PCI address with the 65th bit set.

 Sort-of yes. The window doesn't have to be 16-bit (we commonly have
 larger IO space windows on powerpc) and there's a window per host
 bridge, so there's effectively more than one IO space (as there is more
 than one PCI MMIO space, with only a window off the CPU space routed to
 each brigde).

Ack.

 Making a hard wired assumption that the PCI (MMIO and IO) space relates
 directly to the CPU bus space is wrong on pretty much all !x86
 architectures.

Ack.


  .../...

 You make it sound like substractive decode is a chipset hack. It's not,
 it's specified in the PCI spec.

It's a hack :-)  It's a well specified hack, but it's still a hack.

 1) A chipset will route any non-positively decoded IO transaction (65th
bit set) to a single end point (usually the ISA-bridge).  Which one it
chooses is up to the chipset.  This is called subtractive decoding
because the PCI bus will wait multiple cycles for that device to
claim the transaction before bouncing it.

 This is not a chipset matter. It's the ISA bridge itself that does
 substractive decoding.

The PCI bus can have one end point that that can be the target for
subtractive decoding (not hard decoding, subtractive decoding).  IOW,
you can only have a single ISA Bridge within a single PCI domain.

You are right--chipset is the wrong word.  I'm used to thinking in terms
of only a single domain :-)

 There also exists P2P bridges doing such substractive
 decoding, this used to be fairly common with transparent bridges used for
 laptop docking.

I'm not sure I understand how this would work.  How can two devices on
the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
even says:

Subtractive decoding can be implemented by only one device on the bus
 since it accepts all accesses not positively decoded by some other
 agent.

 2) There are special hacks in most PCI chipsets to route very specific
addresses ranges to certain devices.  Namely, legacy VGA IO transactions
go to the first VGA device.  Legacy IDE IO transactions go to the first
IDE device.  This doesn't need to be programmed in the BARs.  It will
just happen.

 This is also mostly not a hack in the chipset. It's a well defined behaviour
 for legacy devices, sometimes call hard decoding. Of course often those 
 devices
 are built into the chipset but they don't have to. Plug-in VGA devices will
 hard decode legacy VGA regions for both IO and MMIO by default (this can be
 disabled on most of them nowadays) for example. This has nothing to do with
 the chipset.

So I understand what you're saying re: PCI because the devices actually
assert DEVSEL to indicate that they handle the transaction.

But for PCI-E, doesn't the controller have to expressly identify what
the target is?  Is this done with the device class?

 There's a specific bit in P2P bridge to control the forwarding of legacy
 transaction downstream (and VGA palette snoops), this is also fully specified
 in the PCI spec.

Ack.


 3) As it turns out, all legacy PIIX3 devices are positively decoded and
sent to the ISA-bridge (because it's faster this way).

 Chipsets don't send to a bridge. It's the bridge itself that
 decodes.

With PCI...

 Notice the lack of the word ISA in all of this other than describing
 the PCI class of an end point.

 ISA is only relevant to the extent that the legacy regions of IO space
 originate from the original ISA addresses of devices (VGA, IDE, etc...)
 and to the extent that an ISA bus might still be present which will get
 the transactions that nothing else have decoded in that space.

Ack.

  
 So how should this be modeled?
 
 On x86, the CPU has a pio address space.  That can propagate down
 through the PCI bus which is what we do today.
 
 On !x86, the PCI controller ought to setup a MemoryRegion for
 downstream
 PIO that devices can use to register on.
 
 We probably need to do something like change the PCI VGA devices to
 export a MemoryRegion and allow the PCI controller to device how to
 register that as a subregion.

 The VGA device should just register fixed address port IOs the same way
 it would register an IO BAR. Essentially

Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 On Wed, 2013-01-30 at 17:54 +0100, Andreas Färber wrote:
 
 That would require polymorphism since we already need to derive from
 PCIDevice or ISADevice respectively for interfacing with the bus...
 Modern object-oriented languages have tried to avoid multi-inheritence
 due to arising complications, I thought. Wouldn't object if someone
 wanted to do the dirty implementation work though. ;)
 
 Another such example is EHCI, with PCIDevice and SysBusDevice
 frontends,
 sharing an EHCIState struct and having helper functions operating on
 that core state only. Quite a few device share such a pattern today
 actually (serial, m48t59, ...).

 This is a design bug of your model :-) You shouldn't derive from your
 bus interface IMHO but from your functional interface, and have an
 ownership relation to the PCIDevice (a bit like IOKit does if my memory
 serves me well).

Ack.  Hence:

SerialPCIDevice is-a PCIDevice
   has-a SerialChipset

The board that exports a bus interface is one object.  The chipset that
implements the functionality is another object.

The former's job in life is to map the bus interface to whatever
interface the functional object expects.  In most cases, this is just a
straight forward proxy of a MemoryRegion.  Sometimes this involves
address shifting, etc.

Regards,

Anthony Liguori


 Cheers,
 Ben.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes 2013-01-29

2013-01-29 Thread Anthony Liguori
Paolo Bonzini pbonz...@redhat.com writes:

 Il 29/01/2013 16:41, Juan Quintela ha scritto:
 * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
   future. (stefan)
 
   Add checks for fd's bigger than 1024? multifunction devices uses lot
   of fd's for device.
 
   Portability?
   Use glib?  and let it use poll underneath.
   slirp is a problem.
   in the end loop: moving to a glib event loop, how we arrive there is the 
 discussion.

 We can use g_poll while keeping the main-loop.c wrappers around the glib
 event loop.  Both slirp and iohandler.c access the fd_sets randomly, so
 we need to remember some state between the fill and poll functions.  We
 can use two main-loop.c functions:

 int qemu_add_poll_fd(int fd, int events);

   select: writes the events into three fd_sets, returns the file
   descriptor itself

   poll: writes a GPollFD into a dynamically-sized array (of GPollFDs)
   and returns the index in the array.

 int qemu_get_poll_fd_revents(int index);

   select: takes the file descriptor (returned by qemu_add_poll_fd),
   makes up revents based on the three fd_sets

   poll: takes the index into the array and returns the corresponding
   revents

 iohandler.c can simply store the index into struct IOHandlerRecord, and
 use it later.  slirp can do the same for struct socket.

 The select code can be kept for Windows after POSIX switches to poll.

Doesn't g_poll already do this under the covers for Windows?

Regards,

Anthony Liguori


 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes 2013-01-29

2013-01-29 Thread Anthony Liguori
Alexander Graf ag...@suse.de writes:

 On 01/29/2013 04:41 PM, Juan Quintela wrote:
Alex will fill this

 When using -device, we can not specify an IRQ line to attach to the 
 device. This works for some special buses like PCI, but not in the 
 generic case. We need it generically for virtio-mmio and for potential 
 platform assigned vfio devices though.

 The conclusion we came up with was that in order to model IRQ lines 
 between arbitrary devices, we should use QOM and the QOM name space. 
 Details are up for Anthony to fill in :).

Oh good :-)  This is how far I got since I last touched this problem.

https://github.com/aliguori/qemu/commits/qom-pin.4

qemu_irq is basically foreign to QOM/qdev.  There are two things I did
to solve this.  The first is to have a stateful Pin object.  Stateful is
important because qemu_irq is totally broken wrt reset and live
migration as it stands today.

It's pretty easy to have a Pin object that can connect to a qemu_irq
source or sink which means we can incrementally refactor by first
converting each device under a bus to using Pins (using the qemu_irq
connect interface to maintain compat) until the bus controller can be
converted to export Pins allowing a full switch to using Pins only for
that bus.

The problems I ran into were (1) this is a lot of work (2) it basically
requires that all bus children have been qdev/QOM-ified.  Even with
something like the ISA bus which is where I started, quite a few devices
were not qdevified still.

I'm not going to be able to work on this in the foreseeable future, but
if someone wants to take it over, I'd be happy to provide advice.

I'm also open to other approaches that require less refactoring but I
honestly don't know that there is a way to avoid the heavy lifting.

Regards,

Anthony Liguori



 Alex

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V2 11/20] tap: support enabling or disabling a queue

2013-01-29 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Tue, Jan 29, 2013 at 08:10:26PM +, Blue Swirl wrote:
 On Tue, Jan 29, 2013 at 1:50 PM, Jason Wang jasow...@redhat.com wrote:
  On 01/26/2013 03:13 AM, Blue Swirl wrote:
  On Fri, Jan 25, 2013 at 10:35 AM, Jason Wang jasow...@redhat.com wrote:
  This patch introduce a new bit - enabled in TAPState which tracks 
  whether a
  specific queue/fd is enabled. The tap/fd is enabled during 
  initialization and
  could be enabled/disabled by tap_enalbe() and tap_disable() which calls 
  platform
  specific helpers to do the real work. Polling of a tap fd can only done 
  when
  the tap was enabled.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
   include/net/tap.h |2 ++
   net/tap-win32.c   |   10 ++
   net/tap.c |   43 ---
   3 files changed, 52 insertions(+), 3 deletions(-)
 
  diff --git a/include/net/tap.h b/include/net/tap.h
  index bb7efb5..0caf8c4 100644
  --- a/include/net/tap.h
  +++ b/include/net/tap.h
  @@ -35,6 +35,8 @@ int tap_has_vnet_hdr_len(NetClientState *nc, int len);
   void tap_using_vnet_hdr(NetClientState *nc, int using_vnet_hdr);
   void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, 
  int ecn, int ufo);
   void tap_set_vnet_hdr_len(NetClientState *nc, int len);
  +int tap_enable(NetClientState *nc);
  +int tap_disable(NetClientState *nc);
 
   int tap_get_fd(NetClientState *nc);
 
  diff --git a/net/tap-win32.c b/net/tap-win32.c
  index 265369c..a2cd94b 100644
  --- a/net/tap-win32.c
  +++ b/net/tap-win32.c
  @@ -764,3 +764,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int 
  len)
   {
   assert(0);
   }
  +
  +int tap_enable(NetClientState *nc)
  +{
  +assert(0);
  abort()
 
  This is just to be consistent with the reset of the helpers in this file.
 
  +}
  +
  +int tap_disable(NetClientState *nc)
  +{
  +assert(0);
  +}
  diff --git a/net/tap.c b/net/tap.c
  index 67080f1..95e557b 100644
  --- a/net/tap.c
  +++ b/net/tap.c
  @@ -59,6 +59,7 @@ typedef struct TAPState {
   unsigned int write_poll : 1;
   unsigned int using_vnet_hdr : 1;
   unsigned int has_ufo: 1;
  +unsigned int enabled : 1;
  bool without bit field?
 
  Also to be consistent with other field. If you wish I can send patches
  to convert all those bit field to bool on top of this series.
 
 That would be nice, likewise for the assert(0).

 OK so let's go ahead with this patchset as is,
 and a cleanup patch will be send after 1.4 then.

Why?  I'd prefer that we didn't rush things into 1.4 just because.
There's still ample time to respin a corrected series.

Regards,

Anthony Liguori



 
  Thanks
   VHostNetState *vhost_net;
   unsigned host_vnet_hdr_len;
   } TAPState;
  @@ -72,9 +73,9 @@ static void tap_writable(void *opaque);
   static void tap_update_fd_handler(TAPState *s)
   {
   qemu_set_fd_handler2(s-fd,
  - s-read_poll  ? tap_can_send : NULL,
  - s-read_poll  ? tap_send : NULL,
  - s-write_poll ? tap_writable : NULL,
  + s-read_poll  s-enabled ? tap_can_send : 
  NULL,
  + s-read_poll  s-enabled ? tap_send : 
  NULL,
  + s-write_poll  s-enabled ? tap_writable : 
  NULL,
s);
   }
 
  @@ -339,6 +340,7 @@ static TAPState *net_tap_fd_init(NetClientState 
  *peer,
   s-host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
   s-using_vnet_hdr = 0;
   s-has_ufo = tap_probe_has_ufo(s-fd);
  +s-enabled = 1;
   tap_set_offload(s-nc, 0, 0, 0, 0, 0);
   /*
* Make sure host header length is set correctly in tap:
  @@ -737,3 +739,38 @@ VHostNetState *tap_get_vhost_net(NetClientState *nc)
   assert(nc-info-type == NET_CLIENT_OPTIONS_KIND_TAP);
   return s-vhost_net;
   }
  +
  +int tap_enable(NetClientState *nc)
  +{
  +TAPState *s = DO_UPCAST(TAPState, nc, nc);
  +int ret;
  +
  +if (s-enabled) {
  +return 0;
  +} else {
  +ret = tap_fd_enable(s-fd);
  +if (ret == 0) {
  +s-enabled = 1;
  +tap_update_fd_handler(s);
  +}
  +return ret;
  +}
  +}
  +
  +int tap_disable(NetClientState *nc)
  +{
  +TAPState *s = DO_UPCAST(TAPState, nc, nc);
  +int ret;
  +
  +if (s-enabled == 0) {
  +return 0;
  +} else {
  +ret = tap_fd_disable(s-fd);
  +if (ret == 0) {
  +qemu_purge_queued_packets(nc);
  +s-enabled = 0;
  +tap_update_fd_handler(s);
  +}
  +return ret;
  +}
  +}
  --
  1.7.1
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body

Re: KVM call agenda for 2013-01-29

2013-01-28 Thread Anthony Liguori
Juan Quintela quint...@redhat.com writes:

 Hi

 Please send in any agenda topics you are interested in.

 - Outstanding virtio work for 1.4
   - Multiqueue virtio-net (Amos/Michael)
   - Refactorings (Fred/Peter)
   - virtio-ccw (Cornelia/Alex)

We need to work out the ordering here and what's reasonable to merge
over the next week.

Regards,

Anthony Liguori


 Later, Juan.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 16/19] target-ppc: Refactor debug output macros

2013-01-27 Thread Anthony Liguori
Andreas Färber afaer...@suse.de writes:

 Make debug output compile-testable even if disabled.

 Inline DEBUG_OP check in excp_helper.c.
 Inline LOG_MMU_STATE() in mmu_helper.c.
 Inline PPC_DEBUG_SPR check in translate_init.c.

 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  target-ppc/excp_helper.c|   22 +++
  target-ppc/kvm.c|9 ++-
  target-ppc/mem_helper.c |2 --
  target-ppc/mmu_helper.c |   63 
 +--
  target-ppc/translate.c  |   12 -
  target-ppc/translate_init.c |   10 +++
  6 Dateien geändert, 55 Zeilen hinzugefügt(+), 63 Zeilen entfernt(-)

 diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
 index 0a1ac86..54722c4 100644
 --- a/target-ppc/excp_helper.c
 +++ b/target-ppc/excp_helper.c
 @@ -21,14 +21,14 @@
  
  #include helper_regs.h
  
 -//#define DEBUG_OP
 -//#define DEBUG_EXCEPTIONS
 +#define DEBUG_OP 0
 +#define DEBUG_EXCEPTIONS 0
  
 -#ifdef DEBUG_EXCEPTIONS
 -#  define LOG_EXCP(...) qemu_log(__VA_ARGS__)
 -#else
 -#  define LOG_EXCP(...) do { } while (0)
 -#endif
 +#define LOG_EXCP(...) G_STMT_START \
 +if (DEBUG_EXCEPTIONS) { \
 +qemu_log(__VA_ARGS__); \
 +} \
 +G_STMT_END

Just thinking out loud a bit..  This form becomes pretty common and it's
ashame to use a macro here if we don't have to.

I think:

static inline void LOG_EXCP(const char *fmt, ...)
{
if (debug_exceptions) {
   va_list ap;
   va_start(ap, fmt);
   qemu_logv(fmt, ap);
   va_end(ap);
}
}

Probably would have equivalent performance.  debug_exception would be
read-mostly and ought to be very predictable as a result.  I strongly
expect that the compiler would actually inline LOG_EXCP too.

I see LOG_EXCP and LOG_DIS in this series.  Perhaps we could just
introduce these functions and then make these flags run-time
controllable?

BTW, one advantage of this over your original proposal back to your
point is that you still won't catch linker errors with your proposal.
Dead code eliminate will kill off those branches before the linker ever
sees them.

Regards,

Anthony Liguori

  
  
 /*/
  /* PowerPC Hypercall emulation */
 @@ -777,7 +777,7 @@ void ppc_hw_interrupt(CPUPPCState *env)
  }
  #endif /* !CONFIG_USER_ONLY */
  
 -#if defined(DEBUG_OP)
 +#ifndef CONFIG_USER_ONLY
  static void cpu_dump_rfi(target_ulong RA, target_ulong msr)
  {
  qemu_log(Return from exception at  TARGET_FMT_lx  with flags 
 @@ -835,9 +835,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
 nip, target_ulong msr,
  /* XXX: beware: this is false if VLE is supported */
  env-nip = nip  ~((target_ulong)0x0003);
  hreg_store_msr(env, msr, 1);
 -#if defined(DEBUG_OP)
 -cpu_dump_rfi(env-nip, env-msr);
 -#endif
 +if (DEBUG_OP) {
 +cpu_dump_rfi(env-nip, env-msr);
 +}
  /* No need to raise an exception here,
   * as rfi is always the last insn of a TB
   */
 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 2f4f068..0dc6657 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -37,15 +37,10 @@
  #include hw/spapr.h
  #include hw/spapr_vio.h
  
 -//#define DEBUG_KVM
 +#define DEBUG_KVM 0
  
 -#ifdef DEBUG_KVM
  #define dprintf(fmt, ...) \
 -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
 -#else
 -#define dprintf(fmt, ...) \
 -do { } while (0)
 -#endif
 +do { if (DEBUG_KVM) { fprintf(stderr, fmt, ## __VA_ARGS__); } } while (0)
  
  #define PROC_DEVTREE_CPU  /proc/device-tree/cpus/
  
 diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
 index 902b1cd..5c7a5ce 100644
 --- a/target-ppc/mem_helper.c
 +++ b/target-ppc/mem_helper.c
 @@ -26,8 +26,6 @@
  #include exec/softmmu_exec.h
  #endif /* !defined(CONFIG_USER_ONLY) */
  
 -//#define DEBUG_OP
 -
  
 /*/
  /* Memory load and stores */
  
 diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
 index ee168f1..9340fbb 100644
 --- a/target-ppc/mmu_helper.c
 +++ b/target-ppc/mmu_helper.c
 @@ -21,39 +21,36 @@
  #include sysemu/kvm.h
  #include kvm_ppc.h
  
 -//#define DEBUG_MMU
 -//#define DEBUG_BATS
 -//#define DEBUG_SLB
 -//#define DEBUG_SOFTWARE_TLB
 +#define DEBUG_MMU 0
 +#define DEBUG_BATS 0
 +#define DEBUG_SLB 0
 +#define DEBUG_SOFTWARE_TLB 0
  //#define DUMP_PAGE_TABLES
 -//#define DEBUG_SOFTWARE_TLB
  //#define FLUSH_ALL_TLBS
  
 -#ifdef DEBUG_MMU
 -#  define LOG_MMU(...) qemu_log(__VA_ARGS__)
 -#  define LOG_MMU_STATE(env) log_cpu_state((env), 0)
 -#else
 -#  define LOG_MMU(...) do { } while (0)
 -#  define LOG_MMU_STATE(...) do { } while (0)
 -#endif
 -
 -#ifdef DEBUG_SOFTWARE_TLB
 -#  define LOG_SWTLB(...) qemu_log(__VA_ARGS__)
 -#else
 -#  define LOG_SWTLB(...) do { } while (0)
 -#endif
 -
 -#ifdef DEBUG_BATS
 -#  define LOG_BATS(...) qemu_log(__VA_ARGS__)
 -#else
 -#  define

Re: [PATCH v6 00/11] s390: channel I/O support in qemu.

2013-01-25 Thread Anthony Liguori
Hi,

Thank you for submitting your patch series.  checkpatch.pl has
detected that one or more of the patches in this series violate
the QEMU coding style.

If you believe this message was sent in error, please ignore it
or respond here with an explanation.

Otherwise, please correct the coding style issues and resubmit a
new version of the patch.

For more information about QEMU coding style, see:

http://git.qemu.org/?p=qemu.git;a=blob_plain;f=CODING_STYLE;hb=HEAD

Here is the output from checkpatch.pl:

Subject: s390: Add s390-ccw-virtio machine.
Subject: s390: Add default support for SCLP console
ERROR: do not initialise statics to 0 or NULL
#72: FILE: vl.c:2468:
+static int index = 0;

WARNING: braces {} are necessary for all arms of this statement
#126: FILE: vl.c:3923:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#135: FILE: vl.c:3937:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#144: FILE: vl.c:4109:
+if (foreach_device_config(DEV_SCLP, sclp_parse)  0)
[...]

total: 1 errors, 3 warnings, 114 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390-virtio: Factor out some initialization code.
Subject: s390: Add new channel I/O based virtio transport.
Subject: s390: Wire up channel I/O in kvm.
Subject: s390: Virtual channel subsystem support.
ERROR: need consistent spacing around '*' (ctx:WxV)
#56: FILE: hw/s390x/css.c:31:
+SubchDev *sch[MAX_SCHID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#62: FILE: hw/s390x/css.c:37:
+SubchSet *sch_set[MAX_SSID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#74: FILE: hw/s390x/css.c:49:
+CssImage *css[MAX_CSSID + 1];
  ^

total: 3 errors, 0 warnings, 1469 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390: Add channel I/O instructions.
Subject: s390: I/O interrupt and machine check injection.
Subject: s390: Channel I/O basic definitions.
Subject: s390: Add mapping helper functions.
Subject: s390: Lowcore mapping helper.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-21 Thread Anthony Liguori
Nicholas A. Bellinger n...@linux-iscsi.org writes:

 Hi MST  Co,

 On Thu, 2013-01-17 at 18:43 +0200, Michael S. Tsirkin wrote:
 On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  Hello Anthony  Co,
  
  This is the fourth installment to add host virtualized target support for
  the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU 
  1.3.0-rc.
  
  The series is available directly from the following git branch:
  
 git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git 
  vhost-scsi-for-1.3
  
  Note the code is cut against yesterday's QEMU head, and dispite the name
  of the tree is based upon mainline qemu.org git code + has thus far been
  running overnight with  100K IOPs small block 4k workloads using v3.6-rc2+
  based target code with RAMDISK_DR backstores.
  
  Other than some minor fuzz between jumping from QEMU 1.2.0 - 1.2.50, this
  series is functionally identical to what's been posted for vhost-scsi 
  RFC-v3
  to qemu-devel.
  
  Please consider applying these patches for an initial vhost-scsi merge into
  QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed 
  for
  this series to in order to merge.
  
  Thank you!
  
  --nab
 
 OK what's the status here?
 We missed 1.3 but let's try not to miss 1.4?
 

 Unfortunately, I've not been able to get back to the conversion
 requested by Paolo for a standalone vhost-scsi PCI device.

Is your git repo above up to date?  Perhaps I can find someone to help
out..

 At this point my hands are still full with iSER-target for-3.9 kernel
 code over the next weeks.  

 What's the v1.4 feature cut-off looking like at this point..?

Hard freeze is on february 1st but 1.5 opens up again on the 15th.  So
the release windows shouldn't have a major impact on merging...

Regards,

Anthony Liguori


 --nab

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
Jason Wang jasow...@redhat.com writes:

 On 01/15/2013 03:44 AM, Anthony Liguori wrote:
 Jason Wang jasow...@redhat.com writes:

 Hello all:

 This seires is an update of last version of multiqueue virtio-net support.

 Recently, linux tap gets multiqueue support. This series implements basic
 support for multiqueue tap, nic and vhost. Then use it as an infrastructure 
 to
 enable the multiqueue support for virtio-net.

 Both vhost and userspace multiqueue were implemented for virtio-net, but
 userspace could be get much benefits since dataplane like parallized 
 mechanism
 were not implemented.

 User could start a multiqueue virtio-net card through adding a queues
 parameter to tap.

 ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
 virtio-net-pci,netdev=hn0

 Management tools such as libvirt can pass multiple pre-created fds through

 ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
 virtio-net-pci,netdev=hn0
 I'm confused/frightened that this syntax works.  You shouldn't be
 allowed to have two values for the same property.  Better to have a
 syntax like fd[0]=X,fd[1]=Y or something along those lines.

 Yes, but this what current a StringList type works for command line.
 Some other parameters such as dnssearch, hostfwd and guestfwd have
 already worked in this way. Looks like your suggestions need some
 extension on QemuOps visitor, maybe we can do this on top.

It's a silly syntax and breaks compatibility.  This is valid syntax:

-net tap,fd=3,fd=4

In this case, it means 'fd=4' because the last fd overwrites the first
one.

Now you've changed it to mean something else.  Having one thing mean
something in one context, but something else in another context is
terrible interface design.

Regards,

Anthony Liguori


 Thanks

 Regards,

 Anthony Liguori

 You can fetch and try the code from:
 git://github.com/jasowang/qemu.git

 Patch 1 adds a generic method of creating multiqueue taps and implement the
 linux part.
 Patch 2 - 4 introduce some helpers which could be used to refactor the nic
 emulation codes to support multiqueue.
 Patch 5 introduces multiqueue support for qemu networking code: each peers 
 of
 NetClientState were abstracted as a queue. Though this, most of the codes 
 could
 be reusued without change.
 Patch 6 adds basic multiqueue support for vhost which could let vhost just
 handle a subset of all virtqueues.
 Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
 virtio-net.
 Patch 9-12 implement the multiqueue support of virtio-net

 Changes from RFC v2:
 - rebase the codes to latest qemu
 - align the multiqueue virtio-net implementation to virtio spec
 - split the patches into more smaller patches
 - set_link and hotplug support

 Changes from RFC V1:
 - rebase to the latest
 - fix memory leak in parse_netdev
 - fix guest notifiers assignment/de-assignment
 - changes the command lines to:
qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

 Reference:
 v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
 v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481

 Perf Numbers:

 Two Intel Xeon 5620 with direct connected intel 82599EB
 Host/Guest kernel: David net tree
 vhost enabled

 - lots of improvents of both latency and cpu utilization in request-reponse 
 test
 - get regression of guest sending small packets which because TCP tends to 
 batch
   less when the latency were improved

 1q/2q/4q
 TCP_RR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
 TCP_CRR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
 256 50  28354.7  579.85 40578.31 60760261.71 657.87
 256 100 28844.55 585.67

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
 Jason Wang jasow...@redhat.com writes:
 
  On 01/15/2013 03:44 AM, Anthony Liguori wrote:
  Jason Wang jasow...@redhat.com writes:
 
  Hello all:
 
  This seires is an update of last version of multiqueue virtio-net 
  support.
 
  Recently, linux tap gets multiqueue support. This series implements basic
  support for multiqueue tap, nic and vhost. Then use it as an 
  infrastructure to
  enable the multiqueue support for virtio-net.
 
  Both vhost and userspace multiqueue were implemented for virtio-net, but
  userspace could be get much benefits since dataplane like parallized 
  mechanism
  were not implemented.
 
  User could start a multiqueue virtio-net card through adding a queues
  parameter to tap.
 
  ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
  virtio-net-pci,netdev=hn0
 
  Management tools such as libvirt can pass multiple pre-created fds 
  through
 
  ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
  virtio-net-pci,netdev=hn0
  I'm confused/frightened that this syntax works.  You shouldn't be
  allowed to have two values for the same property.  Better to have a
  syntax like fd[0]=X,fd[1]=Y or something along those lines.
 
  Yes, but this what current a StringList type works for command line.
  Some other parameters such as dnssearch, hostfwd and guestfwd have
  already worked in this way. Looks like your suggestions need some
  extension on QemuOps visitor, maybe we can do this on top.
 
 It's a silly syntax and breaks compatibility.  This is valid syntax:
 
 -net tap,fd=3,fd=4
 
 In this case, it means 'fd=4' because the last fd overwrites the first
 one.
 
 Now you've changed it to mean something else.  Having one thing mean
 something in one context, but something else in another context is
 terrible interface design.
 
 Regards,
 
 Anthony Liguori

 Aha so just renaming the field 'fds' would address this issue?

No, you still have the problem of different meanings.

-netdev tap,fd=X,fd=Y

-netdev tap,fds=X,fds=Y

Would have wildly different behavior.

Just do:

-netdev tap,fds=X:Y

And then we're staying consistent wrt the interpretation of multiple
properties of the same name.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Multiqueue virtio-net

2013-01-14 Thread Anthony Liguori
Jason Wang jasow...@redhat.com writes:

 Hello all:

 This seires is an update of last version of multiqueue virtio-net support.

 Recently, linux tap gets multiqueue support. This series implements basic
 support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
 enable the multiqueue support for virtio-net.

 Both vhost and userspace multiqueue were implemented for virtio-net, but
 userspace could be get much benefits since dataplane like parallized mechanism
 were not implemented.

 User could start a multiqueue virtio-net card through adding a queues
 parameter to tap.

 ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0

 Management tools such as libvirt can pass multiple pre-created fds through

 ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
 virtio-net-pci,netdev=hn0

I'm confused/frightened that this syntax works.  You shouldn't be
allowed to have two values for the same property.  Better to have a
syntax like fd[0]=X,fd[1]=Y or something along those lines.

Regards,

Anthony Liguori


 You can fetch and try the code from:
 git://github.com/jasowang/qemu.git

 Patch 1 adds a generic method of creating multiqueue taps and implement the
 linux part.
 Patch 2 - 4 introduce some helpers which could be used to refactor the nic
 emulation codes to support multiqueue.
 Patch 5 introduces multiqueue support for qemu networking code: each peers of
 NetClientState were abstracted as a queue. Though this, most of the codes 
 could
 be reusued without change.
 Patch 6 adds basic multiqueue support for vhost which could let vhost just
 handle a subset of all virtqueues.
 Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
 virtio-net.
 Patch 9-12 implement the multiqueue support of virtio-net

 Changes from RFC v2:
 - rebase the codes to latest qemu
 - align the multiqueue virtio-net implementation to virtio spec
 - split the patches into more smaller patches
 - set_link and hotplug support

 Changes from RFC V1:
 - rebase to the latest
 - fix memory leak in parse_netdev
 - fix guest notifiers assignment/de-assignment
 - changes the command lines to:
qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

 Reference:
 v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
 v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481

 Perf Numbers:

 Two Intel Xeon 5620 with direct connected intel 82599EB
 Host/Guest kernel: David net tree
 vhost enabled

 - lots of improvents of both latency and cpu utilization in request-reponse 
 test
 - get regression of guest sending small packets which because TCP tends to 
 batch
   less when the latency were improved

 1q/2q/4q
 TCP_RR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
 TCP_CRR
  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
 256 50  28354.7  579.85 40578.31 60760261.71 657.87
 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
 TCP_STREAM guest receiving
  size #sessions throughput  norm throughput  norm throughput  norm
 1 1 16.27   1.33   16.11.12   16.13   0.99
 1 2 33.04   2.08   32.96   2.19   32.75   1.98
 1 4 66.62   6.83   68.35.56   66.14   2.65
 64 1896.55  56.67  914.02  58.14  898.9   61.56
 64 21830.46 91.02  1812.02 64.59  1835.57 66.26
 64 43626.61 142.55 3636.25 100.64 3607.46 75.03
 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
 512 4   6996.08 241.49

Re: [PULL 0/2] vfio-pci: Fixes for 1.4 stable

2013-01-10 Thread Anthony Liguori
Pulled, thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] target-i386: make enforce flag work as it should

2013-01-04 Thread Anthony Liguori
Hi,

This is an automated message generated from the QEMU Patches.
Thank you for submitting this patch.  This patch no longer applies to qemu.git.

This may have occurred due to:
 
  1) Changes in mainline requiring your patch to be rebased and re-tested.

  2) Sending the mail using a tool other than git-send-email.  Please use
 git-send-email to send patches to QEMU.

  3) Basing this patch off of a branch that isn't tracking the QEMU
 master branch.  If that was done purposefully, please include the name
 of the tree in the subject line in the future to prevent this message.

 For instance: [PATCH block-next 1/10] qcow3: add fancy new feature

  4) You no longer wish for this patch to be applied to QEMU.  No additional
 action is required on your part.

Nacked-by: QEMU Patches aligu...@us.ibm.com

Below is the output from git-am:

Applying: target-i386: kvm: -cpu host: Use GET_SUPPORTED_CPUID for SVM 
features
Applying: target-i386: kvm: Enable all supported KVM features for -cpu host
Applying: target-i386: check/enforce: Fix CPUID leaf numbers on error 
messages
fatal: sha1 information is lacking or useless (target-i386/cpu.c).
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0003 target-i386: check/enforce: Fix CPUID leaf numbers on 
error messages
The copy of the patch that failed is found in:
   /home/aliguori/.patches/git-working/.git/rebase-apply/patch
When you have resolved this problem run git am --resolved.
If you would prefer to skip this patch, instead run git am --skip.
To restore the original branch and stop patching run git am --abort.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue

2013-01-02 Thread Anthony Liguori
Gleb Natapov g...@redhat.com writes:

 The following changes since commit e376a788ae130454ad5e797f60cb70d0308babb6:

   Merge remote-tracking branch 'kwolf/for-anthony' into staging (2012-12-13 
 14:32:28 -0600)

 are available in the git repository at:


   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

 for you to fetch changes up to 0a2a59d35cbabf63c91340a1c62038e3e60538c1:

   qemu-kvm/pci-assign: 64 bits bar emulation (2012-12-25 14:37:52 +0200)


Pulled. Thanks.

Regards,

Anthony Liguori

 
 Will Auld (1):
   target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

 Xudong Hao (1):
   qemu-kvm/pci-assign: 64 bits bar emulation

  hw/kvm/pci-assign.c   |   14 ++
  target-i386/cpu.h |2 ++
  target-i386/kvm.c |   14 ++
  target-i386/machine.c |   21 +
  4 files changed, 47 insertions(+), 4 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-12-18

2012-12-18 Thread Anthony Liguori
Juan Quintela quint...@redhat.com writes:

 Hi

 Please send in any agenda topics that you have.

I have a conflicting call today so I can't attend.

Regards,

Anthony Liguori


 Thanks, Juan.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2012-12-11

2012-12-11 Thread Anthony Liguori
Kevin Wolf kw...@redhat.com writes:

 Am 10.12.2012 14:59, schrieb Juan Quintela:
 
 Hi
 
 Please send in any agenda topics you are interested in.

 Can probably be answered on the list, but what is the status of
 libqos?

Still on my TODO list.

Regards,

Anthony Liguori


 Kevin
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-28 Thread Anthony Liguori
Glauber Costa glom...@parallels.com writes:

 Hi,

 On 11/27/2012 12:36 AM, Michael Wolf wrote:
 In the case of where you have a system that is running in a
 capped or overcommitted environment the user may see steal time
 being reported in accounting tools such as top or vmstat.  This can
 cause confusion for the end user.  To ease the confusion this patch set
 adds the idea of consigned (expected steal) time.  The host will separate
 the consigned time from the steal time.  The consignment limit passed to the
 host will be the amount of steal time expected within a fixed period of
 time.  Any other steal time accruing during that period will show as the
 traditional steal time.

 If you submit this again, please include a version number in your series.

 It would also be helpful to include a small changelog about what changed
 between last version and this version, so we could focus on that.

 As for the rest, I answered your previous two submissions saying I don't
 agree with the concept. If you hadn't changed anything, resending it
 won't change my mind.

 I could of course, be mistaken or misguided. But I had also not seen any
 wave of support in favor of this previously, so basically I have no new
 data to make me believe I should see it any differently.

 Let's try this again:

 * Rik asked you in your last submission how does ppc handle this. You
 said, and I quote: In the case of lpar on POWER systems they simply
 report steal time and do not alter it in any way.
 They do however report how much processor is assigned to the partition
 and that information is in /proc/ppc64/lparcfg.

This only is helpful for static entitlements.

But if we allow dynamic entitlements--which is a very useful feature,
think buying an online upgrade in a cloud environment--then you need
to account for entitlement loss at the same place where you do the rest
of the accounting: in /proc/stat.

 Now, that is a *way* more sensible thing to do. Much more. Confusing
 users is something extremely subjective. This is specially true about
 concepts that are know for quite some time, like steal time. If you out
 of a sudden change the meaning of this, it is sure to confuse a lot more
 users than it would clarify.

I'll bring you a nice bottle of scotch at the next KVM Forum if you can
find me one user that can accurately describe what steal time is.

The semantics are so incredibly subtle that I have a hard time believing
anyone actually understands what it means today.

Regards,

Anthony Liguori





 
 ---
 
 Michael Wolf (5):
   Alter the amount of steal time reported by the guest.
   Expand the steal time msr to also contain the consigned time.
   Add the code to send the consigned time from the host to the guest
   Add a timer to allow the separation of consigned from steal time.
   Add an ioctl to communicate the consign limit to the host.
 
 
  arch/x86/include/asm/kvm_host.h   |   11 +++
  arch/x86/include/asm/kvm_para.h   |3 +-
  arch/x86/include/asm/paravirt.h   |4 +--
  arch/x86/include/asm/paravirt_types.h |2 +
  arch/x86/kernel/kvm.c |8 ++---
  arch/x86/kernel/paravirt.c|4 +--
  arch/x86/kvm/x86.c|   50 
 -
  fs/proc/stat.c|9 +-
  include/linux/kernel_stat.h   |2 +
  include/linux/kvm_host.h  |2 +
  include/uapi/linux/kvm.h  |2 +
  kernel/sched/core.c   |   10 ++-
  kernel/sched/cputime.c|   21 +-
  kernel/sched/sched.h  |2 +
  virt/kvm/kvm_main.c   |7 +
  15 files changed, 120 insertions(+), 17 deletions(-)
 

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] [PULL] qemu-kvm.git uq/master queue

2012-11-26 Thread Anthony Liguori
Marcelo Tosatti mtosa...@redhat.com writes:

 The following changes since commit 1ccbc2851282564308f790753d7158487b6af8e2:

   qemu-sockets: Fix parsing of the inet option 'to'. (2012-11-21 12:07:59 
 +0400)

 are available in the git repository at:
   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

 Bruce Rogers (1):
   Legacy qemu-kvm options have no argument

Pulled. Thanks.

Regards,

Anthony Liguori


  qemu-options.hx |8 
  1 files changed, 4 insertions(+), 4 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/3] vfio-pci for 1.3-rc0

2012-11-14 Thread Anthony Liguori
Alex Williamson alex.william...@redhat.com writes:

 Hi Anthony,

 Please pull the tag below.  I posted the linux-headers update
 separately on Oct-15; since it hasn't been applied and should be
 non-controversial, I include it again here.  Thanks,

 Alex


Pulled. Thanks.

Regards,

Anthony Liguori

 The following changes since commit f5022a135e4309a54d433c69b2a056756b2d0d6b:

   aio: fix aio_ctx_prepare with idle bottom halves (2012-11-12 20:02:09 +0400)

 are available in the git repository at:

   git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-1.3.0-rc0

 for you to fetch changes up to a771c51703cf9f91023c6570426258bdf5ec775b:

   vfio-pci: Use common msi_get_message (2012-11-13 12:27:40 -0700)

 
 vfio-pci: KVM INTx accel  common msi_get_message

 
 Alex Williamson (3):
   linux-headers: Update to 3.7-rc5
   vfio-pci: Add KVM INTx acceleration
   vfio-pci: Use common msi_get_message

  hw/vfio_pci.c| 210 
 +++
  linux-headers/asm-powerpc/kvm_para.h |   6 +-
  linux-headers/asm-s390/kvm_para.h|   8 +-
  linux-headers/asm-x86/kvm.h  |  17 +++
  linux-headers/linux/kvm.h|  25 -
  linux-headers/linux/kvm_para.h   |   6 +-
  linux-headers/linux/vfio.h   |   6 +-
  linux-headers/linux/virtio_config.h  |   6 +-
  linux-headers/linux/virtio_ring.h|   6 +-
  9 files changed, 241 insertions(+), 49 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-11-12

2012-11-13 Thread Anthony Liguori
Marcelo Tosatti mtosa...@redhat.com writes:

 On Mon, Nov 12, 2012 at 01:58:38PM +0100, Juan Quintela wrote:
 
 Hi
 
 Please send in any agenda topics you are interested in.
 
 Later, Juan.

 It would be good to have a status report on qemu-kvm compatibility
 (the remaining TODO items are with Anthony). They are:

 - qemu-kvm 1.2 machine type.
 - default accelerator being KVM.

 Note migration will remain broken due to 

 https://patchwork.kernel.org/patch/1674521/

 BTW, this can be via email, if preferred (i cannot attend the call).

Let's cancel the call and I'll spend the hour writing up the patches and
sending them out.

Regards,

Anthony Liguori



 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1.1.1 - 1.1.2 migrate /managedsave issue

2012-11-04 Thread Anthony Liguori
Avi Kivity a...@redhat.com writes:

 On 10/22/2012 09:04 AM, Philipp Hahn wrote:
 Hello Doug,
 
 On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote:
 I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs.
 ...
 I had upgraded to qemu-kvm 1.1.2
 ... 
 qemu: warning: error while loading state for instance 0x0 of device 'ram'
 load of migration failed
 
 That error can be from many things. For me it was that the PXE-ROM images 
 for 
 the network cards were updated as well. Their size changed over the next 
 power-of-two size, so kvm needed to allocate less/more memory and changed 
 some PCI configuration registers, where the size of the ROM region is stored.
 On loading the saved state those sizes were compared and failed to validate. 
 KVM then aborts loading the saved state with that little helpful message.
 
 So you might want to check, if your case is similar to mine.
 
 I diagnosed that using gdb to single step kvm until I found 
 hw/pci.c#get_pci_config_device() returning -EINVAL.
 

 Seems reasonable.  Doug, please verify to see if it's the same issue or
 another one.

 Juan, how can we fix this?  It's clear that the option ROM size has to
 be fixed and not change whenever the blob is updated.  This will fix it
 for future releases.  But what to do about the ones in the field?

This is not a problem upstream because we don't alter the ROMs.  If we
did, we would keep the old ROMs around and set the romfile property in
the compatible machine.

This is what distros that are shipping ROMs outside of QEMU ought to
do.  It's a bug to unconditionally change the ROMs (in a guest visible
way) without adding compatibility support.

Regards,

Anthony Liguori


 -- 
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] [PULL] qemu-kvm.git uq/master queue

2012-11-01 Thread Anthony Liguori
Marcelo Tosatti mtosa...@redhat.com writes:

 The following changes since commit aee0bf7d8d7564f8f2c40e4501695c492b7dd8d1:

   tap-win32: stubs to fix win32 build (2012-10-30 19:18:53 +)

 are available in the git repository at:
   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

 Don Slutz (1):
   target-i386: Add missing kvm cpuid feature name

 Eduardo Habkost (19):
   i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for 
 loop
   i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check
   i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
   i386: kvm: extract register switch to cpuid_entry_get_reg() function
   i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
   i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() 
 function
   i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 
 'if'
   i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
   i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on 
 kvm_arch_get_supported_cpuid()
   i386: kvm: x2apic is not supported without in-kernel irqchip
   i386: kvm: mask cpuid_kvm_features earlier
   i386: kvm: mask cpuid_ext4_features bits earlier
   i386: kvm: filter CPUID feature words earlier, on cpu.c
   i386: kvm: reformat filter_features_for_kvm() code
   i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too
   i386: cpu: add missing CPUID[EAX=7,ECX=0] flag names
   target-i386: make cpu_x86_fill_host() void
   target-i386: cpu: make -cpu host/check/enforce code KVM-specific
   target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID

 Jan Kiszka (6):
   Use machine options to emulate -no-kvm-irqchip
   Issue warning when deprecated -no-kvm-pit is used
   Use global properties to emulate -no-kvm-pit-reinjection
   Issue warning when deprecated drive parameter boot=on|off is used
   Issue warning when deprecated -tdf option is used
   Emulate qemu-kvms -no-kvm option

 Marcelo Tosatti (1):
   cirrus_vga: allow configurable vram size

 Peter Maydell (1):
   update-linux-headers.sh: Handle new kernel uapi/ directories


Pulled. Thanks.

Regards,

Anthony Liguori

  blockdev.c  |6 ++
  hw/cirrus_vga.c |   21 --
  kvm.h   |1 +
  qemu-config.c   |4 +
  qemu-options.hx |   16 
  scripts/update-linux-headers.sh |3 +-
  target-i386/cpu.c   |   98 +++---
  target-i386/kvm.c   |  153 
 +++
  vl.c|   33 +
  9 files changed, 242 insertions(+), 93 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-16 Thread Anthony Liguori
 of the change rate?
 
 Typically the HPT would have about a million entries, i.e. it would be
 16MiB in size.  The usual guideline is to make it about 1/64 of the
 maximum amount of RAM the guest could ever have, rounded up to a power
 of two, although we often run with less, say 1/128 or even 1/256.

 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
 it warrant a live migration protocol?

0.15 sec == 150ms.  The typical downtime window is 30ms.  So yeah, I
think it does.

 Because it is a hash table, updates tend to be scattered throughout
 the whole table, which is another reason why per-page dirty tracking
 and updates would be pretty inefficient.

 This suggests a stream format that includes the index in every entry.

 
 As for the change rate, it depends on the application of course, but
 basically every time the guest changes a PTE in its Linux page tables
 we do the corresponding change to the corresponding HPT entry, so the
 rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
 exec, etc. have a high rate of HPT updates.

 If the rate is high enough, then there's no point in a live update.

Do we have practical data here?

Regards,

Anthony Liguori


 
 Suppose new hardware arrives that supports nesting HPTs, so that kvm is
 no longer synchronously aware of the guest HPT (similar to how NPT/EPT
 made kvm unaware of guest virtual-physical translations on x86).  How
 will we deal with that?  But I guess this will be a
 non-guest-transparent and non-userspace-transparent change, unlike
 NPT/EPT, so a userspace ABI addition will be needed anyway).
 
 Nested HPTs or other changes to the MMU architecture would certainly
 need new guest kernels and new support in KVM.  With a nested
 approach, the guest-side MMU data structures (HPT or whatever) would
 presumably be in guest memory and thus be handled along with all the
 other guest memory, while the host-side MMU data structures would not
 need to be saved, so from the migration point of view that would make
 it all a lot simpler.

 Yeah.


 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-16 Thread Anthony Liguori
 of the change rate?
 
 Typically the HPT would have about a million entries, i.e. it would be
 16MiB in size.  The usual guideline is to make it about 1/64 of the
 maximum amount of RAM the guest could ever have, rounded up to a power
 of two, although we often run with less, say 1/128 or even 1/256.

 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
 it warrant a live migration protocol?

0.15 sec == 150ms.  The typical downtime window is 30ms.  So yeah, I
think it does.

 Because it is a hash table, updates tend to be scattered throughout
 the whole table, which is another reason why per-page dirty tracking
 and updates would be pretty inefficient.

 This suggests a stream format that includes the index in every entry.

 
 As for the change rate, it depends on the application of course, but
 basically every time the guest changes a PTE in its Linux page tables
 we do the corresponding change to the corresponding HPT entry, so the
 rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
 exec, etc. have a high rate of HPT updates.

 If the rate is high enough, then there's no point in a live update.

Do we have practical data here?

Regards,

Anthony Liguori


 
 Suppose new hardware arrives that supports nesting HPTs, so that kvm is
 no longer synchronously aware of the guest HPT (similar to how NPT/EPT
 made kvm unaware of guest virtual-physical translations on x86).  How
 will we deal with that?  But I guess this will be a
 non-guest-transparent and non-userspace-transparent change, unlike
 NPT/EPT, so a userspace ABI addition will be needed anyway).
 
 Nested HPTs or other changes to the MMU architecture would certainly
 need new guest kernels and new support in KVM.  With a nested
 approach, the guest-side MMU data structures (HPT or whatever) would
 presumably be in guest memory and thus be handled along with all the
 other guest memory, while the host-side MMU data structures would not
 need to be saved, so from the migration point of view that would make
 it all a lot simpler.

 Yeah.


 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Update Linux headers

2012-10-15 Thread Anthony Liguori
Alex Williamson alex.william...@redhat.com writes:

 Based on v3.7-rc1-3-g29bb4cc

Normally this would go through qemu-kvm/uq/master but since this is from
Linus' tree, it's less of a concern.

Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git
snapshot.

Regards,

Anthony Liguori


 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci

  linux-headers/asm-x86/kvm.h |   17 +
  linux-headers/linux/kvm.h   |   25 +
  linux-headers/linux/kvm_para.h  |6 +++---
  linux-headers/linux/vfio.h  |6 +++---
  linux-headers/linux/virtio_config.h |6 +++---
  linux-headers/linux/virtio_ring.h   |6 +++---
  6 files changed, 50 insertions(+), 16 deletions(-)

 diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
 index 246617e..a65ec29 100644
 --- a/linux-headers/asm-x86/kvm.h
 +++ b/linux-headers/asm-x86/kvm.h
 @@ -9,6 +9,22 @@
  #include linux/types.h
  #include linux/ioctl.h
  
 +#define DE_VECTOR 0
 +#define DB_VECTOR 1
 +#define BP_VECTOR 3
 +#define OF_VECTOR 4
 +#define BR_VECTOR 5
 +#define UD_VECTOR 6
 +#define NM_VECTOR 7
 +#define DF_VECTOR 8
 +#define TS_VECTOR 10
 +#define NP_VECTOR 11
 +#define SS_VECTOR 12
 +#define GP_VECTOR 13
 +#define PF_VECTOR 14
 +#define MF_VECTOR 16
 +#define MC_VECTOR 18
 +
  /* Select x86 specific features in linux/kvm.h */
  #define __KVM_HAVE_PIT
  #define __KVM_HAVE_IOAPIC
 @@ -25,6 +41,7 @@
  #define __KVM_HAVE_DEBUGREGS
  #define __KVM_HAVE_XSAVE
  #define __KVM_HAVE_XCRS
 +#define __KVM_HAVE_READONLY_MEM
  
  /* Architectural interrupt line count. */
  #define KVM_NR_INTERRUPTS 256
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 4b9e575..81d2feb 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region {
   __u64 userspace_addr; /* start of the userspace allocated memory */
  };
  
 -/* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 -#define KVM_MEMSLOT_INVALID  (1UL  1)
 +/*
 + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
 + * other bits are reserved for kvm internal use which are defined in
 + * include/linux/kvm_host.h.
 + */
 +#define KVM_MEM_LOG_DIRTY_PAGES  (1UL  0)
 +#define KVM_MEM_READONLY (1UL  1)
  
  /* for KVM_IRQ_LINE */
  struct kvm_irq_level {
 @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_GET_SMMU_INFO 78
  #define KVM_CAP_S390_COW 79
  #define KVM_CAP_PPC_ALLOC_HTAB 80
 +#ifdef __KVM_HAVE_READONLY_MEM
 +#define KVM_CAP_READONLY_MEM 81
 +#endif
 +#define KVM_CAP_IRQFD_RESAMPLE 82
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config {
  #endif
  
  #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 +/*
 + * Available with KVM_CAP_IRQFD_RESAMPLE
 + *
 + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
 + * the irqfd to operate in resampling mode for level triggered interrupt
 + * emlation.  See Documentation/virtual/kvm/api.txt.
 + */
 +#define KVM_IRQFD_FLAG_RESAMPLE (1  1)
  
  struct kvm_irqfd {
   __u32 fd;
   __u32 gsi;
   __u32 flags;
 - __u8  pad[20];
 + __u32 resamplefd;
 + __u8  pad[16];
  };
  
  struct kvm_clock_data {
 diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
 index 7bdcf93..cea2c5c 100644
 --- a/linux-headers/linux/kvm_para.h
 +++ b/linux-headers/linux/kvm_para.h
 @@ -1,5 +1,5 @@
 -#ifndef __LINUX_KVM_PARA_H
 -#define __LINUX_KVM_PARA_H
 +#ifndef _UAPI__LINUX_KVM_PARA_H
 +#define _UAPI__LINUX_KVM_PARA_H
  
  /*
   * This header file provides a method for making a hypercall to the host
 @@ -25,4 +25,4 @@
   */
  #include asm/kvm_para.h
  
 -#endif /* __LINUX_KVM_PARA_H */
 +#endif /* _UAPI__LINUX_KVM_PARA_H */
 diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
 index f787b72..4758d1b 100644
 --- a/linux-headers/linux/vfio.h
 +++ b/linux-headers/linux/vfio.h
 @@ -8,8 +8,8 @@
   * it under the terms of the GNU General Public License version 2 as
   * published by the Free Software Foundation.
   */
 -#ifndef VFIO_H
 -#define VFIO_H
 +#ifndef _UAPIVFIO_H
 +#define _UAPIVFIO_H
  
  #include linux/types.h
  #include linux/ioctl.h
 @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap {
  
  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
  
 -#endif /* VFIO_H */
 +#endif /* _UAPIVFIO_H */
 diff --git a/linux-headers/linux/virtio_config.h 
 b/linux-headers/linux/virtio_config.h
 index 4f51d8f..b7cda39 100644
 --- a/linux-headers/linux/virtio_config.h
 +++ b/linux-headers/linux/virtio_config.h
 @@ -1,5 +1,5 @@
 -#ifndef _LINUX_VIRTIO_CONFIG_H
 -#define _LINUX_VIRTIO_CONFIG_H
 +#ifndef _UAPI_LINUX_VIRTIO_CONFIG_H
 +#define _UAPI_LINUX_VIRTIO_CONFIG_H
  /* This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
   * anyone can

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Gerd Hoffmann kra...@redhat.com writes:
 So how about this:

 (1) Add a vendor specific pci capability for new-style virtio.
 Specifies the pci bar used for new-style virtio registers.
 Guests can use it to figure whenever new-style virtio is
 supported and to map the correct bar (which will probably
 be bar 1 in most cases).

 This was closer to the original proposal[1], which I really liked (you
 can layout bars however you want).  Anthony thought that vendor
 capabilities were a PCI-e feature, but it seems they're blessed in PCI
 2.3.

2.3 was standardized in 2002.  Are we confident that vendor extensions
play nice with pre-2.3 OSes like Win2k, WinXP, etc?

I still think it's a bad idea to rely on something so new in something
as fundamental as virtio-pci unless we have to.

Regards,

Anthony Liguori


 So let's return to that proposal, giving something like this:

 /* IDs for different capabilities.  Must all exist. */
 /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
 /* Common configuration */
 #define VIRTIO_PCI_CAP_COMMON_CFG 1
 /* Notifications */
 #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
 /* ISR access */
 #define VIRTIO_PCI_CAP_ISR_CFG3
 /* Device specific confiuration */
 #define VIRTIO_PCI_CAP_DEVICE_CFG 4

 /* This is the PCI capability header: */
 struct virtio_pci_cap {
   u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
   u8 cap_next;/* Generic PCI field: next ptr. */
   u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
   u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
   u8 bar; /* Where to find it. */
   u8 unused;
   __le16 offset;  /* Offset within bar. */
   __le32 length;  /* Length. */
 };

 This means qemu can point the isr_cfg into the legacy area if it wants.
 In fact, it can put everything in BAR0 if it wants.

 Thoughts?
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)

 You will be supporting this for qemu on x86, sure.

And PPC.

 As I think we're
 still in the growth phase for virtio, I prioritize future spec
 cleanliness pretty high.

 But I think you'll be surprised how fast this is deprecated:
 1) Bigger queues for block devices (guest-specified ringsize)
 2) Smaller rings for openbios (guest-specified alignment)
 3) All-mmio mode (powerpc)
 4) Whatever network features get numbers  31.

We can do all of these things with incremental change to the existing
layout.  That's the only way what I'm suggesting is different.

You want to reorder all of the fields and have a driver flag day.  But I
strongly suspect we'll decide we need to do the same exercise again in 4
years when we now need to figure out how to take advantage of
transactional memory or some other whiz-bang hardware feature.

There are a finite number of BARs but each BAR has an almost infinite
size.  So extending BARs instead of introducing new one seems like the
conservative approach moving forward.

 I don't think we gain a lot by moving the ISR into a separate BAR.
 Splitting up registers like that seems weird to me too.

 Confused.  I proposed the same split as you have, just ISR by itself.

I disagree with moving the ISR into a separate BAR.  That's what seems
weird to me.

 It's very normal to have a mirrored set of registers that are PIO in one
 bar and MMIO in a different BAR.

 If we added an additional constraints that BAR1 was mirrored except for
 the config space and the MSI section was always there, I think the end
 result would be nice.  IOW:

 But it won't be the same, because we want all that extra stuff, like
 more feature bits and queue size alignment.  (Admittedly queues past
 16TB aren't a killer feature).

 To make it concrete:

 Current:
 struct {
 __le32 host_features;   /* read-only */
 __le32 guest_features;  /* read/write */
 __le32 queue_pfn;   /* read/write */
 __le16 queue_size;  /* read-only */
 __le16 queue_sel;   /* read/write */
 __le16 queue_notify;/* read/write */
 u8 status;  /* read/write */
 u8 isr; /* read-only, clear on read */
 /* Optional */
 __le16 msi_config_vector;   /* read/write */
 __le16 msi_queue_vector;/* read/write */
 /* ... device features */
 };

 Proposed:
 struct virtio_pci_cfg {
   /* About the whole device. */
   __le32 device_feature_select;   /* read-write */
   __le32 device_feature;  /* read-only */
   __le32 guest_feature_select;/* read-write */
   __le32 guest_feature;   /* read-only */
   __le16 msix_config; /* read-write */
   __u8 device_status; /* read-write */
   __u8 unused;

   /* About a specific virtqueue. */
   __le16 queue_select;/* read-write */
   __le16 queue_align; /* read-write, power of 2. */
   __le16 queue_size;  /* read-write, power of 2. */
   __le16 queue_msix_vector;/* read-write */
   __le64 queue_address;   /* read-write: 0x == DNE. */
 };

 struct virtio_pci_isr {
 __u8 isr; /* read-only, clear on read */
 };

What I'm suggesting is:

 struct {
 __le32 host_features;   /* read-only */
 __le32 guest_features;  /* read/write */
 __le32 queue_pfn;   /* read/write */
 __le16 queue_size;  /* read-only */
 __le16 queue_sel;   /* read/write */
 __le16 queue_notify;/* read/write */
 u8 status;  /* read/write */
 u8 isr; /* read-only, clear on read */
 __le16 msi_config_vector;   /* read/write */
 __le16 msi_queue_vector;/* read/write */
 __le32 host_feature_select; /* read/write */
 __le32 guest_feature_select;/* read/write */
 __le32 queue_pfn_hi;/* read/write */
 };


With the additional semantic that the virtio-config space is overlayed
on top of the register set in BAR0 unless the
VIRTIO_PCI_F_SEPARATE_CONFIG feature is acknowledged.  This feature
acts as a latch and when set, removes the config space overlay.

If the config space overlays the registers, the offset in BAR0 of the
overlay depends on whether MSI is enabled or not in the PCI device.

BAR1 is an MMIO mirror of BAR0 except that the config space is never
overlayed in BAR1 regardless of VIRTIO_PCI_F_SEPARATE_CONFIG.

BAR2 contains the config space.

A guest can look at BAR1 and BAR2 to determine whether they exist.

 We could also enforce LE in the per-device config space

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Avi Kivity a...@redhat.com writes:

 On 10/09/2012 05:16 AM, Rusty Russell wrote:
 Anthony Liguori aligu...@us.ibm.com writes:
 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)
 
 You will be supporting this for qemu on x86, sure.  As I think we're
 still in the growth phase for virtio, I prioritize future spec
 cleanliness pretty high.

 If a pure ppc hypervisor was on the table, this might have been
 worthwhile.  As it is the codebase is shared, and the Linux drivers are
 shared, so cleaning up the spec doesn't help the code.

Note that distros have been (perhaps unknowingly) shipping virtio-pci
for PPC for some time now.

So even though there wasn't a hypervisor that supported virtio-pci, the
guests already support it and are out there in the wild.

There's a lot of value in maintaining legacy support even for PPC.
 
 But I think you'll be surprised how fast this is deprecated:
 1) Bigger queues for block devices (guest-specified ringsize)
 2) Smaller rings for openbios (guest-specified alignment)
 3) All-mmio mode (powerpc)
 4) Whatever network features get numbers  31.
 
 I don't think we gain a lot by moving the ISR into a separate BAR.
 Splitting up registers like that seems weird to me too.
 
 Confused.  I proposed the same split as you have, just ISR by itself.

 I believe Anthony objects to having the ISR by itself.  What is the
 motivation for that?

Right, BARs are a precious resource not to be spent lightly.  Having an
entire BAR dedicated to a 1-byte register seems like a waste to me.

Regards,

Anthony Liguori



 -- 
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 Well, we also want to clean up the registers, so how about:

 BAR0: legacy, as is.  If you access this, don't use the others.

 Ok.

 BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
 BAR2: virtio-cfg.  If you use this, don't use BAR0.

 Why use two bars for this?  You can put them into one mmio bar, together
 with the msi-x vector table and PBA.  Of course a pci capability
 describing the location is helpful for that ;)

You don't need a capability.  You can also just add a config offset
field to the register set and then make the semantics that it occurs in
the same region.


 BAR3: ISR. If you use this, don't use BAR0.

 Again, I wouldn't hardcode that but use a capability.

 I prefer the cases exclusive (ie. use one or the other) as a clear path
 to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
 an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

 Ok, so we have four register sets:

   (1) legacy layout
   (2) new virtio-pci
   (3) new virtio-config
   (4) new virtio-isr

 We can have a vendor pci capability, with a dword for each register set:

   bit  31-- present bit
   bits 26-24 -- bar
   bits 23-0  -- offset

 So current drivers which must support legacy can use this:

   legacy layout -- present, bar 0, offset 0
   new virtio-pci-- present, bar 1, offset 0
   new virtio-config -- present, bar 1, offset 256
   new virtio-isr-- present, bar 0, offset 19

 [ For completeness: msi-x capability could add this: ]

   msi-x vector tablebar 1, offset 512
   msi-x pba bar 1, offset 768

 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)

 But new devices (virtio-qxl being a candidate) don't have old guests and
 don't need to worry.

 They could use this if they care about fast isr:

   legacy layout -- not present
   new virtio-pci-- present, bar 1, offset 0
   new virtio-config -- present, bar 1, offset 256
   new virtio-isr-- present, bar 0, offset 0

 Or this if they don't worry about isr performance:

   legacy layout -- not present
   new virtio-pci-- present, bar 0, offset 0
   new virtio-config -- present, bar 0, offset 256
   new virtio-isr-- not present

 I don't think we gain a lot by moving the ISR into a separate BAR.
 Splitting up registers like that seems weird to me too.

 Main advantage of defining a register set with just isr is that it
 reduces pio address space consumtion for new virtio devices which don't
 have to worry about the legacy layout (8 bytes which is minimum size for
 io bars instead of 64 bytes).

Doing some rough math, we should have at least 16k of PIO space.  That
let's us have well over 500 virtio-pci devices with the current register
layout.

I don't think we're at risk of running out of space...

 If we added an additional constraints that BAR1 was mirrored except for

 Why add constraints?  We want something future-proof, don't we?

 The detection is simple: if BAR1 has non-zero length, it's new-style,
 otherwise legacy.

 Doesn't fly.  BAR1 is in use today for MSI-X support.

But the location is specified via capabilities so we can change the
location to be within BAR1 at a non-conflicting offset.

 I agree that this is the best way to extend, but I think we should still
 use a transport feature bit.  We want to be able to detect within QEMU
 whether a guest is using these new features because we need to adjust
 migration state accordingly.

 Why does migration need adjustments?

Because there is additional state in the new layout.  We need to
understand whether a guest is relying on that state or not.

For instance, extended virtio features.  If a guest is in the process
of reading extended virtio features, it may not have changed any state
but we must ensure that we don't migrate to an older verison of QEMU w/o
the extended virtio features.

This cannot be handled by subsections today because there is no guest
written state that's affected.

Regards,

Anthony Liguori


 [ Not that I want veto a feature bit, but I don't see the need yet ]

 cheers,
   Gerd
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 (Topic updated, cc's trimmed).

 Anthony Liguori aligu...@us.ibm.com writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 4) The only significant change to the spec is that we use PCI
capabilities, so we can have infinite feature bits.
(see 
 http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

 We discussed this on IRC last night.  I don't think PCI capabilites are
 a good mechanism to use...

 PCI capabilities are there to organize how the PCI config space is
 allocated to allow vendor extensions to co-exist with future PCI
 extensions.

 But we've never used the PCI config space within virtio-pci.  We do
 everything in BAR0.  I don't think there's any real advantage of using
 the config space vs. a BAR for virtio-pci.

 Note before anyone gets confused; we were talking about using the PCI
 config space to indicate what BAR(s) the virtio stuff is in.  An
 alternative would be to simply specify a new layout format in BAR1.

 The arguments for a more flexible format that I know of:

 1) virtio-pci has already extended the pci-specific part of the
configuration once (for MSI-X), so I don't want to assume it won't
happen again.

configuration is the wrong word here.

The virtio-pci BAR0 layout is:

   0..19   virtio-pci registers
   20+ virtio configuration space

MSI-X needed to add additional virtio-pci registers, so now we have:

   0..19   virtio-pci registers

if MSI-X:
   20..23  virtio-pci MSI-X registers
   24+ virtio configuration space
else:
   20+ virtio configuration space

I agree, this stinks.

But I think we could solve this in a different way.  I think we could
just move the virtio configuration space to BAR1 by using a transport
feature bit.

That then frees up the entire BAR0 for use as virtio-pci registers.  We
can then always include the virtio-pci MSI-X register space and
introduce all new virtio-pci registers as simply being appended.

This new feature bit then becomes essentially a virtio configuration
latch.  When unacked, virtio configuration hides new registers, when
acked, those new registers are exposed.

Another option is to simply put new registers after the virtio
configuration blob.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.

I think the rationale is that ISR really needs to be PIO but everything
else doesn't.  PIO is much faster on x86 because it doesn't require
walking page tables or instruction emulation to handle the exit.

The argument to move the remaining registers to MMIO is to allow 64-bit
accesses to registers which isn't possible with PIO.

 This maps really nicely to non-PCI transports too.

 This isn't right.  Noone else can use the PCI layout.  While parts are
 common, other parts are pci-specific (MSI-X and ISR for example), and
 yet other parts are specified by PCI elsewhere (eg interrupt numbers).

 But extending the
 PCI config space (especially dealing with capability allocation) is
 pretty gnarly and there isn't an obvious equivalent outside of PCI.

 That's OK, because general changes should be done with feature bits, and
 the others all have an infinite number.  Being the first, virtio-pci has
 some unique limitations we'd like to fix.

 There are very devices that we emulate today that make use of extended
 PCI device registers outside the platform devices (that have no BARs).

 This sentence confused me?

There is a missing few.  There are very few devices...

Extending the PCI configuration space is unusual for PCI devices.  That
was the point.

Regards,

Anthony Liguori


 Thanks,
 Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 But I think we could solve this in a different way.  I think we could
 just move the virtio configuration space to BAR1 by using a transport
 feature bit.

 Why hard-code stuff?

 I think it makes alot of sense to have a capability simliar to msi-x
 which simply specifies bar and offset of the register sets:

 [root@fedora ~]# lspci -vvs4
 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
 [ ... ]
   Region 0: I/O ports at c000 [size=64]
   Region 1: Memory at fc029000 (32-bit) [size=4K]
   Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
   Vector table: BAR=1 offset=
   PBA: BAR=1 offset=0800

MSI-X capability is a standard PCI capability which is why lspci can
parse it.


 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800

This would be a vendor specific PCI capability so lspci wouldn't
automatically know how to parse it.

You could just as well teach lspci to parse BAR0 to figure out what
features are supported.

 That then frees up the entire BAR0 for use as virtio-pci registers.  We
 can then always include the virtio-pci MSI-X register space and
 introduce all new virtio-pci registers as simply being appended.

 BAR0 needs to stay as-is for compatibility reasons.  New devices which
 don't have to care about old guests don't need to provide a 'legacy'
 register region.

A latch feature bit would allow the format to change without impacting
compatibility at all.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.
 
 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...

It wasn't that long ago that MSI-X wasn't supported..  I think we should
continue to keep ISR as PIO as it is a fast path.

Regards,

Anthony Liguori


 cheers,
   Gerd
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800
 
 This would be a vendor specific PCI capability so lspci wouldn't
 automatically know how to parse it.

 Sure, would need a patch to actually parse+print the cap,
 /me was just trying to make my point clear in a simple way.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.

 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...
 
 It wasn't that long ago that MSI-X wasn't supported..  I think we should
 continue to keep ISR as PIO as it is a fast path.

 No problem if we allow to have both legacy layout and new layout at the
 same time.  Guests can continue to use ISR @ BAR0 in PIO space for
 existing virtio devices, even in case they want use mmio for other
 registers - all fine.

 New virtio devices can support MSI-X from day one and decide to not
 expose a legacy layout PIO bar.

I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
virtio configuration space is probably not that bad of a solution.

Regards,

Anthony Liguori


 cheers,
   Gerd

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >