Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Avi Kivity

On 11/03/2009 09:50 AM, Alexander Graf wrote:


Ok, imagine this was not this unloved S390 odd architecture but X86. 
The only output choices you have are:


1) virtio-console
2) VNC / SSH over network
3) virtio-fb

Now you want to configure a server, probably using yast and all those 
nice graphical utilities, but still enable a firewall so people 
outside don't intrude your machine. Well, you managed to configure the 
firewall by luck to allow VNC, but now you reconfigured it and 
something broke - but VNC was your only chance to access the machine. 
Oops...


x86 has real framebuffers, so software and people expect it.  s390 
doesn't.  How do people manage now?



You also want to see boot messages, have a console login screen,


virtio-console does that, except for the penguins.  Better, since you 
can scroll back.


It doesn't do graphics. Ever used yast in text mode?


Once you're in, start ssh+X or vnc.  Again, what do people do now?



The hardware model isn't exactly new either. It's just the next 
logical step to a full PV machine using virtio. If the virtio-fb 
stuff turns out to be really fast and reliable, I could even imagine 
it being the default target for kvm on ppc as well, as we can't 
switch resolutions on the fly there atm.




We could with vmware-vga.


The vmware-port stuff is pretty much tied onto X86. I don't think 
modifying EAX is that easy on PPC ;-).


Yes, though we can probably make it work on ppc with minimal modifications.

Why?  the guest will typically have networking when it's set up, so 
it should have network access during install.  You can easily use 
slirp redirection and the built-in dhcp server to set this up with 
relatively few hassles.


That's how I use it right now. It's no fun.



The toolstack should hide the unfun parts.


You can't hide guest configuration. We as a distribution control the 
kernel. We don't control the user's configuration as that's by design 
the user's choice. The only thing we can do is give users meaningful 
choices to choose from - and having graphics available is definitely 
one of them.


Well, if the user chooses not to have networking then vnc or ssh+x 
definitely fail.  That would be a strange choice for a server machine.


Seriously, try to ask someone internally to get access to an S390. I 
think you'll understand my motivations a lot better after having used 
it for a bit.


I actually have a s390 vm (RHEL 4 IIRC).  It acts just like any other 
remote machine over ssh except that it's especially slow (probably the 
host is overloaded).  Of course I wouldn't dream of trying to install 
something like that though.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Alexander Graf


On 03.11.2009, at 09:20, Avi Kivity wrote:


On 11/03/2009 09:50 AM, Alexander Graf wrote:


Ok, imagine this was not this unloved S390 odd architecture but  
X86. The only output choices you have are:


1) virtio-console
2) VNC / SSH over network
3) virtio-fb

Now you want to configure a server, probably using yast and all  
those nice graphical utilities, but still enable a firewall so  
people outside don't intrude your machine. Well, you managed to  
configure the firewall by luck to allow VNC, but now you  
reconfigured it and something broke - but VNC was your only chance  
to access the machine. Oops...


x86 has real framebuffers, so software and people expect it.  s390  
doesn't.  How do people manage now?


They cope with what's there. Fortunately we're in a position to change  
things, so we don't have to stick with the worse.



You also want to see boot messages, have a console login screen,


virtio-console does that, except for the penguins.  Better, since  
you can scroll back.


It doesn't do graphics. Ever used yast in text mode?


Once you're in, start ssh+X or vnc.  Again, what do people do now?


Exactly that. Again, it works but is not ideal. If we can improve user  
experience why work against it?


The hardware model isn't exactly new either. It's just the next  
logical step to a full PV machine using virtio. If the virtio-fb  
stuff turns out to be really fast and reliable, I could even  
imagine it being the default target for kvm on ppc as well, as we  
can't switch resolutions on the fly there atm.




We could with vmware-vga.


The vmware-port stuff is pretty much tied onto X86. I don't think  
modifying EAX is that easy on PPC ;-).


Yes, though we can probably make it work on ppc with minimal  
modifications.


Is it worth it? We can also just implement a virtio mouse event dev  
plus fb and be good. That way we control the whole stack without  
risking to break vmware.


Why?  the guest will typically have networking when it's set up,  
so it should have network access during install.  You can easily  
use slirp redirection and the built-in dhcp server to set this  
up with relatively few hassles.


That's how I use it right now. It's no fun.



The toolstack should hide the unfun parts.


You can't hide guest configuration. We as a distribution control  
the kernel. We don't control the user's configuration as that's by  
design the user's choice. The only thing we can do is give users  
meaningful choices to choose from - and having graphics available  
is definitely one of them.


Well, if the user chooses not to have networking then vnc or ssh+x  
definitely fail.  That would be a strange choice for a server machine.


It's actually rather common on S390, though admittedly not that much  
on Linux+S390. There are more ways for inter node communication than  
networking. You can talk to another VM on the same machine without any  
network whatsoever. That way you can set up an isolated job (your bank  
transfer database for example) that is always protected by a proxy to  
the outside world.


Seriously, try to ask someone internally to get access to an S390.  
I think you'll understand my motivations a lot better after having  
used it for a bit.


I actually have a s390 vm (RHEL 4 IIRC).  It acts just like any  
other remote machine over ssh except that it's especially slow  
(probably the host is overloaded).  Of course I wouldn't dream of  
trying to install something like that though.


Exactly. In fact, I'm even scared to reboot mine because I might end  
up in a 3270 terminal. The whole text only crap keeps people from  
using this platform! And that's what I want to change here.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Segher Boessenkool

Nice patchset.  Some comments on the emulation part:


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu-arch.pc = vcpu-arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu-arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu-arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb)  ~31ULL;
+   if (!(vcpu-arch.msr  MSR_SF))
+   addr = 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some none,  
btw.



+   case SPRN_HID0:
+   to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu-arch.mmu.is_dcbz32(vcpu) 
+   (mfmsr()  MSR_HV))
+   vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


Segher

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Avi Kivity

On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it if 
there's no other way to get performance.  In this case it buys us basic 
functionality which is surprisingly missing on native, that's arguably 
even more important.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] S390x KVM support

2009-11-03 Thread Avi Kivity

On 11/02/2009 10:23 PM, Alexander Graf wrote:

Any progress on the patch? This is really important to make KVM work
properly on S390. I'd even go as far as suggesting it for linux-stable.

   


I forgot all about it, sorry.  Marcelo, can you commit it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Alexander Graf


On 03.11.2009, at 09:47, Segher Boessenkool wrote:


Nice patchset.  Some comments on the emulation part:


Cool, thanks for looking though them!


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


Probably, yeah.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu-arch.pc = vcpu-arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


True.




+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.





+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu-arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu-arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb)  ~31ULL;
+   if (!(vcpu-arch.msr  MSR_SF))
+   addr = 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target == 32.

On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some  
none, btw.


For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.





+   case SPRN_HID0:
+   to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


Sigh :-(


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu-arch.mmu.is_dcbz32(vcpu) 
+   (mfmsr()  MSR_HV))
+   vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.


If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.


So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Avi Kivity

On 11/03/2009 01:25 PM, Vincent Hanquez wrote:

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video  more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)
   


cirrus has pretty good 2d acceleration.  3D is a mega-project though.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Vincent Hanquez
On Tue, Nov 03, 2009 at 07:39:34AM +0100, Alexander Graf wrote:

 On 03.11.2009, at 07:34, Avi Kivity wrote:

 On 11/03/2009 08:27 AM, Alexander Graf wrote:

 How does it work today?

 You boot into a TERM=dumb line based emulation on 3270 (worst thing  
 haunting people's nightmares ever), trying to get out of that mode  
 as quickly as possible and off into SSH / VNC.

 Despite the coolness factor, IMO a few minutes during install time do 
 not justify a new hardware model and a new driver.

 It's more than just coolness factor. There are use cases out there 
 (www.susestudio.com) that don't want to rely on the guest exporting a VNC 
 server to the outside just to access graphics. You also want to see boot 
 messages, have a console login screen, be able to debug things without 
 switching between virtio-console and vnc, etc. etc.

 The hardware model isn't exactly new either. It's just the next logical 
 step to a full PV machine using virtio. If the virtio-fb stuff turns out 
 to be really fast and reliable, I could even imagine it being the default 
 target for kvm on ppc as well, as we can't switch resolutions on the fly 
 there atm.

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video  more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)

-- 
Vincent
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 1/6] qemu/virtio: move features to an inline function

2009-11-03 Thread Michael S. Tsirkin
On Mon, Nov 02, 2009 at 04:33:53PM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin wrote:
 devices should have the final say over which virtio features they
 support. E.g. indirect entries may or may not make sense in the context
 of virtio-console. In particular, for vhost, we do not want to report to
 guest bits not supported by kernel backend.  Move the common bits from
 virtio-pci to an inline function and let each device call it.

 No functional changes.
   

 This is a layering violation.  There are transport specific features and  
 device specific features.  The virtio-net device should have no  
 knowledge or nack'ing ability for transport features.

We could pass vhost flag to virtio, and have virtio query the device
for features. Would that be better?

 If you need to change transport features, it suggests you're modeling  
 things incorrectly and should be supplying an alternative transport  
 implementation.
 Regards,

 Anthony Liguori

Yes, you can make vhost an alternative transport in qemu.  This might be
one way to handle this. However, this seems to go contrary to your
previous proposal to make vhost a networking back end. Which will it be?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: error while loading state for instance 0x0 of device 'kvmclock'

2009-11-03 Thread Glauber Costa
On Mon, Nov 02, 2009 at 04:37:15PM +0100, Jan Kiszka wrote:
 Avi Kivity wrote:
  On 11/02/2009 12:28 PM, Jan Kiszka wrote:
  Jan Kiszka wrote:
 
  Hi,
 
  current qemu-kvm.git gives me the message qemu: warning: error while
  loading state for instance 0x0 of device 'kvmclock' when I run a simple
  savevm followed by a loadvm 1. What's broken here?
   
  OK, this is due to KVM: add flags to kvm_clock_data (958b0c5497): the
  flags field is not cleared on KVM_SET_CLOCK. Will post a fix.
 
  But the above kernel commit is also broken: KVM_GET_CLOCK checks
  uninitialized user_ns.flags (probably instead of the user's value). This
  raises the question if the caller of KVM_GET_CLOCK is also supposed to
  pass kvm_clock_data with flags cleared down to the kernel. Could someone
  clarify this so I could fix it accordingly?
 
  
  I'd make KVM_GET_CLOCK set the flags, not get them.  So if we add new 
  fields, we just set a new bit and userspace can read it.
 
 This makes sense and actually fixes the issue completely. Patch on its
 way...
agreed. This is the best behaviour I can devise, indeed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Clear flags field on return from KVM_GET_CLOCK

2009-11-03 Thread Glauber Costa
On Mon, Nov 02, 2009 at 04:41:54PM +0100, Jan Kiszka wrote:
 This field is supposed to indicate the availability of additional fields
 one day. There are none yet, so clear it - and drop the bogus check,
 too.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Makes sense.

Acked-by: Glauber Costa glom...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 0/6] qemu-kvm: vhost net support

2009-11-03 Thread Michael S. Tsirkin
On Mon, Nov 02, 2009 at 04:58:39PM -0600, Anthony Liguori wrote:
 Hi Michael,

 I'll reserve individual patch review until they're in a mergable state,  
 but I do have some comments about the overall integration architecture.

 Generally speaking, I think the integration unnecessarily invasive.  It  
 adds things to the virtio infrastructure that shouldn't be there like  
 the irqfd/queuefd bindings.  It also sneaks in things like raw backend  
 support which really isn't needed.

 I think we can do better.  Here's what I suggest:

 The long term goal should be to have a NetDevice interface that looks  
 very much like virtio-net but as an API, not an ABI.  Roughly, it would  
 look something like:

 struct NetDevice {
   int add_xmit(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);
   int add recv(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);

   void *get_xmit(NetDevice *dev);
   void *get_recv(NetDevice *dev);

   void kick(NetDevice *dev);

   ...
 };

 That gives us a better API for use with virtio-net, e1000, etc.

This is not much different from what we have now with VLANClientState,
is it?

 Assuming we had this interface, I think a natural extension would be:

 int add_ring(NetDevice *dev, void *address);
 int add_kickfd(NetDevice *dev, int fd);

 For slot management, it really should happen outside of the NetDevice  
 structure.  We'll need a slot notifier mechanism such that we can keep  
 this up to date as things change.

Yes.

 vhost-net because a NetDevice.  It can support things like the e1000 by  
 doing ring translation behind the scenes.

And the point would be?

 virtio-net can be fast pathed  
 in the case that we're using KVM but otherwise, it would also rely on  
 the ring translation.

Won't it be easier to just keep using existing code?

  N.B. in the case vhost-net is fast pathed, it  requires a different
  device in QEMU that uses a separate virtio  transport.  We should
  reuse as much code as possible obviously.  It  doesn't make sense to
  have all of the virtio-pci code and virtio-net  code in place when we
  aren't using it.

Note that all of virtio-pci and setup parts of virtio-net are reused.
The only things we are *not* re-using are send/receive and callbacks in
virtio-net.

 All this said, I'm *not* suggesting you have to implement all of this to  
 get vhost-net merged.  Rather, I'm suggesting that we should try to  
 structure the current vhost-net implementation to complement this  
 architecture assuming we all agree this is the sane thing to do.  That  
 means I would make the following changes to your series:

 - move vhost-net support to a VLANClientState backend.
 - do not introduce a raw socket backend
 - if for some reason you want to back to tap and raw, those should be  
 options to the vhost-net backend.
 - when fast pathing with vhost-net, we should introduce interfaces to  
 VLANClientState similar to add_ring and add_kickfd.  They'll be very  
 specific to vhost-net for now, but that's okay.
 - sort out the layering of vhost-net within the virtio infrastructure.   
 vhost-net should really be it's own qdev device.
  I don't see very much  
 code reuse happening right now so I don't understand why it's not that  
 way currently.
 Regards,

 Anthony Liguori

What you propose short-term is workable.  So basically, vhost would be
an option supported by backends.  virtio net would go ahead and activate
it if available and other frontends will ignore it and just keep
injecting packets through regular interfaces.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Fix KVM_GET_CLOCK

2009-11-03 Thread Jan Kiszka
The flags field of kvm_clock_data is supposed to indicate the
availability of additional fields one day. There are none yet, so clear
it. Moreover, drop the bogus check of this field and return 0 on
success.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Note: This replaces Clear flags field on return from KVM_GET_CLOCK.

 arch/x86/kvm/x86.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f68798..7344405 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2577,14 +2577,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
ktime_get_ts(now);
now_ns = timespec_to_ns(now);
user_ns.clock = kvm-arch.kvmclock_offset + now_ns;
+   user_ns.flags = 0;
 
+   r = -EFAULT;
if (copy_to_user(argp, user_ns, sizeof(user_ns)))
-   r =  -EFAULT;
-
-   r = -EINVAL;
-   if (user_ns.flags)
goto out;
-
+   r = 0;
break;
}
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Mon, Nov 02, 2009 at 04:05:58PM -0800, Daniel Walker wrote:
 
 Random style issues below .. Part of this is just stuff checkpatch
 found.

Thanks very much, I'll fix these.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Arnd Bergmann
On Monday 02 November 2009, Michael S. Tsirkin wrote:
 Tun device looks similar to a packet socket
 in that both pass complete frames from/to userspace.
 
 This patch fills in enough fields in the socket underlying tun driver
 to support sendmsg/recvmsg operations, and message flags
 MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
 to modules.  Regular read/write behaviour is unchanged.
 
 This way, code using raw sockets to inject packets
 into a physical device, can support injecting
 packets into host network stack almost without modification.
 
 First user of this interface will be vhost virtualization
 accelerator.

You mentioned before that you wanted to export the socket
using some ioctl function returning an open file descriptor,
which seemed to be a cleaner approach than this one.

What was your reason for changing?

 index 3f5fd52..404abe0 100644
 --- a/include/linux/if_tun.h
 +++ b/include/linux/if_tun.h
 @@ -86,4 +86,18 @@ struct tun_filter {
 __u8   addr[0][ETH_ALEN];
  };
  
 +#ifdef __KERNEL__
 +#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 +struct socket *tun_get_socket(struct file *);
 +#else
 +#include linux/err.h
 +#include linux/errno.h
 +struct file;
 +struct socket;
 +static inline struct socket *tun_get_socket(struct file *f)
 +{
 +   return ERR_PTR(-EINVAL);
 +}
 +#endif /* CONFIG_TUN */
 +#endif /* __KERNEL__ */
  #endif /* __IF_TUN_H */

Is this a leftover from testing? Exporting the function for !__KERNEL__
seems pointless.

Arnd 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST][PATCH] Fix kvm_config.py -f filename mode

2009-11-03 Thread Lucas Meneghel Rodrigues
Ooops, fixed. Thanks Ryan!

On Mon, Nov 2, 2009 at 9:28 PM, Ryan Harper ry...@us.ibm.com wrote:
 kvm_config.py supports specifying a different filename for
 test config.  This patch fixes the option parsing parameters.
 Currently it uses 'store_true' which stores the value True into
 the filename variable; we want the 'store' mode which will store
 the value of the option (aka, the filename) in the variable.

 Signed-off-by: Ryan Harper ry...@us.ibm.com

 diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
 index 3114c07..52de4c7 100755
 --- a/client/tests/kvm/kvm_config.py
 +++ b/client/tests/kvm/kvm_config.py
 @@ -501,7 +501,7 @@ class config:

  if __name__ == __main__:
     parser = optparse.OptionParser()
 -    parser.add_option('-f', '--file', dest=filename, action='store_true',
 +    parser.add_option('-f', '--file', dest=filename, action='store',
                       help='path to a config file that will be parsed. '
                            'If not specified, will parse kvm_tests.cfg '
                            'located inside the kvm test dir.')



 --
 Ryan Harper
 Software Engineer; Linux Technology Center
 IBM Corp., Austin, Tx
 ry...@us.ibm.com
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 01:12:33PM +0100, Arnd Bergmann wrote:
 On Monday 02 November 2009, Michael S. Tsirkin wrote:
  Tun device looks similar to a packet socket
  in that both pass complete frames from/to userspace.
  
  This patch fills in enough fields in the socket underlying tun driver
  to support sendmsg/recvmsg operations, and message flags
  MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
  to modules.  Regular read/write behaviour is unchanged.
  
  This way, code using raw sockets to inject packets
  into a physical device, can support injecting
  packets into host network stack almost without modification.
  
  First user of this interface will be vhost virtualization
  accelerator.
 
 You mentioned before that you wanted to export the socket
 using some ioctl function returning an open file descriptor,
 which seemed to be a cleaner approach than this one.

Note that a similar feature can be implemented on top of tun_get_socket,
as seen from patch below.

 What was your reason for changing?

It turns out socket structure is really bound to specific a file, so we
can not have 2 files referencing the same socket.  Instead, as I say
above, it's possible to make sendmsg/recvmsg work on tap file directly.

For vhost, the advantage of such a feature over using tun_get_socket
directly would be that vhost module won't depend on tun module then.  I
have implemented this (patch below), but decided to go with the simple
thing first.  Since no userspace-visible changes are involved, let's do
this by small steps: it will be easier to figure out when vhost
is upstream.


---

Note: patch below aplies on top of patch tun: export underlying socket.
It is not intended for merge yet.

net: convert tun device to socket

Add callback to file_ops to retrieve socket from
file structure. Use this to make tun character device
accept sendmsg/recvmsg calls.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b58095a..53e1806 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1405,7 +1405,8 @@ static const struct file_operations tun_fops = {
.unlocked_ioctl = tun_chr_ioctl,
.open   = tun_chr_open,
.release = tun_chr_close,
-   .fasync = tun_chr_fasync
+   .fasync = tun_chr_fasync,
+   .get_socket = tun_get_socket,
 };
 
 static struct miscdevice tun_miscdev = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..f2b381f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1506,6 +1506,9 @@ struct file_operations {
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t 
*, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info 
*, size_t, unsigned int);
int (*setlease)(struct file *, long, struct file_lock **);
+#ifdef CONFIG_NET
+   struct socket *(*get_socket)(struct file *file);
+#endif
 };
 
 struct inode_operations {
diff --git a/net/socket.c b/net/socket.c
index 9dff31c..700efcb 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -119,6 +119,11 @@ static ssize_t sock_splice_read(struct file *file, loff_t 
*ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
 
+static struct socket *sock_get_socket(struct file *file)
+{
+   return file-private_data;  /* set in sock_map_fd */
+}
+
 /*
  * Socket files have a set of 'special' operations as well as the generic 
file ones. These don't appear
  * in the operation structures but are done directly via the socketcall() 
multiplexor.
@@ -141,6 +146,7 @@ static const struct file_operations socket_file_ops = {
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
.splice_read =  sock_splice_read,
+   .get_socket =   sock_get_socket,
 };
 
 /*
@@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags)
 
 static struct socket *sock_from_file(struct file *file, int *err)
 {
-   if (file-f_op == socket_file_ops)
-   return file-private_data;  /* set in sock_map_fd */
+   if (file-f_op-get_socket)
+   return file-f_op-get_socket(file);
 
*err = -ENOTSOCK;
return NULL;


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Live Migration

2009-11-03 Thread Gilberto Nunes Ferreira
hi all

Is this my first post...

I sucess migrate a VM with this command:
(on kvm-0 )

r...@kvm-0:~# virsh list --all
Connecting to uri: qemu:///system
Id Name State
--
  1 win2003  running

r...@kvm-0:~:# visrh migrate --live win2003 qemu+ssh://kvm-1/system

But, when the VM start on kvm-1, it's started on paused state!!!

I don't have any idea why this happen!!!

So, to unpause the vm, I log in on kvm-1, and run virt-manager to
unpause the VM...

I use DRBD as a share storage to vm's...

Any help will be welcome..

Thanks





Gilberto Nunes Ferreira
TI
Selbetti Gestão de Documentos
Telefone: +55 (47) 3441-6004
Celular: +55 (47) 8861-6672 















--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Fix KVM_GET_CLOCK

2009-11-03 Thread Marcelo Tosatti
On Tue, Nov 03, 2009 at 12:49:05PM +0100, Jan Kiszka wrote:
 The flags field of kvm_clock_data is supposed to indicate the
 availability of additional fields one day. There are none yet, so clear
 it. Moreover, drop the bogus check of this field and return 0 on
 success.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 Note: This replaces Clear flags field on return from KVM_GET_CLOCK.
 
  arch/x86/kvm/x86.c |8 +++-
  1 files changed, 3 insertions(+), 5 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tests: The order of the fields are in reverse one isr stack.

2009-11-03 Thread Marcelo Tosatti
Applied both, thanks.

On Thu, Oct 29, 2009 at 11:12:57AM +0200, Gleb Natapov wrote:
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/kvm/user/test/x86/apic.c b/kvm/user/test/x86/apic.c
 index 4e89c77..b6718ec 100644
 --- a/kvm/user/test/x86/apic.c
 +++ b/kvm/user/test/x86/apic.c
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG

2009-11-03 Thread Marcelo Tosatti
On Fri, Oct 30, 2009 at 12:46:59PM +0100, Jan Kiszka wrote:
 Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
 KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
 avoid triggering the WARN_ON in kvm_queue_exception if there is already
 an exception pending and reject such invalid requests.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

2009-11-03 Thread Gleb Natapov
On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote:
 On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote:
  Asynchronous page fault notifies vcpu that page it is trying to access
  is swapped out by a host. In response guest puts a task that caused the
  fault to sleep until page is swapped in again. When missing page is
  brought back into the memory guest is notified and task resumes execution.
 
 Can't you apply this to non-paravirt guests, and continue to deliver
 interrupts while waiting for the swapin? 
 
 It should allow the guest to schedule a different task.
But how can I make the guest to not run the task that caused the fault?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

2009-11-03 Thread Marcelo Tosatti
On Tue, Nov 03, 2009 at 04:25:33PM +0200, Gleb Natapov wrote:
 On Tue, Nov 03, 2009 at 12:14:23PM -0200, Marcelo Tosatti wrote:
  On Sun, Nov 01, 2009 at 01:56:22PM +0200, Gleb Natapov wrote:
   Asynchronous page fault notifies vcpu that page it is trying to access
   is swapped out by a host. In response guest puts a task that caused the
   fault to sleep until page is swapped in again. When missing page is
   brought back into the memory guest is notified and task resumes execution.
  
  Can't you apply this to non-paravirt guests, and continue to deliver
  interrupts while waiting for the swapin? 
  
  It should allow the guest to schedule a different task.
 But how can I make the guest to not run the task that caused the fault?

Any attempt to access the swapped out data will cause a #PF vmexit,
since the translation is marked as not present. If there's swapin in
progress, you wait for that swapin, otherwise start swapin and wait.

Its not as efficient as paravirt because you have to wait for a timer
interrupt and the guest scheduler to decide to taskswitch, but OTOH its
transparent.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

2009-11-03 Thread Avi Kivity

On 11/03/2009 04:32 PM, Marcelo Tosatti wrote:

Any attempt to access the swapped out data will cause a #PF vmexit,
since the translation is marked as not present. If there's swapin in
progress, you wait for that swapin, otherwise start swapin and wait.

Its not as efficient as paravirt because you have to wait for a timer
interrupt and the guest scheduler to decide to taskswitch, but OTOH its
transparent.
   


With a dyntick guest the timer interrupt will come at the end of the 
time slice, likely after the page has been swapped in.  That leaves smp 
reschedule interrupts and non-dyntick guests.


An advantage is that there is one code path for apf and non-apf.  
Another is that interrupts are processed, improving timekeeping and 
maybe responsiveness.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912

2009-11-03 Thread Marcelo Tosatti
On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote:
 Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that.
 also CCing kvm.
 n.
 
 On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote:
  Hi,
  some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just 
  died horribly:

Nikola,

Upgraded from what? Did you see experience the crash again?

  Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: 
  put_cred_rcu() sees 880204e58c00 with usage 82150912
  Oct 30 10:45:17 vbox [706369.133519]
  Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not tainted 
  2.6.31lb.02 #1
  Oct 30 10:45:17 vbox [706369.151554] Call Trace:
  Oct 30 10:45:17 vbox [706369.154332]  IRQ
  Oct 30 10:45:17 vbox [8104c1fa] panic+0xaa/0x180
  Oct 30 10:45:17 vbox [706369.160280]  [81322b90] ? 
  _spin_unlock+0x30/0x60
  Oct 30 10:45:17 vbox [706369.166256]  [810f5671] ? 
  add_partial+0x21/0x90
  Oct 30 10:45:17 vbox [706369.172155]  [810f6a92] ? 
  __slab_free+0x92/0x3c0
  Oct 30 10:45:17 vbox [706369.178127]  [81102317] ? 
  file_free_rcu+0x37/0x50
  Oct 30 10:45:17 vbox [706369.184198]  [8106c655] 
  put_cred_rcu+0x75/0x80
  Oct 30 10:45:17 vbox [706369.190008]  [810a2525] 
  __rcu_process_callbacks+0x125/0x250
  Oct 30 10:45:17 vbox [706369.197020]  [810a2689] 
  rcu_process_callbacks+0x39/0x60
  Oct 30 10:45:17 vbox [706369.203624]  [81052a61] 
  __do_softirq+0xc1/0x250
  Oct 30 10:45:17 vbox [706369.209506]  [81053860] ? 
  ksoftirqd+0x0/0x1a0
  Oct 30 10:45:17 vbox [706369.215182]  [8100c4dc] 
  call_softirq+0x1c/0x30
  Oct 30 10:45:17 vbox [706369.220986]  [8100e46d] 
  do_softirq+0x3d/0x80
  Oct 30 10:45:17 vbox [706369.227317]  [81053860] ? 
  ksoftirqd+0x0/0x1a0
  Oct 30 10:45:17 vbox [706369.233020]  [810538e4] 
  ksoftirqd+0x84/0x1a0
  Oct 30 10:45:17 vbox [706369.238622]  [81066686] kthread+0xa6/0xb0
  Oct 30 10:45:17 vbox [706369.243956]  [8100c3da] 
  child_rip+0xa/0x20
  Oct 30 10:45:17 vbox [706369.249390]  [810665e0] ? 
  kthread+0x0/0xb0
  Oct 30 10:45:17 vbox [706369.254806]  [8100c3d0] ? 
  child_rip+0x0/0x20
  Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds..
  (trace is obtained from netconsole, so hopefully it's not mangled).
  The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it 
  crashed, it was only
  moderately loaded. Never had this (or any other) kind of crash on it before.
  I know there's 2.6.31.5 already out, but I'm not sure if some related 
  problem has been
  reported/fixed and I'm obviously not able to quicky test/reproduce it with 
  latest kernel,
  so I'm rather reporting.
  Should more information/testing/etc be required, I'll be glad to help
  regards
  nik

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Erik Rull

Avi Kivity wrote:

On 11/02/2009 01:45 AM, Erik Rull wrote:

Hi Avi,



Please don't top-post.


the Host CPU is a Intel Core2Duo - VT capable and enabled!


The problem is that one of the flags that -cpu core2duo enables is 
implemented incorrectly, so it leads to .net breakage.


These flags are pni, lm, nx, ssse3, syscall.

Please try -cpu core2duo,-pni,-lm,-nx,-ssse3,-syscall.  If it works, 
remove features one by one until it doesn't and let us know the results.


I took all flags, same effect as without all of these flags.  :(

Any other idea?

- Erik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Avi Kivity

On 11/03/2009 04:56 PM, Erik Rull wrote:


I took all flags, same effect as without all of these flags.  :(

Any other idea?


It's probably the cache size query.

Does -cpu host work?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Timur Safin
2009/11/3 Avi Kivity a...@redhat.com:
 On 11/03/2009 04:56 PM, Erik Rull wrote:

 I took all flags, same effect as without all of these flags.  :(

 Any other idea?

 It's probably the cache size query.

 Does -cpu host work?


My totally noob in QEMU guess -
my bet it's CR4.OSFXSR which is controlled by presence of
cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions.

I'm curious - is there any way in QEMU to redefine returned cpuid leaf
values? It will be interesting to see results if that given bit will
be disabled in configuration.

Best Regards,
Timur
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Anthony Liguori

Avi Kivity wrote:

On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it 
if there's no other way to get performance.  In this case it buys us 
basic functionality which is surprisingly missing on native, that's 
arguably even more important.


There is no native on s390.  Everything is paravirtual.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Ondrej Zajicek
On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:
 On 11/03/2009 01:25 PM, Vincent Hanquez wrote:
  not sure if i'm missing the point here, but couldn't it be hypothetically
  extended to stuff 3d (or video  more 2d accel ?) commands too ? I can't
  imagine the cirrus or stdvga driver be able to do that ever ;)
 
 
 cirrus has pretty good 2d acceleration.  3D is a mega-project though.

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
To err is human -- to blame it on a computer is even more so.


signature.asc
Description: Digital signature


Re: [RFC] allow userspace to set MSR no-ops

2009-11-03 Thread Marcelo Tosatti
On Wed, Oct 28, 2009 at 01:23:07PM -0400, David Windsor wrote:
 Hi,
 
 I've encountered a situation in which I would like to allow userspace
 to set the MSRs which KVM should not emulate and instead implement
 these as no-ops.
 
 I have not seen any work in this space, furthermore there is an item
 on the KVM TODO that is very similar to what I'm trying to do.
 
 The userspace interface is an extension of kvm_vcpu_ioctl, adding the
 KVM_SET_MSRS_NOOP flag.  It takes a struct kvm_msrs as a list of which
 MSRs should be no-ops and adds the field noop to struct kvm_msr_entry.
  This patch only affects vmx, but if the approach is sane, I can
 extend it to support svm as well.

Does the ignore_msrs kvm.ko parameter achieve what you want?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: vhost-net patches

2009-11-03 Thread Shirley Ma
Hello Xiaohui,

On Tue, 2009-11-03 at 09:06 +0800, Xin, Xiaohui wrote:
 Hi, Michael,
 What's your deferring skb allocation patch mentioned here, may you
 elaborate it a little more detailed?

That's my patch. It was submitted a few month ago. Here is the link to
this RFC patch:
http://www.mail-archive.com/kvm@vger.kernel.org/msg20777.html

It is a patch for guest receiving. Right now, kvm guest did pre-skb
allocation, the worse case when receiving large packet for mergable
buffers, 15/16 skbs need to be freed. Avi and Michale gave me some
comments. I will post the updated patch after running a few more test in
a few days.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CPU change causes hanging of .NET apps

2009-11-03 Thread Avi Kivity

On 11/03/2009 05:11 PM, Timur Safin wrote:


My totally noob in QEMU guess -
my bet it's CR4.OSFXSR which is controlled by presence of
cpuid.1.edx[24] - FXSR bit (FXSAVE and FXRSTOR) instructions.
   


That would affect floating point as well.


I'm curious - is there any way in QEMU to redefine returned cpuid leaf
values? It will be interesting to see results if that given bit will
be disabled in configuration.
   


qemu -cpu host,-flag

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Avi Kivity

On 11/03/2009 05:14 PM, Anthony Liguori wrote:

Avi Kivity wrote:

On 11/03/2009 10:26 AM, Alexander Graf wrote:
Exactly. In fact, I'm even scared to reboot mine because I might end 
up in a 3270 terminal. The whole text only crap keeps people from 
using this platform! And that's what I want to change here.


Ok.  I oppose paravirtualization for its own sake and only support it 
if there's no other way to get performance.  In this case it buys us 
basic functionality which is surprisingly missing on native, that's 
arguably even more important.


There is no native on s390.  Everything is paravirtual.


I meant native as in what they usually do without our stuff.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Avi Kivity

On 11/03/2009 05:29 PM, Ondrej Zajicek wrote:

On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:
   

On 11/03/2009 01:25 PM, Vincent Hanquez wrote:
 

not sure if i'm missing the point here, but couldn't it be hypothetically
extended to stuff 3d (or video   more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)

   

cirrus has pretty good 2d acceleration.  3D is a mega-project though.
 

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.
   


What do that entail? 3/4 operand raster ops?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm problem: bonding network interface breaks dhcp

2009-11-03 Thread Harald Dunkel
Hi folks,

I am trying to use a bonding network interface as a bridge
for a virtual machine (kvm). Host and guest are both running
2.6.31.5. Problem: The guest does not receive the DHCPOFFER
reply sent by my dhcp server. There is no such problem if
the host uses just a single network interface instead of
bond0.

Looking at tcpdump on the Linux guest there are several dhcp
discover packages like

15:17:44.005306 00:16:36:2f:f1:d2  ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 328) 0.0.0.0.68  255.255.255.255.67: BOOTP/DHCP, Request from 
00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none]
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]

The dhcp server receives these packages, and sends out
a reply

15:17:45.927589 00:16:36:2f:f1:d2  ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 342: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 328) 0.0.0.0.68  255.255.255.255.67: BOOTP/DHCP, Request from 
00:16:36:2f:f1:d2, length 300, xid 0x4c31213d, secs 10, Flags [none]
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]
15:17:45.927658 00:15:17:94:16:65  00:16:36:2f:f1:d2, ethertype IPv4 (0x0800), 
length 364: (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), 
length 350) 172.19.96.123.67  172.19.97.243.68: BOOTP/DHCP, Reply, length 322, 
xid 0x4c31213d, secs 10, Flags [none]
  Your-IP 172.19.97.243
  Client-Ethernet-Address 00:16:36:2f:f1:d2 [|bootp]

This reply never shows up on the guest.


iptable is not set, of course. sysctl.conf says

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0


Any helpful comment would be highly appreciated.


Many thanx

Harri
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] make cpu creation happen inside the right thread.

2009-11-03 Thread Glauber Costa
Right now, we issue cpu creation from the i/o thread, and then shoot a thread
from inside that code. Over the last months, a lot of subtle bugs were reported,
usually arising from the very fragile order of that initialization.

I propose we rethink that a little. This is a patch that received basic testing
only, and I'd  like to hear on the overall direction. The idea is to issue the 
new
thread as early as possible. The first direct benefits I can identify are that
we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, since
we are already on the right thread. Apic creation has far less spots for race
conditions as well.

I am implementing this on qemu-kvm first, since we can show the benefits of it
a bit better in there (since we already support smp)

Let me know what you guys think

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Marcelo Tosatti mtosa...@redhat.com
CC: Avi Kivity a...@redhat.com
CC: Jan Kiszka jan.kis...@siemens.com
CC: Anthony Liguori aligu...@us.ibm.com
---
 cpu-defs.h |2 +-
 hw/acpi.c  |2 +-
 hw/pc.c|   26 --
 hw/pc.h|2 +-
 qemu-kvm-x86.c |5 +
 qemu-kvm.c |   44 +++-
 qemu-kvm.h |2 ++
 vl.c   |2 --
 8 files changed, 53 insertions(+), 32 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index cf502e9..6d026e0 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -139,7 +139,7 @@ typedef struct CPUWatchpoint {
 struct qemu_work_item;
 
 struct KVMCPUState {
-pthread_t thread;
+pthread_t *thread;
 int signalled;
 struct qemu_work_item *queued_work_first, *queued_work_last;
 int regs_modified;
diff --git a/hw/acpi.c b/hw/acpi.c
index 7564abf..cc68188 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -781,7 +781,7 @@ void qemu_system_cpu_hot_add(int cpu, int state)
 CPUState *env;
 
 if (state  !qemu_get_cpu(cpu)) {
-env = pc_new_cpu(model);
+pc_new_cpu(model, env);
 if (!env) {
 fprintf(stderr, cpu %d creation failed\n, cpu);
 return;
diff --git a/hw/pc.c b/hw/pc.c
index 83012a9..53e7273 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1013,29 +1013,26 @@ int cpu_is_bsp(CPUState *env)
 return env-cpuid_apic_id == 0;
 }
 
-CPUState *pc_new_cpu(const char *cpu_model)
+void pc_new_cpu(const char *cpu_model, CPUState **env)
 {
-CPUState *env;
-
-env = cpu_init(cpu_model);
-if (!env) {
+*env = cpu_init(cpu_model);
+if (!*env) {
 fprintf(stderr, Unable to find x86 CPU definition\n);
 exit(1);
 }
-env-kvm_cpu_state.regs_modified = 1;
-if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
-env-cpuid_apic_id = env-cpu_index;
+(*env)-kvm_cpu_state.regs_modified = 1;
+if (((*env)-cpuid_features  CPUID_APIC) || smp_cpus  1) {
+(*env)-cpuid_apic_id = (*env)-cpu_index;
 /* APIC reset callback resets cpu */
-apic_init(env);
+apic_init(*env);
 } else {
-qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+qemu_register_reset((QEMUResetHandler*)cpu_reset, *env);
 }
 
 /* kvm needs this to run after the apic is initialized. Otherwise,
  * it can access invalid state and crash.
  */
-qemu_init_vcpu(env);
-return env;
+qemu_init_vcpu(*env);
 }
 
 /* PC hardware initialisation */
@@ -1055,7 +1052,6 @@ static void pc_init1(ram_addr_t ram_size,
 PCIBus *pci_bus;
 ISADevice *isa_dev;
 int piix3_devfn = -1;
-CPUState *env;
 qemu_irq *cpu_irq;
 qemu_irq *isa_irq;
 qemu_irq *i8259;
@@ -1086,8 +1082,10 @@ static void pc_init1(ram_addr_t ram_size,
 if (kvm_enabled()) {
 kvm_set_boot_cpu_id(0);
 }
+
 for (i = 0; i  smp_cpus; i++) {
-env = pc_new_cpu(cpu_model);
+//pc_new_cpu(cpu_model, env);
+ kvm_init_vcpu(NULL);
 }
 
 vmport_init();
diff --git a/hw/pc.h b/hw/pc.h
index 93eb34d..f931380 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -100,7 +100,7 @@ extern int fd_bootchk;
 
 void ioport_set_a20(int enable);
 int ioport_get_a20(void);
-CPUState *pc_new_cpu(const char *cpu_model);
+void pc_new_cpu(const char *cpu_model, CPUState **env);
 
 /* acpi.c */
 extern int acpi_enabled;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 0d263ca..4084312 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -160,6 +160,11 @@ int kvm_arch_create(kvm_context_t kvm, unsigned long 
phys_mem_bytes,
return 0;
 }
 
+void kvm_arch_create_vcpu(const char *model, CPUState **env)
+{
+pc_new_cpu(qemu64, env);
+}
+
 #ifdef KVM_EXIT_TPR_ACCESS
 
 static int kvm_handle_tpr_access(CPUState *env)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index b58a457..f83d19a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -436,10 +436,18 @@ void kvm_disable_pit_creation(kvm_context_t kvm)
 kvm-no_pit_creation = 1;
 }
 
-static void kvm_create_vcpu(CPUState *env, int id)
+
+static void kvm_create_vcpu(CPUState **_env)
 {
 long 

Re: libvirt bug #532480

2009-11-03 Thread Brian Jackson
On Tuesday 03 November 2009 06:02:42 am roma1390 wrote:
 Lib virt thinks that bug #532480 must be addressed to quemu/kvm team.
 
https://bugzilla.redhat.com/show_bug.cgi?id=532480


For future reference adding some overview to your email instead of making all 
the devs with arguably limited time go read through a bug report is probably a 
good idea.


 
 Any ideas how to fix this issue?


Iirc, it's being worked on. And yes, it is the developers of said drivers 
responsibility to do the signing. Keep watching the url from the bug for 
updated drivers. Until then, there are workarounds to this issue also 
mentioned at that url.


 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Ondrej Zajicek
On Tue, Nov 03, 2009 at 06:05:13PM +0200, Avi Kivity wrote:
 cirrus has pretty good 2d acceleration.  3D is a mega-project though.
  
 Cirrus has no blending/compositing hardware support.
 Paravirtualized graphics can easily support full XRender-style
 2D acceleration.

 What do that entail? 3/4 operand raster ops?

Yes, basically three operand render/composite operation and rendering of
some 2D primitives (trapezoids).

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
To err is human -- to blame it on a computer is even more so.


signature.asc
Description: Digital signature


Re: kvm problem: bonding network interface breaks dhcp

2009-11-03 Thread Matthew Palmer
On Tue, Nov 03, 2009 at 04:45:48PM +0100, Harald Dunkel wrote:
 I am trying to use a bonding network interface as a bridge
 for a virtual machine (kvm). Host and guest are both running
 2.6.31.5. Problem: The guest does not receive the DHCPOFFER
 reply sent by my dhcp server. There is no such problem if
 the host uses just a single network interface instead of
 bond0.

The output of brctl show, ip addr list, and cat /proc/net/bonding/bond*
might be helpful.

- Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 + unsigned head, out, in, s;
 + struct msghdr msg = {
 + .msg_name = NULL,
 + .msg_namelen = 0,
 + .msg_control = NULL,
 + .msg_controllen = 0,
 + .msg_iov = vq-iov,
 + .msg_flags = MSG_DONTWAIT,
 + };
 + size_t len, total_len = 0;
 + int err, wmem;
 + size_t hdr_size;
 + struct socket *sock = rcu_dereference(vq-private_data);
 + if (!sock)
 + return;
 +
 + wmem = atomic_read(sock-sk-sk_wmem_alloc);
 + if (wmem = sock-sk-sk_sndbuf)
 + return;
 +
 + use_mm(net-dev.mm);
 + mutex_lock(vq-mutex);
 + vhost_no_notify(vq);
 +

using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect
that your use of RCU is not correct.

1) rcu_dereference() should be done inside a read_rcu_lock() section, and
   we are not allowed to sleep in such a section.
   (Quoting Documentation/RCU/whatisRCU.txt :
 It is illegal to block while in an RCU read-side critical section, )

2) mutex_lock() can sleep (ie block)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +   unsigned head, out, in, s;
 +   struct msghdr msg = {
 +   .msg_name = NULL,
 +   .msg_namelen = 0,
 +   .msg_control = NULL,
 +   .msg_controllen = 0,
 +   .msg_iov = vq-iov,
 +   .msg_flags = MSG_DONTWAIT,
 +   };
 +   size_t len, total_len = 0;
 +   int err, wmem;
 +   size_t hdr_size;
 +   struct socket *sock = rcu_dereference(vq-private_data);
 +   if (!sock)
 +   return;
 +
 +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +   if (wmem = sock-sk-sk_sndbuf)
 +   return;
 +
 +   use_mm(net-dev.mm);
 +   mutex_lock(vq-mutex);
 +   vhost_no_notify(vq);
 +
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 
 
 Michael,
   I warned you that this needed better documentation ;)
 
 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.
 
 Kind Regards,
 -Greg
 

Here it is:

http://lkml.org/lkml/2009/8/12/173

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 2/3] mm: export use_mm/unuse_mm to modules

2009-11-03 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 vhost net module wants to do copy to/from user from a kernel thread,
 which needs use_mm. Export it to modules.
 
 Acked-by: Andrea Arcangeli aarca...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

I need this too:

Acked-by: Gregory Haskins ghask...@novell.com

 ---
  mm/mmu_context.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/mm/mmu_context.c b/mm/mmu_context.c
 index ded9081..0777654 100644
 --- a/mm/mmu_context.c
 +++ b/mm/mmu_context.c
 @@ -5,6 +5,7 @@
  
  #include linux/mm.h
  #include linux/mmu_context.h
 +#include linux/module.h
  #include linux/sched.h
  
  #include asm/mmu_context.h
 @@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm)
   if (active_mm != mm)
   mmdrop(active_mm);
  }
 +EXPORT_SYMBOL_GPL(use_mm);
  
  /*
   * unuse_mm
 @@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm)
   enter_lazy_tlb(mm, tsk);
   task_unlock(tsk);
  }
 +EXPORT_SYMBOL_GPL(unuse_mm);




signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] qemu-kvm: x86: Add support for event states

2009-11-03 Thread Jan Kiszka
This patch extends the qemu-kvm state sync logic with the event substate
from the new VCPU state interface, giving access to yet missing
exception, interrupt and NMI states.

The patch  does not switch the rest of qemu-kvm's code to the new
interface as it is expected to be morphed into upstream's version
anyway. Instead, a full conversion will be submitted for upstream.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 qemu-kvm-x86.c|   85 +
 target-i386/cpu.h |4 ++
 target-i386/machine.c |4 ++
 3 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index e03a4ba..b12b103 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -903,6 +903,81 @@ static void get_seg(SegmentCache *lhs, const struct 
kvm_segment *rhs)
| (rhs-avl * DESC_AVL_MASK);
 }
 
+static void kvm_get_events(CPUState *env)
+{
+#ifdef KVM_CAP_VCPU_STATE
+struct {
+struct kvm_vcpu_state header;
+struct kvm_vcpu_substate substates[1];
+} request;
+struct kvm_x86_event_state events;
+int r;
+
+request.header.nsubstates = 1;
+request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS;
+request.header.substates[0].offset = (size_t)events - (size_t)request;
+r = kvm_vcpu_ioctl(env, KVM_GET_VCPU_STATE, request);
+if (r == 0) {
+if (events.exception.injected) {
+env-exception_index = events.exception.nr;
+env-error_code = events.exception.error_code;
+} else {
+env-exception_index = -1;
+}
+
+env-interrupt_injected =
+events.interrupt.injected ? events.interrupt.nr : -1;
+env-soft_interrupt = events.interrupt.soft;
+
+env-nmi_injected = events.nmi.injected;
+env-nmi_pending = events.nmi.pending;
+if (events.nmi.masked) {
+env-hflags2 |= HF2_NMI_MASK;
+} else {
+env-hflags2 = ~HF2_NMI_MASK;
+}
+
+env-sipi_vector = events.sipi_vector;
+
+return;
+}
+#endif
+env-nmi_injected = 0;
+env-nmi_pending = 0;
+env-hflags2 = ~HF2_NMI_MASK;
+}
+
+static void kvm_set_events(CPUState *env)
+{
+#ifdef KVM_CAP_VCPU_STATE
+struct {
+struct kvm_vcpu_state header;
+struct kvm_vcpu_substate substates[1];
+} request;
+struct kvm_x86_event_state events;
+
+request.header.nsubstates = 1;
+request.header.substates[0].type = KVM_X86_VCPU_STATE_EVENTS;
+request.header.substates[0].offset = (size_t)events - (size_t)request;
+
+events.exception.injected = (env-exception_index = 0);
+events.exception.nr = env-exception_index;
+events.exception.error_code = env-error_code;
+
+events.interrupt.injected = (env-interrupt_injected = 0);
+events.interrupt.nr = env-interrupt_injected;
+events.interrupt.soft = env-soft_interrupt;
+
+events.nmi.injected = env-nmi_injected;
+events.nmi.pending = env-nmi_pending;
+events.nmi.masked = !!(env-hflags2  HF2_NMI_MASK);
+
+events.sipi_vector = env-sipi_vector;
+
+kvm_vcpu_ioctl(env, KVM_SET_VCPU_STATE, request);
+#endif
+}
+
 void kvm_arch_load_regs(CPUState *env)
 {
 struct kvm_regs regs;
@@ -1019,6 +1094,8 @@ void kvm_arch_load_regs(CPUState *env)
 rc = kvm_set_msrs(env, msrs, n);
 if (rc == -1)
 perror(kvm_set_msrs FAILED);
+
+kvm_set_events(env);
 }
 
 void kvm_load_tsc(CPUState *env)
@@ -1215,6 +1292,8 @@ void kvm_arch_save_regs(CPUState *env)
 return;
 }
 }
+
+kvm_get_events(env);
 }
 
 static void do_cpuid_ent(struct kvm_cpuid_entry2 *e, uint32_t function,
@@ -1383,7 +1462,10 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 kvm_tpr_vcpu_start(cenv);
 #endif
 
+cenv-exception_index = -1;
 cenv-interrupt_injected = -1;
+cenv-nmi_injected = 0;
+cenv-nmi_pending = 0;
 
 return 0;
 }
@@ -1453,7 +1535,10 @@ void kvm_arch_push_nmi(void *opaque)
 
 void kvm_arch_cpu_reset(CPUState *env)
 {
+env-exception_index = -1;
 env-interrupt_injected = -1;
+env-nmi_injected = 0;
+env-nmi_pending = 0;
 kvm_arch_load_regs(env);
 if (!cpu_is_bsp(env)) {
if (kvm_irqchip_in_kernel()) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a638e70..863c5a1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -711,6 +711,10 @@ typedef struct CPUX86State {
 /* For KVM */
 uint32_t mp_state;
 int32_t interrupt_injected;
+uint8_t soft_interrupt;
+uint8_t nmi_injected;
+uint8_t nmi_pending;
+uint32_t sipi_vector;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 6bd447f..f066e6a 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -452,6 +452,10 @@ static const VMStateDescription vmstate_cpu = {
 VMSTATE_INT32_V(interrupt_injected, CPUState, 9),
 

Re: [RFC] make cpu creation happen inside the right thread.

2009-11-03 Thread Marcelo Tosatti
On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote:
 Right now, we issue cpu creation from the i/o thread, and then shoot a thread
 from inside that code. Over the last months, a lot of subtle bugs were 
 reported,
 usually arising from the very fragile order of that initialization.
 
 I propose we rethink that a little. This is a patch that received basic 
 testing
 only, and I'd  like to hear on the overall direction. The idea is to issue 
 the new
 thread as early as possible. The first direct benefits I can identify are that
 we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, 
 since
 we are already on the right thread. Apic creation has far less spots for race
 conditions as well.
 
 I am implementing this on qemu-kvm first, since we can show the benefits of it
 a bit better in there (since we already support smp)
 
 Let me know what you guys think

Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though
(after initialization).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] make cpu creation happen inside the right thread.

2009-11-03 Thread Glauber Costa
On Tue, Nov 03, 2009 at 03:46:07PM -0200, Marcelo Tosatti wrote:
 On Tue, Nov 03, 2009 at 12:35:08PM -0200, Glauber Costa wrote:
  Right now, we issue cpu creation from the i/o thread, and then shoot a 
  thread
  from inside that code. Over the last months, a lot of subtle bugs were 
  reported,
  usually arising from the very fragile order of that initialization.
  
  I propose we rethink that a little. This is a patch that received basic 
  testing
  only, and I'd  like to hear on the overall direction. The idea is to issue 
  the new
  thread as early as possible. The first direct benefits I can identify are 
  that
  we no longer have to rely at on_vcpu-like schemes for issuing vcpu ioctls, 
  since
  we are already on the right thread. Apic creation has far less spots for 
  race
  conditions as well.
  
  I am implementing this on qemu-kvm first, since we can show the benefits of 
  it
  a bit better in there (since we already support smp)
  
  Let me know what you guys think
 
 Makes sense to me. You still need on_vcpu for issuing vcpu ioctls though
 (after initialization).

Yes, but I believe we can avoid most of them. There is a performance hit of 
using
it, but I am not so concerned with that. The nasty races that arises from it, 
are
more a concern to me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] qemu-kvm: x86: Refactor use of interrupt_bitmap

2009-11-03 Thread Jan Kiszka
Drop interrupt_bitmap from the cpustate and solely rely on the integer
interupt_injected. This prepares us for the new injected-interrupt
interface, which will deprecate the bitmap, while preserving
compatibility.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Note: A corresponding version for upstream is on the way as well.

 qemu-kvm-x86.c|   19 +--
 target-i386/cpu.h |3 +--
 target-i386/kvm.c |   24 +---
 target-i386/machine.c |   24 ++--
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 9df0d83..e03a4ba 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -946,7 +946,11 @@ void kvm_arch_load_regs(CPUState *env)
 fpu.mxcsr = env-mxcsr;
 kvm_set_fpu(env, fpu);
 
-memcpy(sregs.interrupt_bitmap, env-interrupt_bitmap, 
sizeof(sregs.interrupt_bitmap));
+memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
+if (env-interrupt_injected = 0) {
+sregs.interrupt_bitmap[env-interrupt_injected / 64] |=
+(uint64_t)1  (env-interrupt_injected % 64);
+}
 
 if ((env-eflags  VM_MASK)) {
set_v8086_seg(sregs.cs, env-segs[R_CS]);
@@ -1104,7 +1108,14 @@ void kvm_arch_save_regs(CPUState *env)
 
 kvm_get_sregs(env, sregs);
 
-memcpy(env-interrupt_bitmap, sregs.interrupt_bitmap, 
sizeof(env-interrupt_bitmap));
+env-interrupt_injected = -1;
+for (i = 0; i  ARRAY_SIZE(sregs.interrupt_bitmap); i++) {
+if (sregs.interrupt_bitmap[i]) {
+n = ctz64(sregs.interrupt_bitmap[i]);
+env-interrupt_injected = i * 64 + n;
+break;
+}
+}
 
 get_seg(env-segs[R_CS], sregs.cs);
 get_seg(env-segs[R_DS], sregs.ds);
@@ -1371,6 +1382,9 @@ int kvm_arch_init_vcpu(CPUState *cenv)
 #ifdef KVM_EXIT_TPR_ACCESS
 kvm_tpr_vcpu_start(cenv);
 #endif
+
+cenv-interrupt_injected = -1;
+
 return 0;
 }
 
@@ -1439,6 +1453,7 @@ void kvm_arch_push_nmi(void *opaque)
 
 void kvm_arch_cpu_reset(CPUState *env)
 {
+env-interrupt_injected = -1;
 kvm_arch_load_regs(env);
 if (!cpu_is_bsp(env)) {
if (kvm_irqchip_in_kernel()) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 4605fd2..a638e70 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -709,8 +709,8 @@ typedef struct CPUX86State {
 MTRRVar mtrr_var[8];
 
 /* For KVM */
-uint64_t interrupt_bitmap[256 / 64];
 uint32_t mp_state;
+int32_t interrupt_injected;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
@@ -727,7 +727,6 @@ typedef struct CPUX86State {
 uint16_t fpus_vmstate;
 uint16_t fptag_vmstate;
 uint16_t fpregs_format_vmstate;
-int32_t pending_irq_vmstate;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 24c9903..33f7d65 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -23,6 +23,7 @@
 #include kvm.h
 #include cpu.h
 #include gdbstub.h
+#include host-utils.h
 
 #ifdef KVM_UPSTREAM
 //#define DEBUG_KVM
@@ -408,9 +409,11 @@ static int kvm_put_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
 
-memcpy(sregs.interrupt_bitmap,
-   env-interrupt_bitmap,
-   sizeof(sregs.interrupt_bitmap));
+memset(sregs.interrupt_bitmap, 0, sizeof(sregs.interrupt_bitmap));
+if (env-interrupt_injected = 0) {
+sregs.interrupt_bitmap[env-interrupt_injected / 64] |=
+(uint64_t)1  (env-interrupt_injected % 64);
+}
 
 if ((env-eflags  VM_MASK)) {
set_v8086_seg(sregs.cs, env-segs[R_CS]);
@@ -518,15 +521,22 @@ static int kvm_get_sregs(CPUState *env)
 {
 struct kvm_sregs sregs;
 uint32_t hflags;
-int ret;
+int bit, i, ret;
 
 ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, sregs);
 if (ret  0)
 return ret;
 
-memcpy(env-interrupt_bitmap, 
-   sregs.interrupt_bitmap,
-   sizeof(sregs.interrupt_bitmap));
+/* There can only be one pending IRQ set in the bitmap at a time, so try
+   to find it and save its number instead (-1 for none). */
+env-interrupt_injected = -1;
+for (i = 0; i  ARRAY_SIZE(sregs.interrupt_bitmap); i++) {
+if (sregs.interrupt_bitmap[i]) {
+bit = ctz64(sregs.interrupt_bitmap[i]);
+env-interrupt_injected = i * 64 + bit;
+break;
+}
+}
 
 get_seg(env-segs[R_CS], sregs.cs);
 get_seg(env-segs[R_DS], sregs.ds);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 2b88fea..6bd447f 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -2,7 +2,6 @@
 #include hw/boards.h
 #include hw/pc.h
 #include hw/isa.h
-#include host-utils.h
 
 #include exec-all.h
 #include kvm.h
@@ -321,7 +320,7 @@ static const VMStateInfo vmstate_hack_uint64_as_uint32 = {
 static void cpu_pre_save(void *opaque)
 {
 CPUState *env = opaque;
-int i, 

[PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
  migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device.  Backend is also configured by userspace, including vlan/mac
etc.

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Acked-by: Arnd Bergmann a...@arndb.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 MAINTAINERS|9 +
 arch/x86/kvm/Kconfig   |1 +
 drivers/Makefile   |1 +
 drivers/vhost/Kconfig  |   11 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/net.c|  633 +
 drivers/vhost/vhost.c  |  970 
 drivers/vhost/vhost.h  |  158 +++
 include/linux/Kbuild   |1 +
 include/linux/miscdevice.h |1 +
 include/linux/vhost.h  |  126 ++
 11 files changed, 1913 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/Kconfig
 create mode 100644 drivers/vhost/Makefile
 create mode 100644 drivers/vhost/net.c
 create mode 100644 drivers/vhost/vhost.c
 create mode 100644 drivers/vhost/vhost.h
 create mode 100644 include/linux/vhost.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8824115..980a69b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5619,6 +5619,15 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VIRTIO HOST (VHOST)
+M: Michael S. Tsirkin m...@redhat.com
+L: kvm@vger.kernel.org
+L: virtualizat...@lists.osdl.org
+L: net...@vger.kernel.org
+S: Maintained
+F: drivers/vhost/
+F: include/linux/vhost.h
+
 VIA RHINE NETWORK DRIVER
 M: Roger Luethi r...@hellgate.ch
 S: Maintained
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..94f44d9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_AMD
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/lguest/Kconfig
 source drivers/virtio/Kconfig
 
diff --git a/drivers/Makefile b/drivers/Makefile
index 6ee53c7..81e3659 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_HID)   += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_OF)   += of/
 obj-$(CONFIG_SSB)  += ssb/
+obj-$(CONFIG_VHOST_NET)+= vhost/
 obj-$(CONFIG_VIRTIO)   += virtio/
 obj-$(CONFIG_VLYNQ)+= vlynq/
 obj-$(CONFIG_STAGING)  += staging/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
new file mode 100644
index 000..d955406
--- /dev/null
+++ b/drivers/vhost/Kconfig
@@ -0,0 +1,11 @@
+config VHOST_NET
+   tristate Host kernel accelerator for virtio net
+   depends on NET  EVENTFD
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest networking with virtio_net. Not to be confused with virtio_net
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_net.
+
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
new file mode 100644
index 000..72dd020
--- /dev/null
+++ b/drivers/vhost/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_VHOST_NET) += vhost_net.o
+vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
new file mode 100644
index 000..4af3b98
--- /dev/null
+++ b/drivers/vhost/net.c
@@ -0,0 +1,633 @@
+/* Copyright (C) 2009 Red Hat, Inc.
+ * Author: Michael S. Tsirkin m...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * 

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +unsigned head, out, in, s;
 +struct msghdr msg = {
 +.msg_name = NULL,
 +.msg_namelen = 0,
 +.msg_control = NULL,
 +.msg_controllen = 0,
 +.msg_iov = vq-iov,
 +.msg_flags = MSG_DONTWAIT,
 +};
 +size_t len, total_len = 0;
 +int err, wmem;
 +size_t hdr_size;
 +struct socket *sock = rcu_dereference(vq-private_data);
 +if (!sock)
 +return;
 +
 +wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +if (wmem = sock-sk-sk_sndbuf)
 +return;
 +
 +use_mm(net-dev.mm);
 +mutex_lock(vq-mutex);
 +vhost_no_notify(vq);
 +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)
 


Michael,
  I warned you that this needed better documentation ;)

Eric,
  I think I flagged this once before, but Michael convinced me that it
was indeed ok, if but perhaps a bit unconventional.  I will try to
find the thread.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Gregory Haskins a écrit :
 Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +  struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +  unsigned head, out, in, s;
 +  struct msghdr msg = {
 +  .msg_name = NULL,
 +  .msg_namelen = 0,
 +  .msg_control = NULL,
 +  .msg_controllen = 0,
 +  .msg_iov = vq-iov,
 +  .msg_flags = MSG_DONTWAIT,
 +  };
 +  size_t len, total_len = 0;
 +  int err, wmem;
 +  size_t hdr_size;
 +  struct socket *sock = rcu_dereference(vq-private_data);
 +  if (!sock)
 +  return;
 +
 +  wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +  if (wmem = sock-sk-sk_sndbuf)
 +  return;
 +
 +  use_mm(net-dev.mm);
 +  mutex_lock(vq-mutex);
 +  vhost_no_notify(vq);
 +
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)


 Michael,
   I warned you that this needed better documentation ;)

 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.

 Kind Regards,
 -Greg

 
 Here it is:
 
 http://lkml.org/lkml/2009/8/12/173
 

Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
use.
People wanting to use RCU do a grep on kernel sources to find how to correctly
use RCU.

Michael, please use existing locking/barrier mechanisms, and not pretend to use 
RCU.

Some automatic tools might barf later.

For example, we could add a debugging facility to check that rcu_dereference() 
is used
in an appropriate context, ie conflict with existing mutex_lock() debugging 
facility.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XP blue screen with qemu-kvm-0.11.0

2009-11-03 Thread Ross Boylan
On Sat, 2009-10-31 at 15:21 +0300, Michael Tokarev wrote:
 Ross Boylan wrote:
  My XP VM was working OK, and then started crashing shortly after it
  logged me in.  There were no obvious changes at the time.  I built the
  latest qemu-kvm, but the problem persists.
  
  I am running 32 bit XP on Intel(R) Xeon(R) CPU E5420  @ 2.50GHz (8 cores
  total), Debian GNU/Linux mostly Lenny (amd64), but with some more recent
  stuff.  In particular, the kernel is 2.6.30-8 and I pulled in the
  kernel-headers package to match before building kvm.  However, libc6 and
  libc6-dev are at Lenny's 2.7-18 version.
 
 Libc is basically irrelevant here.  What matters are the host kernel
 and kvm version.
 
  $ ./XP.sh
  ++ sudo vdeq bin/qemu-system-x86_64 -net 
  nic,vlan=1,macaddr=52:54:a0:12:01:00 -net 
  vde,vlan=1,sock=/var/run/vde2/tap0.ctl -boot c -vga std -hda 
  /dev/turtle/XP01 -soundhw es1370 -localtime -m 1G -smp 2
  arg ,vlan=1,sock=/var/run/vde2/tap0.ctl
  TUNGETIFF ioctl() failed: Invalid argument
  TUNSETOFFLOAD ioctl() failed: Bad address
  oss: Could not initialize DAC
  oss: Failed to open `/dev/dsp'
  oss: Reason: Device or resource busy
  oss: Could not initialize DAC
  oss: Failed to open `/dev/dsp'
  oss: Reason: Device or resource busy
  audio: Failed to create voice `es1370.dac2'
  # and more sound-related complaints
 
 Switch to alsa to get your audio working.
I don't see an alsa option for kvm/qemu.  I'm already running alsa, but
under KDE which tends to grab the device.
 
  The VM starts; I see the initial XP screen with the 4 colors; I see the
  background I get when I log in (it logs me in directly without prompt);
  and then (pretty fast) I get a blue screen.  The stop code is 0x8E, and
  the text says to check disk space and BIOS options.
 
 What's the bios files your kvm uses?  Are they by a change
 from some old qemu install?
They appear to be from the latest install, since strace shows various
bios files loading from /usr/local/kvm/share.  The invoking environment
was a little different from the real run, since strace vdeq 
apparently traced vdeq but not kvm calls.  So I just ran the kvm bare.

 
 Does kvm deb from http://www.corpit.ru/debian/tls/kvm/ expose the same
 issue?
Yes.  However, as it fails it left a reverberating sound (fragment of
the Windows login tone).

I tried starting in safe mode.  XP said there was new hardware: the
video (-vga std).  It could not find a driver on the internet(!? the
device was identified as a VGA controller).  Then it told me the driver
had been installed (after I hit finish).  I rebooted in regular mode.
This time there was no sound, but the machine failed again with STOP
0x8E (as before).  The video appeared to be working throughout this,
showing a window that exceeded vanilla VGA resolution.

Ross


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote:
 Gregory Haskins a écrit :
  Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +unsigned head, out, in, s;
  +struct msghdr msg = {
  +.msg_name = NULL,
  +.msg_namelen = 0,
  +.msg_control = NULL,
  +.msg_controllen = 0,
  +.msg_iov = vq-iov,
  +.msg_flags = MSG_DONTWAIT,
  +};
  +size_t len, total_len = 0;
  +int err, wmem;
  +size_t hdr_size;
  +struct socket *sock = rcu_dereference(vq-private_data);
  +if (!sock)
  +return;
  +
  +wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +if (wmem = sock-sk-sk_sndbuf)
  +return;
  +
  +use_mm(net-dev.mm);
  +mutex_lock(vq-mutex);
  +vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
 
  Michael,
I warned you that this needed better documentation ;)
 
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
 
  Kind Regards,
  -Greg
 
  
  Here it is:
  
  http://lkml.org/lkml/2009/8/12/173
  
 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.
 
 Some automatic tools might barf later.
 
 For example, we could add a debugging facility to check that 
 rcu_dereference() is used
 in an appropriate context, ie conflict with existing mutex_lock() debugging 
 facility.


Paul, you acked this previously. Should I add you acked-by line so
people calm down?  If you would rather I replace
rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
Or maybe patch Documentation to explain this RCU usage?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libvirt bug #532480

2009-11-03 Thread Vadim Rozenfeld

On 11/03/2009 06:48 PM, Brian Jackson wrote:

On Tuesday 03 November 2009 06:02:42 am roma1390 wrote:
   

Lib virt thinks that bug #532480 must be addressed to quemu/kvm team.

https://bugzilla.redhat.com/show_bug.cgi?id=532480
 


For future reference adding some overview to your email instead of making all
the devs with arguably limited time go read through a bug report is probably a
good idea.


   

Any ideas how to fix this issue?
 


Iirc, it's being worked on. And yes, it is the developers of said drivers
responsibility to do the signing. Keep watching the url from the bug for
updated drivers. Until then, there are workarounds to this issue also
mentioned at that url.


   

http://sourceforge.net/projects/kvm/files/kvm-driver-disc/20080318/kvm-driver-disc-20080318.iso/download
doesn't contain viostor.sys (virtual block driver) at all.

http://people.redhat.com/~yvugenfi/24.09.2009/viostor.zip drivers are 
not signed (there are no cat files inside),

so you need to do it by yourself.
Download and install WDK first, than take a look at Selfsign_example.cmd 
example. It is pretty self-explaining I think. You can also find tons on 
information about driver signing by googling on the Internet.



btw, as a developer I care about passing the WHQL tests, not about signing.

Regards,
Vadim


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-11-03 Thread Gerd Hoffmann

On 10/30/09 09:12, Hannes Reinecke wrote:

Gerd Hoffmann wrote:

http://repo.or.cz/w/qemu/kraxel.git?a=shortlog;h=refs/heads/scsi.v1

It is far from being completed, will continue tomorrow.  Should give a
idea of the direction I'm heading to though.  Comments welcome.


Yep, this looks good.


More bits available now at:

http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v3

Please have a look at the new interface, this commit:

http://repo.or.cz/w/qemu/kraxel.git/commitdiff/9c825dac540282dd4d5f5f660ca13af617888037

The workflow I have in mind is:

  -request_new()

Returns a parsed request, i.e. all fields of req-cmd are filled 
already, so the host adapter can easily check whenever it is a read or 
write and how many bytes will be transfered.


  -request_run()

Execute the command.  iovec is passed to the dma_bdrv_*() functions, 
which means it should contain guest physical addresses.


When the request is done the complete callback is called.  For commands 
which complete immediately the callback might be called before 
request_run() returns.  Or maybe don't call the callback at all then? 
We can easily indicate using the return value that we are done already.


  -request_del()

Release request when done.

Questions  comments are welcome.

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +   unsigned head, out, in, s;
  +   struct msghdr msg = {
  +   .msg_name = NULL,
  +   .msg_namelen = 0,
  +   .msg_control = NULL,
  +   .msg_controllen = 0,
  +   .msg_iov = vq-iov,
  +   .msg_flags = MSG_DONTWAIT,
  +   };
  +   size_t len, total_len = 0;
  +   int err, wmem;
  +   size_t hdr_size;
  +   struct socket *sock = rcu_dereference(vq-private_data);
  +   if (!sock)
  +   return;
  +
  +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +   if (wmem = sock-sk-sk_sndbuf)
  +   return;
  +
  +   use_mm(net-dev.mm);
  +   mutex_lock(vq-mutex);
  +   vhost_no_notify(vq);
  +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)

This use is correct. See comment in vhost.h This use of RCU has been
acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well.
There are many ways to use RCU not all of which involve read_rcu_lock.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Paul E. McKenney
On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
 Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  + unsigned head, out, in, s;
  + struct msghdr msg = {
  + .msg_name = NULL,
  + .msg_namelen = 0,
  + .msg_control = NULL,
  + .msg_controllen = 0,
  + .msg_iov = vq-iov,
  + .msg_flags = MSG_DONTWAIT,
  + };
  + size_t len, total_len = 0;
  + int err, wmem;
  + size_t hdr_size;
  + struct socket *sock = rcu_dereference(vq-private_data);
  + if (!sock)
  + return;
  +
  + wmem = atomic_read(sock-sk-sk_wmem_alloc);
  + if (wmem = sock-sk-sk_sndbuf)
  + return;
  +
  + use_mm(net-dev.mm);
  + mutex_lock(vq-mutex);
  + vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
  
  
  Michael,
I warned you that this needed better documentation ;)
  
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
  
  Kind Regards,
  -Greg
  
 
 Here it is:
 
 http://lkml.org/lkml/2009/8/12/173

What was happening in that case was that the rcu_dereference()
was being used in a workqueue item.  The role of rcu_read_lock()
was taken on be the start of execution of the workqueue item, of
rcu_read_unlock() by the end of execution of the workqueue item, and
of synchronize_rcu() by flush_workqueue().  This does work, at least
assuming that flush_workqueue() operates as advertised, which it appears
to at first glance.

The above code looks somewhat different, however -- I don't see
handle_tx() being executed in the context of a work queue.  Instead
it appears to be in an interrupt handler.

So what is the story?  Using synchronize_irq() or some such?

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xp guest, blue screen c0000221 on boot

2009-11-03 Thread Andrew Olney

Thanks for the suggestion. I've successfully installed on a new qcow2 image.

Strangely all of my xp raw images no longer work. That makes me think 
that this is more than just a corrupt image.


I've been using kvm for several years with raw images, so I don't think 
the BIOS could be a factor either.


If there's a way to get my raw images to work, that would be nice. I'm 
trying to convert one of the raw images to qcow2 -- perhaps that will work.


Avi Kivity wrote:

On 10/28/2009 05:27 PM, Andrew Olney wrote:
Thanks. In pursuing this suggestion I discovered that I also can't 
make new XP VMs. Setup fails with the disk may be damaged.


The image was created with

qemu-img create xp_new.img 13G

And the setup command is

kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb 
-usbdevice tablet


What happens if you use a new qcow2 disk?  Does it fail in the same way?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-fbdev-devel] [Qemu-devel] Re: [PATCH] Add VirtIO Frame Buffer Support

2009-11-03 Thread Paolo Bonzini

On 11/03/2009 05:05 PM, Avi Kivity wrote:

On 11/03/2009 05:29 PM, Ondrej Zajicek wrote:

On Tue, Nov 03, 2009 at 11:38:18AM +0200, Avi Kivity wrote:

On 11/03/2009 01:25 PM, Vincent Hanquez wrote:

not sure if i'm missing the point here, but couldn't it be
hypothetically
extended to stuff 3d (or video more 2d accel ?) commands too ? I can't
imagine the cirrus or stdvga driver be able to do that ever ;)


cirrus has pretty good 2d acceleration. 3D is a mega-project though.

Cirrus has no blending/compositing hardware support.
Paravirtualized graphics can easily support full XRender-style
2D acceleration.


What do that entail? 3/4 operand raster ops?


With alpha compositing.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Gregory Haskins a écrit :
 Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :

 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 Michael,
   I warned you that this needed better documentation ;)

 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.

 Kind Regards,
 -Greg

 Here it is:

 http://lkml.org/lkml/2009/8/12/173

 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.

Yes, I would tend to agree with you.  In fact, I think I suggested that
a normal barrier should be used instead of abusing rcu_dereference().

But as far as his code is concerned, I think it technically works
properly, and that was my main point.  Also note that the usage
rcu_dereference+mutex_lock() are not necessarily broken, per se:  it
could be an srcu-based critical section created by the caller, for
instance.  It would be perfectly legal to sleep on the mutex if that
were the case.

To me, the bigger issue is that the rcu_dereference() without any
apparent hint of a corresponding RSCS is simply confusing as a reviewer.
 smp_rmb() (or whatever is proper in this case) is probably more
appropriate.

Kind Regards,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: 2.6.31.4 panic: CRED: put_cred_rcu() sees ffff880204e58c00 with usage 82150912

2009-11-03 Thread Nikola Ciprich
Hello Marcelo,
well, my report might have been a bit misleading, 2.6.31.2 has been running 
there for almost 3 weeks.
So the problem didn't occur JUST after the upgrade. But it never occured before 
it, while
we were using older kernels (varius 2.6.30.x, 2.6.29.x and older).
I've updated machine to 2.6.31.5, and it's been running without problem since 
then,
but it's been just few days, so we'll see.
I'll report if the problem appears again.
Thanks.
regards
nik

On Tue, Nov 03, 2009 at 12:46:38PM -0200, Marcelo Tosatti wrote:
 On Fri, Oct 30, 2009 at 12:15:34PM +0100, Nikola Ciprich wrote:
  Ouch, typo in subject, it's 2.6.31.1 of course. sorry about that.
  also CCing kvm.
  n.
  
  On Fri, Oct 30, 2009 at 12:06:32PM +0100, Nikola Ciprich wrote:
   Hi,
   some time ago, I updated my KVM hosting machine to 2.6.31.1 and it just 
   died horribly:
 
 Nikola,
 
 Upgraded from what? Did you see experience the crash again?
 
   Oct 30 10:45:17 vbox [706369.133516] Kernel panic - not syncing: CRED: 
   put_cred_rcu() sees 880204e58c00 with usage 82150912
   Oct 30 10:45:17 vbox [706369.133519]
   Oct 30 10:45:17 vbox [706369.144990] Pid: 19, comm: ksoftirqd/5 Not 
   tainted 2.6.31lb.02 #1
   Oct 30 10:45:17 vbox [706369.151554] Call Trace:
   Oct 30 10:45:17 vbox [706369.154332]  IRQ
   Oct 30 10:45:17 vbox [8104c1fa] panic+0xaa/0x180
   Oct 30 10:45:17 vbox [706369.160280]  [81322b90] ? 
   _spin_unlock+0x30/0x60
   Oct 30 10:45:17 vbox [706369.166256]  [810f5671] ? 
   add_partial+0x21/0x90
   Oct 30 10:45:17 vbox [706369.172155]  [810f6a92] ? 
   __slab_free+0x92/0x3c0
   Oct 30 10:45:17 vbox [706369.178127]  [81102317] ? 
   file_free_rcu+0x37/0x50
   Oct 30 10:45:17 vbox [706369.184198]  [8106c655] 
   put_cred_rcu+0x75/0x80
   Oct 30 10:45:17 vbox [706369.190008]  [810a2525] 
   __rcu_process_callbacks+0x125/0x250
   Oct 30 10:45:17 vbox [706369.197020]  [810a2689] 
   rcu_process_callbacks+0x39/0x60
   Oct 30 10:45:17 vbox [706369.203624]  [81052a61] 
   __do_softirq+0xc1/0x250
   Oct 30 10:45:17 vbox [706369.209506]  [81053860] ? 
   ksoftirqd+0x0/0x1a0
   Oct 30 10:45:17 vbox [706369.215182]  [8100c4dc] 
   call_softirq+0x1c/0x30
   Oct 30 10:45:17 vbox [706369.220986]  [8100e46d] 
   do_softirq+0x3d/0x80
   Oct 30 10:45:17 vbox [706369.227317]  [81053860] ? 
   ksoftirqd+0x0/0x1a0
   Oct 30 10:45:17 vbox [706369.233020]  [810538e4] 
   ksoftirqd+0x84/0x1a0
   Oct 30 10:45:17 vbox [706369.238622]  [81066686] 
   kthread+0xa6/0xb0
   Oct 30 10:45:17 vbox [706369.243956]  [8100c3da] 
   child_rip+0xa/0x20
   Oct 30 10:45:17 vbox [706369.249390]  [810665e0] ? 
   kthread+0x0/0xb0
   Oct 30 10:45:17 vbox [706369.254806]  [8100c3d0] ? 
   child_rip+0x0/0x20
   Oct 30 10:45:17 vbox [706369.260454] Rebooting in 10 seconds..
   (trace is obtained from netconsole, so hopefully it's not mangled).
   The machine was running ~30 KVM guests, it's 8CPU 16GB x86_64, when it 
   crashed, it was only
   moderately loaded. Never had this (or any other) kind of crash on it 
   before.
   I know there's 2.6.31.5 already out, but I'm not sure if some related 
   problem has been
   reported/fixed and I'm obviously not able to quicky test/reproduce it 
   with latest kernel,
   so I'm rather reporting.
   Should more information/testing/etc be required, I'll be glad to help
   regards
   nik
 

-- 
-
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpsoP7I6Rqq7.pgp
Description: PGP signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 
 Paul, you acked this previously. Should I add you acked-by line so
 people calm down?  If you would rather I replace
 rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
 Or maybe patch Documentation to explain this RCU usage?
 

So you believe I am over-reacting to this dubious use of RCU ?

RCU documentation is already very complex, we dont need to add yet another
subtle use, and makes it less readable.

It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros :

#define rcu_dereference(p) ({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
(_p1); \
})

#define rcu_assign_pointer(p, v) \
({ \
if (!__builtin_constant_p(v) || \
((v) != NULL)) \
smp_wmb(); \
(p) = (v); \
})


There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy 
Update,
there is nothing wrong to use barriers with appropriate comments.

(And you already use mb(), wmb(), rmb(), smp_wmb() in your patch)


BTW there is at least one locking bug in vhost_net_set_features()

Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked
by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC


static void vhost_net_set_features(struct vhost_net *n, u64 features)
{
   size_t hdr_size = features  (1  VHOST_NET_F_VIRTIO_NET_HDR) ?
   sizeof(struct virtio_net_hdr) : 0;
   int i;
!  mutex_unlock(n-dev.mutex);
   n-dev.acked_features = features;
   smp_wmb();
   for (i = 0; i  VHOST_NET_VQ_MAX; ++i) {
   mutex_lock(n-vqs[i].mutex);
   n-vqs[i].hdr_size = hdr_size;
   mutex_unlock(n-vqs[i].mutex);
   }
   mutex_unlock(n-dev.mutex);
   vhost_net_flush(n);
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xp guest, blue screen c0000221 on boot

2009-11-03 Thread Andrew Olney

OK, if I convert the raw image to qcow2, it boots fine.

Why should a raw image give a blue screen?

Avi Kivity wrote:

On 10/28/2009 05:27 PM, Andrew Olney wrote:
Thanks. In pursuing this suggestion I discovered that I also can't 
make new XP VMs. Setup fails with the disk may be damaged.


The image was created with

qemu-img create xp_new.img 13G

And the setup command is

kvm -cdrom xp/xp_pro.iso -hda xp_new.img -boot d -m 512 -no-acpi -usb 
-usbdevice tablet


What happens if you use a new qcow2 disk?  Does it fail in the same way?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga_arb warning [was: mmotm 2009-11-01-10-01 uploaded]

2009-11-03 Thread Andrew Morton

Please cc me on mmotm bug reports!

On Sun, 01 Nov 2009 22:47:05 +0100 Jiri Slaby jirisl...@gmail.com wrote:

 On 11/01/2009 07:07 PM, a...@linux-foundation.org wrote:
  The mm-of-the-moment snapshot 2009-11-01-10-01 has been uploaded to
 
 Hi, I got the following warning while booting an image in qemu-kvm:
 
 WARNING: at fs/attr.c:158 notify_change+0x2da/0x310()
 Hardware name:
 Modules linked in:
 Pid: 1, comm: swapper Not tainted 2.6.32-rc5-mm1_64 #862
 Call Trace:
  [81038008] warn_slowpath_common+0x78/0xb0
  [8103804f] warn_slowpath_null+0xf/0x20
  [810d32ba] notify_change+0x2da/0x310
  [810c5b88] ? fsnotify_create+0x48/0x60
  [810c6d2b] ? vfs_mknod+0xbb/0xe0
  [812487b6] devtmpfs_create_node+0x1e6/0x270
  [811170d0] ? sysfs_addrm_finish+0x20/0x280
  [811175d6] ? __sysfs_add_one+0x26/0xf0
  [81117b6c] ? sysfs_do_create_link+0xcc/0x160
  [81241cf0] device_add+0x1e0/0x5b0
  [8124adb1] ? pm_runtime_init+0xa1/0xb0
  [81248f05] ? device_pm_init+0x65/0x70
  [812420d9] device_register+0x19/0x20
  [81242290] device_create_vargs+0xf0/0x120
  [812422ec] device_create+0x2c/0x30
  [810c0516] ? __register_chrdev+0x86/0xf0
  [81245599] ? __class_create+0x69/0xa0
  [814326e9] ? mutex_lock+0x19/0x50
  [811d4e23] misc_register+0x93/0x170
  [818994a0] ? vga_arb_device_init+0x0/0x77
  [818994b3] vga_arb_device_init+0x13/0x77
  [818994a0] ? vga_arb_device_init+0x0/0x77
  [810001e7] do_one_initcall+0x37/0x190
  [8187d6ce] kernel_init+0x172/0x1c8
  [81003c7a] child_rip+0xa/0x20
  [8187d55c] ? kernel_init+0x0/0x1c8
  [81003c70] ? child_rip+0x0/0x20

There's a -mm-only debug patch:

http://userweb.kernel.org/~akpm/mmotm/broken-out/notify_change-callers-must-hold-i_mutex.patch

--- a/fs/attr.c~notify_change-callers-must-hold-i_mutex
+++ a/fs/attr.c
@@ -155,6 +155,8 @@ int notify_change(struct dentry * dentry
return -EPERM;
}
 
+   WARN_ON_ONCE(!mutex_is_locked(inode-i_mutex));
+
now = current_fs_time(inode-i_sb);
 
attr-ia_ctime = now;
_

I forget why it was added.


It looks like it's blaming the open-coded notify_change() call in
devtmpfs_create_node().

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] [RFC] KVM test: Major control file cleanup

2009-11-03 Thread Yolkfull Chow
On Wed, Oct 28, 2009 at 02:04:59PM -0400, Michael Goldish wrote:
 
 - Lucas Meneghel Rodrigues l...@redhat.com wrote:
 
  On Wed, Oct 28, 2009 at 1:43 PM, Michael Goldish mgold...@redhat.com
  wrote:
   Sounds great, except it won't allow you to debug your configuration
   using kvm_config.py.  So the question now is what's more important
  --
   the ability to debug or ease of use when running from the server.
  
  Here we have 2 use cases:
  
  1) Users of the web interface, that (hopefully) have canned test sets
  that work reliably. Ability to debug stuff is less important on this
  scenario.
  2) People developing tests, and in this case ability to debug config
  is very important
  
  I see the following options:
  
  1) Document as part of the test development guide that, in order to
  be able to debug stuff, that all the test sets are to be written to
  the config file and then, can be parsed using kvm_config.
  2) If we write all dictionaries generated by that particular
  configuration on files inside the job results directory, we still
  have
  debug ability for all use cases (I am starting to like this idea very
  much, as I type).
  
  So I'd then implement option 2) and refactor the control file with
  the
  test sets defined inside strings in the control file, then you can
  see
  how it looks? How about that?
 
 Sounds fine.
 - Where exactly will the test list appear?
 - We should also allow printing of verbose debug output (parsing variants
 block, 9000 dicts in current context...) by passing something to the
 constructor of the config object.
 - We should make it clear to the user that he/she must rename the control
 file (to control.lucas for example) or else it may be overwritten on the
 next git-fetch or -pull.
 
 I'm still not sure it's a great idea to make config debugging harder, so
 if anyone other than Lucas who uses the KVM test is reading this, please
 let us know if you ever use kvm_config.py and if you think the ability to
 print the list of test dicts is important.

Hi Michael,

I had used kvm_config.py for printing lists of selected test dicts
often. And I think it's necessary to keep this feature. IMHO, option 2) Lucas
proposed is a good idea. What do you think? Hope I haven't missed
something. :)

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xp guest, blue screen c0000221 on boot

2009-11-03 Thread Avi Kivity

On 11/04/2009 05:45 AM, Andrew Olney wrote:

OK, if I convert the raw image to qcow2, it boots fine.

Why should a raw image give a blue screen?


Sounds like a serious bug.  What is the host filesystem?  Did you 
upgrade the host kernel or qemu?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm problem: bonding network interface breaks dhcp

2009-11-03 Thread Harald Dunkel
Hi Matt,

Matthew Palmer wrote:
 
 The output of brctl show, ip addr list, and cat /proc/net/bonding/bond*
 might be helpful.
 

Sure. Using the bridge on the bonding interface (while the
guest was running) I got:


# brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.001517ab0a59   no  bond0
vnet0
# ip addr list
1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth2: BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP mtu 1500 qdisc 
pfifo_fast master bond0 state UP qlen 1000
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
3: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:30:48:c6:e0:98 brd ff:ff:ff:ff:ff:ff
4: eth3: BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP mtu 1500 qdisc 
pfifo_fast master bond0 state UP qlen 1000
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
5: _rename: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:30:48:c6:e0:99 brd ff:ff:ff:ff:ff:ff
6: eth4: BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP mtu 1500 qdisc 
pfifo_fast master bond0 state UP qlen 1000
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
7: eth5: BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP mtu 1500 qdisc 
pfifo_fast master bond0 state UP qlen 1000
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
8: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue state 
UP
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
inet6 fe80::215:17ff:feab:a59/64 scope link
   valid_lft forever preferred_lft forever
51: br0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
inet 172.19.96.25/23 brd 172.19.97.255 scope global br0
inet6 fe80::215:17ff:feab:a59/64 scope link
   valid_lft forever preferred_lft forever
52: vnet0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 500
link/ether c6:d7:7b:fb:02:35 brd ff:ff:ff:ff:ff:ff
inet6 fe80::c4d7:7bff:fefb:235/64 scope link
   valid_lft forever preferred_lft forever
# cat /proc/net/bonding/bond*
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:ab:0a:59

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:ab:0a:58

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:ab:0a:5b

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:ab:0a:5a



For not using bonding I got

# brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.001517ab0a59   no  eth2
vnet0
# ip addr list
1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth2: BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP mtu 1500 qdisc pfifo_fast 
state UP qlen 1000
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
inet6 fe80::215:17ff:feab:a59/64 scope link
   valid_lft forever preferred_lft forever
3: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:30:48:c6:e0:98 brd ff:ff:ff:ff:ff:ff
4: eth3: BROADCAST,MULTICAST,PROMISC mtu 1500 qdisc pfifo_fast state DOWN 
qlen 1000
link/ether 00:15:17:ab:0a:58 brd ff:ff:ff:ff:ff:ff
5: _rename: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:30:48:c6:e0:99 brd ff:ff:ff:ff:ff:ff
6: eth4: BROADCAST,MULTICAST,PROMISC mtu 1500 qdisc pfifo_fast state DOWN 
qlen 1000
link/ether 00:15:17:ab:0a:5b brd ff:ff:ff:ff:ff:ff
7: eth5: BROADCAST,MULTICAST,PROMISC mtu 1500 qdisc pfifo_fast state DOWN 
qlen 1000
link/ether 00:15:17:ab:0a:5a brd ff:ff:ff:ff:ff:ff
8: bond0: BROADCAST,MULTICAST,MASTER mtu 1500 qdisc noqueue state DOWN
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
53: br0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN
link/ether 00:15:17:ab:0a:59 brd ff:ff:ff:ff:ff:ff
inet 172.19.96.25/23 brd 172.19.97.255 scope global br0
inet6 fe80::215:17ff:feab:a59/64 scope link
   valid_lft forever preferred_lft forever
54: vnet0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 500
link/ether fe:2f:ce:cc:ec:ac brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc2f:ceff:fecc:ecac/64 scope link
   valid_lft