Re: [PATCH] kvm-autotest: add object addressing in sample cfg

2009-04-03 Thread Michael Goldish

- Ryan Harper ry...@us.ibm.com wrote:

 The wiki documents[1] object addressing quite well, but we should
 include it in the example config file as well.
 
 1. 
 http://www.linux-kvm.org/page/KVM-Autotest/Parameters#Addressing_objects_.28VMs.2C_images.2C_NICs_etc.29
 
 
 -- 
 Ryan Harper
 Software Engineer; Linux Technology Center
 IBM Corp., Austin, Tx
 ry...@us.ibm.com
 
 
 diffstat output:
  kvm_tests.cfg.sample |4 
  1 files changed, 4 insertions(+)
 
 Signed-off-by: Ryan Harper ry...@us.ibm.com
 ---
 diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
 b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
 index 5619fa8..64f8e4b 100644
 --- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
 +++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
 @@ -19,6 +19,10 @@ image_size = 10G
  ssh_port = 22
  display = vnc
  
 +# specify specific values for vm1 and nic1
 +mem_vm1 = 256
 +nic_model_nic1 = rtl8139
 +
  # Port redirections
  redirs = ssh
  guest_port_ssh = 22

This may not be a good idea, because we'll end up using only rtl8139.
Further down in the file we define virtio and e1000 variants. The e1000 one, 
for example,
specifies 'nic_model = e1000'. So you'll get a dict that contains:

nic_model_vm1 = rtl8139
nic_model = e1000

and the second statement will have no effect on vm1, because object specific 
statements
take precedence over general ones, regardless of order (as mentioned in the 
wiki).

Also, we'll end up always using mem = 256 (isn't that too little for some 
guests?).

Soon we'll try to implement parsing of statements like 'nic_model.* ?= e1000', 
which will
apply to any key that matches the regex 'nic_model.*'. This will make things
a little easier.

On the other hand, nic_model represents the default value for all VMs that 
don't have
their own values. It makes sense to work mainly with this parameter, and give 
specific
values only to VMs whose values we don't want to change. For example, when we 
implement
a load test that brings up numerous VMs in the background, we may choose to 
always give
them their own specific nic_model or mem or anything, as well as their own 
specific
guest OS which excels at producing load, and leave our main_vm with the main OS 
we're
testing (which depends on the current variant).

Thanks,
Michael
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to Use LibVirt to mange KVM virtual machine

2009-04-03 Thread Zheng, Shaohui
Hi, guys
I am trying to mange my KVM VM by libvirt, but I get troubles. If you 
have the experience, can you have a look about my issue?
Before I send out this email, I already search libvirt.org and google it, No 
useful content found.

If you have a step by step document or you know the document URL, can you 
forward it to me.  Very appreciate for your helps.

I am working in RHEL5u3, with libvirt RPM packages installed. 
# rpm -qa |grep libvirt
libvirt-python-0.3.3-14.el5
libvirt-0.3.3-14.el5
libvirt-cim-0.5.1-4.el5
libvirt-devel-0.3.3-14.el5
libvirt-devel-0.3.3-14.el5
libvirt-0.3.3-14.el5

I build the host kernel with kvm upstream (2.6.29); the kvm modules are also 
built. 
I can success to boot my Linux with the following command:
qemu-system-x86_64 -m 256 -smp 2 -no-acpi -net 
nic,macaddr=00:16:3e:11:1d:c5,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup 
-hda /share/xvs/var/tmp-img_gbp23_1238745859_1

I can run virsh with following command:
virsh --connect  qemu:///system
But when I run 'list' command in virsh commlind, it can not find my VM, it is 
very strange.
virsh # list
 Id Name State
--

I try to boot the guest with an xml file, the content is following:
[r...@vt-mv1 libvirt]# cat /share/xvs/var/kvm.conf 
domain type='kvm'
  namedemo2/name
  uuid4dea24b3-1d52-d8f3-2516-782e98a23fa0/uuid
  memory131072/memory
  vcpu1/vcpu
  os
type arch=i686hvm/type
  /os
  clock sync=localtime/
  devices
emulator/usr/bin/kvm/emulator
disk type='file' device='disk'
  source file='/share/ia32p_rhel5u1.img'/
  target dev='hda'/
/disk
interface type='network'
  source network='default'/
  mac address='24:42:53:21:52:45'/
/interface
graphics type='vnc' port='-1' /
  /devices
/domain

I ran the command line like this:
virsh # define /share/xvs/var/kvm.conf 
Domain demo2 defined from /share/xvs/var/kvm.conf
virsh # start demo2
error: Failed to start domain demo2

I check the qemu log /var/log/libvirt/qemu/demo2.log, the file is empty.

I know that I miss some steps, but I can not find it. The documents in libvirt 
is very rough, I did not find any docs about KVM in that website. I am 
expecting your helps. Thanks.

-
Best Regards
Shaohui Zheng

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Izik Eidus

Gerd Hoffmann wrote:

Izik Eidus wrote:
  

The main problem that ksm will face when removing the fd interface is:
right now when you register memory into ksm, you open fd, and then ksm
do get_task_mm(), we will do mmput when the file will be closed



Did you test whenever it really cleans up in case you kill -9 qemu?

I recently did something simliar with the result that the extra
reference hold on mm_struct prevented the process memory from being
zapped ...

cheers,
  Gerd
  

Did you use mmput() after you called get_task_mm() ???
get_task_mm() do nothing beside atomic_inc(mm-mm_users);

and mmput() do nothing beside dec this counter and check if no reference 
are available to this


Am i missing anything?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gerd Hoffmann
Avi Kivity wrote:
 There is no choice.  Exiting from the guest to the kernel to userspace
 is prohibitively expensive, you can't do that on every packet.

I didn't look at virtio-net very closely yet.  I wonder why the
notification is that a big issue though.  It is easy to keep the number
of notifications low without increasing latency:

Check shared ring status when stuffing a request.  If there are requests
not (yet) consumed by the other end there is no need to send a
notification.  That scheme can even span multiple rings (nics with rx
and tx for example).

Host backend can put a limit on the number of requests it takes out of
the queue at once.  i.e. block backend can take out some requests, throw
them at the block layer, check whenever any request in flight is done,
if so send back replies, start over again.  guest can put more requests
into the queue meanwhile without having to notify the host.  I've seen
the number of notifications going down to zero when running disk
benchmarks in the guest ;)

Of course that works best with one or more I/O threads, so the vcpu
doesn't has to stop running anyway to get the I/O work done ...

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Gerd Hoffmann wrote:

Avi Kivity wrote:
  

There is no choice.  Exiting from the guest to the kernel to userspace
is prohibitively expensive, you can't do that on every packet.



I didn't look at virtio-net very closely yet.  I wonder why the
notification is that a big issue though.  It is easy to keep the number
of notifications low without increasing latency:

Check shared ring status when stuffing a request.  If there are requests
not (yet) consumed by the other end there is no need to send a
notification.  That scheme can even span multiple rings (nics with rx
and tx for example).
  


If the host is able to consume a request immediately, and the guest is 
not able to batch requests, this breaks down.  And that is the current 
situation.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Commit 3d28613c225ba94062950dacbb2304b2d2024abc breaks linux boot

2009-04-03 Thread Avi Kivity

Sheng Yang wrote:

tip is still broken for me, did a fix go in for this?



Yes. The fix have already been picked up by Avi, please wait a while for push.
  


Currently my queue is broken due to some qemu display regression.  You 
can find my queue in the 'pending' branch on kernel.org.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qemu process in Guest

2009-04-03 Thread Avi Kivity

Kumar, Venkat wrote:


Thanks for the reply.

 


I had wrong understanding that Qemu runs in Guest.

But now I understand that *ioctl(fd, KVM_RUN, 0);* will tell KVM to 
load the guest and whenever there is an exception in the guest, KVM 
traps it and executes the host code post ioctl depending on the reason 
for exit.


 

Can you point me to the code where the KVM traps the exception and 
loads the host to execute the post ioctl code?




That's what vmx.c and svm.c in the kernel are about, look at 
vmx_vcpu_run() and svm_vcpu_run().


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Gerd Hoffmann
Izik Eidus wrote:
 Gerd Hoffmann wrote:
 Did you test whenever it really cleans up in case you kill -9 qemu?

 I recently did something simliar with the result that the extra
 reference hold on mm_struct prevented the process memory from being
 zapped ...

 cheers,
   Gerd
   
 Did you use mmput() after you called get_task_mm() ???
 get_task_mm() do nothing beside atomic_inc(mm-mm_users);

mmput() call was in -release() callback, -release() in turn never was
called because the kernel didn't zap the mappings because of the
reference ...

The driver *also* created mappings which ksmctl doesn't, so it could be
you don't run into this issue.

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Andi Kleen
 Check shared ring status when stuffing a request.  If there are requests

That means you're bouncing cache lines all the time. Probably not a big
issue on single socket but could be on larger systems.

-Andi

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gregory Haskins
Gerd Hoffmann wrote:
 Avi Kivity wrote:
   
 There is no choice.  Exiting from the guest to the kernel to userspace
 is prohibitively expensive, you can't do that on every packet.
 

 I didn't look at virtio-net very closely yet.  I wonder why the
 notification is that a big issue though.  It is easy to keep the number
 of notifications low without increasing latency:

 Check shared ring status when stuffing a request.  If there are requests
 not (yet) consumed by the other end there is no need to send a
 notification.  That scheme can even span multiple rings (nics with rx
 and tx for example).
   

FWIW: I employ this scheme.  The shm-signal construct has a dirty and
pending flag (all on the same cacheline, which may or may not address
Andi's later point).  The first time you dirty the shm, it sets both
flags.  The consumer side has to clear pending before any subsequent
signals are sent.  Normally the consumer side will also clear enabled
(as part of the bidir napi thing) to further disable signals.

-Greg





signature.asc
Description: OpenPGP digital signature


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Andi Kleen
 I'm wondering about something i suggested many moons ago: to look 
 into the KVM decoder+emulator (arch/x86/kvm/x86_emulate.c).

Hi Ingo,
Me and Masami just discussed this a few emails ago in this thread:)

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Herbert Xu
On Fri, Apr 03, 2009 at 01:18:54PM +0200, Andi Kleen wrote:
  Check shared ring status when stuffing a request.  If there are requests
 
 That means you're bouncing cache lines all the time. Probably not a big
 issue on single socket but could be on larger systems.

If the backend is running on a core that doesn't share caches
with the guest queue then you've got bigger problems.

Right this is unavoidable for guests with many CPUs but that
should go away once we support multiqueue in virtio-net.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Gregory Haskins wrote:

Yes, but the important thing to point out is it doesn't *replace*
PCI. It simply an alternative.
  
  

Does it offer substantial benefits over PCI?  If not, it's just extra
code.



First of all, do you think I would spend time designing it if I didn't
think so? :)
  


I'll rephrase.  What are the substantial benefits that this offers over PCI?


Second of all, I want to use vbus for other things that do not speak PCI
natively (like userspace for instance...and if I am gleaning this
correctly, lguest doesnt either).
  


And virtio supports lguest and s390.  virtio is not PCI specific.

However, for the PC platform, PCI has distinct advantages.  What 
advantages does vbus have for the PC platform?



PCI sounds good at first, but I believe its a false economy.  It was
designed, of course, to be a hardware solution, so it carries all this
baggage derived from hardware constraints that simply do not exist in a
pure software world and that have to be emulated.  Things like the fixed
length and centrally managed PCI-IDs, 


Not a problem in practice.


PIO config cycles, BARs,
pci-irq-routing, etc.  


What are the problems with these?


While emulation of PCI is invaluable for
executing unmodified guest, its not strictly necessary from a
paravirtual software perspective...PV software is inherently already
aware of its context and can therefore use the best mechanism
appropriate from a broader selection of choices.
  


It's also not necessary to invent a new bus.  We need a positive 
advantage, we don't do things just because we can (and then lose the 
real advantages PCI has).



If we insist that PCI is the only interface we can support and we want
to do something, say, in the kernel for instance, we have to have either
something like the ICH model in the kernel (and really all of the pci
chipset models that qemu supports), or a hacky hybrid userspace/kernel
solution.  I think this is what you are advocating, but im sorry. IMO
that's just gross and unecessary gunk.  


If we go for a kernel solution, a hybrid solution is the best IMO.  I 
have no idea what's wrong with it.


The guest would discover and configure the device using normal PCI 
methods.  Qemu emulates the requests, and configures the kernel part 
using normal Linux syscalls.  The nice thing is, kvm and the kernel part 
don't even know about each other, except for a way for hypercalls to 
reach the device and a way for interrupts to reach kvm.



Lets stop beating around the
bush and just define the 4-5 hypercall verbs we need and be done with
it.  :)

FYI: The guest support for this is not really *that* much code IMO.
 
 drivers/vbus/proxy/Makefile  |2

 drivers/vbus/proxy/kvm.c |  726 +
  


Does it support device hotplug and hotunplug?  Can vbus interrupts be 
load balanced by irqbalance?  Can guest userspace enumerate devices?  
Module autoloading support?  pxe booting?


Plus a port to Windows, enerprise Linux distros based on 2.6.dead, and 
possibly less mainstream OSes.



and plus, I'll gladly maintain it :)

I mean, its not like new buses do not get defined from time to time. 
Should the computing industry stop coming up with new bus types because

they are afraid that the windows ABI only speaks PCI?  No, they just
develop a new driver for whatever the bus is and be done with it.  This
is really no different.
  


As a matter of fact, a new bus was developed recently called PCI 
express.  It uses new slots, new electricals, it's not even a bus 
(routers + point-to-point links), new everything except that the 
software model was 1% compatible with traditional PCI.  
That's how much people are afraid of the Windows ABI.



Note that virtio is not tied to PCI, so vbus is generic doesn't count.


Well, preserving the existing virtio-net on x86 ABI is tied to PCI,
which is what I was referring to.  Sorry for the confusion.
  


virtio-net knows nothing about PCI.  If you have a problem with PCI, 
write virtio-blah for a new bus.  Though I still don't understand why.


 


I meant, move the development effort, testing, installed base, Windows
drivers.



Again, I will maintain this feature, and its completely off to the
side.  Turn it off in the config, or do not enable it in qemu and its
like it never existed.  Worst case is it gets reverted if you don't like
it.  Aside from the last few kvm specific patches, the rest is no
different than the greater linux environment.  E.g. if I update the
venet driver upstream, its conceptually no different than someone else
updating e1000, right?
  


I have no objections to you maintaining vbus, though I'd much prefer if 
we can pool our efforts and cooperate on having one good set of drivers.


I think you're integrating too tightly with kvm, which is likely to 
cause problems when kvm evolves.  The way I'd do it is:


- drop all mmu integration; instead, have your devices maintain their 
own slots layout and use 

Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Herbert Xu wrote:

On Fri, Apr 03, 2009 at 02:03:45PM +0300, Avi Kivity wrote:
  
If the host is able to consume a request immediately, and the guest is  
not able to batch requests, this breaks down.  And that is the current  
situation.



Hang on, why is the host consuming the request immediately? It
has to write the packet to tap, which then calls netif_rx_ni so
it should actually go all the way, no?
  


The host writes the packet to tap, at which point it is consumed from 
its point of view.  The host would like to mention that if there was an 
API to notify it when the packet was actually consumed, then it would 
gladly use it.  Bonus points if this involves not copying the packet.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Andi Kleen wrote:

Check shared ring status when stuffing a request.  If there are requests



That means you're bouncing cache lines all the time. Probably not a big
issue on single socket but could be on larger systems.
  


That's why I'd like requests to be handled on the vcpu thread rather 
than an auxiliary thread.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Herbert Xu
On Fri, Apr 03, 2009 at 02:46:04PM +0300, Avi Kivity wrote:

 The host writes the packet to tap, at which point it is consumed from  
 its point of view.  The host would like to mention that if there was an  
 API to notify it when the packet was actually consumed, then it would  
 gladly use it.  Bonus points if this involves not copying the packet.

We're using write(2) for this, no? That should invoke netif_rx_ni
which blocks until the packet is processed, which usually means
that it's placed on the NIC's hardware queue.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM performance

2009-04-03 Thread BRAUN, Stefanie

Hallo,

as I want to switch from XEN to KVM I've made some performance tests
to see if KVM is as peformant as XEN. But tests with a VMU that receives
a streamed video, adds a small logo to the video and streams it to a
client
have shown that XEN performs much betten than KVM. 
In XEN the vlc (videolan client used to receive, process and send the
video) process 
within the vmu has a cpuload of 33,8 % whereas in KVM
the vlc process has a cpuload of 99.9 %.
I'am not sure why, does anybody now some settings to improve
the KVM performance?

Thank you.
Regards, Stefanie.


Used hardware and settings:
In the tests I've used the same host hardware for XEN and KVM:
- Dual Core AMD 2.2 GHz, 8 GB RAM
- Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm
version 10.fc10 version 74
also tested in january: compiled kernel with
kvm-83

- KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also
tested)
  RAM: 256 MB (same for XEN vmu)
  CPU: 1 Core with 2,2 GHz (same for XEN vmu)
tested nic models: rtl8139, e1000, virtio

Tested Scenario: VMU receives a streamed video , adds a logo (watermark)
to the video stream and then streams it to a client

Results:

XEN:
Host cpu load (virt-manager):   23%
VMU  cpu load (virt-manager):   18 %
VLC process within VMU (top):   33,8%   

KVM:
no virt-manager cpu load as I started the vmu with the kvm command
Host cpu load : 52%
qemu-kvm process (top)  77-100%
VLC process within vmu (top):   80 - 99,9%

KVM command to start vmu
/usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net
nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net
tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem
u-ifdown -vnc 127.0.0.1:1 -k de --daemonize







Alcatel-Lucent Deutschland AG
Bell Labs Germany
Service Infrastructure, ZFZ-SI
Stefanie Braun
Phone:   +49.711.821-34865
Fax: +49.711.821-32453

Postal address:
Alcatel-Lucent Deutschland AG
Lorenzstrasse 10
D-70435 STUTTGART

Mail: stefanie.br...@alcatel-lucent.de

 

Alcatel-Lucent Deutschland AG
Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026
Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk
Wulf (Vors.), Dr. Rainer Fechner


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Vegard Nossum
2009/4/3 Ingo Molnar mi...@elte.hu:

 * Avi Kivity a...@redhat.com wrote:

 Ingo Molnar wrote:
 kvm has three requirements not needed by kprobes:
 - it wants to execute instructions, not just decode them, including
   generating faults where appropriate
 - it is performance critical
 - it needs to support 16-bit, 32-bit, and 64-bit instructions 
 simultaneously

 If an arch/x86/ decoder/emulator gives me these I'll gladly switch
 to it.  x86_emulate.c is high on my list of most disliked code.


 Well, this has to be driven from the KVM side as the kprobes use
 will only be for decoding so if it's modified from the kprobes
 side the KVM-only functionality might regress.

 So ... we can do the library decoder for kprobes purposes, and
 someone versed in the KVM emulator can then combine the two.

 Problem is, anyone versed in the kvm emulator will want to run as
 far away from this work as possible.

 Are you suggesting that the KVM emulator should never have been
 merged in the first place? ;-)

 Anyway, we'll make sure the kprobes/library decoder is as clean as
 possible - so it ought to be hackable and extensible without the
 risk of permanent brain damage. Mmiotrace and kmemcheck has decoding
 smarts too, and i think the sw-breakpoint injection code of KGDB
 could use it as well - so there's broader utility in all this.

(Sorry in advance for jumping in -- my post may be irrelevant)

For the record, kmemcheck requirements for an instruction decoder are these:

For any instruction with memory operands, we need to know which are
the operands (so for movl %eax, (%ebx) we need to combine the
instruction with a struct pt_regs to get the actual address
dereferenced, i.e. the contents of %ebx), and their sizes (for movzbl,
the source operand is 8 bits, destination operand is 32 bits). For
things like movsb, we need to be able to get both %esi and %edi.

mmiotrace additionally needs to know what the actual values
read/written were, for instructions that read/write to memory (again,
combined with a struct pt_regs).

Maybe this doesn't really say much, since this is what a generic
instruction decoder would be able to do anyway. But kmemcheck and
mmiotrace both have very special-purpose decoders. I don't really know
what other decoders look like, but what I would wish for is this: Some
macros for iterating the operands, where each operand has a type (e.g.
input (for reads), output (for writes), target (for jumps), immediate
address, immediate value, etc.), a size (in bits), and a way to
evaluate the operand. So eval(op, regs) for op=%eax, it will return
regs-eax; for op=4(%eax), it will return regs-eax + 4; for op=4 it
will return 4, etc.

Both kmemcheck and mmiotrace could gain SMP support with instruction
emulation, though it is strictly not necessary. In that case, though,
we would not want to emulate fault handling, etc. (i.e. the fault
should always be generated by the CPU itself).

Please do put me on Cc for future discussions, though.


Vegard

-- 
The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation.
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Gregory Haskins wrote:

Avi Kivity wrote:
  

Gregory Haskins wrote:


So again, I am proposing for consideration of accepting my work (either
in its current form, or something we agree on after the normal review
process) not only on the basis of the future development of the
platform, but also to keep current components in their running to their
full potential.  I will again point out that the code is almost
completely off to the side, can be completely disabled with config
options, and I will maintain it.  Therefore the only real impact is to
people who care to even try it, and to me.
  
  

Your work is a whole stack.  Let's look at the constituents.

- a new virtual bus for enumerating devices.

Sorry, I still don't see the point.  It will just make writing drivers
more difficult.  The only advantage I've heard from you is that it
gets rid of the gunk.  Well, we still have to support the gunk for
non-pv devices so the gunk is basically free.  The clean version is
expensive since we need to port it to all guests and implement
exciting features like hotplug.


My real objection to PCI is fast-path related.  I don't object, per se,
to using PCI for discovery and hotplug.  If you use PCI just for these
types of things, but then allow fastpath to use more hypercall oriented
primitives, then I would agree with you.  We can leave PCI emulation in
user-space, and we get it for free, and things are relatively tidy.
  


PCI has very little to do with the fast path (nothing, if we use MSI).


Its once you start requiring that we stay ABI compatible with something
like the existing virtio-net in x86 KVM where I think it starts to get
ugly when you try to move it into the kernel.  So that is what I had a
real objection to.  I think as long as we are not talking about trying
to make something like that work, its a much more viable prospect.
  


I don't see why the fast path of virtio-net would be bad.  Can you 
elaborate?


Obviously all the pci glue stays in userspace.

So what I propose is the following: 


1) The core vbus design stays the same (or close to it)
  


Sorry, I still don't see what advantage this has over PCI, and how you 
deal with the disadvantages.



2) the vbus-proxy and kvm-guest patch go away
3) the kvm-host patch changes to work with coordination from the
userspace-pci emulation for things like MSI routing
4) qemu will know to create some MSI shim 1:1 with whatever it
instantiates on the bus (and can communicate changes
  


Don't userstand.  What's this MSI shim?


5) any drivers that are written for these new PCI-IDs that might be
present are allowed to use a hypercall ABI to talk after they have been
probed for that ID (e.g. they are not limited to PIO or MMIO BAR type
access methods).
  


The way we'd to it with virtio is to add a feature bit that say you can 
hypercall here instead of pio.  This way old drivers continue to work.


Note that nothing prevents us from trapping pio in the kernel (in fact, 
we do) and forwarding it to the device.  It shouldn't be any slower than 
hypercalls.



Once I get here, I might have greater clarity to see how hard it would
make to emulate fast path components as well.  It might be easier than I
think.

This is all off the cuff so it might need some fine tuning before its
actually workable.

Does that sound reasonable?
  


The vbus part (I assume you mean device enumeration) worries me.  I 
don't think you've yet set down what its advantages are.  Being pure and 
clean doesn't count, unless you rip out PCI from all existing installed 
hardware and from Windows.



- finer-grained point-to-point communication abstractions

Where virtio has ring+signalling together, you layer the two.  For
networking, it doesn't matter.  For other applications, it may be
helpful, perhaps you have something in mind.



Yeah, actually.  Thanks for bringing that up.

So the reason why signaling and the ring are distinct constructs in the
design is to facilitate constructs other than rings.  For instance,
there may be some models where having a flat shared page is better than
a ring.  A ring will naturally preserve all values in flight, where as a
flat shared page would not (last update is always current).  There are
some algorithms where a previously posted value is obsoleted by an
update, and therefore rings are inherently bad for this update model. 
And as we know, there are plenty of algorithms where a ring works

perfectly.  So I wanted that flexibility to be able to express both.
  


I agree that there is significant potential here.


One of the things I have in mind for the flat page model is that RT vcpu
priority thing.  Another thing I am thinking of is coming up with a PV
LAPIC type replacement (where we can avoid doing the EOI trap by having
the PICs state shared).
  


You keep falling into the paravirtualize the entire universe trap.  If 
you look deep down, you can see Jeremy struggling in there trying to 
bring dom0 support to 

Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Masami Hiramatsu
Avi Kivity wrote:
 Ingo Molnar wrote:
 ok, the structure and concept looks quite good now, really nice!

 I'm wondering about something i suggested many moons ago: to look into
 the KVM decoder+emulator (arch/x86/kvm/x86_emulate.c).

 I remember there were some issues with that (one problem being that
 the KVM decoder is a special-purpose thing covering specific range of
 execution environments - not a near-full integer-ops decoder like the
 one we are aiming for here) - are there any other fundamental problems
 beyond 'it has to be done' ?

 Conceptually we want just a single piece of decoder logic in
 arch/x86/. If the KVM folks are cool with it we could factor out the
 KVM one into arch/x86/lib/. But ... if there are compelling reasons to
 leave the KVM one alone in its limited environment we can do that too.
   
 
 kvm has three requirements not needed by kprobes:
 - it wants to execute instructions, not just decode them, including
 generating faults where appropriate
 - it is performance critical
 - it needs to support 16-bit, 32-bit, and 64-bit instructions
 simultaneously

Hmm, I'd like to know actually kvm aims to emulate all kinds of
instructions. If so, I might find some bugs in x86_emulate.c.
However, I don't know all bugs. To find all of them, we have to
port x86_emulate.c to user-space, decode binaries with it, and
compare its output with another decoder, as Jim had done with insn.c.

https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html


Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Ingo Molnar

* Masami Hiramatsu mhira...@redhat.com wrote:

 Hmm, I'd like to know actually kvm aims to emulate all kinds of 
 instructions. If so, I might find some bugs in x86_emulate.c. 
 However, I don't know all bugs. To find all of them, we have to 
 port x86_emulate.c to user-space, decode binaries with it, and 
 compare its output with another decoder, as Jim had done with 
 insn.c.
 
 https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html

btw., i'd suggest we put a build time check for this into the kernel 
version as well. For example to decode the vmlinux via objdump, run 
it through your decoder as well and compare the results. Put under a 
CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time 
self-test.

This would ensure that the kernel we are running is fully supported 
by the decoder - even as GCC/GAS starts using new instructions, etc. 

How does this sound to you?

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Avi Kivity

Masami Hiramatsu wrote:

Hmm, I'd like to know actually kvm aims to emulate all kinds of
instructions. 


We're less interested in fpu/sse.  The interesting instructions are 
those used for page table management, mmio, and real mode execution.



If so, I might find some bugs in x86_emulate.c.
However, I don't know all bugs. To find all of them, we have to
port x86_emulate.c to user-space, decode binaries with it, and
compare its output with another decoder, as Jim had done with insn.c.

  


That would be very useful.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gregory Haskins
Hi Avi,

 I think we have since covered these topics later in the thread, but in
case you wanted to know my thoughts here:

Avi Kivity wrote:
 Gregory Haskins wrote:
 Yes, but the important thing to point out is it doesn't *replace*
 PCI. It simply an alternative.
 
 Does it offer substantial benefits over PCI?  If not, it's just extra
 code.
 

 First of all, do you think I would spend time designing it if I didn't
 think so? :)
   

 I'll rephrase.  What are the substantial benefits that this offers
 over PCI?

Simplicity and optimization.  You don't need most of the junk that comes
with PCI.  Its all overhead and artificial constraints.  You really only
need things like a handful of hypercall verbs and thats it.


 Second of all, I want to use vbus for other things that do not speak PCI
 natively (like userspace for instance...and if I am gleaning this
 correctly, lguest doesnt either).
   

 And virtio supports lguest and s390.  virtio is not PCI specific.
I understand that.  We keep getting wrapped around the axle on this
one.   At some point in the discussion we were talking about supporting
the existing guest ABI without changing the guest at all.  So while I
totally understand the virtio can work over various transports, I am
referring to what would be needed to have existing ABI guests work with
an in-kernel version.  This may or may not be an actual requirement.


 However, for the PC platform, PCI has distinct advantages.  What
 advantages does vbus have for the PC platform?
To reiterate: IMO simplicity and optimization.  Its designed
specifically for PV use, which is software to software.


 PCI sounds good at first, but I believe its a false economy.  It was
 designed, of course, to be a hardware solution, so it carries all this
 baggage derived from hardware constraints that simply do not exist in a
 pure software world and that have to be emulated.  Things like the fixed
 length and centrally managed PCI-IDs, 

 Not a problem in practice.

Perhaps, but its just one more constraint that isn't actually needed. 
Its like the cvs vs git debate.  Why have it centrally managed when you
don't technically need it.  Sure, centrally managed works, but I'd
rather not deal with it if there was a better option.


 PIO config cycles, BARs,
 pci-irq-routing, etc.  

 What are the problems with these?

1) PIOs are still less efficient to decode than a hypercall vector.  We
dont need to pretend we are hardware..the guest already knows whats
underneath them.  Use the most efficient call method.

2) BARs?  No one in their right mind should use an MMIO BAR for PV. :)
The last thing we want to do is cause page faults here.  Don't use them,
period.  (This is where something like the vbus::shm() interface comes in)

3) pci-irq routing was designed to accommodate etch constraints on a
piece of silicon that doesn't actually exist in kvm.  Why would I want
to pretend I have PCI A,B,C,D lines that route to a pin on an IOAPIC? 
Forget all that stuff and just inject an IRQ directly.  This gets much
better with MSI, I admit, but you hopefully catch my drift now.

One of my primary design objectives with vbus was to a) reduce the
signaling as much as possible, and b) reduce the cost of signaling.  
That is why I do things like use explicit hypercalls, aggregated
interrupts, bidir napi to mitigate signaling, the shm_signal::pending
mitigation, and avoiding going to userspace by running in the kernel. 
All of these things together help to form what I envision would be a
maximum performance transport.  Not all of these tricks are
interdependent (for instance, the bidir + full-duplex threading that I
do can be done in userspace too, as discussed).  They are just the
collective design elements that I think we need to make a guest perform
very close to its peak.  That is what I am after.


 While emulation of PCI is invaluable for
 executing unmodified guest, its not strictly necessary from a
 paravirtual software perspective...PV software is inherently already
 aware of its context and can therefore use the best mechanism
 appropriate from a broader selection of choices.
   

 It's also not necessary to invent a new bus.
You are right, its not strictly necessary to work.  Its just presents
the opportunity to optimize as much as possible and to move away from
legacy constraints that no longer apply.  And since PVs sole purpose is
about optimization, I was not really interested in going half-way.

   We need a positive advantage, we don't do things just because we can
 (and then lose the real advantages PCI has).

Agreed, but I assert there are advantages.  You may not think they
outweigh the cost, and thats your prerogative, but I think they are
still there nonetheless.


 If we insist that PCI is the only interface we can support and we want
 to do something, say, in the kernel for instance, we have to have either
 something like the ICH model in the kernel (and really all of the pci
 chipset models that qemu 

VM cpuTime discrepancy

2009-04-03 Thread Zvi Dubitzky
The cpuTime of a VM   reported by  kvm72  is ok   (real seconds ) while 
that reported by kvm-84 is not
Are you aware of this .  Was it fixed  in latest kvm releases since 84 ?

I access couTime  via libvirt . (same version in both cases) .

thanks

Zvi Dubitzky 
Virtualization and System Architecture   Email:d...@il.ibm.com
IBM Haifa Research LaboratoryPhone: +972-4-8296182
Haifa, 31905, ISRAEL 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Avi Kivity

Gregory Haskins wrote:

I'll rephrase.  What are the substantial benefits that this offers
over PCI?



Simplicity and optimization.  You don't need most of the junk that comes
with PCI.  Its all overhead and artificial constraints.  You really only
need things like a handful of hypercall verbs and thats it.

  


Simplicity:

The guest already supports PCI.  It has to, since it was written to the 
PC platform, and since today it is fashionable to run kernels that 
support both bare metal and a hypervisor.  So you can't remove PCI from 
the guest.


The host also already supports PCI.  It has to, since it must supports 
guests which do not support vbus.  We can't remove PCI from the host.


You don't gain simplicity by adding things.  Sure, lguest is simple 
because it doesn't support PCI.  But Linux will forever support PCI, and 
Qemu will always support PCI.  You aren't simplifying anything by adding 
vbus.


Optimization:

Most of PCI (in our context) deals with configuration.  So removing it 
doesn't optimize anything, unless you're counting hotplugs-per-second or 
something.




Second of all, I want to use vbus for other things that do not speak PCI
natively (like userspace for instance...and if I am gleaning this
correctly, lguest doesnt either).
  
  

And virtio supports lguest and s390.  virtio is not PCI specific.


I understand that.  We keep getting wrapped around the axle on this
one.   At some point in the discussion we were talking about supporting
the existing guest ABI without changing the guest at all.  So while I
totally understand the virtio can work over various transports, I am
referring to what would be needed to have existing ABI guests work with
an in-kernel version.  This may or may not be an actual requirement.
  


There is be no problem supporting an in-kernel host virtio endpoint with 
the existing guest/host ABI.  Nothing in the ABI assumes the host 
endpoint is in userspace.  Nothing in the implementation requires us to 
move any of the PCI stuff into the kernel.


In fact, we already have in-kernel sources of PCI interrupts, these are 
assigned PCI devices (obviously, these have to use PCI).



However, for the PC platform, PCI has distinct advantages.  What
advantages does vbus have for the PC platform?


To reiterate: IMO simplicity and optimization.  Its designed
specifically for PV use, which is software to software.
  


To avoid reiterating, please be specific about these advantages.

  

PCI sounds good at first, but I believe its a false economy.  It was
designed, of course, to be a hardware solution, so it carries all this
baggage derived from hardware constraints that simply do not exist in a
pure software world and that have to be emulated.  Things like the fixed
length and centrally managed PCI-IDs, 
  

Not a problem in practice.



Perhaps, but its just one more constraint that isn't actually needed. 
Its like the cvs vs git debate.  Why have it centrally managed when you

don't technically need it.  Sure, centrally managed works, but I'd
rather not deal with it if there was a better option.
  


We've allocated 3 PCI device IDs so far.  It's not a problem.  There are 
enough real problems out there.


  

PIO config cycles, BARs,
pci-irq-routing, etc.  
  

What are the problems with these?



1) PIOs are still less efficient to decode than a hypercall vector.  We
dont need to pretend we are hardware..the guest already knows whats
underneath them.  Use the most efficient call method.
  


Last time we measured, hypercall overhead was the same as pio overhead.  
Both vmx and svm decode pio completely (except for string pio ...)



2) BARs?  No one in their right mind should use an MMIO BAR for PV. :)
The last thing we want to do is cause page faults here.  Don't use them,
period.  (This is where something like the vbus::shm() interface comes in)
  


So don't use BARs for your fast path.  virtio places the ring in guest 
memory (like most real NICs).



3) pci-irq routing was designed to accommodate etch constraints on a
piece of silicon that doesn't actually exist in kvm.  Why would I want
to pretend I have PCI A,B,C,D lines that route to a pin on an IOAPIC? 
Forget all that stuff and just inject an IRQ directly.  This gets much

better with MSI, I admit, but you hopefully catch my drift now.
  


True, PCI interrupts suck.  But this was fixed with MSI.  Why fix it again?


One of my primary design objectives with vbus was to a) reduce the
signaling as much as possible, and b) reduce the cost of signaling.  
That is why I do things like use explicit hypercalls, aggregated

interrupts, bidir napi to mitigate signaling, the shm_signal::pending
mitigation, and avoiding going to userspace by running in the kernel. 
All of these things together help to form what I envision would be a

maximum performance transport.  Not all of these tricks are
interdependent (for instance, the bidir + full-duplex threading that I
do can be done 

[PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace

2009-04-03 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.
- query_register_offset(const char *name)
   Query the offset of name register.

- query_register_name(unsigned offset)
   Query the name of register by its offset.

- get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- valid_stack_address(struct pt_regs *regs, unsigned long addr)
   Check the address is in the stack.

- get_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the stack. (N = 0)

- get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)

changes from v4:
 - support querying ss register.
 - remove unneeded cast.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
---

 arch/x86/include/asm/ptrace.h |   66 +
 arch/x86/kernel/ptrace.c  |   60 +
 2 files changed, 126 insertions(+), 0 deletions(-)


diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index aed0894..51e5844 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@

 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif

 #ifndef __ASSEMBLY__
@@ -215,6 +216,71 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }

+/* Query offset/name of register from its name/offset */
+extern int query_register_offset(const char *name);
+extern const char *query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/* Get register value from its offset */
+static inline unsigned long get_register(struct pt_regs *regs, unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/* Check the address in the stack */
+static inline int valid_stack_address(struct pt_regs *regs, unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_trap_sp(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/* Get Nth entry of the stack */
+static inline unsigned long get_stack_nth(struct pt_regs *regs, unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_trap_sp(regs);
+   addr += n;
+   if (valid_stack_address(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n)
+{
+#ifdef CONFIG_X86_32
+#define NR_REGPARMS 3
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-ax;
+   case 1: return regs-dx;
+   case 2: return regs-cx;
+   }
+   return 0;
+#else /* CONFIG_X86_64 */
+#define NR_REGPARMS 6
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-di;
+   case 1: return regs-si;
+   case 2: return regs-dx;
+   case 3: return regs-cx;
+   case 4: return regs-r8;
+   case 5: return regs-r9;
+   }
+   return 0;
+#endif
+   } else {
+   /*
+* The typical case: arg n is on the stack.
+* (Note: stack[0] = return address, so skip it)
+*/
+   return get_stack_nth(regs, 1 + n - NR_REGPARMS);
+   }
+}
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 5c6e463..3f504fd 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -46,6 +46,66 @@ enum x86_regset {
REGSET_IOPERM32,
 };

+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET(r) offsetof(struct pt_regs, r)
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static struct pt_regs_offset regoffset_table[] = {
+#ifdef CONFIG_X86_64
+   REG_OFFSET_NAME(r15),
+   REG_OFFSET_NAME(r14),
+   REG_OFFSET_NAME(r13),
+   REG_OFFSET_NAME(r12),
+   REG_OFFSET_NAME(r11),
+   REG_OFFSET_NAME(r10),
+   REG_OFFSET_NAME(r9),
+   REG_OFFSET_NAME(r8),
+#endif
+   REG_OFFSET_NAME(bx),
+   REG_OFFSET_NAME(cx),
+   REG_OFFSET_NAME(dx),
+   REG_OFFSET_NAME(si),
+   REG_OFFSET_NAME(di),
+   REG_OFFSET_NAME(bp),
+   REG_OFFSET_NAME(ax),
+#ifdef CONFIG_X86_32
+   REG_OFFSET_NAME(ds),
+   REG_OFFSET_NAME(es),
+   REG_OFFSET_NAME(fs),
+   REG_OFFSET_NAME(gs),
+#endif
+   REG_OFFSET_NAME(orig_ax),
+   REG_OFFSET_NAME(ip),
+   REG_OFFSET_NAME(cs),
+   

Re: [PATCH 5/4] update ksm userspace interfaces

2009-04-03 Thread Chris Wright
* Gerd Hoffmann (kra...@redhat.com) wrote:
 mmput() call was in -release() callback, -release() in turn never was
 called because the kernel didn't zap the mappings because of the
 reference ...

Don't have this issue.  That mmput() is not tied to zapping mappings,
rather zapping files.  IOW, I think you're saying exit_mmap() wasn't
running due to your get_task_mm() (quite understandable, you still hold
a ref), whereas this ref is tied to exit_files().

So do_exit would do:

exit_mm
  mmput -- not dropped yet
exit_files
  -release
mmput -- dropped here

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 Avi Kivity wrote:
  
 Gregory Haskins wrote:

 So again, I am proposing for consideration of accepting my work
 (either
 in its current form, or something we agree on after the normal review
 process) not only on the basis of the future development of the
 platform, but also to keep current components in their running to
 their
 full potential.  I will again point out that the code is almost
 completely off to the side, can be completely disabled with config
 options, and I will maintain it.  Therefore the only real impact is to
 people who care to even try it, and to me.
 
 Your work is a whole stack.  Let's look at the constituents.

 - a new virtual bus for enumerating devices.

 Sorry, I still don't see the point.  It will just make writing drivers
 more difficult.  The only advantage I've heard from you is that it
 gets rid of the gunk.  Well, we still have to support the gunk for
 non-pv devices so the gunk is basically free.  The clean version is
 expensive since we need to port it to all guests and implement
 exciting features like hotplug.
 
 My real objection to PCI is fast-path related.  I don't object, per se,
 to using PCI for discovery and hotplug.  If you use PCI just for these
 types of things, but then allow fastpath to use more hypercall oriented
 primitives, then I would agree with you.  We can leave PCI emulation in
 user-space, and we get it for free, and things are relatively tidy.
   

 PCI has very little to do with the fast path (nothing, if we use MSI).

At the very least, PIOs are slightly slower than hypercalls.  Perhaps
not enough to care, but the last time I measured them they were slower,
and therefore my clean slate design doesn't use them.

But I digress.  I think I was actually kind of agreeing with you that we
could do this. :P


 Its once you start requiring that we stay ABI compatible with something
 like the existing virtio-net in x86 KVM where I think it starts to get
 ugly when you try to move it into the kernel.  So that is what I had a
 real objection to.  I think as long as we are not talking about trying
 to make something like that work, its a much more viable prospect.
   

 I don't see why the fast path of virtio-net would be bad.  Can you
 elaborate?

Im not.  I am saying I think we might be able to do this.


 Obviously all the pci glue stays in userspace.

 So what I propose is the following:
 1) The core vbus design stays the same (or close to it)
   

 Sorry, I still don't see what advantage this has over PCI, and how you
 deal with the disadvantages.

I think you are confusing the vbus-proxy (guest side) with the vbus
backend.  (1) is saying keep the vbus backend' and (2) is saying drop
the guest side stuff.  In this proposal, the guest would speak a PCI ABI
as far as its concerned.  Devices in the vbus backend would render as
PCI objects in the ICH (or whatever) model in userspace.


 2) the vbus-proxy and kvm-guest patch go away
 3) the kvm-host patch changes to work with coordination from the
 userspace-pci emulation for things like MSI routing
 4) qemu will know to create some MSI shim 1:1 with whatever it
 instantiates on the bus (and can communicate changes
   

 Don't userstand.  What's this MSI shim?

Well, if the device model was an object in vbus down in the kernel, yet
PCI emulation was up in qemu, presumably we would want something to
handle things like PCI config-cycles up in userspace.  Like, for
instance, if the guest re-routes the MSI.  The shim/proxy would handle
the config-cycle, and then turn around and do an ioctl to the kernel to
configure the change with the in-kernel device model (or the irq
infrastructure, as required).

But, TBH, I haven't really looked into whats actually required to make
this work yet.  I am just spitballing to try to find a compromise.


 5) any drivers that are written for these new PCI-IDs that might be
 present are allowed to use a hypercall ABI to talk after they have been
 probed for that ID (e.g. they are not limited to PIO or MMIO BAR type
 access methods).
   

 The way we'd to it with virtio is to add a feature bit that say you
 can hypercall here instead of pio.  This way old drivers continue to
 work.

Yep, agreed.  This is what I was thinking we could do.  But now that I
have the possibility that I just need to write a virtio-vbus module to
co-exist with virtio-pci, perhaps it doesn't even need to be explicit.


 Note that nothing prevents us from trapping pio in the kernel (in
 fact, we do) and forwarding it to the device.  It shouldn't be any
 slower than hypercalls.
Sure, its just slightly slower, so I would prefer pure hypercalls if at
all possible.


 Once I get here, I might have greater clarity to see how hard it would
 make to emulate fast path components as well.  It might be easier than I
 think.

 This is all off the cuff so it might need some fine tuning before its
 actually workable.

 Does that sound reasonable?
   

 The 

Re: How to Use LibVirt to mange KVM virtual machine

2009-04-03 Thread Charles Duffy
You should ask about this on the libvirt mailing list and IRC channel, 
not here.


That said, a few quick points:

1. the libvirt you're running is very old.
2. you might consider setting the emulator to point to a shell script 
which records the command line it's called with to a file before doing 
an exec /usr/bin/kvm $@


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs

2009-04-03 Thread Cam Macdonell

Avi Kivity wrote:

Cam Macdonell wrote:
I think there is value for static memory sharing.   It can be used for 
fast, simple synchronization and communication between guests (and the 
host) that use need to share data that needs to be updated frequently 
(such as a simple cache or notification system).  It may not be a 
common task, but I think static sharing has its place and that's what 
this device is for at this point.


It would be good to detail a use case for reference.


I'll try my best...

We are using the (static) shared memory region for fast, interprocess 
communications (IPC).  Of course, shared-memory IPC is an old idea, and 
the IPC here is actually between VMs (i.e., ivshmem), not processes 
inside a single VM.  But, fast IPC is useful for shared caches, OS 
bypass (guest-to-guest, and host-to-guest), and low-latency IPC use-cases.


For example, one use of ivshmem is as as a file cache between VMs.  Note 
that, unlike stream-oriented IPC, this file cache can be shared between, 
say, four VMs simultaneously.  In using VMs as sandboxes for distributed 
computing (condor, cloud, etc.), if two (or more) VMs are co-located on 
the same server, they can effectively share a single, unified cache. 
Any VM can bring in the data, and other VMs can use it.  Otherwise, two 
VMs might transfer (over the WAN, in the worst case, as in a cloud) and 
buffer cache the same file in multiple VMs.  In some ways, the 
configuration would look like an in-memory cluster file system, but 
instead of shared disks, we have shared memory.


Alternative forms of file sharing between VMs (e.g., via SAMBA or NFS) 
are possible, but also results in multiple cached copies of the same 
file data on the same physical server.  Furthermore, ivshmem has the 
(usual, planned) latency (e.g., for file metadata stats) and bandwidth 
advantages between most forms of stream-oriented IPC for file sharing 
protocols.


Other (related) use cases include bulk-data movement between the host 
and guest VMs, due to the OS bypass properties of the  ivshmem.  Since 
static shared memory shares a file (or memory object) on the host, 
host-guest sharing is simpler than with dynamic shared memory.


We acknowledge that work has to be done with thread/process scheduling 
to truly gain low IPC latency; that is to come, possibly with
PCI interrupts.  And, as the VMware experience shows (see below), VM 
migration *is* affected by ivshmem, but we think a good (but 
non-trivial) attach-to-ivshmem and detach-from-ivshmem protocol (in the 
future) can mostly address that issue.


As an aside, VMware ditched shared memory as part of their VMCI 
interface.  We emailed with some of their people who suggested to use 
sockets since shared memory de-virtualizes the VM (i.e. it breaks 
migration).  But on their forums there were users that used shared 
memory for their work and were disappointed to see it go.  One person I 
emailed with used shared memory for simulations running across VMs. 
Using shared memory freed him from having to come up with a protocol to 
exchange updates and having a central VM responsible for receiving and 
broadcasting updates.  When he did try to use a socket-based approach, 
the performance dropped substantially due to the communication overhead.


Then you need a side channel to communicate the information to the 
guest.


Couldn't one of the registers in BAR0 be used to store the actual 
(non-power-of-two) size?


The PCI config space (where the BARs reside) is a good place for it.  
Registers 0x40+ are device specific IIRC.




Ok.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Masami Hiramatsu
Ingo Molnar wrote:
 * Masami Hiramatsu mhira...@redhat.com wrote:
 
 Hmm, I'd like to know actually kvm aims to emulate all kinds of 
 instructions. If so, I might find some bugs in x86_emulate.c. 
 However, I don't know all bugs. To find all of them, we have to 
 port x86_emulate.c to user-space, decode binaries with it, and 
 compare its output with another decoder, as Jim had done with 
 insn.c.

 https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html
 
 btw., i'd suggest we put a build time check for this into the kernel 
 version as well. For example to decode the vmlinux via objdump, run 
 it through your decoder as well and compare the results. Put under a 
 CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time 
 self-test.
 
 This would ensure that the kernel we are running is fully supported 
 by the decoder - even as GCC/GAS starts using new instructions, etc. 
 
 How does this sound to you?

Thanks! That is a good idea.
Jim, would you think you can port your script into kernel tree?

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI device assignment to Guest

2009-04-03 Thread Eric Liu


Anyone has experience to assign PCI-E based InfiniBand card to
guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU
support. Host OS has kernel 2.6.29.

Steps I used:

 $ echo -n 8086 10de  /sys/bus/pci/drivers/pci-stub/new_id
 $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind
 $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind

Then
I started guest with -pcidevice host=id. After guest is started, it
successfully detected pci device with lspci command, however kernel
can't bring up the device. dmesg shows infiniband kernel module can't
detect infiniband card firmware properly then aborted. I think it is
KVM issue rather than infiniband kernel module issue. Can anyone
suggest?

Thanks
Eric
_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Chris Wright
* Gregory Haskins (ghask...@novell.com) wrote:
 Let me ask you this:  If you had a clean slate and were designing a
 hypervisor and a guest OS from scratch:  What would you make the bus
 look like?

Well, virtio did have a relatively clean slate. And PCI (as _one_
transport option) is what it looks like.  It's not the only transport
(as Avi already mentioned it works for s390, for example).

BTW, from my brief look at vbus, it seems pretty similar to xenbus.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI device assignment to Guest

2009-04-03 Thread Chris Wright
* Eric Liu (ericliu2...@hotmail.com) wrote:
 Anyone has experience to assign PCI-E based InfiniBand card to
 guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU
 support. Host OS has kernel 2.6.29.
 
 Steps I used:
 
  $ echo -n 8086 10de  /sys/bus/pci/drivers/pci-stub/new_id
  $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind
  $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind

The steps above are specific to an e1000e device.

 Then
 I started guest with -pcidevice host=id. After guest is started, it
 successfully detected pci device with lspci command, however kernel
 can't bring up the device. dmesg shows infiniband kernel module can't
 detect infiniband card firmware properly then aborted. I think it is
 KVM issue rather than infiniband kernel module issue. Can anyone
 suggest?

Sounds like you may have two drivers for this device.

Can you include (on the host) lspci -tv and lspci -vvv?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 4/6 V4.1] x86: kprobes checks safeness of insertion address.

2009-04-03 Thread Jim Keniston
On Fri, 2009-04-03 at 12:02 -0400, Masami Hiramatsu wrote:
 Ensure safeness of inserting kprobes by checking whether the specified
 address is at the first byte of a instruction. This is done by decoding
 probed function from its head to the probe point.
 
 changes from v4:
  - change a comment according to Ananth's suggestion.
 
 Signed-off-by: Masami Hiramatsu mhira...@redhat.com
 Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
 Cc: Jim Keniston jkeni...@us.ibm.com
 Cc: Ingo Molnar mi...@elte.hu
 ---
 
  arch/x86/kernel/kprobes.c |   51 
 +
  1 files changed, 51 insertions(+), 0 deletions(-)
 
 
 diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
...
 
 +/* Recover original instruction */

/* Recover the probed instruction at addr for further analysis. */
See below.

 +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long 
 addr)
 +{
 + struct kprobe *kp;
 + kp = get_kprobe((void *)addr);
 + if (!kp)
 + return -EINVAL;
 +
 + /* Don't use p-ainsn.insn; which will be modified by fix_riprel */

fix_riprel doesn't affect the instruction's length, which is what
concerns this patch.  But we want this function to be useful for
unforeseen uses as well, so I like the code you have.  Just consider the
suggested comment changes.

/*
 * Don't use p-ainsn.insn, which could be modified -- e.g.,
 * by fix_riprel().
 */

 + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
 + buf[0] = kp-opcode;
 + return 0;
 +}

Jim Keniston

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PCI device assignment to Guest

2009-04-03 Thread Eric Liu



Here are the exact steps I used:

1. lspci -n on host:
06:00.0 0c06: 15b3:634a (rev a0)


I want to assign this device to guest.

2. Uninstall driver for this device.

3. Unbind device with the following commands:

echo 15b3 634a /sys/bus/pci/drivers/pci-stub/new_id
echo :06:00.0 /sys/bus/pci/devices/:06:00.0/driver/unbind
echo :06:00.0 /sys/bus/pci/drivers/pci-stub/bind
4. start guest with... -pcidevice host=06:00.0
5. Guest os detects device with lspci command but failed to start.

lspci -tv on host:(last device here is what i want to assign)

-[:00]-+-01.0-[:03-04]--+-0d.0-[:04]--
   |\-0e.0  Broadcom BCM5785 [HT1000] SATA 
(PATA/IDE Mode)
   +-02.0  Broadcom BCM5785 [HT1000] Legacy South Bridge
   +-02.1  Broadcom BCM5785 [HT1000] IDE
   +-02.2  Broadcom BCM5785 [HT1000] LPC
   +-03.0  Broadcom BCM5785 [HT1000] USB
   +-03.1  Broadcom BCM5785 [HT1000] USB
   +-03.2  Broadcom BCM5785 [HT1000] USB
   +-04.0  ATI Technologies Inc ES1000
   +-07.0-[:05]--
   +-08.0-[:01]00.0  Broadcom Corporation NetXtreme BCM5721 
Gigabit Ethernet PCI Express
   +-09.0-[:02]00.0  Broadcom Corporation NetXtreme BCM5721 
Gigabit Ethernet PCI Express
   +-0a.0-[:06]00.0  Mellanox Technologies MT25418 [ConnectX IB 
DDR]



lspci -vvv: (only related portion)

06:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0)
Subsystem: Mellanox Technologies Unknown device 0007
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fastTAbort- SERR-  
Date: Fri, 3 Apr 2009 10:13:22 -0700
 From: chr...@sous-sol.org
 To: ericliu2...@hotmail.com
 CC: kvm@vger.kernel.org
 Subject: Re: PCI device assignment to Guest

 * Eric Liu (ericliu2...@hotmail.com) wrote:
 Anyone has experience to assign PCI-E based InfiniBand card to
 guest OS(RHEL5U2 with kernel 2.6.18-92) on latest AMD with IOMMU
 support. Host OS has kernel 2.6.29.

 Steps I used:

 $ echo -n 8086 10de /sys/bus/pci/drivers/pci-stub/new_id
 $ echo -n :00:19.0 /sys/bus/pci/drivers/e1000e/unbind
 $ echo -n :00:19.0 /sys/bus/pci/drivers/pci-stub/bind

 The steps above are specific to an e1000e device.

 Then
 I started guest with -pcidevice host=id. After guest is started, it
 successfully detected pci device with lspci command, however kernel
 can't bring up the device. dmesg shows infiniband kernel module can't
 detect infiniband card firmware properly then aborted. I think it is
 KVM issue rather than infiniband kernel module issue. Can anyone
 suggest?

 Sounds like you may have two drivers for this device.

 Can you include (on the host) lspci -tv and lspci -vvv?

 thanks,
 -chris

_
Windows Live™: Keep your life in sync.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_042009--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-03 Thread Jim Keniston
On Fri, 2009-04-03 at 12:55 -0400, Masami Hiramatsu wrote:
 Ingo Molnar wrote:
  * Masami Hiramatsu mhira...@redhat.com wrote:
  
  Hmm, I'd like to know actually kvm aims to emulate all kinds of 
  instructions. If so, I might find some bugs in x86_emulate.c. 
  However, I don't know all bugs. To find all of them, we have to 
  port x86_emulate.c to user-space, decode binaries with it, and 
  compare its output with another decoder, as Jim had done with 
  insn.c.
 
  https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html
  
  btw., i'd suggest we put a build time check for this into the kernel 
  version as well. For example to decode the vmlinux via objdump, run 
  it through your decoder as well and compare the results. Put under a 
  CONFIG_DEBUG_X86_DECODER_TEST kind of (deault-off) build-time 
  self-test.
  
  This would ensure that the kernel we are running is fully supported 
  by the decoder - even as GCC/GAS starts using new instructions, etc. 
  
  How does this sound to you?
 
 Thanks! That is a good idea.
 Jim, would you think you can port your script into kernel tree?
...

I'd be happy to do what's needed to make it happen, and maintain it in
the face of x86 changes.  The script itself is practically nothing (~100
lines of awk and C), but what I don't know about the kernel build is a
lot, so I'd need some help from a kernel-build expert.

Jim

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 I'll rephrase.  What are the substantial benefits that this offers
 over PCI?
 

 Simplicity and optimization.  You don't need most of the junk that comes
 with PCI.  Its all overhead and artificial constraints.  You really only
 need things like a handful of hypercall verbs and thats it.

   

 Simplicity:

 The guest already supports PCI.  It has to, since it was written to
 the PC platform, and since today it is fashionable to run kernels that
 support both bare metal and a hypervisor.  So you can't remove PCI
 from the guest.

Agreed

 The host also already supports PCI.  It has to, since it must supports
 guests which do not support vbus.  We can't remove PCI from the host.

Agreed

 You don't gain simplicity by adding things.

But you are failing to account for the fact that we still have to add
something for PCI if we go with something like the in-kernel model.  Its
nice for the userspace side because a) it was already in qemu, and b) we
need it for proper guest support.  But we don't presumably have it for
this new thing, so something has to be created (unless this support is
somehow already there and I don't know it?)

   Sure, lguest is simple because it doesn't support PCI.  But Linux
 will forever support PCI, and Qemu will always support PCI.  You
 aren't simplifying anything by adding vbus.

 Optimization:

 Most of PCI (in our context) deals with configuration.  So removing it
 doesn't optimize anything, unless you're counting hotplugs-per-second
 or something.

Most, but not all ;)  (Sorry, you left the window open on that one).

What about IRQ routing?  What if I want to coalesce interrupts to
minimize injection overhead?  How do I do that in PCI?

How do I route those interrupts in an arbitrarily nested fashion, say,
to a guest userspace?

What about scale?  What if Herbet decides to implement a 2048 ring MQ
device ;)  Theres no great way to do that in x86 with PCI, yet I can do
it in vbus.  (And yes, I know, this is ridiculous..just wanting to get
you thinking)



 Second of all, I want to use vbus for other things that do not
 speak PCI
 natively (like userspace for instance...and if I am gleaning this
 correctly, lguest doesnt either).
 
 And virtio supports lguest and s390.  virtio is not PCI specific.
 
 I understand that.  We keep getting wrapped around the axle on this
 one.   At some point in the discussion we were talking about supporting
 the existing guest ABI without changing the guest at all.  So while I
 totally understand the virtio can work over various transports, I am
 referring to what would be needed to have existing ABI guests work with
 an in-kernel version.  This may or may not be an actual requirement.
   

 There is be no problem supporting an in-kernel host virtio endpoint
 with the existing guest/host ABI.  Nothing in the ABI assumes the host
 endpoint is in userspace.  Nothing in the implementation requires us
 to move any of the PCI stuff into the kernel.
Well, thats not really true.  If the device is a PCI device, there is
*some* stuff that has to go into the kernel.  Not an ICH model or
anything, but at least an ability to interact with userspace for
config-space changes, etc.


 In fact, we already have in-kernel sources of PCI interrupts, these
 are assigned PCI devices (obviously, these have to use PCI).

This will help.


 However, for the PC platform, PCI has distinct advantages.  What
 advantages does vbus have for the PC platform?
 
 To reiterate: IMO simplicity and optimization.  Its designed
 specifically for PV use, which is software to software.
   

 To avoid reiterating, please be specific about these advantages.
We are both reading the same thread, right?


  
 PCI sounds good at first, but I believe its a false economy.  It was
 designed, of course, to be a hardware solution, so it carries all this
 baggage derived from hardware constraints that simply do not exist
 in a
 pure software world and that have to be emulated.  Things like the
 fixed
 length and centrally managed PCI-IDs,   
 Not a problem in practice.
 

 Perhaps, but its just one more constraint that isn't actually needed.
 Its like the cvs vs git debate.  Why have it centrally managed when you
 don't technically need it.  Sure, centrally managed works, but I'd
 rather not deal with it if there was a better option.
   

 We've allocated 3 PCI device IDs so far.  It's not a problem.  There
 are enough real problems out there.

  
 PIO config cycles, BARs,
 pci-irq-routing, etc.
 What are the problems with these?
 

 1) PIOs are still less efficient to decode than a hypercall vector.  We
 dont need to pretend we are hardware..the guest already knows whats
 underneath them.  Use the most efficient call method.
   

 Last time we measured, hypercall overhead was the same as pio
 overhead.  Both vmx and svm decode pio completely (except for string
 pio ...)
Not on my woodcrests last time I looked, but 

Re: [RFC PATCH 00/17] virtual-bus

2009-04-03 Thread Gregory Haskins
Chris Wright wrote:
 * Gregory Haskins (ghask...@novell.com) wrote:
   
 Let me ask you this:  If you had a clean slate and were designing a
 hypervisor and a guest OS from scratch:  What would you make the bus
 look like?
 

 Well, virtio did have a relatively clean slate. And PCI (as _one_
 transport option) is what it looks like.  It's not the only transport
 (as Avi already mentioned it works for s390, for example).
   

Got it.  Thanks.

 BTW, from my brief look at vbus, it seems pretty similar to xenbus.
   

If you are referring to the guest side interface, it was actually
inspired by lguest's bus (I forget what Rusty called it now, though).  
I think I actually declared that in the original patch series I put out
1.5 years ago, but I might have inadvertently omitted that on this
go-round.

I think XenBus is more of an event channel infrastructure, isn't it? 
But in any case, I think the nature of getting PV drivers into a guest
is relatively similar, so I wouldn't be surprised if there were
parallels in quite a few of the implementations.  In fact, I chose a
generic name like vbus in hopes that it could be used across different
hypervisors. :)

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCH -tip 4/6 V4.1] x86: kprobes checks safeness of insertion address.

2009-04-03 Thread Masami Hiramatsu
Jim Keniston wrote:
 On Fri, 2009-04-03 at 12:02 -0400, Masami Hiramatsu wrote:
 Ensure safeness of inserting kprobes by checking whether the specified
 address is at the first byte of a instruction. This is done by decoding
 probed function from its head to the probe point.

 changes from v4:
  - change a comment according to Ananth's suggestion.

 Signed-off-by: Masami Hiramatsu mhira...@redhat.com
 Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
 Cc: Jim Keniston jkeni...@us.ibm.com
 Cc: Ingo Molnar mi...@elte.hu
 ---

  arch/x86/kernel/kprobes.c |   51 
 +
  1 files changed, 51 insertions(+), 0 deletions(-)


 diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
 ...
 +/* Recover original instruction */
 
 /* Recover the probed instruction at addr for further analysis. */
 See below.

Sure.

 
 +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long 
 addr)
 +{
 +struct kprobe *kp;
 +kp = get_kprobe((void *)addr);
 +if (!kp)
 +return -EINVAL;
 +
 +/* Don't use p-ainsn.insn; which will be modified by fix_riprel */
 
 fix_riprel doesn't affect the instruction's length, which is what
 concerns this patch.  But we want this function to be useful for
 unforeseen uses as well, so I like the code you have.  Just consider the
 suggested comment changes.
 
   /*
* Don't use p-ainsn.insn, which could be modified -- e.g.,
* by fix_riprel().
*/

Thanks, I'll update comments then!

 
 +memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
 +buf[0] = kp-opcode;
 +return 0;
 +}
 
 Jim Keniston
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passtthrought intel 82574L can't boot from disk

2009-04-03 Thread Hauke Hoffmann
On Thursday 02 April 2009 18:58:26 you wrote:
 It is my understanding that you need vt-d/iommu support. I didn't think any
 existing amd chipsets had iommu support. You may want to look into that.

Hi Brian,

thanks for you response. 

I found a tool [1] from Intel to disable the Boot ROM on the nic. Thats 
resolves the boot problem.

Regards
Hauke

[1] 
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=YProductID=412DwnldID=8242


 --Brian Jackson

 On Thursday 02 April 2009 07:00:07 Hauke Hoffmann wrote:
  Hi,
 
  qemu-system-x86_64 runs well and i can boot and run the guest system.
  Thats works very well.
 
  Command:
  /usr/local/kvm/bin/qemu-system-x86_64 -m
  512 -hda /var/VM/roadrunner.local/hda.qcow2 -smp 1 -vnc
  192.168.2.30: -net nic,macaddr=DE:AD:BE:EF:90:26 -net
  tap,ifname=tap0,script=no,downscript=no -boot c
 
  Then i tried to add an intel 82574L network adapter to the guest.
  Just the same command with addtionally -pcidevice host=07:00.0
 
  Then i connected via VNC and see BIOS startpage and the following lines:
  Initializing Intel(r) boot agent ge v1.3.21
  pxe 2.1 build 086 (WfM 2.0)
  Press f12 for moot menu
 
  You can see a screenshot at http://nxt7.de/download/qemu.png
 
  The guests keep on this point and nothing changes. (I have wait hours.)
 
  I tried to press F12 in ThightVNC but no action.
  I must say that ThightVNC has problems with special chars (in my case).
 
  At this point, i need your help.
 
 
  Here are some details of my system
 
  Kernel: 2.6.29 form kernel.org (self compiled)
  kvm userspace: kvm-84 (self compiled)
  OS: Ubuntu 8.04.2 server
 
  r...@ls:~# lspci
  00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
  00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
  00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
  00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
  00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
  00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
  00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
  00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
  00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
  00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
  00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
  00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
  00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
  00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
  HyperTransport Technology Configuration
  00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
  Address Map
  00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
  DRAM Controller
  00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
  Miscellaneous Control
  01:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX [Linksys
  EtherFast 10/100] (rev 25)
  01:0a.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics
  Innovation) Volari Z7
  06:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363
  AHCI Controller (rev 03)
  06:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363
  AHCI Controller (rev 03)
  07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
  Connection
 
 
  r...@ls:~# lspci -tvvv
  -[:00]-+-00.0  nVidia Corporation MCP55 Memory Controller
 +-01.0  nVidia Corporation MCP55 LPC Bridge
 +-01.1  nVidia Corporation MCP55 SMBus
 +-02.0  nVidia Corporation MCP55 USB Controller
 +-02.1  nVidia Corporation MCP55 USB Controller
 +-04.0  nVidia Corporation MCP55 IDE
 +-05.0  nVidia Corporation MCP55 SATA Controller
 +-05.1  nVidia Corporation MCP55 SATA Controller
 +-05.2  nVidia Corporation MCP55 SATA Controller
 +-06.0-[:01]--+-09.0  Lite-On Communications Inc LNE100TX
  [Linksys EtherFast 10/100]
 
 | \-0a.0  XGI Technology Inc. (eXtreme
 | Graphics
 
  Innovation) Volari Z7
 +-08.0  nVidia Corporation MCP55 Ethernet
 +-09.0  nVidia Corporation MCP55 Ethernet
 +-0a.0-[:02]--
 +-0b.0-[:03]--
 +-0c.0-[:04]--
 +-0d.0-[:05]--
 +-0e.0-[:06]--+-00.0  JMicron Technologies, Inc. JMicron
  20360/20363 AHCI Controller
 
 | \-00.1  JMicron Technologies, Inc. 

Re: [kvm] [PATCH 06/16] Support for device capability

2009-04-03 Thread Alex Williamson
On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote:
 This framework can be easily extended to support device capability, like
 MSI/MSI-x.

Sheng,

Are you already looking at adding support for PM and EXP capabilities?
The bnx2 driver is an example that won't claim the device if these
capabilities aren't present.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM cpuTime discrepancy

2009-04-03 Thread Zvi Dubitzky
The cpuTime of a VM   reported by  kvm72  is ok   (real seconds ) while 
that reported by kvm-84 is not
Are you aware of this .  Was it fixed  in latest kvm releases since 84 ?

I access the cpuTime  via libvirt . (same version in both cases) .

thanks

Zvi Dubitzky 
Virtualization and System Architecture   Email:d...@il.ibm.com
IBM Haifa Research LaboratoryPhone: +972-4-8296182
Haifa, 31905, ISRAEL 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] PCI pass-through fixups

2009-04-03 Thread Alex Williamson

I'm wondering if we need a spot for device specific fixups for PCI
pass-through.  In the example below, I want to expose a single port of
an Intel 82571EB quad port copper NIC to a guest.  It works great until
I shutdown the guest, at which point the guest e1000e driver knows by
the device ID that the NIC is a quad port, and blindly attempts to
twiddle some bits on the bridge above it (that doesn't exist).
Obviously some robustness could be added to the driver, but would it
make sense to do something like below and automatically remap these
devices to identical single port device IDs?  Thanks,

Alex

Signed-off-by: Alex Williamson alex.william...@hp.com
--

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index b7f9fa6..1c6d1e8 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -427,6 +427,35 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 return 0;
 }
 
+static void assigned_device_fixup(AssignedDevice *pci_dev)
+{
+uint16_t vendor_id, device_id;
+
+vendor_id = pci_dev-dev.config[0] | pci_dev-dev.config[1]  8;
+device_id = pci_dev-dev.config[2] | pci_dev-dev.config[3]  8;
+
+switch (vendor_id) {
+case 0x8086:
+switch (device_id) {
+case 0x10A4:
+case 0x10BC:
+/* quad port copper - single port copper */
+pci_dev-dev.config[2] = 0x5E;
+break;
+case 0x10A5:
+/* quad port fiber - single port fiber */
+pci_dev-dev.config[2] = 0x5F;
+break;
+case 0x10DA:
+case 0x10D9:
+/* dual/quad port serdes - single port serdes */
+pci_dev-dev.config[2] = 0x60;
+break;
+}
+break;
+}
+}
+
 static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
uint8_t r_dev, uint8_t r_func)
 {
@@ -524,6 +553,8 @@ again:
 }
 fclose(f);
 
+assigned_device_fixup(pci_dev);
+
 /* dealing with virtual function device */
 snprintf(name, sizeof(name), %sphysfn/, dir);
 if (!stat(name, statbuf))


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace

2009-04-03 Thread Roland McGrath
 +static struct pt_regs_offset regoffset_table[] = {

 ^ const
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-autotest: weird memory error during stepmaker test

2009-04-03 Thread Eduardo Habkost
Excerpts from Ryan Harper's message of Qua Abr 01 12:55:58 -0300 2009:
 Wondering if anyone else using kvm-autotest stepmaker has ever seen this
 error:
 
 Traceback (most recent call last):
   File
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker.
 py, line 146, in update
 self.set_image_from_file(self.screendump_filename)
   File
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
 .py, line 499, in set_image_from_file
 self.set_image(w, h, data)
   File
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
 .py, line 485, in set_image
 w, h, w*3))
 MemoryError

I've seen this error twice today, while trying to create a step file to
install a Windows 2008 R2 64-bit guest (the Win2008-64 step file
available on the git repository doesn't work for me). This happened when
the guest was being rebooted by the windows installer. The contents of
the screen dump file are this:

$ cat 
/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm
P6
0 0
255
$ 

And the 0x0 pixmap really makes gdk panic:

 (w, h, data) =  
 ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm')
 w,h,data
(0, 0, '')
 gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False, 8,  w, h, 
 w*3)
Traceback (most recent call last):
  File stdin, line 1, in ?
MemoryError
 



 
 The guest is still running, but stepmaker isn't recording any more so it's
 boned at that point.  And of course, it's near the end of a guest install so
 one has lost a decent amount of time...
 
-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip 4/6 V4.2] x86: kprobes checks safeness of insertion address.

2009-04-03 Thread Masami Hiramatsu
x86: kprobes checks safeness of insertion address.

From: Masami Hiramatsu mhira...@redhat.com

Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction. This is done by decoding
probed function from its head to the probe point.

changes from v4.1:
 - update comments according to Jim's suggestion.
 - s/lookup_symbol_attrs/kallsyms_lookup/ in a comment.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   54 +
 1 files changed, 54 insertions(+), 0 deletions(-)


diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 7b5169d..3d5e85f 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,12 +48,14 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h

 #include asm/cacheflush.h
 #include asm/desc.h
 #include asm/pgtable.h
 #include asm/uaccess.h
 #include asm/alternative.h
+#include asm/insn.h

 void jprobe_return_end(void);

@@ -244,6 +246,56 @@ retry:
}
 }

+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+* Don't use p-ainsn.insn, which could be modified -- e.g.,
+* by fix_riprel().
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   /* Lookup symbol including addr */
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   insn_init_kernel(insn, (void *)addr);
+   insn_get_opcode(insn);
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   return 0;
+   insn_init_kernel(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -359,6 +411,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)

 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm] [PATCH 13/16] kvm: enable MSI-X capabilty for assigned device

2009-04-03 Thread Alex Williamson
On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote:
 +if (*ctrl_word  PCI_MSIX_ENABLE) {
 +if (assigned_dev_update_msix_mmio(pci_dev)  0) {
 +perror(assigned_dev_update_msix_mmio);
 +return;
 +}
 +if (kvm_assign_irq(kvm_context, assigned_irq_data)  0) {
 +perror(assigned_dev_enable_msix: assign irq);
 +return;
 +}
 +assigned_dev-irq_requested_type = assigned_irq_data.flags;
 +}
 +}

Do we need some disable logic here?  If I toggle a bnx2 NIC in a guest,
I get the following when it attempts to come back up:

MSI-X entry number is zero!
assigned_dev_update_msix_mmio: No such device or address

Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-autotest: weird memory error during stepmaker test

2009-04-03 Thread Michael Goldish

- Eduardo Habkost ehabk...@raisama.net wrote:

 Excerpts from Ryan Harper's message of Qua Abr 01 12:55:58 -0300
 2009:
  Wondering if anyone else using kvm-autotest stepmaker has ever seen
 this
  error:
  
  Traceback (most recent call last):
File
 
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker.
  py, line 146, in update
  self.set_image_from_file(self.screendump_filename)
File
 
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
  .py, line 499, in set_image_from_file
  self.set_image(w, h, data)
File
 
 /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
  .py, line 485, in set_image
  w, h, w*3))
  MemoryError
 
 I've seen this error twice today, while trying to create a step file
 to
 install a Windows 2008 R2 64-bit guest (the Win2008-64 step file
 available on the git repository doesn't work for me). This happened
 when
 the guest was being rebooted by the windows installer. The contents
 of
 the screen dump file are this:
 
 $ cat
 /home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm
 P6
 0 0
 255
 $ 
 
 And the 0x0 pixmap really makes gdk panic:
 
  (w, h, data) = 
 ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm')
  w,h,data
 (0, 0, '')
  gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False,
 8,  w, h, w*3)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 MemoryError
  

This is very useful information. I've seen qemu/kvm produce 0x0 screendumps 
before,
but it's never happened to me while working with stepmaker.

A reasonable solution would be to make sure a screendump is OK before feeding it
to gdk. I'll try to commit this ASAP so it doesn't bother people any more.

Thanks,
Michael
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 2/6 V4.1] x86: add arch-dep register and stack access API to ptrace

2009-04-03 Thread Masami Hiramatsu
Roland McGrath wrote:
 +static struct pt_regs_offset regoffset_table[] = {
 
  ^ const

Oops, exactly.
Perhaps, I need to update insn.c to make bitmap tables
static const too.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-autotest: stepeditor: clear image if width, height, or data are invalid

2009-04-03 Thread Eduardo Habkost

This patch fixes the following issue:

Excerpts from Eduardo Habkost's message of Fri Apr 03 17:37:56 -0300 2009:
 Excerpts from Ryan Harper's message of Wed Apr 01 12:55:58 -0300 2009:
  Wondering if anyone else using kvm-autotest stepmaker has ever seen this
  error:
  
  Traceback (most recent call last):
File
  /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepmaker.
  py, line 146, in update
  self.set_image_from_file(self.screendump_filename)
File
  /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
  .py, line 499, in set_image_from_file
  self.set_image(w, h, data)
File
  /home/rharper/work/git/build/kvm-autotest/client/tests/kvm_runtest_2/stepeditor
  .py, line 485, in set_image
  w, h, w*3))
  MemoryError
 
 I've seen this error twice today, while trying to create a step file to
 install a Windows 2008 R2 64-bit guest (the Win2008-64 step file
 available on the git repository doesn't work for me). This happened when
 the guest was being rebooted by the windows installer. The contents of
 the screen dump file are this:
 
 $ cat
 /home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win200
 8.64.install/debug/scrdump.ppm
 P6
 0 0
 255
 $ 
 
 And the 0x0 pixmap really makes gdk panic:
 
  (w, h, data) =  
  ppm_utils.image_read_from_ppm_file('/home/ehabkost/autotest/kvm-autotest/client/results/default/kvm_runtest_2.Win2008.64.install/debug/scrdump.ppm')
  w,h,data
 (0, 0, '')
  gtk.gdk.pixbuf_new_from_data(data, gtk.gdk.COLORSPACE_RGB, False, 8,  w, 
  h, w*3)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 MemoryError
  


Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 client/tests/kvm_runtest_2/stepeditor.py |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/client/tests/kvm_runtest_2/stepeditor.py 
b/client/tests/kvm_runtest_2/stepeditor.py
index caaf47b..383834b 100755
--- a/client/tests/kvm_runtest_2/stepeditor.py
+++ b/client/tests/kvm_runtest_2/stepeditor.py
@@ -488,14 +488,18 @@ Utilities
 vscrollbar = self.scrolledwindow.get_vscrollbar()
 vscrollbar.set_range(0, h)
 
+def clear_image(self):
+self.image.clear()
+self.image_width = 0
+self.image_height = 0
+self.image_data = 
+
 def set_image_from_file(self, filename):
 if not filename or not os.path.exists(filename):
-self.image.clear()
-self.image_width = 0
-self.image_height = 0
-self.image_data = 
-return
+return self.clear_image()
 (w, h, data) = ppm_utils.image_read_from_ppm_file(filename)
+if w = 0 or h = 0 or not data:
+return self.clear_image()
 self.set_image(w, h, data)
 
 def get_step_lines(self, output_dir=None, current_step=None):

-- 
1.5.5.6
-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-03 Thread Marcelo Tosatti
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
 index 2ea8262..48169d7 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, 
 struct kvm_run *kvm_run)
  kvm_write_guest_time(vcpu);
  if (test_and_clear_bit(KVM_REQ_MMU_SYNC, vcpu-requests))
  kvm_mmu_sync_roots(vcpu);
 +if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
 vcpu-requests))
 +kvm_mmu_sync_global(vcpu);
  if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, vcpu-requests))
  kvm_x86_ops-tlb_flush(vcpu);
  if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS

 Windows will (I think) write a PDE on every context switch, so this  
 effectively disables global unsync for that guest.

 What about recursively syncing the newly linked page in FNAME(fetch)()?  
 If the page isn't global, this becomes a no-op, so no new overhead.  The  
 only question is the expense when linking a populated top-level page,  
 especially in long mode.

How about this?

KVM: MMU: sync global pages on fetch()

If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.

So sync global pages in fetch().

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 09782a9..728be72 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
break;
}
 
-   if (is_shadow_present_pte(*sptep)  !is_large_pte(*sptep))
+   if (is_shadow_present_pte(*sptep)  !is_large_pte(*sptep)) {
+   if (level-1 == PT_PAGE_TABLE_LEVEL) {
+   shadow_page = page_header(__pa(sptep));
+   if (shadow_page-unsync  shadow_page-global)
+   kvm_sync_page(vcpu, shadow_page);
+   }
continue;
+   }
 
if (is_large_pte(*sptep)) {
rmap_remove(vcpu-kvm, sptep);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip 2/6 V4.2] x86: add arch-dep register and stack access API to ptrace

2009-04-03 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.
- query_register_offset(const char *name)
   Query the offset of name register.

- query_register_name(unsigned offset)
   Query the name of register by its offset.

- get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- valid_stack_address(struct pt_regs *regs, unsigned long addr)
   Check the address is in the stack.

- get_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the stack. (N = 0)

- get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)

changes from v4.1:
 - make regoffset_table constant.
 - remove needless local variable initialization in query_register_*.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
---

 arch/x86/include/asm/ptrace.h |   66 +
 arch/x86/kernel/ptrace.c  |   60 +
 2 files changed, 126 insertions(+), 0 deletions(-)


diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index aed0894..51e5844 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@

 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif

 #ifndef __ASSEMBLY__
@@ -215,6 +216,71 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }

+/* Query offset/name of register from its name/offset */
+extern int query_register_offset(const char *name);
+extern const char *query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/* Get register value from its offset */
+static inline unsigned long get_register(struct pt_regs *regs, unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/* Check the address in the stack */
+static inline int valid_stack_address(struct pt_regs *regs, unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_trap_sp(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/* Get Nth entry of the stack */
+static inline unsigned long get_stack_nth(struct pt_regs *regs, unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_trap_sp(regs);
+   addr += n;
+   if (valid_stack_address(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+static inline unsigned long get_argument_nth(struct pt_regs *regs, unsigned n)
+{
+#ifdef CONFIG_X86_32
+#define NR_REGPARMS 3
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-ax;
+   case 1: return regs-dx;
+   case 2: return regs-cx;
+   }
+   return 0;
+#else /* CONFIG_X86_64 */
+#define NR_REGPARMS 6
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0: return regs-di;
+   case 1: return regs-si;
+   case 2: return regs-dx;
+   case 3: return regs-cx;
+   case 4: return regs-r8;
+   case 5: return regs-r9;
+   }
+   return 0;
+#endif
+   } else {
+   /*
+* The typical case: arg n is on the stack.
+* (Note: stack[0] = return address, so skip it)
+*/
+   return get_stack_nth(regs, 1 + n - NR_REGPARMS);
+   }
+}
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index fe9345c..8a8af27 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -46,6 +46,66 @@ enum x86_regset {
REGSET_IOPERM32,
 };

+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET(r) offsetof(struct pt_regs, r)
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = REG_OFFSET(r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static const struct pt_regs_offset regoffset_table[] = {
+#ifdef CONFIG_X86_64
+   REG_OFFSET_NAME(r15),
+   REG_OFFSET_NAME(r14),
+   REG_OFFSET_NAME(r13),
+   REG_OFFSET_NAME(r12),
+   REG_OFFSET_NAME(r11),
+   REG_OFFSET_NAME(r10),
+   REG_OFFSET_NAME(r9),
+   REG_OFFSET_NAME(r8),
+#endif
+   REG_OFFSET_NAME(bx),
+   REG_OFFSET_NAME(cx),
+   REG_OFFSET_NAME(dx),
+   REG_OFFSET_NAME(si),
+   REG_OFFSET_NAME(di),
+   REG_OFFSET_NAME(bp),
+   REG_OFFSET_NAME(ax),
+#ifdef CONFIG_X86_32
+   REG_OFFSET_NAME(ds),
+   REG_OFFSET_NAME(es),
+   REG_OFFSET_NAME(fs),
+   REG_OFFSET_NAME(gs),
+#endif
+   

[PATCH -tip 3/6 V4.1] x86: instruction decorder API

2009-04-03 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode all x86 instructions into prefix, opcode, modrm, sib,
displacement and immediates. This can also show the length of
instructions.

changes from v4:
 - make bitmap tables static.

Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
---

 arch/x86/include/asm/insn.h |  130 +
 arch/x86/lib/Makefile   |1
 arch/x86/lib/insn.c |  627 +++
 3 files changed, 758 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/insn.c


diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
new file mode 100644
index 000..488001f
--- /dev/null
+++ b/arch/x86/include/asm/insn.h
@@ -0,0 +1,130 @@
+#ifndef _ASM_X86_INSN_H
+#define _ASM_X86_INSN_H
+/*
+ * x86 instruction analysis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+#include linux/types.h
+
+/* legacy instruction prefixes */
+#define X86_PFX_OPNDSZ 0x1 /* 0x66 */
+#define X86_PFX_ADDRSZ 0x2 /* 0x67 */
+#define X86_PFX_CS 0x4 /* 0x2E */
+#define X86_PFX_DS 0x8 /* 0x3E */
+#define X86_PFX_ES 0x10/* 0x26 */
+#define X86_PFX_FS 0x20/* 0x64 */
+#define X86_PFX_GS 0x40/* 0x65 */
+#define X86_PFX_SS 0x80/* 0x36 */
+#define X86_PFX_LOCK   0x100   /* 0xF0 */
+#define X86_PFX_REPE   0x200   /* 0xF3 */
+#define X86_PFX_REPNE  0x400   /* 0xF2 */
+/* REX prefix */
+#define X86_PFX_REX0x800   /* 0x4X */
+/* REX prefix dissected */
+#define X86_PFX_REX_BASE 0x1000
+#define X86_PFX_REXB   0x1000  /* 0x41 bit */
+#define X86_PFX_REXX   0x2000  /* 0x42 bit */
+#define X86_PFX_REXR   0x4000  /* 0x44 bit */
+#define X86_PFX_REXW   0x8000  /* 0x48 bit */
+
+struct insn_field {
+   union {
+   s32 value;
+   u8 bytes[4];
+   };
+   bool got;   /* true if we've run insn_get_xxx() for this field */
+   u8 nbytes;
+};
+
+struct insn {
+   struct insn_field prefixes; /* prefixes.value is a bitmap */
+   struct insn_field opcode;   /*
+* opcode.bytes[0]: opcode1
+* opcode.bytes[1]: opcode2
+* opcode.bytes[2]: opcode3
+*/
+   struct insn_field modrm;
+   struct insn_field sib;
+   struct insn_field displacement;
+   union {
+   struct insn_field immediate;
+   struct insn_field moffset1; /* for 64bit MOV */
+   struct insn_field immediate1;   /* for 64bit imm or off16/32 */
+   };
+   union {
+   struct insn_field moffset2; /* for 64bit MOV */
+   struct insn_field immediate2;   /* for 64bit imm or seg16 */
+   };
+
+   u8 opnd_bytes;
+   u8 addr_bytes;
+   u8 length;
+   bool x86_64;
+
+   const u8 *kaddr;/* kernel address of insn (copy) to analyze */
+   const u8 *next_byte;
+};
+
+#define OPCODE1(insn) ((insn)-opcode.bytes[0])
+#define OPCODE2(insn) ((insn)-opcode.bytes[1])
+#define OPCODE3(insn) ((insn)-opcode.bytes[2])
+
+#define MODRM_MOD(insn) (((insn)-modrm.value  0xc0)  6)
+#define MODRM_REG(insn) (((insn)-modrm.value  0x38)  3)
+#define MODRM_RM(insn) ((insn)-modrm.value  0x07)
+
+#define SIB_SCALE(insn) (((insn)-sib.value  0xc0)  6)
+#define SIB_INDEX(insn) (((insn)-sib.value  0x38)  3)
+#define SIB_BASE(insn) ((insn)-sib.value  0x07)
+
+#define MOFFSET64(insn)(((u64)((insn)-moffset2.value)  32) | \
+ (u32)((insn)-moffset1.value))
+
+#define IMMEDIATE64(insn)  (((u64)((insn)-immediate2.value)  32) | \
+ (u32)((insn)-immediate1.value))
+
+extern void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64);
+extern void insn_get_prefixes(struct insn *insn);
+extern void insn_get_opcode(struct insn *insn);
+extern void insn_get_modrm(struct 

Re: Can't boot guest with more than 3585MB when using large pages

2009-04-03 Thread Marcelo Tosatti
On Tue, Mar 24, 2009 at 04:57:46PM -0500, Ryan Harper wrote:
 * Alex Williamson alex.william...@hp.com [2009-03-24 16:07]:
  
  On a 2.6.29, x86_64 host/guest, what's special about specifying a guest
  size of -m 3586 when using -mem-path backed by hugetlbfs?  3585 works,
  3586 hangs here:
  
  ...
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  Placing 64MB software IO TLB between 88002000 - 88002400
  software IO TLB at phys 0x2000 - 0x2400
  Memory: 3504832k/4196352k available (2926k kernel code, 524740k absent, 
  166780k reserved, 1260k data, 496k init)
  
  I can back -mem-path by tmpfs or disk and it works fine.  Also works
  with no -mem-path, but it would obviously be nice to benefit from large
  pages on big guests.  The system has plenty of huge pages to back the
  request, and booting with -mem-prealloc makes no difference.  Tested on
  latest git as of today.  Thanks,
 
 I've seen this as well, haven't had a chance to dig into the issue yet
 either.  Certainly can test patches if anyone has an idea of what's
 wrong here.

Can you please try the following

--

qemu: kvm: fixup 4GB+ memslot large page alignment

Need to align the 4GB+ memslot after we know its address, not before.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index d4a4320..cc84772 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -866,6 +866,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 
 /* above 4giga memory allocation */
 if (above_4g_mem_size  0) {
+ram_addr = qemu_ram_alloc(above_4g_mem_size);
 if (hpagesize) {
 if (ram_addr  (hpagesize-1)) {
 unsigned long aligned_addr;
@@ -874,7 +875,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 ram_addr = aligned_addr;
 }
 }
-ram_addr = qemu_ram_alloc(above_4g_mem_size);
 cpu_register_physical_memory(0x1ULL,
  above_4g_mem_size,
  ram_addr);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API

2009-04-03 Thread H. Peter Anvin

Masami Hiramatsu wrote:

Add x86 instruction decoder to arch-specific libraries. This decoder
can decode all x86 instructions into prefix, opcode, modrm, sib,
displacement and immediates. This can also show the length of
instructions.

changes from v4:
 - make bitmap tables static.


Hi Masami,

On the surface the overall structure looks fine, but I have a couple of 
concerns:


1. is this meant to be able to decode userspace code or just kernel 
code?  If it is supposed to be able to decode userspace code, is there a 
reason you're not dealing with 16-bit or V86 mode code at all?  If not, 
why are you including the 32-bit decoder in a 64-bit kernel (as well as 
instructions which we're pretty much guaranteed to never use in the 
kernel, such as ENTER.)


2. you're already not dealing with all existing three-byte opcode 
spaces, nor with DREX or VEX encodings for upcoming processors.  This 
doesn't matter so much for the kernel, but it does matter if this is 
supposed to be used for user-space code.


3. is there any need to deal with instruction set differences among 
processors?  (Again, this depends on the usage model.)


4. you have a bunch of magic opcode constants all over the place.  This 
means that as new instructions come in -- and they're going to be coming 
in -- this is going to be hard to update.  It would be cleaner if we 
could have an intermediate format that preprocesses down to all the 
relevant tables and perhaps even some of the code rather than 
open-coding everything in C.


This matters... for example you have:

+   } else if (opcode == 0xea /* jmp far seg:offs */) {
+   __get_immptr(insn);

... but nothing similar for opcode 0x9a.  This is extremely hard to spot 
with this kind of structure.


The more data-driven we can make it (without bloating the code too much) 
the better off we are, I believe.


-hpa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API

2009-04-03 Thread Masami Hiramatsu
Hi Peter,

H. Peter Anvin wrote:
 Masami Hiramatsu wrote:
 Add x86 instruction decoder to arch-specific libraries. This decoder
 can decode all x86 instructions into prefix, opcode, modrm, sib,
 displacement and immediates. This can also show the length of
 instructions.

 changes from v4:
  - make bitmap tables static.
 
 Hi Masami,
 
 On the surface the overall structure looks fine, but I have a couple of 
 concerns:
 
 1. is this meant to be able to decode userspace code or just kernel 
 code?  If it is supposed to be able to decode userspace code, is there a 
 reason you're not dealing with 16-bit or V86 mode code at all?  If not, 
 why are you including the 32-bit decoder in a 64-bit kernel (as well as 
 instructions which we're pretty much guaranteed to never use in the 
 kernel, such as ENTER.)

Actually, this aims to decode both of user space and kernel code.
At this point, it just needs to cover kernel code, because kprobes
just want to decode kernel binary.
However, this is just a starting point, uprobe developers want to
use it to decode user-space code. In that case, it needs to be
enhanced.


 2. you're already not dealing with all existing three-byte opcode 
 spaces, nor with DREX or VEX encodings for upcoming processors.  This 
 doesn't matter so much for the kernel, but it does matter if this is 
 supposed to be used for user-space code.
 
 3. is there any need to deal with instruction set differences among 
 processors?  (Again, this depends on the usage model.)

Agreed. When it support decoding user-space code, it should
support all kind of instructions.

 
 4. you have a bunch of magic opcode constants all over the place.  This 
 means that as new instructions come in -- and they're going to be coming 
 in -- this is going to be hard to update.  It would be cleaner if we 
 could have an intermediate format that preprocesses down to all the 
 relevant tables and perhaps even some of the code rather than 
 open-coding everything in C.
 
 This matters... for example you have:
 
 + } else if (opcode == 0xea /* jmp far seg:offs */) {
 + __get_immptr(insn);
 
 ... but nothing similar for opcode 0x9a.  This is extremely hard to spot 
 with this kind of structure.

Oops, that should be a bug. Hmm, I think we'd better bit-flags tables
for classifying opcodes.
Jim, can your INAT idea help this situation?

http://sources.redhat.com/ml/systemtap/2009-q2/msg00109.html

 
 The more data-driven we can make it (without bloating the code too much) 
 the better off we are, I believe.
 
   -hpa

Thank you for good advice!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2729621 ] usb_add on garmin gps fails

2009-04-03 Thread SourceForge.net
Bugs item #2729621, was opened at 2009-04-03 21:02
Message generated for change (Tracker Item Submitted) made by byron_clark
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Byron Clark (byron_clark)
Assigned to: Nobody/Anonymous (nobody)
Summary: usb_add on garmin gps fails

Initial Comment:
When I attempt to usb_add my Garmin Venture HC GPS with this command:
usb_add host:091e:0003

I get the following error:
usb_linux_update_endp_table: Broken pipe

strace log:
write(1, husb: open device 5.3\n, 22) = 22
open(/proc/bus/usb/005/003, O_RDWR|O_NONBLOCK) = 26
read(26, 
\22\1\20\1\377\377\...@\36\t\3\0\1\0\0\0\0\1\t\2'\0\1\1\0\300\0\t\4\0\0\3..., 
1024) = 57
write(1, husb: config #1 need -1\n, 24) = 24
ioctl(26, USBDEVFS_IOCTL, 0x7fff0bb74300) = -1 ENODATA (No data available)
ioctl(26, USBDEVFS_CLAIMINTERFACE, 0x7fff0bb7431c) = 0
write(1, husb: 1 interfaces claimed for c..., 47) = 47
ioctl(26, USBDEVFS_CONNECTINFO, 0x7fff0bb74750) = 0
write(1, husb: grabbed usb device 5.3\n, 29) = 29
ioctl(26, USBDEVFS_CONTROL, 0x7fff0bb742f0) = -1 EPIPE (Broken pipe)
dup(2)  = 27
fcntl(27, F_GETFL)  = 0x8802 (flags 
O_RDWR|O_NONBLOCK|O_LARGEFILE)
fstat(27, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f4e03b67000
lseek(27, 0, SEEK_CUR)  = -1 ESPIPE (Illegal seek)
write(27, usb_linux_update_endp_table: Bro..., 41) = 41
close(27)   = 0

cpu: 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
stepping: 11
cpu MHz : 2194.427
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est 
tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority
bogomips: 4388.85
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

host distribution: debian sid
bitness: 64

guest distribution: windows xp
bitness: 32

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2729621 ] usb_add on garmin gps fails

2009-04-03 Thread SourceForge.net
Bugs item #2729621, was opened at 2009-04-03 21:02
Message generated for change (Comment added) made by byron_clark
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Byron Clark (byron_clark)
Assigned to: Nobody/Anonymous (nobody)
Summary: usb_add on garmin gps fails

Initial Comment:
When I attempt to usb_add my Garmin Venture HC GPS with this command:
usb_add host:091e:0003

I get the following error:
usb_linux_update_endp_table: Broken pipe

strace log:
write(1, husb: open device 5.3\n, 22) = 22
open(/proc/bus/usb/005/003, O_RDWR|O_NONBLOCK) = 26
read(26, 
\22\1\20\1\377\377\...@\36\t\3\0\1\0\0\0\0\1\t\2'\0\1\1\0\300\0\t\4\0\0\3..., 
1024) = 57
write(1, husb: config #1 need -1\n, 24) = 24
ioctl(26, USBDEVFS_IOCTL, 0x7fff0bb74300) = -1 ENODATA (No data available)
ioctl(26, USBDEVFS_CLAIMINTERFACE, 0x7fff0bb7431c) = 0
write(1, husb: 1 interfaces claimed for c..., 47) = 47
ioctl(26, USBDEVFS_CONNECTINFO, 0x7fff0bb74750) = 0
write(1, husb: grabbed usb device 5.3\n, 29) = 29
ioctl(26, USBDEVFS_CONTROL, 0x7fff0bb742f0) = -1 EPIPE (Broken pipe)
dup(2)  = 27
fcntl(27, F_GETFL)  = 0x8802 (flags 
O_RDWR|O_NONBLOCK|O_LARGEFILE)
fstat(27, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f4e03b67000
lseek(27, 0, SEEK_CUR)  = -1 ESPIPE (Illegal seek)
write(27, usb_linux_update_endp_table: Bro..., 41) = 41
close(27)   = 0

cpu: 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
stepping: 11
cpu MHz : 2194.427
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est 
tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority
bogomips: 4388.85
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

host distribution: debian sid
bitness: 64

guest distribution: windows xp
bitness: 32

--

Comment By: Byron Clark (byron_clark)
Date: 2009-04-03 21:08

Message:
kvm-84, kernel 2.6.29.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2729621group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NetBSD and device trees

2009-04-03 Thread 刘宇
On Fri, Apr 3, 2009 at 3:32 AM, Hollis Blanchard holl...@us.ibm.com wrote:
 (I'll address the MMU issue in a separate mail.)

 On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote:
 Another potential issue could be the initial environment (described
 earlier as option 2) not being what BSD expects. Do you use u-boot?
 You
 can see the initial environment set up in kvm_arch_vcpu_setup() in KVM
 and mpc8544ds_init() in Qemu.

 Rahul Yes..I will look into those functions..We do use uboot..Are
 you hinting to go with option 1?

 If you use u-boot then you might not have much work to do (option 2 will
 probably work for you with few changes).

 Does NetBSD use flattened device trees at all? KVM (Qemu) supplies a
 stripped-down device tree to the guest so that the guest won't try to
 access IO devices not currently emulated by qemu. If BSD has a
 hardcoded
 device configuration system (e.g. we built for 8544, therefore we
 always have the following SoC devices) that will be an issue.

 Rahul The device config is hardcoded our NetBSD code base(more so
 because of the embedded nature it's a preferred way) but since I see
 NetBSD supported on Qemu..I would think there is a support available
 for a flattened device tree to be passed in from qemu..I'll look at
 x86 implementations.

 Really quick history: Traditionally, desktop/server PowerPC had Open
 Firmware (IEEE1275). Open Firmware provides runtime services (sometimes
 including IP stack, disk drivers, filesystems, etc), and those services
 allow the kernel to retrieve a device tree describing the physical
 topology of the system. The runtime services (callbacks) are relatively
 high overhead for embedded systems, so traditionally embedded PowerPC
 systems used something simpler (ppcboot/u-boot, redboot, CFE, homebrew,
 etc). These systems usually hardcoded the expected set of IO devices at
 build time.

 However, in recent years Linux developers have found that the
 flexibility granted by the device tree is invaluable, even without the
 runtime services. So they developed a flat device tree data structure
 (flat because it's a contiguous in-memory format representing a tree),
 and had firmware (especially u-boot) pass that tree to the kernel as a
 binary blob.

 The takeaway here is that the flat device tree is so far mostly a
 PowerPC Linux specific concept. Although the idea is beginning to catch
 on with architectures and kernels, I expect that NetBSD doesn't know
 anything about it, and x86 Linux doesn't either.

hmm. learnt a lot. Thanks.
seems qemu is going to adopt flat device tree. :)


 So since PowerPC NetBSD has build-time tables describing the hardware it
 will try to use. I see the following options:
 1) Teach NetBSD about flat device trees. Probably a lot of work.
 2) Emulate more 85xx hardware in qemu. Maybe an easy to medium amount of
 work, depending on the complexity and number of the IO devices.
 3) Build a special NetBSD kernel with modified tables appropriate for
 qemu. Probably the easiest/quickest way, but if your long-term goal is
 to run unmodified NetBSD kernels built for real hardware, this is only a
 prototyping step.

 If you have more than one person playing with this, #2 could be done in
 conjunction with #3, until you've emulated all the necessary devices.

 Also, if you do #2, you could actually use qemu (without KVM) as a
 development environment on normal x86 Linux or Windows workstations (I
 think virtual prototyping or virtual platforms is the buzzword these
 days). This might be a benefit for your internal software development
 processes.

btw, why did you give up virtio-net and change to e1000 on guest 440?


 If there is interest (or maybe even existing work) in the NetBSD
 community for flat device tree support, you may be able to team up with
 other developers to tackle problem #1. To find out, I would post to
 devicetree-disc...@ozlabs.org asking if they've heard of NetBSD work,
 and also NetBSD/PowerPC mailing lists to see if they've heard of device
 tree work.


It will be better to cc here...
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MMU tricks for NetBSD guests

2009-04-03 Thread Hollis Blanchard
On Fri, 2009-04-03 at 00:52 +0200, Alexander Graf wrote:
 
 That sounds a lot like what I imlemented for real mode on 970. I  
 assume the PID is similar to a full SLB context and AS=1/AS=0 is
 just  
 another bit that could as well be in the PID?

Mostly... however, when an interrupt occurs, AS is set to 0 and PID
remains unchanged. Also, AS can have different settings for instruction
and data fetches. (I've been abbreviating as MSR[AS], but technically
I should be writing MSR[IS] for instructions or MSR[DS] for data).

 So what we do on 970[1] is we treat real mode as yet another vsid.  
 970 translates EA - VA - RA. It looks like booke does the same, with
 the VSID coming from the PID.

Exactly -- Book E uses AS | PID to provide the VSID, while Book S uses
the SLB. The Book E way is much simpler, and also avoids the effective
address collision problem we ran into on 970, because AS/PID don't
depend on the EA.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html