Re: [PATCH 1/4] kvm: Add support for querying supported cpu features

2009-05-09 Thread Avi Kivity

Anthony Liguori wrote:

Anthony Liguori wrote:


kvm_check_extension doesn't exist in upstream QEMU.  It's a good idea 
though so I added it in a previous commit.  However, I changed the 
signature to it to take a KVMState * as the first argument (which is 
available in env->kvm_state).  I updated this patch to pass the extra 
's' parameter.


Ah, you had a patch, I just didn't notice in my queue.



Yeah, I should have made the dependency explicit.


Still wanted to make the KVMState change though.



I think the KVM_REQUIRE_EXTENSION bit didn't like it.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread Avi Kivity

David S. Ahern wrote:

I ran another test case with SMT disabled, and while I was at it
converted TSC delta to operations/sec. The results without SMT are
confusing -- to me anyways. I'm hoping someone can explain it.
Basically, using a count of 10,000,000 (per your web page) with SMT
disabled the guest detected a soft lockup on the CPU. So, I dropped the
count down to 1,000,000. So, for 1e6 iterations:

without SMT, with EPT:
HC:   259,455 ops/sec
PIO:  226,937 ops/sec
MMIO: 113,180 ops/sec

without SMT, without EPT:
HC:   274,825 ops/sec
PIO:  247,910 ops/sec
MMIO: 111,535 ops/sec

Converting the prior TSC deltas:

with SMT, with EPT:
HC:994,655 ops/sec
PIO:   875,116 ops/sec
MMIO:  439,738 ops/sec

with SMT, without EPT:
HC:994,304 ops/sec
PIO:   903,057 ops/sec
MMIO:  423,244 ops/sec

Running the tests repeatedly I did notice a fair variability (as much as
-10% down from these numbers).

Also, just to make sure I converted the delta to ops/sec, the formula I
used was cpu_freq / dTSC * count = operations/sec

  


The only think I can think of is cpu frequency scaling lying about the 
cpu frequency.  Really the test needs to use time and not the time stamp 
counter.


Are the results expressed in cycles/op more reasonable?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMA errors in guest caused by corrupted(?) disk image

2009-05-09 Thread René Pfeiffer
On Mar 31, 2009 at 1355 +1100, Matthew Palmer appeared and said:
> Hi,
> 
> I've just come across a somewhat strange problem that was suggested I
> report to the list.
> 
> The problem manifested itself as DMA errors and the like popping up in the
> guest, like I'd expect to see if a disk in a physical machine was dying,
> like this:
> 
>   hda: dma_timer_expiry: dma status == 0x21
> [...]

I've seen a similar effect on my KVM host server. There were I/O errors
on one partition. When I tried to reformat the partition and restore its
content from backups, mkfs was reporting I/O errors, always at the same
block number. I added a second qcow2 file and moved the partition there.

> The VM has previously been quite stable until this problem started part of
> the way through today.  Another guest on the same host machine is fine.

The KVM host runs since January very stable. The system is a Debian
Lenny with custom kernel 2.6.28.8 and kvm-84. The filesystem for images
is ext4 and the guests use ext4 in part, too.

I cannot provide the images since they are quite large (about 20 to 44
GB) and they contain sensitive data.

I plan to move to 2.6.29.x or 2.6.30.x for reasons of the ext4 patches
regarding delayed allocation and to upgrade kvm, but I can to tests with
the current system if you wish.

Best,
René.

-- 
  )\._.,--,'``.  fL  Let GNU/Linux work for you while you take a nap.
 /,   _.. \   _\  (`._ ,. R. Pfeiffer  + http://web.luchs.at/
`._.-(,_..'--(,_..'`-.;.'  - System administration + Consulting + Teaching -
Got mail delivery problems?  http://web.luchs.at/information/blockedmail.php


pgpk8j9RJFNfI.pgp
Description: PGP signature


Re: [RFC][KVM-AUTOTEST] Work to get autotest kvm merged back upstream

2009-05-09 Thread Alexey Eremenko
Hi Lucas Meneghel Rodrigues !

I'm Alexey from Red Hat Israel, and I do QA on both KVM and KVM Autotest.

>  4 We want to make it easy for people to add new tests to the framework.
> We are unsure if the current way we define the tests (all of them under
> the file kvm_tests.py) is good maintainability-wise. Unfortunately I am
> new to KVM and I still don't have a clear idea if we could use a
> different approach.
>
There is a good idea sent to us by someone, that we should make some
of our tests dynamically loadable.

It will look like:
cliest/tests/kvm_runtest_2/tests

"Tests" and "Tests again" is useful, because first tests will have
tests that are dependent on autotest generic framework, and second
tests will be useful for tests that dependent on KVM Autotest
framework.
Basically it was proposed that all "Tests again" will have a structure
like "test_testname.py", and kvm_runtest2.py will load those modules
dynamically according to filename.
And automatically run function "run_testname()".

This makes it very easy to external contributors to contribute,
without breaking someone else's code.

> For 3 and 4, we will still need more time to work on this. However, they
> are not crucial to get all the stuff merged back with upstream.
>
>  5 Though there are plans of moving from subversion to git, we still
> don't know when that will happen. So it might take a little time.
>
> I think 5 doesn't hold us back from merging back with upstream as soon
> as possible. I talked with some folks at google and they agree that we
> should merge soon.
>

Git is not a problem, because KVM Autotest is internally developed
using Git at Red Hat.

> So the current plan is:
>
>  * Finish the patches to make the code base comply with upstream
> standards, give it good testing.
>  * Finish replacing kvm_log with the logging module and test everything
>  * Send a patch upstream adding the kvm test module
>  * From there, development would happen using cross posting between the
> autotest mailing list and the KVM one.
>
> I hope this all makes sense. Comments?
>
> Best regards,
>
> --
> Lucas Meneghel Rodrigues
> Software Engineer (Virtualization)
> Red Hat - Emerging Technologies
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
-Alexey Eromenko "Technologov"
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Via Nano (Isaiah) CPUs?

2009-05-09 Thread Avi Kivity

Craig Metz wrote:

In message <49d396ab.6090...@redhat.com>, you write:
  
Via engineers have contacted me and confirmed that this is a problem in 
the processor.



  Is there a known-fixed CPU revision?

  Is there a way to identify working vs. non-working chips, either from IC
stamp or from /proc/cpuinfo? (Bonus: is it possible to put a check and an error
into the kvm-intel kernel model?)
  


I have no idea.  Please contact Via for this information.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMA errors in guest caused by corrupted(?) disk image

2009-05-09 Thread Avi Kivity

René Pfeiffer wrote:

On Mar 31, 2009 at 1355 +1100, Matthew Palmer appeared and said:
  

Hi,

I've just come across a somewhat strange problem that was suggested I
report to the list.

The problem manifested itself as DMA errors and the like popping up in the
guest, like I'd expect to see if a disk in a physical machine was dying,
like this:

  hda: dma_timer_expiry: dma status == 0x21
[...]



I've seen a similar effect on my KVM host server. There were I/O errors
on one partition. When I tried to reformat the partition and restore its
content from backups, mkfs was reporting I/O errors, always at the same
block number. I added a second qcow2 file and moved the partition there.

  


It's worth trying a qemu-img convert  to move the image to an LVM volume.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread Gregory Haskins
Avi Kivity wrote:
> David S. Ahern wrote:
>> I ran another test case with SMT disabled, and while I was at it
>> converted TSC delta to operations/sec. The results without SMT are
>> confusing -- to me anyways. I'm hoping someone can explain it.
>> Basically, using a count of 10,000,000 (per your web page) with SMT
>> disabled the guest detected a soft lockup on the CPU. So, I dropped the
>> count down to 1,000,000. So, for 1e6 iterations:
>>
>> without SMT, with EPT:
>> HC:   259,455 ops/sec
>> PIO:  226,937 ops/sec
>> MMIO: 113,180 ops/sec
>>
>> without SMT, without EPT:
>> HC:   274,825 ops/sec
>> PIO:  247,910 ops/sec
>> MMIO: 111,535 ops/sec
>>
>> Converting the prior TSC deltas:
>>
>> with SMT, with EPT:
>> HC:994,655 ops/sec
>> PIO:   875,116 ops/sec
>> MMIO:  439,738 ops/sec
>>
>> with SMT, without EPT:
>> HC:994,304 ops/sec
>> PIO:   903,057 ops/sec
>> MMIO:  423,244 ops/sec
>>
>> Running the tests repeatedly I did notice a fair variability (as much as
>> -10% down from these numbers).
>>
>> Also, just to make sure I converted the delta to ops/sec, the formula I
>> used was cpu_freq / dTSC * count = operations/sec
>>
>>   
>
> The only think I can think of is cpu frequency scaling lying about the
> cpu frequency.  Really the test needs to use time and not the time
> stamp counter.
>
> Are the results expressed in cycles/op more reasonable?

FWIW: I always used kvm_stat instead of my tsc printk




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread Gregory Haskins
Anthony Liguori wrote:
> Avi Kivity wrote:
>>
>> Hmm, reminds me of something I thought of a while back.
>>
>> We could implement an 'mmio hypercall' that does mmio reads/writes
>> via a hypercall instead of an mmio operation.  That will speed up
>> mmio for emulated devices (say, e1000).  It's easy to hook into Linux
>> (readl/writel), is pci-friendly, non-x86 friendly, etc.
>
> By the time you get down to userspace for an emulated device, that 2us
> difference between mmio and hypercalls is simply not going to make a
> difference.

I don't care about this path for emulated devices.  I am interested in
in-kernel vbus devices.

>   I'm surprised so much effort is going into this, is there any
> indication that this is even close to a bottleneck in any circumstance?

Yes.  Each 1us of overhead is a 4% regression in something as trivial as
a 25us UDP/ICMP rtt "ping".
>
>
> We have much, much lower hanging fruit to attack.  The basic fact that
> we still copy data multiple times in the networking drivers is clearly
> more significant than a few hundred nanoseconds that should occur less
> than once per packet.
for request-response, this is generally for *every* packet since you
cannot exploit buffering/deferring.

Can you back up your claim that PPC has no difference in performance
with an MMIO exit and a "hypercall" (yes, I understand PPC has no "VT"
like instructions, but clearly there are ways to cause a trap, so
presumably we can measure the difference between a PF exit and something
more explicit).

We need numbers before we can really decide to abandon this
optimization.  If PPC mmio has no penalty over hypercall, I am not sure
the 350ns on x86 is worth this effort (especially if I can shrink this
with some RCU fixes).  Otherwise, the margin is quite a bit larger.

-Greg





signature.asc
Description: OpenPGP digital signature


Re: virtio net regression

2009-05-09 Thread Antoine Martin
Hi,

Here is another one, any ideas?
These oopses do look quite deep. Is it normal to end up in tcp_send_ack
from pdflush??

Cheers
Antoine

[929492.154634] pdflush: page allocation failure. order:0, mode:0x20
[929492.154637] Pid: 291, comm: pdflush Not tainted 2.6.29.2 #5
[929492.154639] Call Trace:
[929492.154641][]
__alloc_pages_internal+0x3e1/0x401
[929492.154649]  [] try_fill_recv+0xa1/0x182
[929492.154652]  [] virtnet_poll+0x533/0x5ab
[929492.154655]  [] net_rx_action+0x70/0x143
[929492.154658]  [] __do_softirq+0x83/0x123
[929492.154661]  [] call_softirq+0x1c/0x28
[929492.154664]  [] do_softirq+0x3c/0x85
[929492.154666]  [] irq_exit+0x3f/0x7a
[929492.154668]  [] do_IRQ+0x12b/0x14f
[929492.154670]  [] ret_from_intr+0x0/0x29
[929492.154672][]
__set_page_dirty_buffers+0x0/0x8f
[929492.154677]  [] bget_one+0x0/0xb
[929492.154680]  [] walk_page_buffers+0x2/0x8b
[929492.154682]  [] ext3_ordered_writepage+0xae/0x134
[929492.154685]  [] __writepage+0xa/0x25
[929492.154687]  [] write_cache_pages+0x206/0x322
[929492.154689]  [] __writepage+0x0/0x25
[929492.154691]  [] do_writepages+0x27/0x2d
[929492.154694]  [] __writeback_single_inode+0x1a7/0x3b5
[929492.154696]  [] __switch_to+0xb4/0x38c
[929492.154698]  [] generic_sync_sb_inodes+0x2a7/0x458
[929492.154701]  [] writeback_inodes+0x8d/0xe6
[929492.154704]  [] _spin_lock+0x5/0x7
[929492.155056]  [] wb_kupdate+0x9f/0x116
[929492.155058]  [] pdflush+0x14b/0x202
[929492.155061]  [] wb_kupdate+0x0/0x116
[929492.155063]  [] pdflush+0x0/0x202
[929492.155065]  [] pdflush+0x0/0x202
[929492.155068]  [] kthread+0x47/0x73
[929492.155070]  [] child_rip+0xa/0x20
[929492.155072]  [] kthread+0x0/0x73
[929492.183142]  [] child_rip+0x0/0x20
[929492.183145] Mem-Info:
[929492.183147] DMA per-cpu:
[929492.183149] CPU0: hi:0, btch:   1 usd:   0
[929492.183151] DMA32 per-cpu:
[929492.183154] CPU0: hi:  186, btch:  31 usd: 184
[929492.183158] Active_anon:2755 active_file:39849 inactive_anon:2972
[929492.183159]  inactive_file:70353 unevictable:0 dirty:4172
writeback:1580 unstable:0
[929492.183161]  free:734 slab:5619 mapped:15047 pagetables:927 bounce:0
[929492.183166] DMA free:1968kB min:28kB low:32kB high:40kB
active_anon:0kB inactive_anon:40kB active_file:2116kB
inactive_file:1880kB unevictable:0kB present:5448kB pages_scanned:0
all_unreclaimable? no
[929492.183169] lowmem_reserve[]: 0 489 489 489
[929492.183176] DMA32 free:968kB min:2812kB low:3512kB high:4216kB
active_anon:11020kB inactive_anon:11848kB active_file:157280kB
inactive_file:279532kB unevictable:0kB present:500896kB pages_scanned:0
all_unreclaimable? no
[929492.183180] lowmem_reserve[]: 0 0 0 0
[929492.183183] DMA: 6*4kB 2*8kB 3*16kB 1*32kB 1*64kB 2*128kB 0*256kB
1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB
[929492.183235] DMA32: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 3*128kB 2*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 968kB
[929492.183244] 110992 total pagecache pages
[929492.183246] 739 pages in swap cache
[929492.183248] Swap cache stats: add 8996, delete 8257, find 92604/93191
[929492.183250] Free swap  = 1040016kB
[929492.183252] Total swap = 1048568kB
[929492.186003] 131056 pages RAM
[929492.186006] 4799 pages reserved
[929492.186007] 44697 pages shared
[929492.186008] 90516 pages non-shared
[930274.380075] eth0: no IPv6 routers present







Antoine Martin wrote:
> Hi
> 
> Still getting (some but less) network issues with a 2.6.28.9 host.
> 
> Found quite a few of these call traces in the 2.6.29.1 guests:
> Guest has 512MB of memory and was not all that busy (just network
> traffic), so I don't understand why it would fail to allocate a page...
> 
> 
> [701453.834571] kjournald: page allocation failure. order:0, mode:0x4020
> [701453.834574] Pid: 4806, comm: kjournald Not tainted 2.6.29.1 #4
> [701453.834576] Call Trace:
> [701453.834578][]
> __alloc_pages_internal+0x3e1/0x401
> [701453.834586]  [] __slab_alloc+0x17f/0x4ca
> [701453.834590]  [] tcp_send_ack+0x23/0x105
> [701453.834592]  [] tcp_send_ack+0x23/0x105
> [701453.834595]  [] __kmalloc_track_caller+0xac/0xe1
> [701453.834598]  [] __alloc_skb+0x61/0x11e
> [701453.834600]  [] tcp_send_ack+0x23/0x105
> [701453.834603]  [] tcp_rcv_established+0x6c7/0x9e6
> [701453.834605]  [] tcp_v4_do_rcv+0x19e/0x324
> [701453.834608]  [] tcp_v4_rcv+0x488/0x73b
> [701453.834611]  [] nf_hook_slow+0x62/0xc3
> [701453.834615]  [] ip_local_deliver_finish+0x0/0x1ee
> [701453.834617]  [] ip_local_deliver_finish+0x11c/0x1ee
> [701453.834620]  [] ip_rcv_finish+0x2cf/0x2e9
> [701453.834622]  [] ip_rcv+0x233/0x277
> [701453.834626]  [] virtnet_poll+0x4ca/0x5ab
> [701453.834628]  [] net_rx_action+0x70/0x143
> [701453.834631]  [] __do_softirq+0x83/0x145
> [701453.834634]  [] timer_interrupt+0x1a/0x21
> [701453.834637]  [] call_softirq+0x1c/0x28
> [701453.834639]  [] do_softirq+0x3c/0x85
> [701453.834641]  [] irq_exit+0x3f/0x7a
> [701453.834643]  [] do_IRQ+0x12b/0x14f
> [701453.834646]  [] ret_from_intr+0x0/0x29
> [701453.834647][] vp_notify+0x0/0x1c
> [701453.

Re: [PATCH -tip v5 4/7] tracing: add kprobe-based event tracer

2009-05-09 Thread Frédéric Weisbecker
Hi,

2009/5/9 Masami Hiramatsu :
> Add kprobes based event tracer on ftrace.
>
> This tracer is similar to the events tracer which is based on Tracepoint
> infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
> and kretprobe). It probes anywhere where kprobes can probe(this means, all
> functions body except for __kprobes functions).
>
> Changes from v4:
>  - Change interface name from 'kprobe_probes' to 'kprobe_events'
>  - Skip comments (words after '#') from inputs of 'kprobe_events'.
>
> Signed-off-by: Masami Hiramatsu 
> Cc: Steven Rostedt 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Ingo Molnar 
> Cc: Frederic Weisbecker 
> ---
>
>  Documentation/trace/ftrace.txt |   55 +
>  kernel/trace/Kconfig           |    9 +
>  kernel/trace/Makefile          |    1
>  kernel/trace/trace_kprobe.c    |  404 
> 
>  4 files changed, 469 insertions(+), 0 deletions(-)
>  create mode 100644 kernel/trace/trace_kprobe.c
>
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index fd9a3e6..2b8ead6 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -1310,6 +1310,61 @@ dereference in a kernel module:
>  [...]
>
>
> +kprobe-based event tracer
> +---
> +
> +This tracer is similar to the events tracer which is based on Tracepoint
> +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
> +and kretprobe). It probes anywhere where kprobes can probe(this means, all
> +functions body except for __kprobes functions).
> +
> +Unlike the function tracer, this tracer can probe instructions inside of
> +kernel functions. It allows you to check which instruction has been executed.
> +
> +Unlike the Tracepoint based events tracer, this tracer can add new probe 
> points
> +on the fly.
> +
> +Similar to the events tracer, this tracer doesn't need to be activated via
> +current_tracer, instead of that, just set probe points via
> +/debug/tracing/kprobe_events.
> +
> +Synopsis of kprobe_events:
> +  p SYMBOL[+offs|-offs]|MEMADDR        : set a probe
> +  r SYMBOL[+0]                 : set a return probe
> +
> +E.g.
> +  echo p sys_open > /debug/tracing/kprobe_events
> +
> + This sets a kprobe on the top of sys_open() function.
> +
> +  echo r sys_open >> /debug/tracing/kprobe_events
> +
> + This sets a kretprobe on the return point of sys_open() function.
> +
> +  echo > /debug/tracing/kprobe_events
> +
> + This clears all probe points. and you can see the traced information via
> +/debug/tracing/trace.
> +
> +  cat /debug/tracing/trace
> +# tracer: nop
> +#
> +#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
> +#              | |       |          |         |
> +           <...>-5117  [003]   416.481638: sys_open: @sys_open+0
> +           <...>-5117  [003]   416.481662: syscall_call: <-sys_open+0
> +           <...>-5117  [003]   416.481739: sys_open: @sys_open+0
> +           <...>-5117  [003]   416.481762: sysenter_do_call: <-sys_open+0
> +           <...>-5117  [003]   416.481818: sys_open: @sys_open+0
> +           <...>-5117  [003]   416.481842: sysenter_do_call: <-sys_open+0
> +           <...>-5117  [003]   416.481882: sys_open: @sys_open+0
> +           <...>-5117  [003]   416.481905: sysenter_do_call: <-sys_open+0
> +
> + @SYMBOL means that kernel hits a probe, and <-SYMBOL means kernel returns
> +from SYMBOL(e.g. "sysenter_do_call: <-sys_open+0" means kernel returns from
> +sys_open to sysenter_do_call).
> +
> +
>  function graph tracer
>  ---
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 7370253..914df9c 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -398,6 +398,15 @@ config BLK_DEV_IO_TRACE
>
>          If unsure, say N.
>
> +config KPROBE_TRACER
> +       depends on KPROBES
> +       depends on X86
> +       bool "Trace kprobes"
> +       select TRACING
> +       help
> +         This tracer probes everywhere where kprobes can probe it, and
> +         records various registers and memories specified by user.
> +
>  config DYNAMIC_FTRACE
>        bool "enable/disable ftrace tracepoints dynamically"
>        depends on FUNCTION_TRACER
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 06b8585..166c859 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -51,5 +51,6 @@ obj-$(CONFIG_EVENT_TRACING) += trace_export.o
>  obj-$(CONFIG_FTRACE_SYSCALLS) += trace_syscalls.o
>  obj-$(CONFIG_EVENT_PROFILE) += trace_event_profile.o
>  obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
> +obj-$(CONFIG_KPROBE_TRACER) += trace_kprobe.o
>
>  libftrace-y := ftrace.o
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> new file mode 100644
> index 000..8112505
> --- /dev/null
> +++ b/kernel/trace/trace_kprobe.c
> @@ -0,0 +1,404 @@
> +/*
> + * kprobe based kernel tracer
> + *
> + * Created by Masami Hiramatsu 
> + *
> + * This program

Re: [PATCH -tip v5 4/7] tracing: add kprobe-based event tracer

2009-05-09 Thread Masami Hiramatsu
Frédéric Weisbecker wrote:
> Hi,
> 
> 2009/5/9 Masami Hiramatsu :
[...]
>> +
>> +/* event recording functions */
>> +static void kprobe_trace_record(unsigned long ip, struct trace_probe *tp,
>> +   struct pt_regs *regs)
>> +{
>> +   __trace_bprintk(ip, "%s%s%+ld\n",
>> +   probe_is_return(tp) ? "<-" : "@",
>> +   probe_symbol(tp), probe_offset(tp));
>> +}
> 
> 
> 
> What happens here if you have:
> 
> kprobe_trace_record() {
>   probe_symbol() {
>  probes_open() {
>   cleanup_all_probes() {
>  free_trace_probe();
>  return tp->symbol ? ; //crack!
>
> I wonder if you shouldn't use a per_cpu list of probes,
> spinlocked/irqsaved  accessed
> and also a kind of prevention against nmi.

Sure, cleanup_all_probes() invokes unregister_kprobe() via
unregister_trace_probe(), which waits running probe-handlers by
using synchronize_sched()(because kprobes disables preemption
around its handlers), before free_trace_probe().

So you don't need any locks there :-)

Thank you,


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


boot=on option fails on win 2k/xp double boot

2009-05-09 Thread Federico Fissore
I've a windows (virtual) box with windows xp originally installed and a 
windows 2k lately installed


If I run kvm from the command line for testing purposes everything is 
fine. If I run it with libvirt on the host, I does not boot, complaining 
with an I/O error


the problem arises since libvirt adds boot=on on the first disk

without that option, everything works fine

do you have any hint on this?

thanks


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Qemu-devel] Re: Question about KVM and PC speaker

2009-05-09 Thread Sebastian Herbszt

Jan Kiszka wrote:

Sebastian Herbszt wrote:

Jan Kiszka wrote:

Sebastian Herbszt wrote:

Simon Bienlein wrote:

Is a support for BIOS worked on right now?


The vgabios (vgabios.c) has a "FIXME should beep". Volker, do you plan
to fix this?

Which frequency should be used for the beep? Which delay?


I would try 1 KHz and some hundred milliseconds.


I just looked at some vga bios and it uses about 896,45 Hz.


Getting a delay using "inb(0x61) & 0x10" is still a no go on qemu,
right?


Looks like (should be far too inaccurate for longer delays). What about
0x40:0x6c, the BIOS' daily timer counter?


The bios i looked at used the refresh request port 0x61. This is
supported by bochs and
there is also a patch for qemu to replace the dummy [1]. The rombios
uses this to provide
INT 15h AH=86h functionality; this is likely broken with the dummy code
in qemu.


I see no problem with improving qemu's emulation accuracy this way a
bit. But I wouldn't built new delay implementations on top of it,
specifically if the code is in fact aware of running over a hypervisor.
Such micro-timings are far too inaccurate for longer delays in an
environment where you cannot be sure of running all the time during that
period.


Volker, mind to resubmit the patch once again?


Anyway, using "timer ticks since midnight" should be possible (INT 08h
handler is set up
before vga bios is called).


I have implemented the beep for vgabios-6b and it works on bochs but fails on 
qemu.
With "-soundhw pcspk" i should hear it thru the hosts pc speaker, right?

- Sebastian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-09 Thread Rusty Russell
On Fri, 8 May 2009 10:18:22 pm Michael S. Tsirkin wrote:
> On Fri, May 08, 2009 at 04:37:06PM +0930, Rusty Russell wrote:
> > On Thu, 7 May 2009 11:40:39 pm Michael S. Tsirkin wrote:
> > > This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
> > > and updates all drivers. This is needed for MSI support, because MSI
> > > needs to know the total number of vectors upfront.
> >
> > Hmm, I have a similar need for a dev to vq mapping (debugging stats). 
> > How's this as a common basis?
>
> This helps. Should I redo mine on top of this?

Yep, it should make your smaller as well.

> >  void vring_del_virtqueue(struct virtqueue *vq)
> >  {
> > +   list_del(&vq->list);
> > kfree(to_vvq(vq));
> >  }
> >  EXPORT_SYMBOL_GPL(vring_del_virtqueue);
>
> I note lack of locking here. This is okay in practice as
> drivers don't really call find/del vq in parallel,
> but making this explicit with find_vqs will be best, yes?

Yes, and in fact a rough look at your patch reveals that we don't actually 
need del_vq: now we track them, we can just do that as part of vdev 
destruction, right?

If you agree, please do that patch first, then do the find_vqs change on top of 
that.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread David S. Ahern


Avi Kivity wrote:
> David S. Ahern wrote:
>> I ran another test case with SMT disabled, and while I was at it
>> converted TSC delta to operations/sec. The results without SMT are
>> confusing -- to me anyways. I'm hoping someone can explain it.
>> Basically, using a count of 10,000,000 (per your web page) with SMT
>> disabled the guest detected a soft lockup on the CPU. So, I dropped the
>> count down to 1,000,000. So, for 1e6 iterations:
>>
>> without SMT, with EPT:
>> HC:   259,455 ops/sec
>> PIO:  226,937 ops/sec
>> MMIO: 113,180 ops/sec
>>
>> without SMT, without EPT:
>> HC:   274,825 ops/sec
>> PIO:  247,910 ops/sec
>> MMIO: 111,535 ops/sec
>>
>> Converting the prior TSC deltas:
>>
>> with SMT, with EPT:
>> HC:994,655 ops/sec
>> PIO:   875,116 ops/sec
>> MMIO:  439,738 ops/sec
>>
>> with SMT, without EPT:
>> HC:994,304 ops/sec
>> PIO:   903,057 ops/sec
>> MMIO:  423,244 ops/sec
>>
>> Running the tests repeatedly I did notice a fair variability (as much as
>> -10% down from these numbers).
>>
>> Also, just to make sure I converted the delta to ops/sec, the formula I
>> used was cpu_freq / dTSC * count = operations/sec
>>
>>   
> 
> The only think I can think of is cpu frequency scaling lying about the
> cpu frequency.  Really the test needs to use time and not the time stamp
> counter.
> 
> Are the results expressed in cycles/op more reasonable?
> 

Power settings seem to be the root cause. With this HP server the SMT
mode must be disabling or overriding a power setting that is enabled in
the bios. I found one power-based knob that gets non-SMT performance
close to SMT numbers. Not very intuitive that SMT/non-SMT can differ so
dramatically.

david
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread David S. Ahern


Gregory Haskins wrote:
> Avi Kivity wrote:
>> David S. Ahern wrote:
>>> I ran another test case with SMT disabled, and while I was at it
>>> converted TSC delta to operations/sec. The results without SMT are
>>> confusing -- to me anyways. I'm hoping someone can explain it.
>>> Basically, using a count of 10,000,000 (per your web page) with SMT
>>> disabled the guest detected a soft lockup on the CPU. So, I dropped the
>>> count down to 1,000,000. So, for 1e6 iterations:
>>>
>>> without SMT, with EPT:
>>> HC:   259,455 ops/sec
>>> PIO:  226,937 ops/sec
>>> MMIO: 113,180 ops/sec
>>>
>>> without SMT, without EPT:
>>> HC:   274,825 ops/sec
>>> PIO:  247,910 ops/sec
>>> MMIO: 111,535 ops/sec
>>>
>>> Converting the prior TSC deltas:
>>>
>>> with SMT, with EPT:
>>> HC:994,655 ops/sec
>>> PIO:   875,116 ops/sec
>>> MMIO:  439,738 ops/sec
>>>
>>> with SMT, without EPT:
>>> HC:994,304 ops/sec
>>> PIO:   903,057 ops/sec
>>> MMIO:  423,244 ops/sec
>>>
>>> Running the tests repeatedly I did notice a fair variability (as much as
>>> -10% down from these numbers).
>>>
>>> Also, just to make sure I converted the delta to ops/sec, the formula I
>>> used was cpu_freq / dTSC * count = operations/sec
>>>
>>>   
>> The only think I can think of is cpu frequency scaling lying about the
>> cpu frequency.  Really the test needs to use time and not the time
>> stamp counter.
>>
>> Are the results expressed in cycles/op more reasonable?
> 
> FWIW: I always used kvm_stat instead of my tsc printk
> 

kvm_stat shows same approximate numbers as with the TSC-->ops/sec
conversions. Interestingly, MMIO writes are not showing up as mmio_exits
in kvm_stat; they are showing up as insn_emulation.

david
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: boot=on option fails on win 2k/xp double boot

2009-05-09 Thread Gleb Natapov
On Sat, May 09, 2009 at 07:53:50PM +0200, Federico Fissore wrote:
> I've a windows (virtual) box with windows xp originally installed and a  
> windows 2k lately installed
>
Are they both installed on the same disk?

> If I run kvm from the command line for testing purposes everything is  
> fine. If I run it with libvirt on the host, I does not boot, complaining  
> with an I/O error
>
> the problem arises since libvirt adds boot=on on the first disk
>
> without that option, everything works fine
>
> do you have any hint on this?
>
libvirt should not add boot=on if interface type is IDE and if disk type
is not IDE then boot is supported only from one disk (the one that has
boot=on).



--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] generic hypercall support

2009-05-09 Thread Avi Kivity

David S. Ahern wrote:

kvm_stat shows same approximate numbers as with the TSC-->ops/sec
conversions. Interestingly, MMIO writes are not showing up as mmio_exits
in kvm_stat; they are showing up as insn_emulation.
  


That's a bug, mmio_exits ignores mmios that are handled in the kernel.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html