Re: [Qemu-devel] [PULL 14/15] tb hash: track translated blocks with qht

2016-08-10 Thread Igor Mammedov
On Fri, 10 Jun 2016 07:26:52 -0700
Richard Henderson  wrote:

This patch make SIGSEGVs QEMU when debugging KVM guest with gdb

Steps to reproduce:

Seabios:
  clone and build current master with
 defconfig plus CONFIG_RELOCATE_INIT=n on top of it

QEMU:
  ./configure --enable-debug --target-list=x86_64-softmmu
  ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm  -bios 
../seabios/out/bios.bin -s -S

GDB:
  gdb ../seabios/out/rom.o
  (gdb) b smp_setup
  (gdb) target remote localhost:1234
  (gdb) c
  Continuing.
  Remote connection closed


QEMU's backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x55c19c6f in qht_reset_size (ht=0x56240058 , 
n_elems=0x8000) at util/qht.c:422
422 if (n_buckets != map->n_buckets) {
(gdb) bt
#0  0x55c19c6f in qht_reset_size (ht=0x56240058 , 
n_elems=0x8000) at util/qht.c:422
#1  0x55754bdd in tb_flush (cpu=0x0) at 
/home/imammedo/builds/qemu/translate-all.c:855
#2  0x5579026e in gdb_vm_state_change (opaque=0x0, running=0x0, 
state=RUN_STATE_DEBUG) at /home/imammedo/builds/qemu/gdbstub.c:1276
#3  0x558be59f in vm_state_notify (running=0x0, state=RUN_STATE_DEBUG) 
at vl.c:1585
#4  0x5578137e in do_vm_stop (state=RUN_STATE_DEBUG) at 
/home/imammedo/builds/qemu/cpus.c:743
#5  0x55782b5e in vm_stop (state=RUN_STATE_DEBUG) at 
/home/imammedo/builds/qemu/cpus.c:1476
#6  0x558bebdf in main_loop_should_exit () at vl.c:1856
#7  0x558bed57 in main_loop () at vl.c:1912
#8  0x558c678a in main (argc=0x6, argv=0x7fffe0c8, 
envp=0x7fffe100) at vl.c:4603


> From: "Emilio G. Cota" 
> 
> Having a fixed-size hash table for keeping track of all translation blocks
> is suboptimal: some workloads are just too big or too small to get maximum
> performance from the hash table. The MRU promotion policy helps improve
> performance when the hash table is a little undersized, but it cannot
> make up for severely undersized hash tables.
> 
> Furthermore, frequent MRU promotions result in writes that are a scalability
> bottleneck. For scalability, lookups should only perform reads, not writes.
> This is not a big deal for now, but it will become one once MTTCG matures.
> 
> The appended fixes these issues by using qht as the implementation of
> the TB hash table. This solution is superior to other alternatives considered,
> namely:
> 
> - master: implementation in QEMU before this patchset
> - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
> - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
>   MRU is implemented here by adding an intermediate struct
>   that contains the u32 hash and a pointer to the TB; this
>   allows us, on an MRU promotion, to copy said struct (that is not
>   at the head), and put this new copy at the head. After a grace
>   period, the original non-head struct can be eliminated, and
>   after another grace period, freed.
> - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
>no MRU for lookups; MRU for inserts.
> The appended solution is the following:
> - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
>  no MRU for lookups; MRU for inserts.
> 
> The plots below compare the considered solutions. The Y axis shows the
> boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
> sweeps the number of buckets (or initial number of buckets for 
> qht-autoresize).
> The plots in PNG format (and with errorbars) can be seen here:
>   http://imgur.com/a/Awgnq
> 
> Each test runs 5 times, and the entire QEMU process is pinned to a
> single core for repeatability of results.
> 
> Host: Intel Xeon E5-2690
> 
>   28 +++-+-+-+++
>  A*+ + + master **A*** +
>   27 ++* xxhash ##B###++
>  |  A**A**   xxhash-rcu $$C$$$ |
>   26 C$$  A**A**qht-fixed-nomru*%%D%%%++
>  D%%$$  A**A**A*qht-dyn-mru A*EA
>   25 ++ %%$$  qht-dyn-nomru ++
>  B#%   |
>   24 ++#C$++
>  |  B###  $|
>  |  ## C$$ |
>   23 ++   #   C$$ ++
>  | B##   C$$%%%D
>   22 ++  %B##   C$$C$$C$$C$$C$$C
> 

[Qemu-devel] [PULL 14/15] tb hash: track translated blocks with qht

2016-06-10 Thread Richard Henderson
From: "Emilio G. Cota" 

Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
  MRU is implemented here by adding an intermediate struct
  that contains the u32 hash and a pointer to the TB; this
  allows us, on an MRU promotion, to copy said struct (that is not
  at the head), and put this new copy at the head. After a grace
  period, the original non-head struct can be eliminated, and
  after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
   no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
 no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
  http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

Host: Intel Xeon E5-2690

  28 +++-+-+-+++
 A*+ + + master **A*** +
  27 ++* xxhash ##B###++
 |  A**A**   xxhash-rcu $$C$$$ |
  26 C$$  A**A**qht-fixed-nomru*%%D%%%++
 D%%$$  A**A**A*qht-dyn-mru A*EA
  25 ++ %%$$  qht-dyn-nomru ++
 B#%   |
  24 ++#C$++
 |  B###  $|
 |  ## C$$ |
  23 ++   #   C$$ ++
 | B##   C$$%%%D
  22 ++  %B##   C$$C$$C$$C$$C$$C
 |D%%B##  @E@@%%%D%%%@@@E@@E
  21 E@@E@@F&&&@@@E@@@&&%%B##B##B##B##B##B
 + E@@@   F&&&   +  E@ +  F&&&   + +
  20 +++-+-+-+++
 141618202224
 log2 number of buckets

 Host: Intel i7-4790K

  14.5 ++++-++++
   A**   ++ +master **A*** +
14 ++ ** xxhash ##B###++
  13.5 ++   **   xxhash-rcu $$C$$$++
   |qht-fixed-nomru %%D%%% |
13 ++ A**   qht-dyn-mru @@E@@@++
   | A*A**A** qht-dyn-nomru  |
  12.5 C$$   A**A**A*A*****A
12 ++ $$A***  ++
   D%%% $$ |
  11.5 ++  %% ++
   B###  %C$$  |
11 ++  ## D% C$   ++
   | #  %  C$$ |
  10.5 F&##D%   C$$C$$C$$C$C$$$$$C
10 E@@E@@B#B##B##E@@E@@@%%%D%D%%%###B##B
   + F&&