Re: KVM with hugepages generate huge load with two guests

2010-12-13 Thread Dmitry Golubev
Hi,

So, nobody has any idea what's going wrong with all these massive IRQs
and spin_locks that cause virtual machines to almost completely stop?
:(

Thanks,
Dmitry

On Wed, Dec 1, 2010 at 5:38 AM, Dmitry Golubev  wrote:
> Hi,
>
> Sorry it took so slow to reply you - there are only few moments when I
> can poke a production server and I need to notify people in advance
> about that :(
>
>> Can you post kvm_stat output while slowness is happening? 'perf top' on the 
>> host?  and on the guest?
>
> I took 'perf top' and first thing I saw is that while guest is on
> acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
> however when I switched back to the default (which is nohz with
> kvm_clock), there are 40 times (!!!) more IRQs under normal operation
> (about 40 000/s). When the slowdown is happening, there are a lot of
> _spin_lock events and a lot of messages like: "WARNING: failed to keep
> up with mmap data.  Last read 810 msecs ago."
>
> As I told before, switching to acpi_pm does not save the day, but
> makes situation a lot more workable (i.e., servers recover faster from
> the period of slowness). During slowdowns on acpi_pm I also see
> "_spin_lock"
>
> Raw data follows:
>
>
>
> vmstat -5 on the host:
>
> procs ---memory-- ---swap-- -io -system-- cpu
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 131904  13952 205872    0    0     0    24 2495 9813  6  3 91  0
>  0  0      0 132984  13952 205872    0    0     0    47 2596 9851  5  3 91  1
>  1  0      0 132148  13952 205872    0    0     0    54 2644 10559  3  3 93  1
>  0  1      0 129084  13952 205872    0    0     0    38 3039 9752  7  3 87  2
>  6  0      0 126388  13952 205872    0    0     0   311 15619 9009 42 17 39  2
>  9  0      0 125868  13960 205872    0    0     6    86 4659 6504 98  2  0  0
>  8  0      0 123320  13960 205872    0    0     0    26 4682 6649 98  2  0  0
>  8  0      0 126252  13960 205872    0    0     0   124 4923 6776 98  2  0  0
>  8  0      0 125376  13960 205872    0    0   136    11 4287 5865 98  2  0  0
>  9  0      0 123812  13960 205872    0    0   205    51 4497 6134 98  2  0  0
>  8  0      0 126020  13960 205872    0    0   904    26 4483 5999 98  2  0  0
>  8  0      0 124052  13960 205872    0    0    15    10 4397 6200 98  2  0  0
>  8  0      0 125928  13960 205872    0    0    14    41 4335 5823 98  2  0  0
>  8  0      0 126184  13960 205872    0    0     6    14 4966 6588 98  2  0  0
>  8  0      0 123588  13960 205872    0    0   143    18 5234 6891 98  2  0  0
>  8  0      0 126640  13960 205872    0    0     6    91 5554 7334 98  2  0  0
>  8  0      0 123144  13960 205872    0    0   146    11 5235 7145 98  2  0  0
>  8  0      0 125856  13968 205872    0    0  1282    98 5481 7159 98  2  0  0
>  9 19      0 124124  13968 205872    0    0   782  2433 8587 8987 97  3  0  0
>  8  0      0 122584  13968 205872    0    0   432    90 5359 6960 98  2  0  0
>  8  0      0 125320  13968 205872    0    0  3074    52 5448 7095 97  3  0  0
>  8  0      0 121436  13968 205872    0    0  2519    81 5714 7279 98  2  0  0
>  8  0      0 124436  13968 205872    0    0     1    56 5242 6864 98  2  0  0
>  8  0      0 111324  13968 205872    0    0     2    22 10660 6686 97  3  0  0
>  8  0      0 107824  13968 205872    0    0     0    24 14329 8147 97  3  0  0
>  8  0      0 110420  13968 205872    0    0     0    68 13486 6985 98  2  0  0
>  8  0      0 110024  13968 205872    0    0     0    19 13085 6659 98  2  0  0
>  8  0      0 109932  13968 205872    0    0     0     3 12952 6415 98  2  0  0
>  8  0      0 108552  13968 205880    0    0     2    41 13400 7349 98  2  0  0
>
> Few shots with kvm_stat on the host:
>
> Every 2.0s: kvm_stat -1
>
>  Wed Dec  1 04:45:47 2010
>
> efer_reload                    0         0
> exits                   56264102     14074
> fpu_reload                311506        50
> halt_exits               4733166       935
> halt_wakeup              3845079       840
> host_state_reload        8795964      4085
> hypercalls                     0         0
> insn_emulation          13573212      7249
> insn_emulation_fail            0         0
> invlpg                   1846050        20
> io_exits                 3579406       843
> irq_exits                3038887      4879
> irq_injections           5242157      3681
> irq_window                124361       540
> largepages                  2253         0
> mmio_exits                 64274        20
> mmu_cache_miss            664011        16
> mmu_flooded               164506         1
> mmu_pde_zapped            212686         8
> mmu_pte_updated           729268         0
> mmu_pte_write           81323616       551
> mmu_recycled                 277         0
> mmu_shadow_zapped         652691        23
> mmu_unsync                  5630         8
> nmi_injections                 0         0
> nmi_window                     0   

Re: KVM with hugepages generate huge load with two guests

2010-11-30 Thread Dmitry Golubev
Hi,

Sorry it took so slow to reply you - there are only few moments when I
can poke a production server and I need to notify people in advance
about that :(

> Can you post kvm_stat output while slowness is happening? 'perf top' on the 
> host?  and on the guest?

I took 'perf top' and first thing I saw is that while guest is on
acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
however when I switched back to the default (which is nohz with
kvm_clock), there are 40 times (!!!) more IRQs under normal operation
(about 40 000/s). When the slowdown is happening, there are a lot of
_spin_lock events and a lot of messages like: "WARNING: failed to keep
up with mmap data.  Last read 810 msecs ago."

As I told before, switching to acpi_pm does not save the day, but
makes situation a lot more workable (i.e., servers recover faster from
the period of slowness). During slowdowns on acpi_pm I also see
"_spin_lock"

Raw data follows:



vmstat -5 on the host:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0 131904  13952 20587200 024 2495 9813  6  3 91  0
 0  0  0 132984  13952 20587200 047 2596 9851  5  3 91  1
 1  0  0 132148  13952 20587200 054 2644 10559  3  3 93  1
 0  1  0 129084  13952 20587200 038 3039 9752  7  3 87  2
 6  0  0 126388  13952 20587200 0   311 15619 9009 42 17 39  2
 9  0  0 125868  13960 20587200 686 4659 6504 98  2  0  0
 8  0  0 123320  13960 20587200 026 4682 6649 98  2  0  0
 8  0  0 126252  13960 20587200 0   124 4923 6776 98  2  0  0
 8  0  0 125376  13960 20587200   13611 4287 5865 98  2  0  0
 9  0  0 123812  13960 20587200   20551 4497 6134 98  2  0  0
 8  0  0 126020  13960 20587200   90426 4483 5999 98  2  0  0
 8  0  0 124052  13960 205872001510 4397 6200 98  2  0  0
 8  0  0 125928  13960 205872001441 4335 5823 98  2  0  0
 8  0  0 126184  13960 20587200 614 4966 6588 98  2  0  0
 8  0  0 123588  13960 20587200   14318 5234 6891 98  2  0  0
 8  0  0 126640  13960 20587200 691 5554 7334 98  2  0  0
 8  0  0 123144  13960 20587200   14611 5235 7145 98  2  0  0
 8  0  0 125856  13968 20587200  128298 5481 7159 98  2  0  0
 9 19  0 124124  13968 20587200   782  2433 8587 8987 97  3  0  0
 8  0  0 122584  13968 20587200   43290 5359 6960 98  2  0  0
 8  0  0 125320  13968 20587200  307452 5448 7095 97  3  0  0
 8  0  0 121436  13968 20587200  251981 5714 7279 98  2  0  0
 8  0  0 124436  13968 20587200 156 5242 6864 98  2  0  0
 8  0  0 111324  13968 20587200 222 10660 6686 97  3  0  0
 8  0  0 107824  13968 20587200 024 14329 8147 97  3  0  0
 8  0  0 110420  13968 20587200 068 13486 6985 98  2  0  0
 8  0  0 110024  13968 20587200 019 13085 6659 98  2  0  0
 8  0  0 109932  13968 20587200 0 3 12952 6415 98  2  0  0
 8  0  0 108552  13968 20588000 241 13400 7349 98  2  0  0

Few shots with kvm_stat on the host:

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:45:47 2010

efer_reload0 0
exits   56264102 14074
fpu_reload31150650
halt_exits   4733166   935
halt_wakeup  3845079   840
host_state_reload8795964  4085
hypercalls 0 0
insn_emulation  13573212  7249
insn_emulation_fail0 0
invlpg   184605020
io_exits 3579406   843
irq_exits3038887  4879
irq_injections   5242157  3681
irq_window124361   540
largepages  2253 0
mmio_exits 6427420
mmu_cache_miss66401116
mmu_flooded   164506 1
mmu_pde_zapped212686 8
mmu_pte_updated   729268 0
mmu_pte_write   81323616   551
mmu_recycled 277 0
mmu_shadow_zapped 65269123
mmu_unsync  5630 8
nmi_injections 0 0
nmi_window 0 0
pf_fixed17470658   218
pf_guest1335220581
remote_tlb_flush 189893096
request_irq0 0
signal_exits   0 0
tlb_flush5827433   108

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:47:33 2010

efer_reload0 0
exits   58

Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Dmitry Golubev
Thanks for the answer.

> Are you sure it is hugepages related?

Well, empirically it looked like either hugepages-related, or
regression of qemu-kvm 0.12.3 -> 0.12.5, as this did not happen until
I upgraded (needed to avoid disk corruption caused by a bug in 0.12.3)
and put hugepages. However as frequency of problem does seem related
to memory each guest consumes (more memory = faster the problem
appears) and in the beginning it might have been that the memory
consumption of the guests did not hit some kind of threshold, maybe it
is not really hugepages related.

> Can you post kvm_stat output while slowness is happening? 'perf top' on the 
> host?  and on the guest?

OK, I will test this and write back.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Avi Kivity

On 11/21/2010 02:24 AM, Dmitry Golubev wrote:

Hi,

Seems that nobody is interested in this bug :(



It's because the information is somewhat confused.  There's a way to 
prepare bug reports that gets developers competing to see who solves it 
first.




Anyway I wanted to add a bit more to this investigation.

Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
options, the guests started to behave better - they do not stay in the
slow state, but rather get there for some seconds (usually up to
minute, but sometimes 2-3 minutes) and then get out of it (this cycle
repeats once in a while - every approx 3-6 minutes). Once the
situation became stable, so that I am able to leave the guests without
very much worries, I also noticed that sometimes the predicted
swapping occurs, although rarely (I waited about half an hour to catch
the first swapping on the host). Here is a fragment of vmstat. Note
that when the first column shows 8-9 - the slowness and huge load
happens. You can also see how is appears and disappears (with nohz and
kvm-clock it did not go out of slowness period, but with tsc clock the
probability of getting out is significantly lower):



Are you sure it is hugepages related?

Can you post kvm_stat output while slowness is happening? 'perf top' on 
the host?  and on the guest?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Dmitry Golubev
> Just out of curiocity: did you try updating the BIOS on your
> motherboard?  The issus you're facing seems to be quite unique,
> and I've seen more than once how various different weird issues
> were fixed just by updating the BIOS.  Provided they actually
> did they own homework and fixed something and released the fixes
> too... ;)

Thank you for reply, I really appreciate that somebody found time to
answer. Unfortunately for this investigation I managed to upgrade BIOS
version few months ago. I just checked - there are no newer versions.

I do see, however, that many people advise to change to acpi_pm
ckocksource (and, thus, disable nohz option) in case similar problems
are experienced - I did not invent this workaround (got the idea here:
http://forum.proxmox.com/threads/5144-100-CPU-on-host-VM-hang-every-night?p=29143#post29143
). Looks like an ancient bug. I even upgraded my qemu-kvm to version
0.13 without any significant changes to this behavior.

It is really weird, however how one guest can work fine, but two start
messing with each other. Shouldn't there be some kind of isolation
between them? As they both start to behave exactly the same at exactly
the same time. And it does not happen once a month or a year, but
pretty frequently.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Michael Tokarev
21.11.2010 03:24, Dmitry Golubev wrote:
> Hi,
> 
> Seems that nobody is interested in this bug :(
> 
> Anyway I wanted to add a bit more to this investigation.
> 
> Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
> options, the guests started to behave better - they do not stay in the
> slow state, but rather get there for some seconds (usually up to
> minute, but sometimes 2-3 minutes) and then get out of it (this cycle

Just out of curiocity: did you try updating the BIOS on your
motherboard?  The issus you're facing seems to be quite unique,
and I've seen more than once how various different weird issues
were fixed just by updating the BIOS.  Provided they actually
did they own homework and fixed something and released the fixes
too... ;)

P.S.  I'm Not A Guru (tm) :)

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-20 Thread Dmitry Golubev
Hi,

Seems that nobody is interested in this bug :(

Anyway I wanted to add a bit more to this investigation.

Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
options, the guests started to behave better - they do not stay in the
slow state, but rather get there for some seconds (usually up to
minute, but sometimes 2-3 minutes) and then get out of it (this cycle
repeats once in a while - every approx 3-6 minutes). Once the
situation became stable, so that I am able to leave the guests without
very much worries, I also noticed that sometimes the predicted
swapping occurs, although rarely (I waited about half an hour to catch
the first swapping on the host). Here is a fragment of vmstat. Note
that when the first column shows 8-9 - the slowness and huge load
happens. You can also see how is appears and disappears (with nohz and
kvm-clock it did not go out of slowness period, but with tsc clock the
probability of getting out is significantly lower):

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 8  0  0  60456  19708 25368800 6   170 5771 1712 97  3  0  0
 9  5  0  58752  19708 253688001157 6457 1500 96  4  0  0
 8  0  0  58192  19708 2536880055   106 5112 1588 98  3  0  0
 8  0  0  58068  19708 2536880021 0 2609 1498 100  0  0  0
 8  2  0  57728  19708 25368800 996 2645 1620 100  0  0  0
 8  0  0  53852  19716 25368000 2   186 6321 1935 97  4  0  0
 8  0  0  49636  19716 25368800 045 3482 1484 99  1  0  0
 8  0  0  49452  19716 25368800 034 3253 1851 100  0  0  0
 4  1   1468 126252  16780 182256   53  317   393   788 29318 3498 79 21  0  0
 4  0   1468 135596  16780 18233200 7   360 26782 2459 79 21  0  0
 1  0   1468 169720  16780 182340007581 22024 3194 40 15 42  3
 3  0   1464 167608  16780 1823406026  1579 9404 5526 22  8 35 35
 0  0   1460 164232  16780 1825040085   170 4955 3345 21  5 69  5
 0  0   1460 163636  16780 18250400 090 1288 1855  5  2 90  3
 1  0   1460 164836  16780 18250400 034 1166 1789  4  2 93  1
 1  0   1452 165628  16780 18250400   28570 1981 2692 10  2 83  4
 1  0   1452 160044  16952 18484060   832   146 5046 3303 11  6 76  7
 1  0   1452 161416  16960 1848400019   170 1732 2577 10  2 74 13
 0  1   1452 161920  16960 18484000   11153 1084 1986  0  1 96  3
 0  0   1452 161332  16960 18484000   25434  856 1505  2  1 95  3
 1  0   1452 159168  16960 18484000   36646 2137 2774  3  2 94  1
 1  0   1452 157408  16968 18484000 069 2423 2991  9  5 84  2
 0  0   1444 157876  16968 18484000 045 6343 3079 24 10 65  1
 0  0   1428 159644  16968 18484460 852  724 1276  0  0 98  2
 0  0   1428 160336  16968 184844003198 1115 1835  1  1 92  6
 1  0   1428 161360  16968 18484400 045 1333 1849  2  1 95  2
 0  0   1428 162092  16968 18484400 0   408 3517 4267 11  2 78  8
 1  1   1428 163868  16968 1848440024   121 1714 2036 10  2 86  2
 1  3   1428 161292  16968 18484400 3   143 2906 3503 16  4 77  3
 0  0   1428 156448  16976 18483600 1   781 5661 4464 16  7 74  3
 1  0   1428 156924  16976 18484400   58892 2341 3845  7  2 87  4
 0  0   1428 158816  16976 1848440027   119 2052 3830  5  1 89  4
 0  0   1428 161420  16976 18484400 156 3923 3132 26  4 68  1
 0  0   1428 162724  16976 1848440010   107 2806 3558 10  2 86  2
 1  0   1428 165244  16976 1848440034   155 2084 2469  8  2 78 12
 0  0   1428 165204  16976 18484400   390   282 9568 4924 17 11 55 17
 1  0   1392 163864  16976 185064  1020   218   411 11762 16591  6  9 68 17
 8  0   1384 164992  16984 18505600 988 7540 5761 73  6 17  4
 8  0   1384 163620  16984 18507600 189 21936 45040 90 10  0  0
 8  0   1384 165324  16992 18507600 5   194 3330 1678 99  1  0  0
 8  0   1384 165704  16992 18507600 154 2651 1457 99  1  0  0
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 8  0   1384 163016  17000 18507600 0   126 4988 1536 97  3  0  0
 9  1   1384 162608  17000 1850760034   477 20106 2351 83 17  0  0
 0  0   1384 184052  17000 18507600   102  1198 48951 3628 48 38  6  8
 0  0   1384 183088  17008 18507600 8   156 1228 1419  2  2 82 14
 0  0   1384 184436  17008 1851640028   113 3176 2785 12  7 75  6
 0  0   1384 184568  17008 1851640030   107 1547 1821  4  3 87  6
 4  2   1228 228808  17008

Re: KVM with hugepages generate huge load with two guests

2010-11-17 Thread Dmitry Golubev
Hi,

Sorry to bother you again. I have more info:

> 1. router with 32MB of RAM (hugepages) and 1VCPU
...
> Is it too much to have 3 guests with hugepages?

OK, this router is also out of equation - I disabled hugepages for it.
There should be also additional pages available to guests because of
that. I think this should be pretty reproducible... Two exactly
similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
0.8.3) from Ubuntu Maverick.

Still no swapping and the effect is pretty much the same: one guest
runs well, two guests work for some minutes - then slow down few
hundred times, showing huge load both inside (unlimited rapid growth
of loadaverage) and outside (host load is not making it unresponsive
though - but loaded to the max). Load growth on host is instant and
finite ('r' column change indicate this sudden rise):

# vmstat 5
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  3  0 194220  30680  7671200   31928 2633 1960  6  6 67 20
 1  2  0 193776  30680  7671200 4   231 55081 78491  3 39 17 41
10  1  0 185508  30680  7671200 487 53042 34212 55 27  9  9
12  0  0 185180  30680  7671200 295 41007 21990 84 16  0  0

Thanks,
Dmitry

On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev  wrote:
> Hi,
>
> Maybe you remember that I wrote few weeks ago about KVM cpu load
> problem with hugepages. The problem was lost hanging, however I have
> now some new information. So the description remains, however I have
> decreased both guest memory and the amount of hugepages:
>
> Ram = 8GB, hugepages = 3546
>
> Total of 2 virual machines:
> 1. router with 32MB of RAM (hugepages) and 1VCPU
> 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU
>
> Everything works fine until I start the second linux guest with the
> same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
> description is the same as before: after a while the host shows
> loadaverage of about 8 (on a Core2Quad) and it seems that both big
> guests consume exactly the same amount of resources. The hosts seems
> responsive though. Inside the guests, however, things are not so good
> - the load sky rockets to at least 20. Guests are not responsive and
> even a 'ps' executes inappropriately slow (may take few minutes -
> here, however, load builds up and it seems that machine becomes slower
> with time, unlike host, which shows the jump in resource consumption
> instantly). It also seem that the more guests uses memory, the faster
> the problem appers. Still at least a gig of RAM is free on each guest
> and there is no swap activity inside the guest.
>
> The most important thing - why I went back and quoted older message
> than the last one, is that there is no more swap activity on host, so
> the previous track of thought may also be wrong and I returned to the
> beginning. There is plenty of RAM now and swap on host is always on 0
> as seen in 'top'. And there is 100% cpu load, equally shared between
> the two large guests. To stop the load I can destroy either large
> guest. Additionally, I have just discovered that suspending any large
> guest works as well. Moreover, after resume, the load does not come
> back for a while. Both methods stop the high load instantly (faster
> than a second). As you were asking for a 'top' inside the guest, here
> it is:
>
> top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
> Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
> Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
> Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
> vpsnetclean
> 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
> 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
> 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
>  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
> cpsrvd-ssl
> 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
> 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
> 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
> 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
> 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
>  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
> 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
> cpsrvd-ssl
> 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
> 11305 99        20  

Re: KVM with hugepages generate huge load with two guests

2010-11-16 Thread Dmitry Golubev
Hi,

Maybe you remember that I wrote few weeks ago about KVM cpu load
problem with hugepages. The problem was lost hanging, however I have
now some new information. So the description remains, however I have
decreased both guest memory and the amount of hugepages:

Ram = 8GB, hugepages = 3546

Total of 2 virual machines:
1. router with 32MB of RAM (hugepages) and 1VCPU
2. linux guest with 3500MB of RAM (hugepages) and 4VCPU

Everything works fine until I start the second linux guest with the
same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
description is the same as before: after a while the host shows
loadaverage of about 8 (on a Core2Quad) and it seems that both big
guests consume exactly the same amount of resources. The hosts seems
responsive though. Inside the guests, however, things are not so good
- the load sky rockets to at least 20. Guests are not responsive and
even a 'ps' executes inappropriately slow (may take few minutes -
here, however, load builds up and it seems that machine becomes slower
with time, unlike host, which shows the jump in resource consumption
instantly). It also seem that the more guests uses memory, the faster
the problem appers. Still at least a gig of RAM is free on each guest
and there is no swap activity inside the guest.

The most important thing - why I went back and quoted older message
than the last one, is that there is no more swap activity on host, so
the previous track of thought may also be wrong and I returned to the
beginning. There is plenty of RAM now and swap on host is always on 0
as seen in 'top'. And there is 100% cpu load, equally shared between
the two large guests. To stop the load I can destroy either large
guest. Additionally, I have just discovered that suspending any large
guest works as well. Moreover, after resume, the load does not come
back for a while. Both methods stop the high load instantly (faster
than a second). As you were asking for a 'top' inside the guest, here
it is:

top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12303 root      20   0     0    0    0 R  100  0.0   0:33.72
vpsnetclean
11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
 3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
cpsrvd-ssl
10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
 3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
cpsrvd-ssl
12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
 1534 root      20   0     0    0    0 S    6  0.0   0:03.29
flush-254:2
 3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
munin-update
10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
 1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
    1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
    3 root      RT   0     0    0    0 S    0  0.0   0:01.08
migration/0
    4 root      20   0     0    0    0 S    0  0.0   0:00.02
ksoftirqd/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:00.47
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:00.03
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1


The tasks are changing in the 'top' view, so it is nothing like a
single task hanging - it is more like a machine working off a swap.
The problem is, however that according to vmstat, there is no swap
activity during this time. Should I try to decrease RAM I give to my
guests even more? Is it too much to have 3 guests with hugepages?
Should I try something else? Unfortunately it is a production system
and I can't play with it very much.

Here is 'top' on the host:

top - 03:32:12 up 25

Re: KVM with hugepages generate huge load with two guests

2010-10-04 Thread Dmitry Golubev
> Please don't top post.

Sorry

> Please use 'top' to find out which processes are busy, the aggregate
> statistics don't help to find out what the problem is.

The thing is - all more or less active processes become busy, like
httpd, etc - I can't identify any single process that generates all
the load. I see at least 10 different processes in the list that look
busy in each guest... From what I see, there is nothing out of the
ordinary in guest 'top', except that the whole guest becomes extremely
slow. But OK, I will try to repeat the problem few hours later and
send you the whole 'top' output if it is required.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-04 Thread Avi Kivity

 On 10/03/2010 10:24 PM, Dmitry Golubev wrote:

So, I started anew. I decreased the memory allocated to each guest to
3500MB (from 3500MB as I told earlier), but have not decreased number
of hugepages - it is still 3696.



Please don't top post.

Please use 'top' to find out which processes are busy, the aggregate 
statistics don't help to find out what the problem is.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-03 Thread Dmitry Golubev
So, I started anew. I decreased the memory allocated to each guest to
3500MB (from 3500MB as I told earlier), but have not decreased number
of hugepages - it is still 3696.

On one host I started one guest. it looked like this:

HugePages_Total:3696
HugePages_Free: 1933
HugePages_Rsvd:   19
HugePages_Surp:0
Hugepagesize:   2048 kB

top - 22:05:53 up 2 days,  3:44,  1 user,  load average: 0.29, 0.33, 0.29
Tasks: 131 total,   1 running, 130 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  4.6%sy,  0.0%ni, 90.8%id,  1.0%wa,  0.0%hi,  2.7%si,  0.0%st
Mem:   8193472k total,  8118248k used,75224k free,29036k buffers
Swap: 11716412k total,0k used, 11716412k free,75864k cached

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0  74668  29036  7586400 1 8   54   51  1  7 91  1

Now I am starting the second virtual machine, and that's what happens:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0  74272  29216  7666400 0 0  447  961  0  0 100  0
 0  0  0  73172  29216  7746400   19216  899 1575  1  1 96  2
 0  0  0  72528  29224  7746400 014  475 1022  1  0 99  0
 0  0  0  72720  29232  7745600 049  519  999  0  0 97  3
 1  0 52  77988  28776  404920   10  119117  988 2285  8  9 72 11
 4  0 52  68868  28784  4049200  285438 7452 2817 17 16 67  1
 2  0 52  66052  28784  4098400  190618 24057 4620 25 20 48  7
 1  0 52  67044  28792  4098400  163035 3175 3966  9 12 72  7
 0  0 52  63684  28800  4098000  1433   228 6021 4479 10 11 65 14
 0  1 52  65516  28800  4098400  1288   109 4143 4179 10 10 58 21
 2  2 52  62216  28808  4098400  1698   241 4357 4183  9  8 58 25
 2  2 52  60292  28816  4098400  2874   258 11538 5324 15 14 39 33
 2  2 52  57352  28816  4098400  5303   278 8528 5176  9 11 39 42
 0  7 52  54000  28824  4098000  5263   249 10580 6214 16 10 32 42
 0  4396  55180  19740  401880   70 10304   315 7359 9633 19  8 44 28
 1  0320  61520  19748  4048000  5361   302 2509 5743 23  2 50 25
 1  5316  59940  19748  4072800  2343 8 2225 4690 13  3 75 10
 3  1316  55616  19748  4072800  4435   215 7660 6057 15  6 51 28
 0 16   2528  53596  17392  384680  529   832   834 6600 4675  8  5 11 76
 3  0   2404  56176  17392  3848010  6530   301 8371 5646 20  7 14 59
 2  5   7480  58012  14836  33720   13 1082  3666  3155 12290 7752 17 10 20 54
 2  1   7340  59628  14836  3388400  5550   690 9513 7258 13  9 38 41
 2  1   7288  59124  14844  3447200  1524   481 4597 4688  5  6 58 31
 0  3   7284  58848  14844  3447200  1365   364 2171 3813  3  2 58 38
 0  1   7056  59324  14844  3447270   841   372 2159 3940  3  2 48 47
 0 30   7056  54456  14844  3447200 2   248 1402 2705  2  1 85 13
 0  1   6892  55336  14828  3839610   888   268 1927 4124  2  2 41 55
 0  0   6892  57808  14060  36988001792  948 1682  1  1 93  5
 0  0   6888  58616  14060  3769600   14043  747 1566  1  1 94  5
 1  0   6884  59444  14060  3769600 714  942 1747  3  1 95  1
 1  0   6884  58820  14060  3769600 046  722 1480  1  1 97  2
 0  0   6884  58608  14060  3769600 041  858 1564  3  1 93  3
 3  8   6884  51752  14060  3779200   354   147 8243 2447 20  7 71  2
 2  0   6880  52840  14060  3779200   604   281 10430 5859 21 15 50 14
 0  0   6880  55176  14060  3779200   699   232 3271 3656 20  4 66 10
 0  0   6880  56120  14060  3779200 0   280 1064 2116  1  1 85 14
 0  0   6880  55628  14060  3779200 0 0  616 1367  1  0 98  0
 1  0   6880  56388  14060  3779200 018  689 1381  1  1 97  2

Unlike I have expected given that in the previous case I had only 6
unreserved pages, and I thought I would have 56 now, I have 156 free
unreserved pages:

HugePages_Total:3696
HugePages_Free: 1113
HugePages_Rsvd:  957
HugePages_Surp:0

Then at one moment both guests almost stopped working for a minute or
so - both went up to huge load and became unresponsive. I didn't get
to catch hot they looked in 'top', but they did not use any swap
themselves (they have at least 1GB of free memory each) and their load
average went to something like 10. vmstat from the host looked like
this:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0   6740  61948  11140  3410400 0 3  663 1435  1  0 99  

Re: KVM with hugepages generate huge load with two guests

2010-10-03 Thread Avi Kivity

 On 09/30/2010 11:07 AM, Dmitry Golubev wrote:

Hi,

I am not sure what's really happening, but every few hours
(unpredictable) two virtual machines (Linux 2.6.32) start to generate
huge cpu loads. It looks like some kind of loop is unable to complete
or something...



What does 'top' inside the guest show when this is happening?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-02 Thread Michael Tokarev
02.10.2010 03:50, Dmitry Golubev wrote:
> Hi,
> 
> Thanks for reply. Well, although there is plenty of RAM left (about
> 100MB), some swap space was used during the operation:
> 
> Mem:   8193472k total,  8089788k used,   103684k free, 5768k buffers
> Swap: 11716412k total,36636k used, 11679776k free,   103112k cached

If you want to see swapping, run vmstat with, say, 5-second interval:
 $ vmstat 5

Amount of swap used is interesting, but amount of swapins/swapouts
per secound is much more so.

JFYI.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-01 Thread Dmitry Golubev
OK, I have repeated the problem. The two machines were working fine
for few hours without some services running (these would take up some
gigabyte additionally in total), I ran these services again and some
40 minutes later the problem reappeared (may be a coincidence, though,
but I don't think so). From top command output it looks like this:

top - 03:38:10 up 2 days, 20:08,  1 user,  load average: 9.60, 6.92, 5.36
Tasks: 143 total,   3 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s): 85.7%us,  4.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 10.0%si,  0.0%st
Mem:   8193472k total,  8056700k used,   136772k free, 4912k buffers
Swap: 11716412k total,64884k used, 11651528k free,55640k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
21306 libvirt-  20   0 3781m  10m 2408 S  190  0.1  31:36.09 kvm
 4984 libvirt-  20   0 3771m  19m 1440 S  180  0.2 390:30.04 kvm

Comparing to the previous shot i sent before (that was taken few hours
ago), and you will not see much difference in my opinion.

Note that I have 8GB of RAM and totally both VMs take up 7GB. There is
nothing else running on the server, except the VMs and cluster
software (drbd, pacemaker etc). Right now the drbd sync process is
taking some cpu resources - that is why the libvirt processes do not
show as 200% (physically, it is a quad-core processor). Is almost 1GB
really not enough for KVM to support two 3.5GB guests? I see 136MB of
free memory right now - it is not even used...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 2:50 AM, Dmitry Golubev  wrote:
> Hi,
>
> Thanks for reply. Well, although there is plenty of RAM left (about
> 100MB), some swap space was used during the operation:
>
> Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
> Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached
>
> I am not sure why, though. Are you saying that there are bursts of
> memory usage that push some pages to swap and they are not unswapped
> although used? I will try to replicate the problem now and send you
> some better printout from the moment the problem happens. I have not
> noticed anything unusual when I was watching the system - there was
> plenty of RAM free and a few megabytes in swap... Is there any kind of
> check I can try during the problem occurring? Or should I free
> 50-100MB from hugepages and the system shall be stable again?
>
> Thanks,
> Dmitry
>
> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti  wrote:
>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>>> Hi,
>>>
>>> I am not sure what's really happening, but every few hours
>>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>>> huge cpu loads. It looks like some kind of loop is unable to complete
>>> or something...
>>>
>>> So the idea is:
>>>
>>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>>> 32bit linux virtual machine (16MB of ram) with a router inside (i
>>> doubt it contributes to the problem).
>>>
>>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>>> reserved 3696 huge pages (page size is 2MB) on the server, and I am
>>> running the main guests each having 3550MB of virtual memory. The
>>> third guest, as I wrote before, takes 16MB of virtual memory.
>>>
>>> 3. Once run, the guests reserve huge pages for themselves normally. As
>>> mem-prealloc is default, they grab all the memory they should have,
>>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>>> times - so as I understand they should not want to get any more,
>>> right?
>>>
>>> 4. All virtual machines run perfectly normal without any disturbances
>>> for few hours. They do not, however, use all their memory, so maybe
>>> the issue arises when they pass some kind of a threshold.
>>>
>>> 5. At some point of time both guests exhibit cpu load over the top
>>> (16-24). At the same time, host works perfectly well, showing load of
>>> 8 and that both kvm processes use CPU equally and fully. This point of
>>> time is unpredictable - it can be anything from one to twenty hours,
>>> but it will be less than a day. Sometimes the load disappears in a
>>> moment, but usually it stays like that, and everything works extremely
>>> slow (even a 'ps' command executes some 2-5 minutes).
>>>
>>> 6. If I am patient, I can start rebooting the gueat systems - once
>>> they have restarted, everything returns to normal. If I destroy one of
>>> the guests (virsh destroy), the other one starts working normally at
>>> once (!).
>>>
>>> I am relatively new to kvm and I am absolutely lost here. I have not
>>> experienced such problems before, but recently I upgraded from ubuntu
>>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>>> and started to use hugepages. These two virtual machines are not
>>> normally ru

Re: KVM with hugepages generate huge load with two guests

2010-10-01 Thread Dmitry Golubev
Hi,

Thanks for reply. Well, although there is plenty of RAM left (about
100MB), some swap space was used during the operation:

Mem:   8193472k total,  8089788k used,   103684k free, 5768k buffers
Swap: 11716412k total,36636k used, 11679776k free,   103112k cached

I am not sure why, though. Are you saying that there are bursts of
memory usage that push some pages to swap and they are not unswapped
although used? I will try to replicate the problem now and send you
some better printout from the moment the problem happens. I have not
noticed anything unusual when I was watching the system - there was
plenty of RAM free and a few megabytes in swap... Is there any kind of
check I can try during the problem occurring? Or should I free
50-100MB from hugepages and the system shall be stable again?

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti  wrote:
> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>> Hi,
>>
>> I am not sure what's really happening, but every few hours
>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>> huge cpu loads. It looks like some kind of loop is unable to complete
>> or something...
>>
>> So the idea is:
>>
>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>> 32bit linux virtual machine (16MB of ram) with a router inside (i
>> doubt it contributes to the problem).
>>
>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>> reserved 3696 huge pages (page size is 2MB) on the server, and I am
>> running the main guests each having 3550MB of virtual memory. The
>> third guest, as I wrote before, takes 16MB of virtual memory.
>>
>> 3. Once run, the guests reserve huge pages for themselves normally. As
>> mem-prealloc is default, they grab all the memory they should have,
>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>> times - so as I understand they should not want to get any more,
>> right?
>>
>> 4. All virtual machines run perfectly normal without any disturbances
>> for few hours. They do not, however, use all their memory, so maybe
>> the issue arises when they pass some kind of a threshold.
>>
>> 5. At some point of time both guests exhibit cpu load over the top
>> (16-24). At the same time, host works perfectly well, showing load of
>> 8 and that both kvm processes use CPU equally and fully. This point of
>> time is unpredictable - it can be anything from one to twenty hours,
>> but it will be less than a day. Sometimes the load disappears in a
>> moment, but usually it stays like that, and everything works extremely
>> slow (even a 'ps' command executes some 2-5 minutes).
>>
>> 6. If I am patient, I can start rebooting the gueat systems - once
>> they have restarted, everything returns to normal. If I destroy one of
>> the guests (virsh destroy), the other one starts working normally at
>> once (!).
>>
>> I am relatively new to kvm and I am absolutely lost here. I have not
>> experienced such problems before, but recently I upgraded from ubuntu
>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>> and started to use hugepages. These two virtual machines are not
>> normally run on the same host system (i have a corosync/pacemaker
>> cluster with drbd storage), but when one of the hosts is not
>> abailable, they start running on the same host. That is the reason I
>> have not noticed this earlier.
>>
>> Unfortunately, I don't have any spare hardware to experiment and this
>> is a production system, so my debugging options are rather limited.
>>
>> Do you have any ideas, what could be wrong?
>
> Is there swapping activity on the host when this happens?
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-01 Thread Marcelo Tosatti
On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
> Hi,
> 
> I am not sure what's really happening, but every few hours
> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> huge cpu loads. It looks like some kind of loop is unable to complete
> or something...
> 
> So the idea is:
> 
> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
> 32bit linux virtual machine (16MB of ram) with a router inside (i
> doubt it contributes to the problem).
> 
> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
> reserved 3696 huge pages (page size is 2MB) on the server, and I am
> running the main guests each having 3550MB of virtual memory. The
> third guest, as I wrote before, takes 16MB of virtual memory.
> 
> 3. Once run, the guests reserve huge pages for themselves normally. As
> mem-prealloc is default, they grab all the memory they should have,
> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
> times - so as I understand they should not want to get any more,
> right?
> 
> 4. All virtual machines run perfectly normal without any disturbances
> for few hours. They do not, however, use all their memory, so maybe
> the issue arises when they pass some kind of a threshold.
> 
> 5. At some point of time both guests exhibit cpu load over the top
> (16-24). At the same time, host works perfectly well, showing load of
> 8 and that both kvm processes use CPU equally and fully. This point of
> time is unpredictable - it can be anything from one to twenty hours,
> but it will be less than a day. Sometimes the load disappears in a
> moment, but usually it stays like that, and everything works extremely
> slow (even a 'ps' command executes some 2-5 minutes).
> 
> 6. If I am patient, I can start rebooting the gueat systems - once
> they have restarted, everything returns to normal. If I destroy one of
> the guests (virsh destroy), the other one starts working normally at
> once (!).
> 
> I am relatively new to kvm and I am absolutely lost here. I have not
> experienced such problems before, but recently I upgraded from ubuntu
> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
> and started to use hugepages. These two virtual machines are not
> normally run on the same host system (i have a corosync/pacemaker
> cluster with drbd storage), but when one of the hosts is not
> abailable, they start running on the same host. That is the reason I
> have not noticed this earlier.
> 
> Unfortunately, I don't have any spare hardware to experiment and this
> is a production system, so my debugging options are rather limited.
> 
> Do you have any ideas, what could be wrong?

Is there swapping activity on the host when this happens? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM with hugepages generate huge load with two guests

2010-09-30 Thread Dmitry Golubev
Hi,

I am not sure what's really happening, but every few hours
(unpredictable) two virtual machines (Linux 2.6.32) start to generate
huge cpu loads. It looks like some kind of loop is unable to complete
or something...

So the idea is:

1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
32bit linux virtual machine (16MB of ram) with a router inside (i
doubt it contributes to the problem).

2. All these machines use hufetlbfs. The server has 8GB of RAM, I
reserved 3696 huge pages (page size is 2MB) on the server, and I am
running the main guests each having 3550MB of virtual memory. The
third guest, as I wrote before, takes 16MB of virtual memory.

3. Once run, the guests reserve huge pages for themselves normally. As
mem-prealloc is default, they grab all the memory they should have,
leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
times - so as I understand they should not want to get any more,
right?

4. All virtual machines run perfectly normal without any disturbances
for few hours. They do not, however, use all their memory, so maybe
the issue arises when they pass some kind of a threshold.

5. At some point of time both guests exhibit cpu load over the top
(16-24). At the same time, host works perfectly well, showing load of
8 and that both kvm processes use CPU equally and fully. This point of
time is unpredictable - it can be anything from one to twenty hours,
but it will be less than a day. Sometimes the load disappears in a
moment, but usually it stays like that, and everything works extremely
slow (even a 'ps' command executes some 2-5 minutes).

6. If I am patient, I can start rebooting the gueat systems - once
they have restarted, everything returns to normal. If I destroy one of
the guests (virsh destroy), the other one starts working normally at
once (!).

I am relatively new to kvm and I am absolutely lost here. I have not
experienced such problems before, but recently I upgraded from ubuntu
lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
and started to use hugepages. These two virtual machines are not
normally run on the same host system (i have a corosync/pacemaker
cluster with drbd storage), but when one of the hosts is not
abailable, they start running on the same host. That is the reason I
have not noticed this earlier.

Unfortunately, I don't have any spare hardware to experiment and this
is a production system, so my debugging options are rather limited.

Do you have any ideas, what could be wrong?

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html