It should be noted that after switching to using Ubuntu Cloud Archive
which includes newer libvirt this issue went away in the gate.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
** Changed in: nova
Status: Confirmed => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
libvirt randomly crashes on xenial nodes with "*** Error in
@ChristianEhrhardt -
After few days stress, the issue cat not be reproduced by the script I
provide.
So I fallback to use a more simple script, which I use at the beginning
to reproduce the issue in my environment, a script just start/stop
instance and sleep (without vm status check).
After
At least on my end it ran fine for ~7 days now with 20 guests.
Looking forward to hear what you might find being the hidden trigger.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
@ChristianEhrhardt -
In the new deployed environment (with libvirt 1.3.1-1ubuntu10.8), the
libvirt is working fine for more than 24 hours.
The stress will keep running for more two days to check if the issue is
reproducible.
However, so far I think I may missing some key point, so the new
@ChristianEhrhardt, I stop stress script after stress PPA-2619 for 7 days
without issue, the libvirt still work fine, but there are no stress loading, so
I will start stress script to stress PPA-2619 again.
On the other hand, due to I can reproduce this issue in my environment, so I
will
After about a day my keystone and percona died for exceeding the limited size
of the system I had. But not libvirt/qemu crash/fail so far.
I need to look into it again at more detail if I can get it to hold longer
until the libvirt issue occurs.
@Davidchen is for you the -19 ppa still (would be
@ChristianEhrhardt -
In my test environment, I have encounter this issue with different
flavor and image
so I think use different image and flavor to reproduce is ok :)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
After slightly more than three days without crash of which almost
exactly a day of pure cpu cycles were spent in libvirt I start to think
that this won't trigger the bug as I hoped.
I deployed a new openstack and now have a loop running based on openstack
start/stop (using 10x m1.small as I have
Thanks Davidchen,
FYI now running ~48 hours being on ruond 3511 atm and continuing.
So far no crash yet - same libvirt PID still running and no logs/crashdumps/...
I contacted a few people to consider an openstack setup on my test node just in
case starting/stopping it via that might - other
@ChristianEhrhardt - I chose m1.nano because nano just the lowest and
easiest flavor to use. My test system have about 20GB memory.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
Ok, thank you - my test is still running fine but you reported 2-12
hours so I'll give it at least a few days.
It also came to my mind - 5 x m1.nano which default to 5x64MB = 320MB.
In case it might be related due to fragmentation or overall memory
shortage. Is your system low on memory or was a
Hi ChristianEhrhardt,
You're welcome :)
My test environment do not have any other workload or guest vm, only 5
cirros instance
so I guess parallel start/stop instance is the reason trigger libvirt
crash
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
@Davidchen - just to clarify details is on your system any other work
going on, other guests or virtualization activity that might influence
this?
I almost assumed that we need the bigger version that was in PPA-2619
(which is why I created it right away) - big thanks for verifying both
PPAs for
Hi Davidchen - Thank you to provide the info on your repro:
m1.nano is really small, I'd hope that with a slightly bigger size of 512M and
10 instead of 5 systems I might crash it earlier.
If not triggering I will modify to follow the smaller sizes you had.
I wanted to get openstack out of the
Note: test script based on Davidchen's suggestion but without openstack
and converted into an infinite run, now running on Horsea
** Attachment added: "crash-loop.sh"
https://bugs.launchpad.net/nova/+bug/1643911/+attachment/4870727/+files/crash-loop.sh
--
You received this bug notification
@ChristianEhrhardt:
I can success reproduce this issue on my environment, the issue can be
reproduced by stop and start multiple instance parallelly (in my case is
5 cirros instance)
I have try both PPA, the PPA-2619 fix this issue, stress 7 days and
libvirt work fine, but PPA-2620 still hit
Hi (cross post I know),
there is a bit of a "somewhat dup" context around the list of the following
bugs:
- bug 1646779
- bug 1643911
- bug 1638982
- bug 1673483
Unfortunately, I’ve never hit these bugs in any of my libvirt/qemu runs
and these are literally thousands every day due to all the
@Matt Booth: This is not the same bug 1673483 that DanB debugged the
other day and identified fixes, as the Nova stacktraces are different
for both.
For bug 1673483, the Nova crash directly relates to the libvirt commits
mentioned in its comment #5 (of bug 1673483).
In this memory corruption
If it is the other one is bug 1673483 with test builds of a wide and a
more narrow backport of the mentioned fixes in comment #5
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
libvirt
Is this the same bug we saw the other week? I thought Dan B had found a
couple of patches missing from the libvirt shipped in Ubuntu which are
likely candidates for fixing this. Kashyap, do you have those to hand?
I don't think we're likely to get much traction on this from upstream
libvirt
(i just sent this to the list, but putting here too)
While I agree that a coredump is not that likely to help, I would also
like to come to that conclusion after inspecting a coredump :) I've
found things in the heap before that give clues as to what real
problems are.
To this end, I've proposed
valgrind would be great, but is the 100-pound gorilla approach. I'll
play with maybe some lighter-weight things like electric fence which
could give us some insight. something like that is going to segfault so
we cores seem a top priority. I'm probably more optimistic about
general usefulness
We've also somewhat recently gotten the OOMkiller problems to go away.
And yet these problems remain. I doubt it is related to OOMKiller.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
I faced the same calltrace on
http://logs.openstack.org/59/426459/4/check/gate-tempest-dsvm-neutron-
full-ubuntu-xenial/0222b58/logs
The backtrace of libvirtd is
*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:
0x5560f082e8d0 ***
=== Backtrace: =
FYI Armando suspects that this failure is a result of general high
memory consumption in gate, something that lingers all projects:
http://lists.openstack.org/pipermail/openstack-
dev/2017-February/111413.html
--
You received this bug notification because you are a member of Ubuntu
Bugs, which
Now, we seem to be stuck in a limbo here, unable to diagnose this to get
to the root cause. So I asked upstream libvirt maintainers on IRC. And
Dan Berrange responds [text formatted a little bit for readability
here]:
"Running libvirt under Valgrind will likely point to a root cause.
However,
There seem to be 6 people+ hitting it -> marking confirmed
** Changed in: libvirt (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
libvirt
** Also affects: libvirt (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1643911
Title:
libvirt randomly crashes on xenial nodes with "*** Error
29 matches
Mail list logo