Re: [libvirt-users] domains paused without any obvious reason

2019-05-14 Thread Lentes, Bernd


- Am 14. Mai 2019 um 11:08 schrieb Daniel P. Berrangé berra...@redhat.com:

> 
> 'virsh domstate --reason $GUEST'
> 
> will tell you what event caused the guest to pause in the first place.
> 
> If you can resume successfully, this indicates the event was a transient
> problem.   Given the domblkerror message 'no space' I'm it looks that
> you had a problem running out of disk space temporarily which then
> resolved itself.
> 
> Regards,
> Daniel


Hi,

i have a clue what happened.
The script shuts down the domains, snapshots them, restarts them and then copy 
the backing files to a CIFS
server. After the copy is done (which lasts several hours), the domains are 
blockcommitted.
Finally the script deletes the local snap files. I think the snap files got too 
big,
because the logical volume for them has just 20GB and i'm snapshotting 
currently 8 domains.
Limit of the LV was reached. And because i finally deleted the snapshot files i 
didn't see that.
I will monitor now the LV for the snap files in my script to see how big they 
are growing.

Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, 
Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: [libvirt-users] domains paused without any obvious reason

2019-05-14 Thread Daniel P . Berrangé
On Mon, May 13, 2019 at 06:19:05PM +0200, Lentes, Bernd wrote:
> 
> 
> - On May 13, 2019, at 3:34 PM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
> 
> > Hi,
> > 
> > i have a two node HA-Cluster with several domains as resources.
> > Currently it's running in test mode.
> > Some domains (all on the same host) stopped running, virsh list shows them 
> > as
> > "paused".
> > All stopped at the same time (11th of may, 7:00 am), my monitoring system 
> > began
> > to yell.
> > I don't have any clue why this happened.
> > virsh domblkerror says for all the domains (5) "no space". The days before 
> > the
> > domains were running fine and i know that all disks inside the domain should
> > have enough space.
> > Also the host is not running out of space.
> > The logs don't say anything sensefully, unfortunately i didn't have a log 
> > for
> > the libvirtd daemon, i just configured that now.
> > The domains are stopped each day by cron at 10:30 pm for a short moment, a
> > snapshot is taken, domains are started again, the backing file is copied to 
> > a
> > CIFS server and if that is finished the snapshot is blockcommited into the
> > backing file.
> > That's working fine already for several days. This cronjob creates a log and
> > it's looking fine.
> > The domains reside in naked Logical Volumes, the respective Volume Group has
> > enough space.
> > 
> > 
> 
> I resumed one of the guests and it continued without any problem.
> The log doesn't indicate any problem, and df -h shows enough space on
> all partitions.

'virsh domstate --reason $GUEST'

will tell you what event caused the guest to pause in the first place.

If you can resume successfully, this indicates the event was a transient
problem.   Given the domblkerror message 'no space' I'm it looks that
you had a problem running out of disk space temporarily which then
resolved itself.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users