Re: Instability in VM blocked more than 120s

2012-04-20 Thread Thomas Bendler
Hi Niels,

2012/4/20 Niels_Walet niels.wa...@manchester.ac.uk

 [...]
 Does anyone have any suggestion what I could try to further diagnose this
 problem, or maybe even a solution?


the error say that the filesystem does not reposoned. In this case it is
not the filesystem in the virtual machine, it is the host system which
provide the virtual disc. So without further knowledge of the storage
infrastructure I can 't give you additional advices. The last time I saw
this kind of error it was related to a not responding NFS share.

Kind regards
Thomas
-- 
Linux ... enjoy the ride!


Re: Instability in VM blocked more than 120s

2012-04-20 Thread Niels R. Walet

On 20/04/12 08:34, Thomas Bendler wrote:

Hi Niels,

2012/4/20 Niels_Walet niels.wa...@manchester.ac.uk 
mailto:niels.wa...@manchester.ac.uk


[...]
Does anyone have any suggestion what I could try to further
diagnose this
problem, or maybe even a solution?


the error say that the filesystem does not reposoned. In this case it 
is not the filesystem in the virtual machine, it is the host system 
which provide the virtual disc. So without further knowledge of the 
storage infrastructure I can 't give you additional advices. The last 
time I saw this kind of error it was related to a not responding NFS 
share.


Kind regards
Thomas
--
Linux ... enjoy the ride!
The vms are stored on the raid array of the underlying machine--which is 
not maintained by me: dmsg suggests an LSI SAS based MegaRAID driver

Niels

--
Prof. Niels R. Walet   Phone:  +44(0)1613063693
School of Physics and AstronomyFax:+44(0)1613064303
The University of Manchester   Mobile: +44(0)7905438934
Manchester, M13 9PL,  UK   room 7.7, Schuster Building
email: niels.wa...@manchester.ac.uk
web:   http://www.theory.physics.manchester.ac.uk/~mccsnrw



Re: Instability in VM blocked more than 120s

2012-04-20 Thread Thomas Bendler
Hi Niels,

2012/4/20 Niels R. Walet niels.wa...@manchester.ac.uk

  [...]
 The vms are stored on the raid array of the underlying machine--which is
 not maintained by me: dmsg suggests an LSI SAS based MegaRAID driver


then you need to go in touch with the guys maintaining the underlying host.
They need to check if your virtual disc (either an image or something
physical) is fully accessible during the move from one server to another.
For me it look like that the virtual discs are not moving correctly between
the servers (could be some kind of access problem like wrong ACLs or
network routing problems or ...). But this should be something that the
server host guys should be able to investigate.

Kind regards
Thomas
-- 
Linux ... enjoy the ride!


Re: Instability in VM blocked more than 120s

2012-04-20 Thread Steven Timm

On Fri, 20 Apr 2012, Niels_Walet wrote:


When moving my virtual machines (libvirt/qemu-kvm) from one server to
another (from amd to intel hardware), I seem to have suddenly hit the
time-out issues that have been discussed in many places (the dreaded
blocked more than 120s message), after which the systems both become
totally unresponsive). Since the time-out involves the filesystem, I can
only take screenshots, which I attach; nothing appears in the syslog.


This timeout issue is not specific to virtual machines, at Fermilab
we see it on bare metal machines just as much as we do on
virtual machines.



I have updated the virtual machines from SL 5.5 to 5.7; with some change but
similar crashes; I added a few boot parameters (having to do with idle=),
with no change at all.

Does anyone have any suggestion what I could try to further diagnose this
problem, or maybe even a solution?

Niels Walet


More than half the time when I've seen those timeouts they have
been blocking on some sort of network task or other.  Do you have
any kind of a network file system mounted such as NFS, AFS, GFS, etc?

Steve Timm


--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.