Re: [ovirt-users] vm pauses with "vm has paused due to unknown storage error

2016-06-26 Thread Krutika Dhananjay
Hi Bill,

After glusterfs 3.7.11, around 4-5 bugs were found in sharding and
replicate modules and fixed, some of them causing the VM(s) to pause. Could
you share the glusterfs client logs from around the time the issue was
seen? This will help me confirm it's the same issue, or even debug further
if this is a new issue.

-Krutika

On Fri, Jun 24, 2016 at 10:02 AM, Sahina Bose  wrote:

> Can you post the gluster mount logs from the node where paused VM was
> running (under
> /var/log/glusterfs/rhev-datarhev-data-center-mnt-glusterSD.log)
> ?
> Which version of glusterfs are you running?
>
>
> On 06/24/2016 07:49 AM, Bill Bill wrote:
>
> Hello,
>
>
>
> Have 3 nodes running both oVirt and Gluster on 4 SSD’s each. At the
> moment, there are two physical nics, one has public internet access and the
> other is a non-routable network used for ovirtmgmt & gluster.
>
>
>
> In the logical networks, I have selected gluster for the nonroutable
> network running ovirtmgmt and gluster however, two VM’s randomly pause for
> what seems like no reason. They can both be resumed without issue.
>
>
>
> One test VM has 4GB of memory and a small disk – no problems with this
> one. Two others have 800GB disks and 32GB of RAM – both vm’s exhibit the
> same issue.
>
>
>
> I also see these in the oVirt dashboard:
>
>
>
>
>
> Failed to update OVF disks 9e60328d-29af-4533-84f9-633d87f548a7, OVF data
> isn't updated on those OVF stores (Data Center x, Storage Domain
> sr-volume01).
>
>
>
> Jun 23, 2016 9:54:03 PM
>
>
>
> VDSM command failed: Could not acquire resource. Probably resource factory
> threw an exception.: ()
>
>
>
> ///
>
>
>
> VM x has been paused due to unknown storage error.
>
>
>
> ///
>
>
>
> In the error log on the engine, I see these:
>
>
>
> ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ForkJoinPool-1-worker-7) [10caf93e] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM xx has been paused due to
> unknown storage error.
>
>
>
> INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ForkJoinPool-1-worker-11) [10caf93e] Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM xx has recovered from paused
> back to up.
>
>
>
> ///
>
>
>
> Hostnames are all local to /etc/hosts on all servers – they also resolve
> without issue from each host.
>
>
>
> //
>
>
>
> 2016-06-23 22:08:59,611 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3435:/mnt/data/sr-volume01' of volume
> '93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,614 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3637:/mnt/data/sr-volume01' of volume
> '93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,616 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3839:/mnt/data/sr-volume01' of volume
> '93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,618 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3435:/mnt/data/distributed' of volume
> 'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,620 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3637:/mnt/data/distributed' of volume
> 'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,622 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3839:/mnt/data/distributed' of volume
> 'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no gluster
> network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'
>
> 2016-06-23 22:08:59,624 WARN
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> (DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick
> 'ovirt3435:/mnt/data/iso' of volume '89f32457-c8c3-490e-b491-16dd27de0073'
> with correct network as no 

Re: [ovirt-users] vm pauses with "vm has paused due to unknown storage error

2016-06-23 Thread Sahina Bose
Can you post the gluster mount logs from the node where paused VM was 
running (under 
/var/log/glusterfs/rhev-datarhev-data-center-mnt-glusterSD.log) 
?

Which version of glusterfs are you running?

On 06/24/2016 07:49 AM, Bill Bill wrote:


Hello,

Have 3 nodes running both oVirt and Gluster on 4 SSD’s each. At the 
moment, there are two physical nics, one has public internet access 
and the other is a non-routable network used for ovirtmgmt & gluster.


In the logical networks, I have selected gluster for the nonroutable 
network running ovirtmgmt and gluster however, two VM’s randomly pause 
for what seems like no reason. They can both be resumed without issue.


One test VM has 4GB of memory and a small disk – no problems with this 
one. Two others have 800GB disks and 32GB of RAM – both vm’s exhibit 
the same issue.


I also see these in the oVirt dashboard:

Failed to update OVF disks 9e60328d-29af-4533-84f9-633d87f548a7, OVF 
data isn't updated on those OVF stores (Data Center x, Storage 
Domain sr-volume01).


Jun 23, 2016 9:54:03 PM

VDSM command failed: Could not acquire resource. Probably resource 
factory threw an exception.: ()


///

VM x has been paused due to unknown storage error.

///

In the error log on the engine, I see these:

ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-7) [10caf93e] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM xx has been paused due to 
unknown storage error.


INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-11) [10caf93e] Correlation ID: null, Call 
Stack: null, Custom Event ID: -1, Message: VM xx has recovered 
from paused back to up.


///

Hostnames are all local to /etc/hosts on all servers – they also 
resolve without issue from each host.


//

2016-06-23 22:08:59,611 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3435:/mnt/data/sr-volume01' of volume 
'93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,614 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3637:/mnt/data/sr-volume01' of volume 
'93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,616 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3839:/mnt/data/sr-volume01' of volume 
'93e36cdc-ab1b-41ec-ac7f-966cf3856b59' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,618 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3435:/mnt/data/distributed' of volume 
'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,620 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3637:/mnt/data/distributed' of volume 
'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,622 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3839:/mnt/data/distributed' of volume 
'b887b05e-2ea6-496e-9552-155d658eeaa6' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,624 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3435:/mnt/data/iso' of volume 
'89f32457-c8c3-490e-b491-16dd27de0073' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,626 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not associate brick 
'ovirt3637:/mnt/data/iso' of volume 
'89f32457-c8c3-490e-b491-16dd27de0073' with correct network as no 
gluster network found in cluster '75bd64de-04b2-4a99-9cd0-b63e919b9aca'


2016-06-23 22:08:59,628 WARN 
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] 
(DefaultQuartzScheduler_Worker-76) [1c1cf4f] Could not