I have the same issue when I run backup tasks during the night.

I have a Gluster setup with a 1TB SSD on each of the tree nodes. Maybe its
related to bug: https://bugzilla.redhat.com/show_bug.cgi?id=1430847

sanlock.log:
2017-11-23 00:46:42 3410597 [1114]: s15 check_our_lease warning 60
last_success 3410537
2017-11-23 00:46:43 3410598 [1114]: s15 check_our_lease warning 61
last_success 3410537
2017-11-23 00:46:44 3410599 [1114]: s15 check_our_lease warning 62
last_success 3410537
2017-11-23 00:46:45 3410600 [1114]: s15 check_our_lease warning 63
last_success 3410537
2017-11-23 00:46:46 3410601 [1114]: s15 check_our_lease warning 64
last_success 3410537
2017-11-23 00:46:47 3410602 [1114]: s15 check_our_lease warning 65
last_success 3410537
2017-11-23 00:46:48 3410603 [1114]: s15 check_our_lease warning 66
last_success 3410537
2017-11-23 00:46:49 3410603 [28384]: s15 delta_renew long write time 46 sec
2017-11-23 00:46:49 3410603 [28384]: s15 renewed 3410557 delta_length 46
too long
2017-11-23 02:48:04 3417878 [28384]: s15 delta_renew long write time 10 sec
2017-11-23 02:57:23 3418438 [28384]: s15 delta_renew long write time 34 sec
2017-11-23 02:57:23 3418438 [28384]: s15 renewed 3418404 delta_length 34
too long


vdsm.log | grep "WARN"
017-11-23 00:20:05,544+0100 WARN  (jsonrpc/0) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=63.7199999997) (vm:5109)
2017-11-23 00:20:06,840+0100 WARN  (check/loop) [storage.check] Checker
u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata'
is blocked for 10.00 seconds (check:279)
2017-11-23 00:20:13,853+0100 WARN  (periodic/170)
[virt.periodic.VmDispatcher] could not run <class
'vdsm.virt.periodic.UpdateVolumes'> on
[u'e1f26ea9-9294-4d9c-8f70-d59f96dec5f7'] (periodic:308)
2017-11-23 00:20:15,031+0100 WARN  (jsonrpc/2) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=73.21) (vm:5109)
2017-11-23 00:20:20,586+0100 WARN  (jsonrpc/4) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=78.7599999998) (vm:5109)
2017-11-23 00:21:06,849+0100 WARN  (check/loop) [storage.check] Checker
u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata'
is blocked for 10.01 seconds (check:279)
2017-11-23 00:21:13,847+0100 WARN  (periodic/167)
[virt.periodic.VmDispatcher] could not run <class
'vdsm.virt.periodic.UpdateVolumes'> on
[u'd8f22423-9fe3-4c06-97dc-5c9e9f5b33c8'] (periodic:308)
2017-11-23 00:22:13,854+0100 WARN  (periodic/172)
[virt.periodic.VmDispatcher] could not run <class
'vdsm.virt.periodic.UpdateVolumes'> on
[u'd8f22423-9fe3-4c06-97dc-5c9e9f5b33c8'] (periodic:308)
2017-11-23 00:22:16,846+0100 WARN  (check/loop) [storage.check] Checker
u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata'
is blocked for 9.99 seconds (check:279)
2017-11-23 00:23:06,040+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=64.2199999997) (vm:5109)
2017-11-23 00:23:06,850+0100 WARN  (check/loop) [storage.check] Checker
u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata'
is blocked for 9.98 seconds (check:279)
2017-11-23 00:23:13,845+0100 WARN  (periodic/169)
[virt.periodic.VmDispatcher] could not run <class
'vdsm.virt.periodic.UpdateVolumes'> on
[u'5ef506de-44b9-4ced-9b7f-b90ee098f4f7'] (periodic:308)
2017-11-23 00:23:16,855+0100 WARN  (jsonrpc/7) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=75.0300000003) (vm:5109)
2017-11-23 00:23:21,082+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=79.2599999998) (vm:5109)
2017-11-23 00:25:31,488+0100 WARN  (libvirt/events) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') unknown eventid 8 args
('/rhev/data-center/00000001-0001-0001-0001-000000000370/f0e21aae-1237-4dd3-88ec-81254d29c372/images/1a1b9620-52fc-4008-9047-15cd725f8bd8/90b
913ba-e03f-46c5-bccf-bae011fcdd55', 4, 3, 8) (clientIF:549)
2017-11-23 00:25:32,372+0100 WARN  (libvirt/events) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') unknown eventid 8 args
('/rhev/data-center/00000001-0001-0001-0001-000000000370/f0e21aae-1237-4dd3-88ec-81254d29c372/images/1a1b9620-52fc-4008-9047-15cd725f8bd8/90b
913ba-e03f-46c5-bccf-bae011fcdd55', 4, 0, 8) (clientIF:549)
2017-11-23 00:45:56,851+0100 WARN  (check/loop) [storage.check] Checker
u'/rhev/data-center/mnt/glusterSD/x-c01-n03:_fastIO/f0e21aae-1237-4dd3-88ec-81254d29c372/dom_md/metadata'
is blocked for 10.00 seconds (check:279)
2017-11-23 00:46:13,850+0100 WARN  (periodic/172)
[virt.periodic.VmDispatcher] could not run <class
'vdsm.virt.periodic.UpdateVolumes'> on
[u'e1f26ea9-9294-4d9c-8f70-d59f96dec5f7',
u'5ef506de-44b9-4ced-9b7f-b90ee098f4f7'] (periodic:308)
2017-11-23 00:46:36,013+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=64.0899999999) (vm:5109)
2017-11-23 00:46:38,805+0100 WARN  (jsonrpc/2) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=66.8799999999) (vm:5109)
2017-11-23 00:46:40,439+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='930ecaca-ef2f-490a-a4df-e4f0dad218aa') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,440+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,441+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,442+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,442+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='0cf9b0cb-7c53-4bab-b879-0bdf190b293c') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,443+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,444+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,445+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='d8f22423-9fe3-4c06-97dc-5c9e9f5b33c8') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,446+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='ea36f7bd-1790-4b42-b7e1-6d8e2ef0487b') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:40,446+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='82ed235e-37bb-4d67-8db9-61d39340f951') monitor became unresponsive
(command timeout, age=68.5199999996) (vm:5109)
2017-11-23 00:46:46,116+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='930ecaca-ef2f-490a-a4df-e4f0dad218aa') monitor became unresponsive
(command timeout, age=74.1899999995) (vm:5109)
2017-11-23 00:46:46,118+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
(command timeout, age=74.1899999995) (vm:5109)
2017-11-23 00:46:46,119+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,120+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,121+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0cf9b0cb-7c53-4bab-b879-0bdf190b293c') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,123+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,124+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,125+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='d8f22423-9fe3-4c06-97dc-5c9e9f5b33c8') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,127+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='ea36f7bd-1790-4b42-b7e1-6d8e2ef0487b') monitor became unresponsive
(command timeout, age=74.1999999993) (vm:5109)
2017-11-23 00:46:46,128+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='82ed235e-37bb-4d67-8db9-61d39340f951') monitor became unresponsive
(command timeout, age=74.21) (vm:5109)
2017-11-23 00:46:46,509+0100 WARN  (jsonrpc/3) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=74.5899999999) (vm:5109)
2017-11-23 00:46:48,187+0100 WARN  (jsonrpc/7) [virt.vm]
(vmId='0bcf7520-3c60-42a1-8e6b-683af670e6cb') monitor became unresponsive
(command timeout, age=76.2599999998) (vm:5109)
2017-11-23 00:46:49,825+0100 WARN  (periodic/173)
[virt.sampling.StatsCache] dropped stale old sample: sampled 7705208.650000
stored 7705268.650000 (sampling:442)
2017-11-23 00:46:49,835+0100 WARN  (periodic/176)
[virt.sampling.StatsCache] dropped stale old sample: sampled 7705253.650000
stored 7705268.650000 (sampling:442)
2017-11-23 00:46:49,854+0100 WARN  (periodic/171)
[virt.sampling.StatsCache] dropped stale old sample: sampled 7705238.650000
stored 7705268.650000 (sampling:442)
2017-11-23 00:46:49,866+0100 WARN  (periodic/174)
[virt.sampling.StatsCache] dropped stale old sample: sampled 7705223.650000
stored 7705268.650000 (sampling:442)
2017-11-23 00:46:55,488+0100 WARN  (jsonrpc/0) [virt.vm]
(vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
(command timeout, age=83.5699999994) (vm:5109)
2017-11-23 00:46:55,488+0100 WARN  (jsonrpc/0) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=83.5699999994) (vm:5109)
2017-11-23 00:46:55,489+0100 WARN  (jsonrpc/0) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=83.5699999994) (vm:5109)
2017-11-23 00:46:55,491+0100 WARN  (jsonrpc/0) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=83.5699999994) (vm:5109)
2017-11-23 00:47:01,742+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='e1f26ea9-9294-4d9c-8f70-d59f96dec5f7') monitor became unresponsive
(command timeout, age=89.8199999994) (vm:5109)
2017-11-23 00:47:01,743+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=89.8199999994) (vm:5109)
2017-11-23 00:47:01,744+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=89.8199999994) (vm:5109)
2017-11-23 00:47:01,746+0100 WARN  (jsonrpc/1) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=89.8199999994) (vm:5109)
2017-11-23 00:47:10,531+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=98.6099999994) (vm:5109)
2017-11-23 00:47:10,532+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=98.6099999994) (vm:5109)
2017-11-23 00:47:10,534+0100 WARN  (jsonrpc/6) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=98.6099999994) (vm:5109)
2017-11-23 00:47:16,950+0100 WARN  (jsonrpc/7) [virt.vm]
(vmId='0a83954f-56d1-42d0-88b9-825435055fd0') monitor became unresponsive
(command timeout, age=105.029999999) (vm:5109)
2017-11-23 00:47:16,951+0100 WARN  (jsonrpc/7) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=105.029999999) (vm:5109)
2017-11-23 00:47:16,953+0100 WARN  (jsonrpc/7) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=105.029999999) (vm:5109)
2017-11-23 00:47:25,578+0100 WARN  (jsonrpc/4) [virt.vm]
(vmId='245e104f-2bd5-4f77-81de-d75a593d77c5') monitor became unresponsive
(command timeout, age=113.659999999) (vm:5109)
2017-11-23 00:47:25,581+0100 WARN  (jsonrpc/4) [virt.vm]
(vmId='5ef506de-44b9-4ced-9b7f-b90ee098f4f7') monitor became unresponsive
(command timeout, age=113.659999999) (vm:5109)

Kind regards,


Florian Nolden

Head of IT at Xilloc Medical B.V.

———————————————————————————————

Disclaimer: The content of this e-mail, including any attachments, are
confidential and are intended for the sole use of the individual or entity
to which it is addressed. If you have received it by mistake please let us
know by reply and then delete it from your system. Any distribution,
copying or dissemination of this message is expected to conform to all
legal stipulations governing the use of information.

2017-11-23 11:25 GMT+01:00 Sven Achtelik <sven.achte...@eps.aero>:

> Hi All,
>
>
>
> I’m experiencing huge issues when working with big VMs on Gluster volumes.
> Doing a Snapshot or removing a big Disk lead to the effect that the SPM
> node is getting non responsive. Fencing is than kicking in and taking the
> node down with the hard reset/reboot.
>
>
>
> My setup has three nodes with 10Gbit/s NICs for the Gluster network. The
> Bricks are on Raid-6 with a 1GB cache on the raid controller and the
> volumes are setup as follows:
>
>
>
> Volume Name: data
>
> Type: Replicate
>
> Volume ID: c734d678-91e3-449c-8a24-d26b73bef965
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: ovirt-node01-gfs.storage.lan:/gluster/brick2/data
>
> Brick2: ovirt-node02-gfs.storage.lan:/gluster/brick2/data
>
> Brick3: ovirt-node03-gfs.storage.lan:/gluster/brick2/data
>
> Options Reconfigured:
>
> features.barrier: disable
>
> cluster.granular-entry-heal: enable
>
> performance.readdir-ahead: on
>
> performance.quick-read: off
>
> performance.read-ahead: off
>
> performance.io-cache: off
>
> performance.stat-prefetch: on
>
> cluster.eager-lock: enable
>
> network.remote-dio: off
>
> cluster.quorum-type: auto
>
> cluster.server-quorum-type: server
>
> storage.owner-uid: 36
>
> storage.owner-gid: 36
>
> features.shard: on
>
> features.shard-block-size: 512MB
>
> performance.low-prio-threads: 32
>
> cluster.data-self-heal-algorithm: full
>
> cluster.locking-scheme: granular
>
> cluster.shd-wait-qlength: 10000
>
> cluster.shd-max-threads: 6
>
> network.ping-timeout: 30
>
> user.cifs: off
>
> nfs.disable: on
>
> performance.strict-o-direct: on
>
> server.event-threads: 4
>
> client.event-threads: 4
>
>
>
> It feel like the System looks up during snapshotting or removing of a big
> disk and this delay triggers things to go wrong. Is there anything that is
> not setup right on my gluster or is this behavior normal with bigger disks
> (50GB+) ? Is there a reliable option for caching with SSDs ?
>
>
>
> Thank you,
>
> Sven
>
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to