Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-21 Thread Krutika Dhananjay
Hi Martin, Glad it worked! And yes, 3.7.6 is really old! :) So the issue is occurring when the vm flushes outstanding data to disk. And this is taking > 120s because there's lot of buffered writes to flush, possibly followed by an fsync too which needs to sync them to disk (volume profile would

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-20 Thread Martin
Hi Krutika, > Also, gluster version please? I am running old 3.7.6. (Yes I know I should upgrade asap) I’ve applied firstly "network.remote-dio off", behaviour did not changed, VMs got stuck after some time again. Then I’ve set "performance.strict-o-direct on" and problem completly

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
OK. In that case, can you check if the following two changes help: # gluster volume set $VOL network.remote-dio off # gluster volume set $VOL performance.strict-o-direct on preferably one option changed at a time, its impact tested and then the next change applied and tested. Also, gluster

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Andrey Volodin
what is the context from dmesg ? On Mon, May 13, 2019 at 7:33 AM Andrey Volodin wrote: > as per > https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. > , > the informational warning could be suppressed with : > > "echo 0 >

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Andrey Volodin
as per https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. , the informational warning could be suppressed with : "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" Moreover, as per their website : "*This message is not an error*. It is an indication that a

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Martin Toth
Cache in qemu is none. That should be correct. This is full command : /usr/bin/qemu-system-x86_64 -name one-312 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 -no-user-config -nodefaults

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
Also, what's the caching policy that qemu is using on the affected vms? Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output. -Krutika On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay wrote: > What

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
What version of gluster are you using? Also, can you capture and share volume-profile output for a run where you manage to recreate this issue? https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command Let me know if you have any

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Martin Toth
Hi, there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good. > you'd have it's log on qemu's standard output, If you mean /var/log/libvirt/qemu/vm.log there is nothing. I

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread lemonnierk
On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: > Hi all, Hi > > I am running replica 3 on SSDs with 10G networking, everything works OK but > VMs stored in Gluster volume occasionally freeze with “Task XY blocked for > more than 120 seconds”. > Only solution is to poweroff