Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-31 Thread Jason Dillaman
The exclusive-lock feature should only require grabbing the lock on the very first IO, so if this is an issue that pops up after extended use, it's either most likely not related to exclusive-lock or perhaps you had a client<->OSD link hiccup. In the latter case, you will see a log message like

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-30 Thread Peter Maloney
On 03/28/17 17:28, Brian Andrus wrote: > Just adding some anecdotal input. It likely won't be ultimately > helpful other than a +1.. > > Seemingly, we also have the same issue since enabling exclusive-lock > on images. We experienced these messages at a large scale when making > a CRUSH map change

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Brian Andrus
Just adding some anecdotal input. It likely won't be ultimately helpful other than a +1.. Seemingly, we also have the same issue since enabling exclusive-lock on images. We experienced these messages at a large scale when making a CRUSH map change a few weeks ago that resulted in many many VMs

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Jason Dillaman
Eric, If you already have debug level 20 logs captured from one of these events, I would love to be able to take a look at them to see what's going on. Depending on the size, you could either attach the log to a new RBD tracker ticket [1] or use the ceph-post-file helper to upload a large file.

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-28 Thread Marius Vaitiekunas
On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney < peter.malo...@brockmann-consult.de> wrote: > I can't guarantee it's the same as my issue, but from that it sounds the > same. > > Jewel 10.2.4, 10.2.5 tested > hypervisors are proxmox qemu-kvm, using librbd > 3 ceph nodes with mon+osd on each > >

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Peter Maloney
I can't guarantee it's the same as my issue, but from that it sounds the same. Jewel 10.2.4, 10.2.5 tested hypervisors are proxmox qemu-kvm, using librbd 3 ceph nodes with mon+osd on each -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops and bw limits on client side, jumbo

[ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Hall, Eric
In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph hosts, we occasionally see hung processes (usually during boot, but otherwise as well), with errors reported in the instance logs as shown