On Tue, Sep 26, 2017 at 04:31:22PM +0200, Nicolas Ecarnot wrote:
> Le 21/09/2017 à 16:31, Stefan Hajnoczi a écrit :
> > On Tue, Sep 19, 2017 at 12:09:06PM +0200, Nicolas Ecarnot wrote:
> > > Hello,
> > > 
> > > First post here, so maybe I should introduce myself :
> > > - I'm a sysadmin for decades and currently managing 4 oVirt clusters, made
> > > out of tens of hypervisors, all are CentOS 7.2+ based.
> > > - I'm very happy with this solution we choose especially because it is 
> > > based
> > > on qemu-kvm (open source, reliable, documented).
> > > 
> > > On one VM, we experienced the following :
> > > - oVirt/vdsm is detecting an issue on the image
> > > - following this hints https://access.redhat.com/solutions/1173623, I
> > > managed to detect one error and fix it
> > > - the VM is now running perfectly
> > > 
> > > On two other VMs, we experienced a similar situation, except the check 
> > > stage
> > > is showing something like 14000+ errors, and the relevant logs are :
> > > 
> > > Repairing refcount block 14 is outside image
> > > ERROR could not resize image: Invalid argument
> > > ERROR cluster 425984 refcount=0 reference=1
> > > ERROR cluster 425985 refcount=0 reference=1
> > > [... repeating the previous line 7000+ times...]
> > > ERROR cluster 457166 refcount=0 reference=1
> > > Rebuilding refcount structure
> > > ERROR writing refblock: No space left on device
> > > qemu-img: Check failed: No space left on device
> > 
> > Please run strace qemu-img info /the/relevant/logical/volume/path.  It

Sorry, "qemu-img info" should be your "qemu-img check" command.

> > will print all the syscalls that qemu-img makes.  That way we'll be able
> > to verify that the ENOSPC error is coming from a pwritev syscall.
> I did but I'm not skilled enough to ensure where the ENOSPC error is coming
> from.
> 
> Is your question meaning the reads and/or the writes may come from or go to
> places outside the expected boundaries?

I was interested in the syscall (probably pwritev or similar) related to
the following output from qemu-img check:

  ERROR writing refblock: No space left on device

Feel free to post your strace log so we can analyze it.

> > > You surely know that oVirt/RHEV is storing its qcow2 images in dedicated
> > > logical volumes.
> > > 
> > > pvs/vgs/lvs are all showing there is plenty of space available, so I
> > > understand that I don't understand what "No space left on device" means.
> > 
> > After you have the strace data you can look at the file offset from the
> > failing pwritev syscall and check that it's really within the LV.
> > 
> > I think there is no fancy thin provisioning going on at the LVM level
> > with oVirt, but if there is then perhaps a write within the LV could
> > still result in an ENOSPC error.  It would be worth confirming that
> > these are class "thick" LVs.
> 
> I think there is no such thin prov. at the LVM level, but I wouldn't swear.
> Don't you mind if I forward your question to the oVirt mailing-list?

Sure, feel free to CC other mailing lists.  I have added oVirt devel.

Stefan

Reply via email to