29.03.2019 20:39, Max Reitz wrote: > On 29.03.19 18:31, Vladimir Sementsov-Ogievskiy wrote: >> 29.03.2019 20:15, Max Reitz wrote: >>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote: >>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on >>>> loosening permissions. However file-locking operations may fail even >>>> in this case, for example on NFS. And this leads to Qemu crash. >>>> >>>> Let's avoid such errors. Note, that we ignore such things anyway on >>>> permission update commit and abort. >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]> >>>> --- >>>> block/file-posix.c | 12 ++++++++++++ >>>> 1 file changed, 12 insertions(+) >>>> >>>> diff --git a/block/file-posix.c b/block/file-posix.c >>>> index db4cccbe51..1cf4ee49eb 100644 >>>> --- a/block/file-posix.c >>>> +++ b/block/file-posix.c >>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs, >>>> >>>> switch (op) { >>>> case RAW_PL_PREPARE: >>>> + if ((s->perm | new_perm) == s->perm && >>>> + (s->shared_perm & new_shared) == s->shared_perm) >>>> + { >>>> + /* >>>> + * We are going to unlock bytes, it should not fail. If it >>>> fail due >>>> + * to some fs-dependent permission-unrelated reasons (which >>>> occurs >>>> + * sometimes on NFS and leads to abort in bdrv_replace_child) >>>> we >>>> + * can't prevent such errors by any check here. And we ignore >>>> them >>>> + * anyway in ABORT and COMMIT. >>>> + */ >>>> + return 0; >>>> + } >>>> ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm, >>>> ~s->shared_perm | ~new_shared, >>>> false, errp); >>> >>> Help me understand the exact issue, please. I understand that there are >>> operations like bdrv_replace_child() that pass &error_abort to >>> bdrv_check_perm() because they just loosen the permissions, so it should >>> not fail. >>> >>> However, if the whole effect really would be to loosen permissions, >>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway: >>> @unlock is passed as false, so no bytes will be unlocked. And if >>> permissions are just loosened (as your condition checks), it should not >>> lock any bytes. >> >> but it does qemu_lock_fd unconditionally on any locked byte (it may be >> loosening, >> but not of all bytes). > > It only locks bytes that aren't already locked ("!(locked_perm & bit)"). > So if (s->perm | new_perm) & ~s->locked_perm == 0 (same for shared), it
Oops, missed that. We don't have this code in rhel base.. > won't lock anything. That's the same as s->perm | new_perm == s->perm > and s->perm == s->locked_perm. So if this patch prevents qemu_lock_fd() > errors in raw_apply_lock_bytes(), s->perm must be different from > s->locked_perm. > >> and fcntl is called, which may fail. > > Hm, yeah, and as Kevin said, raw_check_lock_bytes() if new_perm and > new_shared are already covered (though this could be fixed by passing > new_perm & ~s->perm and... new_shared | ~s->shared_perm? to > raw_check_lock_bytes). > > It just seems to me that we sometimes do end up in a position where > s->perm and s->locked_perm (or ~s->shared_perm and > s->locked_shared_perm) do not align, which should not be happening: This > would mean that the block layer thinks the permission locks are taken, > but file-posix disagrees. So if it is it's possibly not my case, as all my crashes are on old code, which don't have this additional checking in raw_apply_lock_bytes > > Max > >>> So why does it attempt lock any bytes in the first place? There must be >>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm >>> and s->locked_shared_perm. How does that occur? >>> >>> Max >>> >> >> > > -- Best regards, Vladimir
