On Sat, Nov 23, 2019 at 3:14 AM Nir Soffer <nsof...@redhat.com> wrote:

> On Fri, Nov 22, 2019 at 10:41 PM Strahil Nikolov <hunter86...@yahoo.com>
> wrote:
>
>> On Thu, Nov 21, 2019 at 8:20 AM Sahina Bose <sab...@redhat.com> wrote:
>>
>>
>>
>> On Thu, Nov 21, 2019 at 6:03 AM Strahil Nikolov <hunter86...@yahoo.com>
>> wrote:
>>
>> Hi All,
>>
>> another clue in the logs :
>> [2019-11-21 00:29:50.536631] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
>> remote operation failed. Path:
>> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:29:50.536798] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
>> remote operation failed. Path:
>> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:29:50.536959] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
>> remote operation failed. Path:
>> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:29:50.537007] E [MSGID: 133010]
>> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
>> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
>> [Permission denied]
>> [2019-11-21 00:29:50.537066] W [fuse-bridge.c:2830:fuse_readv_cbk]
>> 0-glusterfs-fuse: 12458: READ => -1
>> gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae fd=0x7fc63c00fe18 (Permission
>> denied)
>> [2019-11-21 00:30:01.177665] I [MSGID: 133022]
>> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
>> gfid=eb103fbf-80dc-425d-882f-1e4efe510db5 from backend
>> [2019-11-21 00:30:13.132756] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
>> remote operation failed. Path:
>> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:13.132824] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
>> remote operation failed. Path:
>> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:13.133217] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
>> remote operation failed. Path:
>> /.shard/17c663c2-f582-455b-b806-3b9d01fb2c6c.79
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:13.133238] E [MSGID: 133010]
>> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
>> shard 79 failed. Base file gfid = 17c663c2-f582-455b-b806-3b9d01fb2c6c
>> [Permission denied]
>> [2019-11-21 00:30:13.133264] W [fuse-bridge.c:2830:fuse_readv_cbk]
>> 0-glusterfs-fuse: 12660: READ => -1
>> gfid=17c663c2-f582-455b-b806-3b9d01fb2c6c fd=0x7fc63c007038 (Permission
>> denied)
>> [2019-11-21 00:30:38.489449] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
>> remote operation failed. Path:
>> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:38.489520] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
>> remote operation failed. Path:
>> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:38.489669] W [MSGID: 114031]
>> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
>> remote operation failed. Path:
>> /.shard/a10a5ae8-108b-4d78-9e65-cca188c27fc4.6
>> (00000000-0000-0000-0000-000000000000) [Permission denied]
>> [2019-11-21 00:30:38.489717] E [MSGID: 133010]
>> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
>> shard 6 failed. Base file gfid = a10a5ae8-108b-4d78-9e65-cca188c27fc4
>> [Permission denied]
>> [2019-11-21 00:30:38.489777] W [fuse-bridge.c:2830:fuse_readv_cbk]
>> 0-glusterfs-fuse: 12928: READ => -1
>> gfid=a10a5ae8-108b-4d78-9e65-cca188c27fc4 fd=0x7fc63c01a058 (Permission
>> denied)
>>
>>
>> Anyone got an idea why is it happening?
>> I checked user/group and selinux permissions - all OK
>>
>>
>> >Can you share the commands (and output) used to check this?
>> I first thought that the file is cached in memory and that's why vdsm
>> user can read the file , but the following shows opposite:
>>
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# ll
>> total 562145
>> -rw-rw----. 1 vdsm kvm 5368709120 Nov 12 23:29
>> 5b1d3113-5cca-4582-9029-634b16338a2f
>> -rw-rw----. 1 vdsm kvm    1048576 Nov 11 14:11
>> 5b1d3113-5cca-4582-9029-634b16338a2f.lease
>> -rw-r--r--. 1 vdsm kvm        313 Nov 11 14:11
>> 5b1d3113-5cca-4582-9029-634b16338a2f.meta
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# pwd
>>
>> /rhev/data-center/mnt/glusterSD/gluster1:_data__fast/396604d9-2a9e-49cd-9563-fdc79981f67b/images/94f763e9-fd96-4bee-a6b2-31af841a918b
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 >
>> /proc/sys/vm/drop_caches
>>
>
> I would use iflag=direct instead, no need to mess with caches. Vdsm always
> use direct I/O.
>
>
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
>> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
>> dd: error reading ‘5b1d3113-5cca-4582-9029-634b16338a2f’: Permission
>> denied
>>
>
> You got permissions denied...
>
>
>> 16+0 records in
>> 16+0 records out
>> 67108864 bytes (67 MB) copied, 0.198372 s, 338 MB/s
>>
>
Seems like it could read upto ~67MB successfully before it encountered
'Permission denied' errors. Assuming a shard-block-size of 64MB, looks like
all the shards under /.shard could not be accessed.
Could you share the following pieces of information:
1.  brick logs of data_fast
2. ls -la of .shard relative to the bricks (NOT the mount) on all the
bricks of data_fast
3. and ls -la of all shards under .shard of data_fast (perhaps a handful of
them have root permission assigned somehow which is causing access to be
denied? Perhaps while resolving pending heals manually? )

-Krutika

>
> And dd continue to read data?!
>
> I have never seen anything like this.
>
> It will be helpful to run this with strace:
>
>     strace -t -TT -o dd.strace dd if=vol-id of=/dev/null iflag=direct
> bs=8M status=progress
>
> And share dd.strace.
>
> Logs in /var/log/glusterfs/exportname.log will contain useful info for
> this test.
>
>
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# dd
>> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
>> 5356126208 bytes (5.4 GB) copied, 12.061393 s, 444 MB/s
>> 1280+0 records in
>> 1280+0 records out
>> 5368709120 bytes (5.4 GB) copied, 12.0876 s, 444 MB/s
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
>> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
>> 3598712832 bytes (3.6 GB) copied, 1.000540 s, 3.6 GB/s
>> 1280+0 records in
>> 1280+0 records out
>> 5368709120 bytes (5.4 GB) copied, 1.47071 s, 3.7 GB/s
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# echo 3 >
>> /proc/sys/vm/drop_caches
>> [root@ovirt1 94f763e9-fd96-4bee-a6b2-31af841a918b]# sudo -u vdsm dd
>> if=5b1d3113-5cca-4582-9029-634b16338a2f of=/dev/null bs=4M status=progress
>> 5171576832 bytes (5.2 GB) copied, 12.071837 s, 428 MB/s
>> 1280+0 records in
>> 1280+0 records out
>> 5368709120 bytes (5.4 GB) copied, 12.4873 s, 430 MB/s
>>
>> As you can see , once root user reads the file -> vdsm user can also do
>> that.
>>
>
> Smells like issue on gluster side.
>
>
>> >I would try this on the hypervisor to check what vdsm/qemu see:
>>
>> >$ ls -lahRZ /rhv/data-center/mnt/glusterSD/gluster-server:_path
>> I'm attaching the output of the find I run, but this one should be enough:
>> [root@ovirt1 ~]# find /rhev/data-center/mnt/glusterSD/*/[0-9]*/images/
>> -not -user vdsm -print
>>
>
> A full output of ls -lahRZ, showing user, group, permissions bits, and
> selinux label
> of the entire tree will be more useful.
>
>
>> >Also, to make sure we don't have selinux issue on the hypervisor, you
>> can change
>> >selinux to permissive mode:
>>
>>   >  setenforce 0
>>
>> This is the first thing I did and the systems were still in permissive
>> when I tried again.I'm 99.99% sure it's not selinux.
>>
>>
>> >And then try again. If this was selinux issue the permission denied
>> issue will disappear.
>> >If this is the case please provide the output of:
>>
>>   >  ausearh -m AVC -ts today
>>
>> >If the issue still exists, we eliminated selinux, and you can enable it
>> again:
>>
>>   >  setenforce 1
>>
>> [root@ovirt3 ~]# ausearch -m AVC -ts today
>> <no matches>
>> [root@ovirt2 ~]# ausearch -m AVC -ts today
>> <no matches>
>> [root@ovirt1 ~]# ausearch -m AVC -ts today
>> <no matches>
>>
>
> So this is not selinux on the hypervisor. I wonder if it can be selinux on
> the gluster side?
>
>
>> I have a vague feeling that the issue is related to gluster v6.5 to 6.6
>> upgrade which I several days before... So if any logs are needed (or debug
>> enabled), just mention.
>>
>
> If this is the last change, and it worked before, most likely.
>
> Nir
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/AKLLOJKG6NEJUB264RA5PLQMGWNG3GD3/
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OO47Q3WXPVKFQARBS4QXPVJK7M3BI2EK/

Reply via email to