Re: [Gluster-users] [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-12-02 Thread Strahil
Hi Jiffin,

Thanks for the info.
As I'm now on Gluster v7 , I hope it won't happen again.
It's nice to know it got fixed.

Best Regards,
Strahil NikolovOn Dec 2, 2019 11:32, Jiffin Thottan  wrote:
>
> Hi Krutika, 
>
> Apparently, in context acl info got corrupted see brick logs 
>
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: 
> client: 
> CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
>  
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806, 
> req(uid:107,gid:107,perm:1,ngrps:4), 
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) 
> [Permission denied] 
>
> which resulted in the situation. There was one bug similar was reported 
> https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and 
>
> it got fixed in 6.6 release IMO 
> https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he 
> saw the issue when 
>
> he upgraded from 6.5 to 6.6 
>
> One way to workaround is to perform a dummy setfacl(preferably using root) on 
> the corrupted files which will forcefully fetch the acl 
>
> info again from backend and update the context. Another approach to restart 
> brick process(kill and vol start force) 
>
> Regards, 
> Jiffin 
>
> - Original Message - 
> From: "Krutika Dhananjay"  
> To: "Strahil Nikolov" , "Jiffin Thottan" 
> , "raghavendra talur"  
> Cc: "Nir Soffer" , "Rafi Kavungal Chundattu Parambil" 
> , "users" , "gluster-user" 
>  
> Sent: Monday, December 2, 2019 11:48:22 AM 
> Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is 
> now available for testing 
>
> Sorry about the late response. 
>
> I looked at the logs. These errors are originating from posix-acl 
> translator - 
>
>
>
> *[2019-11-17 07:55:47.090065] E [MSGID: 115050] 
> [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496: 
> LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6 
> (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
>  
> client: 
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
>  
> error-xlator: data_fast-access-control [Permission denied][2019-11-17 
> 07:55:47.090174] I [MSGID: 139001] 
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: 
> client: 
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
>  
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806, 
> req(uid:36,gid:36,perm:1,ngrps:3), 
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) 
> [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050] 
> [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497: 
> LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7 
> (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
>  
> client: 
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
>  
> error-xlator: data_fast-access-control [Permission denied][2019-11-17 
> 07:55:47.090299] I [MSGID: 139001] 
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control: 
> client: 
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
>  
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806, 
> req(uid:36,gid:36,perm:1,ngrps:3), 
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) 
> [Permission denied]* 
>
> Jiffin/Raghavendra Talur, 
> Can you help? 
>
> -Krutika 
>
> On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov  
> wrote: 
>
> > Hi Nir,All, 
> > 
> > it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached 
> > screenshot of oVirt running on v7 gluster). 
> > It seems strange that both my serious issues with oVirt are related to 
> > gluster issue (1st gluster v3  to v5 migration and now this one). 
> > 
> > I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all 
> > nodes. 
> > Now both Engine and all my VMs are back online - so if you hit issues with 
> > 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before 
> > deciding to wipe everything. 
> > 
> > @Krutika, 
> > 
> > I guess you will ask for the logs, so let's switch to gluster-users about 
> > this one ? 
> > 
> > Best Regards, 
> > Strahil Nikolov 
> > 
> > В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov < 
> > hunter86...@yahoo.com> написа: 
> > 
> > 
> > Hi Krutika, 
> > 
> > I have enabled TRACE log level for the volume data_fast, 
> > 
> > but the issue is not much clear: 
> > FUSE reports: 
> > 
> > [2019-11-25 21:31:53.478130] I [MSGID: 133022] 
> > [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of 
> > gfid=6d9ed2e5-d4f2-4749-839b-2f1 
> > 3a68ed472 from backend 
> > 

Re: [Gluster-users] [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-12-02 Thread Jiffin Thottan
Hi Krutika,

Apparently, in context acl info got corrupted see brick logs

[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:107,gid:107,perm:1,ngrps:4),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]

which resulted in the situation. There was one bug similar was reported 
https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and

it got fixed in 6.6 release IMO 
https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw 
the issue when

he upgraded from 6.5 to 6.6

One way to workaround is to perform a dummy setfacl(preferably using root) on 
the corrupted files which will forcefully fetch the acl

info again from backend and update the context. Another approach to restart 
brick process(kill and vol start force)

Regards,
Jiffin

- Original Message -
From: "Krutika Dhananjay" 
To: "Strahil Nikolov" , "Jiffin Thottan" 
, "raghavendra talur" 
Cc: "Nir Soffer" , "Rafi Kavungal Chundattu Parambil" 
, "users" , "gluster-user" 

Sent: Monday, December 2, 2019 11:48:22 AM
Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now 
available for testing

Sorry about the late response.

I looked at the logs. These errors are originating from posix-acl
translator -



*[2019-11-17 07:55:47.090065] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090174] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090299] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]*

Jiffin/Raghavendra Talur,
Can you help?

-Krutika

On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov 
wrote:

> Hi Nir,All,
>
> it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> screenshot of oVirt running on v7 gluster).
> It seems strange that both my serious issues with oVirt are related to
> gluster issue (1st gluster v3  to v5 migration and now this one).
>
> I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> nodes.
> Now both Engine and all my VMs are back online - so if you hit issues with
> 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> deciding to wipe everything.
>
> @Krutika,
>
> I guess you will ask for the logs, so let's switch to gluster-users about
> this one ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Hi Krutika,
>
> I have enabled TRACE log level for the volume data_fast,
>
> but the issue is not much clear:
> FUSE reports:
>
> [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=6d9ed2e5-d4f2-4749-839b-2f1
> 3a68ed472 from backend
> [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565653] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 

Re: [Gluster-users] [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-12-01 Thread Krutika Dhananjay
Sorry about the late response.

I looked at the logs. These errors are originating from posix-acl
translator -



*[2019-11-17 07:55:47.090065] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090174] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090299] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]*

Jiffin/Raghavendra Talur,
Can you help?

-Krutika

On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov 
wrote:

> Hi Nir,All,
>
> it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> screenshot of oVirt running on v7 gluster).
> It seems strange that both my serious issues with oVirt are related to
> gluster issue (1st gluster v3  to v5 migration and now this one).
>
> I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> nodes.
> Now both Engine and all my VMs are back online - so if you hit issues with
> 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> deciding to wipe everything.
>
> @Krutika,
>
> I guess you will ask for the logs, so let's switch to gluster-users about
> this one ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Hi Krutika,
>
> I have enabled TRACE log level for the volume data_fast,
>
> but the issue is not much clear:
> FUSE reports:
>
> [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=6d9ed2e5-d4f2-4749-839b-2f1
> 3a68ed472 from backend
> [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565653] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565689] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565770] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
> [Permission denied]
> [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae
> fd=0x7fbf40005ea8 (Permission denied)
>
>
> While the BRICK logs on ovirt1/gluster1 report:
> 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule]
> 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop
> [2019-11-25 21:32:43.564194] T [MSGID: 0]
> [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall
> [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall
> to data_fast-leases
> [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases
> to