Re: [Gluster-users] Corrupted File readable via FUSE?

2019-02-06 Thread David Spisla
Hello Raghavendra,

I can not give you the output of the gluster commands because I repaired
the system already. But beside of this this errors occurs randomly. I am
sure that only one copy of the file was corrupted because it is part of a
test and I corrupt one copy of the file manually on brick level and after
this I check if it is still readable. During this conversation the error
occurs again.
Here is the Log from brick of node1:

[2019-02-06 14:15:09.524638] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-
> archive1-server: 32: OPEN /data/file1.txt
> (23b623cb-7256-4fe6-85b0-1026b1531a86), client: CTX_
>
> ID:e3871169-af62-44aa-a990-fa4248283c08-GRAPH_ID:0-PID:31830-HOST:fs-lrunning-c1-n1-PC_NAME:ar
> chive1-client-0-RECON_NO:-0, error-xlator: archive1-bitrot-stub
> [Input/output error]
> [2019-02-06 14:15:09.535587] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-
> archive1-server: 56: OPEN /data/file1.txt
> (23b623cb-7256-4fe6-85b0-1026b1531a86), client: CTX_
>
> ID:e3871169-af62-44aa-a990-fa4248283c08-GRAPH_ID:0-PID:31830-HOST:fs-lrunning-c1-n1-PC_NAME:ar
> chive1-client-0-RECON_NO:-0, error-xlator: archive1-bitrot-stub
> [Input/output error]
> The message "E [MSGID: 116020]
> [bit-rot-stub.c:647:br_stub_check_bad_object] 0-archive1-bitrot
> -stub: 23b623cb-7256-4fe6-85b0-1026b1531a86 is a bad object. Returning"
> repeated 2 times betwe
> en [2019-02-06 14:15:09.524599] and [2019-02-06 14:15:09.549409]
> [2019-02-06 14:15:09.549427] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-
> archive1-server: 70: OPEN /data/file1.txt
> (23b623cb-7256-4fe6-85b0-1026b1531a86), client: CTX_
>
> ID:e3871169-af62-44aa-a990-fa4248283c08-GRAPH_ID:0-PID:31830-HOST:fs-lrunning-c1-n1-PC_NAME:ar
> chive1-client-0-RECON_NO:-0, error-xlator: archive1-bitrot-stub
> [Input/output error]
> [2019-02-06 14:15:09.561450] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-archive1-ser
> ver: disconnecting connection from
> CTX_ID:e3871169-af62-44aa-a990-fa4248283c08-GRAPH_ID:0-PID:
> 31830-HOST:fs-lrunning-c1-n1-PC_NAME:archive1-client-0-RECON_NO:-0
> [2019-02-06 14:15:09.561568] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-archive1-ser
> ver: Shutting down connection
> CTX_ID:e3871169-af62-44aa-a990-fa4248283c08-GRAPH_ID:0-PID:31830-HOST:fs-lrunning-c1-n1-PC_NAME:archive1-client-0-RECON_NO:-0
> [2019-02-06 14:15:10.188406] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2019-02-06 14:15:10.201029] I [glusterfsd-mgmt.c:2005:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile,continuing
> [2019-02-06 14:15:10.514721] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2019-02-06 14:15:10.526216] I [glusterfsd-mgmt.c:2005:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile,continuing
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 79 times between [2019-02-06 14:15:09.499105] and
> [2019-02-06 14:15:10.682592]
> [2019-02-06 14:15:10.684204] E [MSGID: 116020]
> [bit-rot-stub.c:647:br_stub_check_bad_object] 0-archive1-bitrot-stub:
> 23b623cb-7256-4fe6-85b0-1026b1531a86 is a bad object. Returning
> [2019-02-06 14:15:10.684262] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-archive1-server: 2146148:
> OPEN /data/file1.txt (23b623cb-7256-4fe6-85b0-1026b1531a86), client:
> CTX_ID:0545b52c-2843-4833-a5fc-b11e062a72d3-GRAPH_ID:0-PID:2458-HOST:fs-lrunning-c1-n1-PC_NAME:archive1-client-0-RECON_NO:-3,
> error-xlator: archive1-bitrot-stub [Input/output error]
> [2019-02-06 14:15:10.684949] E [MSGID: 116020]
> [bit-rot-stub.c:647:br_stub_check_bad_object] 0-archive1-bitrot-stub:
> 23b623cb-7256-4fe6-85b0-1026b1531a86 is a bad object. Returning
> [2019-02-06 14:15:10.684982] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-archive1-server: 2146149:
> OPEN /data/file1.txt (23b623cb-7256-4fe6-85b0-1026b1531a86), client:
> CTX_ID:0545b52c-2843-4833-a5fc-b11e062a72d3-GRAPH_ID:0-PID:2458-HOST:fs-lrunning-c1-n1-PC_NAME:archive1-client-0-RECON_NO:-3,
> error-xlator: archive1-bitrot-stub [Input/output error]
> [2019-02-06 14:15:10.686566] E [MSGID: 116020]
> [bit-rot-stub.c:647:br_stub_check_bad_object] 0-archive1-bitrot-stub:
> 23b623cb-7256-4fe6-85b0-1026b1531a86 is a bad object. Returning
> [2019-02-06 14:15:10.686600] E [MSGID: 115070]
> [server-rpc-fops_v2.c:1503:server4_open_cbk] 0-archive1-server: 2146150:
> OPEN /data/file1.txt (23b623cb-7256-4fe6-85b0-1026b1531a86), client:
> CTX_ID:0545b52c-2843-4833-a5fc-b11e062a72d3-GRAPH_ID:0-PID:2458-HOST:fs-lrunning-c1-n1-PC_NAME:archive1-client-0-RECON_NO:-3,
> error-xlator: archive1-bitrot-stub [Input/output error]
> [2019-02-06 14:15:11.189361] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2019-02-06 14:15:11.207835] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2019-02-06 14:15:11.220763] I 

Re: [Gluster-users] Corrupted File readable via FUSE?

2019-02-05 Thread FNU Raghavendra Manjunath
Hi David,

Do you have any bricks down? Can you please share the output of the
following commands and also the logs of the server and the client nodes?

1) gluster volume info
2) gluster volume status
3) gluster volume bitrot  scrub status

Few more questions

1) How many copies of the file were corrupted? (All? Or Just one?)

2 things I am trying to understand

A) IIUC, if only one copy is corrupted, then the replication module from
the gluster client should serve the data from the
remaining good copy
B) If all the copies were corrupted (or say more than quorum copies were
corrupted which means 2 in case of 3 way replication)
then there will be an error to the application. But the error to be
reported should 'Input/Output Error'. Not 'Transport endpoint not connected'
   'Transport endpoint not connected' error usually comes when a brick
where the operation is being directed to is not connected to the client.



Regards,
Raghavendra

On Mon, Feb 4, 2019 at 6:02 AM David Spisla  wrote:

> Hello Amar,
> sounds good. Until now this patch is only merged into master. I think it
> should be part of the next v5.x patch release!
>
> Regards
> David
>
> Am Mo., 4. Feb. 2019 um 09:58 Uhr schrieb Amar Tumballi Suryanarayan <
> atumb...@redhat.com>:
>
>> Hi David,
>>
>> I guess https://review.gluster.org/#/c/glusterfs/+/21996/ helps to fix
>> the issue. I will leave it to Raghavendra Bhat to reconfirm.
>>
>> Regards,
>> Amar
>>
>> On Fri, Feb 1, 2019 at 8:45 PM David Spisla  wrote:
>>
>>> Hello Gluster Community,
>>> I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a
>>> brick with a copy of a file. Now I tried out the bitrot functionality and
>>> corrupt the copy on the brick of node1. After this I scrub ondemand and the
>>> file is marked correctly as corrupted.
>>>
>>> No I try to read that file from FUSE on node1 (with corrupt copy):
>>> $ cat file1.txt
>>> cat: file1.txt: Transport endpoint is not connected
>>> FUSE log says:
>>>
>>> *[2019-02-01 15:02:19.191984] E [MSGID: 114031]
>>> [client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote
>>> operation failed. Path: /data/file1.txt
>>> (b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]*
>>> [2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fc642471329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
>>> [2019-02-01 15:02:19.192714] E [MSGID: 108009]
>>> [afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to
>>> open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
>>> *[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk]
>>> 0-glusterfs-fuse: 147733: READ => -1
>>> gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport
>>> endpoint is not connected)*
>>> [2019-02-01 15:02:19.193653] W [MSGID: 114028]
>>> [client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not
>>> valid [Invalid argument]
>>>
>>> And from FUSE on node2 (with heal copy):
>>> $ cat file1.txt
>>> file1
>>>
>>> It seems to be that node1 wants to get the file from its own brick, but
>>> the copy there is broken. Node2 gets the file from its own brick with a
>>> heal copy, so reading the file succeed.
>>> But I am wondering myself because sometimes reading the file from node1
>>> with the broken copy succeed
>>>
>>> What is the expected behaviour here? Is it possibly to read files with a
>>> corrupted copy from any client access?
>>>
>>> Regards
>>> David Spisla
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Corrupted File readable via FUSE?

2019-02-04 Thread David Spisla
Hello Amar,
sounds good. Until now this patch is only merged into master. I think it
should be part of the next v5.x patch release!

Regards
David

Am Mo., 4. Feb. 2019 um 09:58 Uhr schrieb Amar Tumballi Suryanarayan <
atumb...@redhat.com>:

> Hi David,
>
> I guess https://review.gluster.org/#/c/glusterfs/+/21996/ helps to fix
> the issue. I will leave it to Raghavendra Bhat to reconfirm.
>
> Regards,
> Amar
>
> On Fri, Feb 1, 2019 at 8:45 PM David Spisla  wrote:
>
>> Hello Gluster Community,
>> I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a
>> brick with a copy of a file. Now I tried out the bitrot functionality and
>> corrupt the copy on the brick of node1. After this I scrub ondemand and the
>> file is marked correctly as corrupted.
>>
>> No I try to read that file from FUSE on node1 (with corrupt copy):
>> $ cat file1.txt
>> cat: file1.txt: Transport endpoint is not connected
>> FUSE log says:
>>
>> *[2019-02-01 15:02:19.191984] E [MSGID: 114031]
>> [client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote
>> operation failed. Path: /data/file1.txt
>> (b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]*
>> [2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fc642471329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
>> [2019-02-01 15:02:19.192714] E [MSGID: 108009]
>> [afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to
>> open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
>> *[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk]
>> 0-glusterfs-fuse: 147733: READ => -1
>> gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport
>> endpoint is not connected)*
>> [2019-02-01 15:02:19.193653] W [MSGID: 114028]
>> [client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not
>> valid [Invalid argument]
>>
>> And from FUSE on node2 (with heal copy):
>> $ cat file1.txt
>> file1
>>
>> It seems to be that node1 wants to get the file from its own brick, but
>> the copy there is broken. Node2 gets the file from its own brick with a
>> heal copy, so reading the file succeed.
>> But I am wondering myself because sometimes reading the file from node1
>> with the broken copy succeed
>>
>> What is the expected behaviour here? Is it possibly to read files with a
>> corrupted copy from any client access?
>>
>> Regards
>> David Spisla
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Corrupted File readable via FUSE?

2019-02-04 Thread Amar Tumballi Suryanarayan
Hi David,

I guess https://review.gluster.org/#/c/glusterfs/+/21996/ helps to fix the
issue. I will leave it to Raghavendra Bhat to reconfirm.

Regards,
Amar

On Fri, Feb 1, 2019 at 8:45 PM David Spisla  wrote:

> Hello Gluster Community,
> I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a
> brick with a copy of a file. Now I tried out the bitrot functionality and
> corrupt the copy on the brick of node1. After this I scrub ondemand and the
> file is marked correctly as corrupted.
>
> No I try to read that file from FUSE on node1 (with corrupt copy):
> $ cat file1.txt
> cat: file1.txt: Transport endpoint is not connected
> FUSE log says:
>
> *[2019-02-01 15:02:19.191984] E [MSGID: 114031]
> [client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote
> operation failed. Path: /data/file1.txt
> (b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]*
> [2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fc642471329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-01 15:02:19.192714] E [MSGID: 108009]
> [afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to
> open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
> *[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk]
> 0-glusterfs-fuse: 147733: READ => -1
> gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport
> endpoint is not connected)*
> [2019-02-01 15:02:19.193653] W [MSGID: 114028]
> [client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not
> valid [Invalid argument]
>
> And from FUSE on node2 (with heal copy):
> $ cat file1.txt
> file1
>
> It seems to be that node1 wants to get the file from its own brick, but
> the copy there is broken. Node2 gets the file from its own brick with a
> heal copy, so reading the file succeed.
> But I am wondering myself because sometimes reading the file from node1
> with the broken copy succeed
>
> What is the expected behaviour here? Is it possibly to read files with a
> corrupted copy from any client access?
>
> Regards
> David Spisla
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Corrupted File readable via FUSE?

2019-02-01 Thread David Spisla
Hello Gluster Community,
I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a
brick with a copy of a file. Now I tried out the bitrot functionality and
corrupt the copy on the brick of node1. After this I scrub ondemand and the
file is marked correctly as corrupted.

No I try to read that file from FUSE on node1 (with corrupt copy):
$ cat file1.txt
cat: file1.txt: Transport endpoint is not connected
FUSE log says:

*[2019-02-01 15:02:19.191984] E [MSGID: 114031]
[client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote
operation failed. Path: /data/file1.txt
(b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]*
[2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fc642471329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 15:02:19.192714] E [MSGID: 108009]
[afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to
open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
*[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk]
0-glusterfs-fuse: 147733: READ => -1
gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport
endpoint is not connected)*
[2019-02-01 15:02:19.193653] W [MSGID: 114028]
[client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not
valid [Invalid argument]

And from FUSE on node2 (with heal copy):
$ cat file1.txt
file1

It seems to be that node1 wants to get the file from its own brick, but the
copy there is broken. Node2 gets the file from its own brick with a heal
copy, so reading the file succeed.
But I am wondering myself because sometimes reading the file from node1
with the broken copy succeed

What is the expected behaviour here? Is it possibly to read files with a
corrupted copy from any client access?

Regards
David Spisla
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users