[Gluster-users] VM freeze issue on simple gluster setup.

2019-12-05 Thread WK

I have a replica2+arbiter setup that is used for VMs.

ip #.1 is the arb

ip #.2 and #.3 are the kvm hosts.

Two Volumes are involved and its gluster 6.5/Ubuntu 18.4/fuse The 
Gluster networking uses a  two ethernet card teamd/round-robin setup 
which *should* have stayed up if one of the ports had failed.


I just had a number of VMs go Read-Only due to the below communication 
failure at 22:00 but only on kvm host  #2


VMs on the same gluster volumes on kvm host 3 were unaffected.

The logs on host #2 show the following:

[2019-12-05 22:00:43.739804] C 
[rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-GL1image-client-2: 
server 10.255.1.1:49153 has not responded in the last 21 seconds, 
disconnecting.
[2019-12-05 22:00:43.757095] C 
[rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-GL1image-client-1: 
server 10.255.1.3:49152 has not responded in the last 21 seconds, 
disconnecting.
[2019-12-05 22:00:43.757191] I [MSGID: 114018] 
[client.c:2323:client_rpc_notify] 0-GL1image-client-2: disconnected from 
GL1image-client-2. Client process will keep trying to connect to 
glusterd until brick's port is available
[2019-12-05 22:00:43.757246] I [MSGID: 114018] 
[client.c:2323:client_rpc_notify] 0-GL1image-client-1: disconnected from 
GL1image-client-1. Client process will keep trying to connect to 
glusterd until brick's port is available
[2019-12-05 22:00:43.757266] W [MSGID: 108001] 
[afr-common.c:5608:afr_notify] 0-GL1image-replicate-0: Client-quorum is 
not met
[2019-12-05 22:00:43.790639] E [rpc-clnt.c:346:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x139)[0x7f030d045f59] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcbb0)[0x7f030cdf0bb0] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xccce)[0x7f030cdf0cce] 
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x95)[0x7f030cdf1c45] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe890)[0x7f030cdf2890] 
) 0-GL1image-client-2: forced unwinding frame type(GlusterFS 4.x v1) 
op(FXATTROP(34)) called at 2019-12-05 22:00:19.736456 (xid=0x825bffb)
[2019-12-05 22:00:43.790655] W [MSGID: 114031] 
[client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-GL1image-client-2: 
remote operation failed
[2019-12-05 22:00:43.790686] E [rpc-clnt.c:346:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x139)[0x7f030d045f59] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcbb0)[0x7f030cdf0bb0] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xccce)[0x7f030cdf0cce] 
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x95)[0x7f030cdf1c45] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe890)[0x7f030cdf2890] 
) 0-GL1image-client-1: forced unwinding frame type(GlusterFS 4.x v1) 
op(FXATTROP(34)) called at 2019-12-05 22:00:19.736428 (xid=0x89fee01)
[2019-12-05 22:00:43.790703] W [MSGID: 114031] 
[client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-GL1image-client-1: 
remote operation failed
[2019-12-05 22:00:43.790774] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1393:client4_0_finodelk_cbk] 0-GL1image-client-1: 
remote operation failed [Transport endpoint is not connected]
[2019-12-05 22:00:43.790777] E [rpc-clnt.c:346:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x139)[0x7f030d045f59] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcbb0)[0x7f030cdf0bb0] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xccce)[0x7f030cdf0cce] 
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x95)[0x7f030cdf1c45] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe890)[0x7f030cdf2890] 
) 0-GL1image-client-2: forced unwinding frame type(GlusterFS 4.x v1) 
op(FXATTROP(34)) called at 2019-12-05 22:00:19.736542 (xid=0x825bffc)
[2019-12-05 22:00:43.790794] W [MSGID: 114029] 
[client-rpc-fops_v2.c:4873:client4_0_finodelk] 0-GL1image-client-1: 
failed to send the fop
[2019-12-05 22:00:43.790806] W [MSGID: 114031] 
[client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-GL1image-client-2: 
remote operation failed
[2019-12-05 22:00:43.790825] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1393:client4_0_finodelk_cbk] 0-GL1image-client-2: 
remote operation failed [Transport endpoint is not connected]
[2019-12-05 22:00:43.790842] W [MSGID: 114029] 
[client-rpc-fops_v2.c:4873:client4_0_finodelk] 0-GL1image-client-2: 
failed to send the fop


the fop/transport not connected errors just repeat for another 50 lines 
or so until I hit 22:00:46 seconds at which point the Volumes appear to 
be fine (though the VMs were still read-only until I rebooted.


[2019-12-05 22:00:46.987242] W [fuse-bridge.c:2827:fuse_readv_cbk] 
0-glusterfs-fuse: 91701328: READ => -1 
gfid=d883b7c4-97f5-4f12-9373-7987cfc7dee4 fd=0x7f02f005b708 (Transport 
endpoint is not connected)
[2019-12-05 22:00:47.029947] W [fuse-bridge.c:2827:fuse_readv_cbk] 
0-glusterfs-fuse: 91701329: READ => -1 
gfid=d883b7c4-97f5-4f12-9373-7987cfc7dee4 

Re: [Gluster-users] In-place volume type conversion

2019-12-05 Thread Vijay Bellur
On Thu, Dec 5, 2019 at 7:37 AM Dmitry Antipov  wrote:

> Is it technically possible/feasible to implement an in-place volume
> conversion,
> at least for volumes with the same number of bricks (say, from 'replica 3'
> to
> 'disperse 3')? If so, any thoughts on initial steps?
>
>
The backend layouts for replicate and disperse are different. Hence it is
not recommended to try out an in-place volume conversion for these types.

Regards,
Vijay


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] In-place volume type conversion

2019-12-05 Thread Dmitry Antipov

Is it technically possible/feasible to implement an in-place volume conversion,
at least for volumes with the same number of bricks (say, from 'replica 3' to
'disperse 3')? If so, any thoughts on initial steps?

Dmitry


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] glusterfind can't find modul utils.py

2019-12-05 Thread David Spisla
Dear Gluster Community,

I tried to run glusterfind on my SLES15-SP1 machine using GlusterFS v5.10
but I got this error when performing it with user root:

fs-davids-c1-n1:~ # glusterfind create --help
Traceback (most recent call last):
  File "/usr/bin/glusterfind", line 14, in 
from glusterfind.main import main
  File "/usr/lib/glusterfs/glusterfind/main.py", line 26, in 
from utils import execute, is_host_local, mkdirp, fail
ModuleNotFoundError: No module named 'utils'

It seems to be not the default python3 utils module because I found in this
path utils.py which has defined the function execute, is_host_local,
mkdirp, fail :

fs-davids-c1-n1:~ # ll /usr/lib/glusterfs/glusterfind/
total 112
-rwxr-xr-x 1 root root  1846 Oct 11 04:53 S57glusterfind-delete-post.py
-rw-r--r-- 1 root root   381 Oct 11 04:53 __init__.py
drwxr-xr-x 2 root root  4096 Dec  4 10:03 __pycache__
-rwxr-xr-x 1 root root  3737 Oct 11 04:53 brickfind.py
-rwxr-xr-x 1 root root 15342 Oct 11 04:53 changelog.py
-rw-r--r-- 1 root root 16079 Oct 11 04:53 changelogdata.py
-rw-r--r-- 1 root root   851 Oct 11 04:53 conf.py
-rw-r--r-- 1 root root  2507 Oct 11 04:53 libgfchangelog.py
-rw-r--r-- 1 root root 33317 Oct 11 04:53 main.py
-rwxr-xr-x 1 root root  5092 Oct 11 04:53 nodeagent.py
-rw-r--r-- 1 root root   365 Oct 11 04:53 tool.conf
-rw-r--r-- 1 root root  7195 Oct 11 04:53 utils.py

Why glusterfind can't find this utils.py?
Regards
David Spisla


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Trying to fix files that don't want to heal

2019-12-05 Thread Gudrun Mareike Amedick
Am Montag, den 02.12.2019, 02:15 -0500 schrieb Ashish Pandey:
> 
> 
> From: "Gudrun Mareike Amedick" 
> To: "Ashish Pandey" 
> Cc: "Gluster-users" 
> Sent: Friday, November 29, 2019 8:45:13 PM
> Subject: Re: [Gluster-users] Trying to fix files that don't want to heal
> 
> Hi Ashish,
> 
> thanks for your reply. To fulfill the "no IO"-requirement, I'll have to wait 
> until second week of December (9th – 14th). 
> 
> We originally planned to update GlusterFS from 4.1.7 to 5 and then to 6 in 
> December. Should we do that upgrade before or after running those
> scripts?
> The best 
> 
> >> It will be best if you could do it before upgrading to newer version.
> BTW, why are you not planing to upgrade to gluster 7?

Okay, will do. 

We have ~400TB-volume. Restoring it is no fun. As a result, we have a pretty 
conservative updating schedule.

> 
> 
> Kind regards
> 
> GudrunAm Freitag, den 29.11.2019, 00:38 -0500 schrieb Ashish Pandey:
> > Hey Gudrun,
> > 
> > Could you please try to use the scripts and try to resolve it. 
> > We have written some scripts and it is in final phase to get merge - 
> > https://review.gluster.org/#/c/glusterfs/+/23380/
> > 
> > You can find the steps to use these scripts in README.md file
> > 
> > ---
> > Ashish
> > 
> > From: "Gudrun Mareike Amedick" 
> > To: "Gluster-users" 
> > Sent: Thursday, November 28, 2019 3:57:18 PM
> > Subject: [Gluster-users] Trying to fix files that don't want to heal
> > 
> > Hi,
> > 
> > I have a distributed dispersed volume with files that don't want to heal. 
> > I'm trying to fix them manually. 
> > 
> > I'm currently working on a file that is present on all bricks, GFID exists 
> > in .glusterfs-structure and getfattr shows identical attributes for all
> > files. They look like this:
> > 
> > # getfattr -m. -d -e hex $brick/somepath/libssl.so.1.1
> > getfattr: Removing leading '/' from absolute path names
> > # file: $brick/$somepath/libssl.so.1.1
> > trusted.ec.config=0x080602000200
> > trusted.ec.dirty=0x0001
> > trusted.ec.size=0x000a
> > trusted.ec.version=0x00040005
> > trusted.gfid=0xdd7dd64f6bb34b5f891a5e32fe83874f
> > trusted.gfid2path.0c3a5b76c518ef60=0x34663064396234332d343730342d343634352d613834342d3338303532336137346632662f6c696273736c2e736f2e312e31
> > trusted.gfid2path.578ce2ec37aa0f9d=0x31636136323433342d396132642d343039362d616265352d6463353065613131333066632f6c696273736c2e736f2e312e31
> > trusted.glusterfs.quota.1ca62434-9a2d-4096-abe5-dc50ea1130fc.contri.3=0x00029201
> > trusted.glusterfs.quota.4f0d9b43-4704-4645-a844-380523a74f2f.contri.3=0x00029201
> > trusted.pgfid.1ca62434-9a2d-4096-abe5-dc50ea1130fc=0x0001
> > trusted.pgfid.4f0d9b43-4704-4645-a844-380523a74f2f=0x0001
> > 
> > pgfid is "parent gfid" right? Both GFID's refer to a dir in my volume, both 
> > of those dirs contain a file named libssl.so.1.1. They seem to be
> > hardlinks:
> > 
> > find  $brick/$somepath  -samefile  $brick/$someotherpath/libssl.so.1.1
> > $brick/$somepath/libssl.so.1
> > 
> > This exceeds the limits of my GlusterFS knowledge. Is that something that 
> > can/should happen? If not, is it the reason that file won't heal and how
> > do
> > I fix that?
> > 
> > Kind regards
> > 
> > Gudrun Amedick
> > 
> > 
> > Community Meeting Calendar:
> > 
> > APAC Schedule -
> > Every 2nd and 4th Tuesday at 11:30 AM IST
> > Bridge: https://bluejeans.com/441850968
> > 
> > NA/EMEA Schedule -
> > Every 1st and 3rd Tuesday at 01:00 PM EDT
> > Bridge: https://bluejeans.com/441850968
> > 
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> > 
> 
> 
> Community Meeting Calendar:
> 
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
> 
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 

smime.p7s
Description: S/MIME cryptographic signature


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users