Re: [Gluster-users] Does replace-brick migrate data?

2019-06-07 Thread Nithya Balachandran
On Sat, 8 Jun 2019 at 01:29, Alan Orth  wrote:

> Dear Ravi,
>
> In the last week I have completed a fix-layout and a full INDEX heal on
> this volume. Now I've started a rebalance and I see a few terabytes of data
> going around on different bricks since yesterday, which I'm sure is good.
>
> While I wait for the rebalance to finish, I'm wondering if you know what
> would cause directories to be missing from the FUSE mount point? If I list
> the directories explicitly I can see their contents, but they do not appear
> in their parent directories' listing. In the case of duplicated files it is
> always because the files are not on the correct bricks (according to the
> Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the
> correct brick(s) and removing it from the others (along with their
> .glusterfs hard links). So what could cause directories to be missing?
>
> Hi Alan,

The directories that don't show up in the parent directory listing is
probably because they do not exist on the hashed subvol. Please check the
backend bricks to see if they are missing on any of them.

Regards,
Nithya

Thank you,
>
> Thank you,
>
> On Wed, Jun 5, 2019 at 1:08 AM Alan Orth  wrote:
>
>> Hi Ravi,
>>
>> You're right that I had mentioned using rsync to copy the brick content
>> to a new host, but in the end I actually decided not to bring it up on a
>> new brick. Instead I added the original brick back into the volume. So the
>> xattrs and symlinks to .glusterfs on the original brick are fine. I think
>> the problem probably lies with a remove-brick that got interrupted. A few
>> weeks ago during the maintenance I had tried to remove a brick and then
>> after twenty minutes and no obvious progress I stopped it—after that the
>> bricks were still part of the volume.
>>
>> In the last few days I have run a fix-layout that took 26 hours and
>> finished successfully. Then I started a full index heal and it has healed
>> about 3.3 million files in a few days and I see a clear increase of network
>> traffic from old brick host to new brick host over that time. Once the full
>> index heal completes I will try to do a rebalance.
>>
>> Thank you,
>>
>>
>> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N 
>> wrote:
>>
>>>
>>> On 01/06/19 9:37 PM, Alan Orth wrote:
>>>
>>> Dear Ravi,
>>>
>>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I
>>> could verify them for six bricks and millions of files, though... :\
>>>
>>> Hi Alan,
>>>
>>> The reason I asked this is because you had mentioned in one of your
>>> earlier emails that when you moved content from the old brick to the new
>>> one, you had skipped the .glusterfs directory. So I was assuming that when
>>> you added back this new brick to the cluster, it might have been missing
>>> the .glusterfs entries. If that is the cae, one way to verify could be to
>>> check using a script if all files on the brick have a link-count of at
>>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
>>> themselves.
>>>
>>>
>>> I had a small success in fixing some issues with duplicated files on the
>>> FUSE mount point yesterday. I read quite a bit about the elastic hashing
>>> algorithm that determines which files get placed on which bricks based on
>>> the hash of their filename and the trusted.glusterfs.dht xattr on brick
>>> directories (thanks to Joe Julian's blog post and Python script for showing
>>> how it works¹). With that knowledge I looked closer at one of the files
>>> that was appearing as duplicated on the FUSE mount and found that it was
>>> also duplicated on more than `replica 2` bricks. For this particular file I
>>> found two "real" files and several zero-size files with
>>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
>>> the correct brick as far as the DHT layout is concerned, so I copied one of
>>> them to the correct brick, deleted the others and their hard links, and did
>>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>>>
>>> Could this have been caused by a replace-brick that got interrupted and
>>> didn't finish re-labeling the xattrs?
>>>
>>> No, replace-brick only initiates AFR self-heal, which just copies the
>>> contents from the other brick(s) of the *same* replica pair into the
>>> replaced brick.  The link-to files are created by DHT when you rename a
>>> file from the client. If the new name hashes to a different  brick, DHT
>>> does not move the entire file there. It instead creates the link-to file
>>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
>>> this xattr points to the brick where the actual data is there (`getfattr -e
>>> text` to see it for yourself).  Perhaps you had attempted a rebalance or
>>> remove-brick earlier and interrupted that?
>>>
>>> Should I be thinking of some heuristics to identify and fix these issues
>>> with a script (incorrect brick placement), or is this something a fix
>>> layout or repeated vo

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-07 Thread Alan Orth
Dear Ravi,

In the last week I have completed a fix-layout and a full INDEX heal on
this volume. Now I've started a rebalance and I see a few terabytes of data
going around on different bricks since yesterday, which I'm sure is good.

While I wait for the rebalance to finish, I'm wondering if you know what
would cause directories to be missing from the FUSE mount point? If I list
the directories explicitly I can see their contents, but they do not appear
in their parent directories' listing. In the case of duplicated files it is
always because the files are not on the correct bricks (according to the
Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the
correct brick(s) and removing it from the others (along with their
.glusterfs hard links). So what could cause directories to be missing?

Thank you,

Thank you,

On Wed, Jun 5, 2019 at 1:08 AM Alan Orth  wrote:

> Hi Ravi,
>
> You're right that I had mentioned using rsync to copy the brick content to
> a new host, but in the end I actually decided not to bring it up on a new
> brick. Instead I added the original brick back into the volume. So the
> xattrs and symlinks to .glusterfs on the original brick are fine. I think
> the problem probably lies with a remove-brick that got interrupted. A few
> weeks ago during the maintenance I had tried to remove a brick and then
> after twenty minutes and no obvious progress I stopped it—after that the
> bricks were still part of the volume.
>
> In the last few days I have run a fix-layout that took 26 hours and
> finished successfully. Then I started a full index heal and it has healed
> about 3.3 million files in a few days and I see a clear increase of network
> traffic from old brick host to new brick host over that time. Once the full
> index heal completes I will try to do a rebalance.
>
> Thank you,
>
>
> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N 
> wrote:
>
>>
>> On 01/06/19 9:37 PM, Alan Orth wrote:
>>
>> Dear Ravi,
>>
>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I
>> could verify them for six bricks and millions of files, though... :\
>>
>> Hi Alan,
>>
>> The reason I asked this is because you had mentioned in one of your
>> earlier emails that when you moved content from the old brick to the new
>> one, you had skipped the .glusterfs directory. So I was assuming that when
>> you added back this new brick to the cluster, it might have been missing
>> the .glusterfs entries. If that is the cae, one way to verify could be to
>> check using a script if all files on the brick have a link-count of at
>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
>> themselves.
>>
>>
>> I had a small success in fixing some issues with duplicated files on the
>> FUSE mount point yesterday. I read quite a bit about the elastic hashing
>> algorithm that determines which files get placed on which bricks based on
>> the hash of their filename and the trusted.glusterfs.dht xattr on brick
>> directories (thanks to Joe Julian's blog post and Python script for showing
>> how it works¹). With that knowledge I looked closer at one of the files
>> that was appearing as duplicated on the FUSE mount and found that it was
>> also duplicated on more than `replica 2` bricks. For this particular file I
>> found two "real" files and several zero-size files with
>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
>> the correct brick as far as the DHT layout is concerned, so I copied one of
>> them to the correct brick, deleted the others and their hard links, and did
>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>>
>> Could this have been caused by a replace-brick that got interrupted and
>> didn't finish re-labeling the xattrs?
>>
>> No, replace-brick only initiates AFR self-heal, which just copies the
>> contents from the other brick(s) of the *same* replica pair into the
>> replaced brick.  The link-to files are created by DHT when you rename a
>> file from the client. If the new name hashes to a different  brick, DHT
>> does not move the entire file there. It instead creates the link-to file
>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
>> this xattr points to the brick where the actual data is there (`getfattr -e
>> text` to see it for yourself).  Perhaps you had attempted a rebalance or
>> remove-brick earlier and interrupted that?
>>
>> Should I be thinking of some heuristics to identify and fix these issues
>> with a script (incorrect brick placement), or is this something a fix
>> layout or repeated volume heals can fix? I've already completed a whole
>> heal on this particular volume this week and it did heal about 1,000,000
>> files (mostly data and metadata, but about 20,000 entry heals as well).
>>
>> Maybe you should let the AFR self-heals complete first and then attempt a
>> full rebalance to take care of the dht link-to files. But  if the files are
>> in millions, it could t

[Gluster-users] Gluster quorum lost

2019-06-07 Thread Edward Clay
Hello,  I have a replica 3  volume that has lost quorum twice this week causing 
us much pain.  What seems to happen is one of the sans thinks one of the other 
two peers has disconnected.  Then a few seconds later another disconnects 
causing quorum to be lost.  This causes us pain since we have 7 ovirt host that 
are connected to this gluster volume and they never seem to reattach.  I was 
able to unmount the brick manually on the ovirt host and then run the commands 
to mount them again and that seemed to get things working again.

We have 3 sans running glusterfs 3.12.14-1 and nothing else.

# gluster volume info gv1

Volume Name: gv1
Type: Replicate
Volume ID: ea12f72d-a228-43ba-a360-4477cada292a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.4.16.19:/glusterfs/data1/gv1
Brick2: 10.4.16.11:/glusterfs/data1/gv1
Brick3: 10.4.16.12:/glusterfs/data1/gv1
Options Reconfigured:
nfs.register-with-portmap: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.self-heal-daemon: enable
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: 10.4.16.*
nfs.rpc-auth-allow: 10.4.16.*
nfs.disable: off
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
nfs.addr-namelookup: off
nfs.export-volumes: on
network.ping-timeout: 50
cluster.server-quorum-ratio: 51%

They produced the following logs this morning. and the first entry is the first 
entry for 2019-06-07.

san3 seems to have an issue first:
[2019-06-07 14:23:20.670561] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.12> (), in state , has disconnected from glusterd.

[2019-06-07 14:23:20.774127] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state , has disconnected from glusterd.

[2019-06-07 14:23:20.774413] C [MSGID: 106002] 
[glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: 
Server quorum lost for volume gv1. Stopping local bricks.

san1 follows:
[2019-06-07 14:23:22.137405] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.12> (), in state , has disconnected from glusterd.

[2019-06-07 14:23:22.229343] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state , has disconnected from glusterd.

[2019-06-07 14:23:22.229618] C [MSGID: 106002] 
[glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: 
Server quorum lost for volume gv1. Stopping local bricks.

san2 seems to be the last one standing but quorum gets lost:
[2019-06-07 14:23:26.611435] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.11> (<0f3090ee-080b-4a6b-9964-0ca86d801469>), in state , has disconnected from glusterd.

[2019-06-07 14:23:26.714137] I [MSGID: 106004] 
[glusterd-handler.c:6317:__glusterd_peer_rpc_notify] 0-management: Peer 
<10.4.16.19> (<238af98a-d2f1-491d-a1f1-64ace4eb6d3d>), in state , has disconnected from glusterd.

[2019-06-07 14:23:26.714405] C [MSGID: 106002] 
[glusterd-server-quorum.c:360:glusterd_do_volume_quorum_action] 0-management: 
Server quorum lost for volume gv1. Stopping local bricks.

On the ovirt host I see the following type of entries for the gluster brick 
that's mounted 
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-10.4.16.11:gv1.log.  They are 
all pretty much the same entries on all 7 host.

hv6 seems to be the first host to complain:
[2019-06-07 14:23:22.190493] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 
0-glusterfsd-mgmt: disconnected from remote-host: 10.4.16.11
[2019-06-07 14:23:22.190540] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 
0-glusterfsd-mgmt: connecting to next volfile server 10.4.16.19
[2019-06-07 14:23:32.618071] I [glusterfsd-mgmt.c:2005:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile,continuing
[2019-06-07 14:23:33.651755] W [socket.c:719:__socket_rwv] 0-gv1-client-4: 
readv on 10.4.16.12:49152 failed (No data available)
[2019-06-07 14:23:33.651806] I [MSGID: 114018] 
[client.c:2288:client_rpc_notify] 0-gv1-client-4: disconnected from 
gv1-client-4. Client process will keep trying to connect to glusterd until 
brick's port is available

One thing I should point out here that is probably important.  We are running 
glusterfs 3.12.14-1 on the sans but ovirt host have been upgraded to 5.6-1.  We 
stopped updating the sans gluster version after the previous version had a 
memory leak causing the sans to go down randomly.  Version 3.12.14-1 has seemed 
to stop this from happening.  What I'm not finding is there a incompatibility 
between these versions that could 

Re: [Gluster-users] healing of disperse volume

2019-06-07 Thread Ashish Pandey
Hi, 

First of all , following command is not for disperse volume - 
gluster volume heal elastic-volume info split-brain 

This is applicable for replicate volumes only. 

Could you please let us know what exactly do you want to test? 

If you want to test disperse volume against failure of bricks or servers, you 
can kill some of the brick process. 
Maximum redundant number of brick process. In 4+2, 2 is the redundancy count. 
After killing two brick process, by using kill command, you can write some data 
on volume and the do force start of the volume. 
gluster v  start force 
This will start all the killed brick processes also. At the end you can see tha 
heal should be done by self heal daemon and volume should become healthy again. 

--- 
Ashish 




- Original Message -

From: "fusillator"  
To: gluster-users@gluster.org 
Sent: Friday, June 7, 2019 2:09:01 AM 
Subject: [Gluster-users] healing of disperse volume 

Hi all, I'm pretty new to glusterfs, I managed to setup a dispersed 
volume (4+2) using the release 6.1 from centos repository.. Is it a stable 
release? 
Then I forced the volume stop when the application were writing on the 
mount point.. getting a wanted inconsistent state, I'm 
wondering what are the best practice to solve this kinds of 
situation...I just found a detailed explanation about how to solve 
splitting-head state of replicated volume at 
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ 
but it seems to be not applicable to the disperse volume type. 
Do I miss to read some important piece of documentation? Please point 
me to some reference. 
Here's some command detail: 

#gluster volume info elastic-volume 

Volume Name: elastic-volume 
Type: Disperse 
Volume ID: 96773fef-c443-465b-a518-6630bcf83397 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 1 x (4 + 2) = 6 
Transport-type: tcp 
Bricks: 
Brick1: dev-netflow01.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Brick2: dev-netflow02.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Brick3: dev-netflow03.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Brick4: dev-netflow04.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Brick5: dev-netflow05.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Brick6: dev-netflow06.fineco.it:/data/gfs/lv_elastic/brick1/brick 
Options Reconfigured: 
performance.io-cache: off 
performance.io-thread-count: 64 
performance.write-behind-window-size: 100MB 
performance.cache-size: 1GB 
nfs.disable: on 
transport.address-family: inet 


# gluster volume heal elastic-volume info 
Brick dev01:/data/gfs/lv_elastic/brick1/brick 
 
/data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log 
 
 
 
 
/data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log 
Status: Connected 
Number of entries: 12 

Brick dev02:/data/gfs/lv_elastic/brick1/brick 
/data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log 
 
 
 
 
 
/data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log 
Status: Connected 
Number of entries: 12 

Brick dev03:/data/gfs/lv_elastic/brick1/brick 
/data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log 
 
 
 
 
 
/data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log 
Status: Connected 
Number of entries: 12 

Brick dev04:/data/gfs/lv_elastic/brick1/brick 
 
/data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-httpwrapper.2019060615.log 
 
 
 
 
/data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-fns.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-chart.2019060615.log 
Status: Connected 
Number of entries: 12 

Brick dev05:/data/gfs/lv_elastic/brick1/brick 
/data/logs/20190606/ns-coreiol-iol-app-news.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-app-trkd.2019060615.log 
 
 
 
 
 
/data/logs/20190606/ns-coreiol-iol-app-listini.2019060615.log 
/data/logs/20190606/ns-coreiol-iol-lib-managers.2019060615.log 
/data/logs/2