from:"Hu Bert"

Re: [Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert

Howdy,
was able to solve the problem. I had 2 options: reset-brick (i.e.
reconfigure) or replace-brick (i.e. full sync). Tried reset-brick
first...

gluster volume reset-brick sourceimages
gluster190:/gluster/md3/sourceimages start
[... do nothing ...]
gluster volume reset-brick sourceimages
gluster190:/gluster/md3/sourceimages
gluster190:/gluster/md3/sourceimages commit force

After that the pending heals started, going to 0 pretty fast, and the
connected clients are now identical for all 3 servers.


Thx for reading,

Hubert


Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert :
>
> Hi,
>
> referring to this thread:
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
> especially: 
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html
>
> I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
> running. The first 2 servers went fine, gluster volume ok, no heals,
> so after a couple of minutes i rebooted the 3rd server. And having the
> same problem again: heals are counting up, no heals happen. gluster
> volume status+info ok, gluster peer status ok.
>
> Full volume status+info: https://pastebin.com/aEEEKn7h
>
> Volume Name: sourceimages
> Type: Replicate
> Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster188:/gluster/md3/sourceimages
> Brick2: gluster189:/gluster/md3/sourceimages
> Brick3: gluster190:/gluster/md3/sourceimages
>
> Internal IPs:
> gluster188: 192.168.0.188
> gluster189: 192.168.0.189
> gluster190: 192.168.0.190
>
> After rebooting the 3rd server (gluster190) the client info looks like this:
>
> gluster volume status sourceimages clients
> Client connections for volume sourceimages
> --
> Brick : gluster188:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:491511047856
>  988364  11
> 192.168.0.189:49149   930792
>654096  11
> 192.168.0.109:49147   271598
>279908  11
> 192.168.0.223:49147   126764
>130964  11
> 192.168.0.222:49146   125848
>130144  11
> 192.168.0.2:49147 273756
>  43400387  11
> 192.168.0.15:49147  57248531
>  14327465  11
> 192.168.0.126:49147 32282645
> 671284763  11
> 192.168.0.94:49146125520
>128864  11
> 192.168.0.66:49146  34086248
> 666519388  11
> 192.168.0.99:49146   3051076
> 522652843  11
> 192.168.0.16:49146 149773024
>   1049035  11
> 192.168.0.110:49146  1574768
> 566124922  11
> 192.168.0.106:49146152640790
> 146483580  11
> 192.168.0.91:49133  89548971
>  82709793  11
> 192.168.0.190:49149 4132
>  6540  11
> 192.168.0.118:4913392176
> 92884  11
> --
> Brick : gluster189:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:49146   935172
>658268  11
> 192.168.0.189:491511039048
>  977920  11
> 192.168.0.126:49146 27106555
> 231766764  11
> 192.168.0.110:49147  1121696
> 226426262  11
> 192.168.0.16:49147 147165735
>994015  11
> 192.168.0.106:49147152476618
>   1091156  11
> 192.168.0.94:49147109612
>112688

Re: [Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert

Ah, logs: nothing in the glustershd.log on the 3 gluster servers. But
on one client in /var/log/glusterfs/data-sourceimages.log :

[2024-04-23 06:54:21.456157 +] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-sourceimages-client-2:
remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511},
{errno=77}, {error=File descriptor in bad state}]
[2024-04-23 06:54:21.456195 +] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-sourceimages-replicate-0:
Failed getlk for a1817071-2949-4145-a96a-874159e46511 [File descriptor
in bad state]
[2024-04-23 06:54:21.488511 +] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-sourceimages-client-2:
remote_fd is -1. EBADFD [{gfid=a1817071-2949-4145-a96a-874159e46511},
{errno=77}, {error=File descriptor in bad stat
e}]


Am Di., 23. Apr. 2024 um 08:46 Uhr schrieb Hu Bert :
>
> Hi,
>
> referring to this thread:
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
> especially: 
> https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html
>
> I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
> running. The first 2 servers went fine, gluster volume ok, no heals,
> so after a couple of minutes i rebooted the 3rd server. And having the
> same problem again: heals are counting up, no heals happen. gluster
> volume status+info ok, gluster peer status ok.
>
> Full volume status+info: https://pastebin.com/aEEEKn7h
>
> Volume Name: sourceimages
> Type: Replicate
> Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster188:/gluster/md3/sourceimages
> Brick2: gluster189:/gluster/md3/sourceimages
> Brick3: gluster190:/gluster/md3/sourceimages
>
> Internal IPs:
> gluster188: 192.168.0.188
> gluster189: 192.168.0.189
> gluster190: 192.168.0.190
>
> After rebooting the 3rd server (gluster190) the client info looks like this:
>
> gluster volume status sourceimages clients
> Client connections for volume sourceimages
> --
> Brick : gluster188:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:491511047856
>  988364  11
> 192.168.0.189:49149   930792
>654096  11
> 192.168.0.109:49147   271598
>279908  11
> 192.168.0.223:49147   126764
>130964  11
> 192.168.0.222:49146   125848
>130144  11
> 192.168.0.2:49147 273756
>  43400387  11
> 192.168.0.15:49147  57248531
>  14327465  11
> 192.168.0.126:49147 32282645
> 671284763  11
> 192.168.0.94:49146125520
>128864  11
> 192.168.0.66:49146  34086248
> 666519388  11
> 192.168.0.99:49146   3051076
> 522652843  11
> 192.168.0.16:49146 149773024
>   1049035  11
> 192.168.0.110:49146  1574768
> 566124922  11
> 192.168.0.106:49146152640790
> 146483580  11
> 192.168.0.91:49133  89548971
>  82709793  11
> 192.168.0.190:49149 4132
>  6540  11
> 192.168.0.118:4913392176
> 92884  11
> --
> Brick : gluster189:/gluster/md3/sourceimages
> Clients connected : 17
> Hostname   BytesRead
> BytesWritten   OpVersion
>    -
>    -
> 192.168.0.188:49146   935172
>658268  11
> 192.168.0.189:491511039048
>  977920  11
> 192.168.0.126:49146 27106555
> 231766764  11
> 192.168.0.110:49147  1121696
> 226426262  11
>

[Gluster-users] Gluster 11.1 - heal hangs (again)

2024-04-23 Thread Hu Bert

Hi,

referring to this thread:
https://lists.gluster.org/pipermail/gluster-users/2024-January/040465.html
especially: 
https://lists.gluster.org/pipermail/gluster-users/2024-January/040513.html

I've updated+rebooted 3 servers (debian bookworm) with gluster 11.1
running. The first 2 servers went fine, gluster volume ok, no heals,
so after a couple of minutes i rebooted the 3rd server. And having the
same problem again: heals are counting up, no heals happen. gluster
volume status+info ok, gluster peer status ok.

Full volume status+info: https://pastebin.com/aEEEKn7h

Volume Name: sourceimages
Type: Replicate
Volume ID: d6a559a1-ca4c-48c7-8adf-89048333bb58
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster188:/gluster/md3/sourceimages
Brick2: gluster189:/gluster/md3/sourceimages
Brick3: gluster190:/gluster/md3/sourceimages

Internal IPs:
gluster188: 192.168.0.188
gluster189: 192.168.0.189
gluster190: 192.168.0.190

After rebooting the 3rd server (gluster190) the client info looks like this:

gluster volume status sourceimages clients
Client connections for volume sourceimages
--
Brick : gluster188:/gluster/md3/sourceimages
Clients connected : 17
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.188:491511047856
 988364  11
192.168.0.189:49149   930792
   654096  11
192.168.0.109:49147   271598
   279908  11
192.168.0.223:49147   126764
   130964  11
192.168.0.222:49146   125848
   130144  11
192.168.0.2:49147 273756
 43400387  11
192.168.0.15:49147  57248531
 14327465  11
192.168.0.126:49147 32282645
671284763  11
192.168.0.94:49146125520
   128864  11
192.168.0.66:49146  34086248
666519388  11
192.168.0.99:49146   3051076
522652843  11
192.168.0.16:49146 149773024
  1049035  11
192.168.0.110:49146  1574768
566124922  11
192.168.0.106:49146152640790
146483580  11
192.168.0.91:49133  89548971
 82709793  11
192.168.0.190:49149 4132
 6540  11
192.168.0.118:4913392176
92884  11
--
Brick : gluster189:/gluster/md3/sourceimages
Clients connected : 17
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.188:49146   935172
   658268  11
192.168.0.189:491511039048
 977920  11
192.168.0.126:49146 27106555
231766764  11
192.168.0.110:49147  1121696
226426262  11
192.168.0.16:49147 147165735
   994015  11
192.168.0.106:49147152476618
  1091156  11
192.168.0.94:49147109612
   112688  11
192.168.0.109:49146   180819
  1489715  11
192.168.0.223:49146   110708
   114316  11
192.168.0.99:49147   2573412
157737429  11
192.168.0.2:49145 242696
 26088710  11
192.168.0.222:49145   109728
   113064  11
192.168.0.66:49145  27003740
215124678  11
192.168.0.15:49145  57217513
   594699  11
192.168.0.91:49132  89463431
  2714920  11
192.168.0.190:49148 4132
 6540  11
192.168.0.118:4913192380
94996  11
--
Brick : gluster190:/gluster/md3/sourceimages
Clients connected : 2
Hostname

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-02-15 Thread Hu Bert

Hello,
just to bring this to an end... the servers and the volume are "out of
service", so i tried to repair.

- umount all related mounts
- rebooted misbehaving server
- mounted volume on all clients

Well, no healing happens. 'gluster volume status workdata clients'
looks good btw.

gluster volume heal workdata statistics heal-count: empty.
gluster volume heal workdata info: lists lots of files

glustershd on the "good" servers:
[2024-02-15 09:31:32.427779 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-13:
remote operation failed.
[{path=},
{gfid=2a9dfe1d-c617-4ca5-9842-5267
5581c880}, {errno=2}, {error=No such file or directory}]

glustershd on the "bad" server:
[2024-02-15 09:32:18.613343 +] E [MSGID: 108008]
[afr-self-heal-common.c:399:afr_gfid_split_brain_source]
0-workdata-replicate-2: Gfid mismatch detected for
/854>,
bb8e53c7-0446-4f82-bd23-1
2253e8484db on workdata-client-8 and
a42769e2-f6ba-44b0-ad8c-1e451ba943a6 on workdata-client-6.
[2024-02-15 09:32:18.613550 +] E [MSGID: 108008]
[afr-self-heal-entry.c:465:afr_selfheal_detect_gfid_and_type_mismatch]
0-workdata-replicate-2: Skipping conservative merge on the file.

Well, i won't put any more work into this. The volume is screwed up,
and was replaced by a different solution. Servers will be dismissed
soon.


Thx for all your efforts,

Hubert

Am Mi., 31. Jan. 2024 um 17:10 Uhr schrieb Strahil Nikolov
:
>
> Hi,
>
> This is a simplified description, see the links bellow for more detailed one.
> When a client makes a change to a file - it  commits that change to all 
> bricks simultaneously and if the change passes on a quorate number of bricks 
> (in your case 2 out of 3 is enough) it is treated as successful.
> During that phase the 2 bricks, that successfully have completed the task, 
> will mark the 3rd brick as 'dirty' and you will see that in the heal report.
> Only when the heal daemon syncs the file to the final brick, that heal will 
> be cleaned from the remaining bricks.
>
>
> If a client has only 2 out of 3 bricks connected, it will constantly create 
> new files for healing (as it can't save it on all 3) and this can even get 
> worse with the increase of the number of clients that fail to connect to the 
> 3rd brick.
>
> Check that all client's IPs are connected to all bricks and those that are 
> not - remount the volume. After remounting the behavior should not persist. 
> If it does - check with your network/firewall team for troubleshooting the 
> problem.
>
> You can use 'gluster volume status all client-list'  and 'gluster volume 
> status all clients' (where 'all' can be replaced by the volume name) to find 
> more details on that side.
>
> You can find a more detailed explanation of the whole process at this blog:
> https://ravispeaks.wordpress.com/2019/04/05/glusterfs-afr-the-complete-guide/
>
> https://ravispeaks.wordpress.com/2019/04/15/gluster-afr-the-complete-guide-part-2/
>
> https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
> On Tue, Jan 30, 2024 at 15:26, Hu Bert
>  wrote:
> Hi Strahil,
> hm, not sure what the clients have to do with the situation. "gluster
> volume status workdata clients" - lists all clients with their IP
> addresses.
>
> "gluster peer status" and "gluster volume status" are ok, the latter
> one says that all bricks are online, have a port etc. The network is
> okay, ping works etc. Well, made a check on one client: umount gluster
> volume, remount, now the client appears in the list. Yeah... but why
> now? Will try a few more... not that easy as most of these systems are
> in production...
>
> I had enabled the 3 self-heal values, but that didn't have any effect
> back then. And, honestly, i won't do it now, because: if the heal
> started now that would probably slow down the live system (with the
> clients). I'll try it when the cluster isn't used anymore.
>
> Interesting - new messages incoming on the "bad" server:
>
> [2024-01-30 14:15:11,820] INFO [utils - 67:log_event] - {'nodeid':
> '8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620511, 'event':
> 'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol':
> 'workdata-replicate-2', 'type': 'gfid', '
> file': '/756>', 'count':
> '2', 'child-2': 'workdata-client-8', 'gfid-2':
> '39807be6-b7de-4a82-8a22-cf61b1415208', 'child-0':
> 'workdata-client-6', 'gfid-0': 'bb4a12ec-f9b7-46bc-9fb3-c57730f1fc49'}
> }
> [2024-01-30 14:15:17,028] INFO [utils - 67:log_event] - {'nodeid':
> '8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620517, 'event':
> 'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol':
> 'workdata-replic

Re: [Gluster-users] Challenges with Replicated Gluster volume after stopping Gluster on any node.

2024-02-06 Thread Hu Bert

Hi Anant,
e.g. if i reboot a gluster server this is the only script that i use - i
don't use systemctl at all.

Best regards,
Hubert


Am Mo., 5. Feb. 2024 um 19:22 Uhr schrieb Anant Saraswat <
anant.saras...@techblue.co.uk>:

> Hi Hu,
>
> Yes, I have used the "stop-all-gluster-processes.sh" after systemctl stop.
>
> I consider "stop-all-gluster-processes.sh" as the last resort to kill all
> the remaining gluster processes. Do you use it primarily to stop the
> gluster?
>
> Thanks,
> Anant
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> --
> *From:* Hu Bert 
> *Sent:* Monday, February 5, 2024 6:15 PM
> *To:* Anant Saraswat 
> *Cc:* gluster-users@gluster.org 
> *Subject:* Re: [Gluster-users] Challenges with Replicated Gluster volume
> after stopping Gluster on any node.
>
>
> *EXTERNAL: Do not click links or open attachments if you do not recognize
> the sender.*
> Hi,
> normally, when we shut down or reboot one of the (server) nodes, we call
> the "stop-all-gluster-processes.sh" script. But i think you did that, right?
>
>
> Best regards,
> Hubert
>
>
> Am Mo., 5. Feb. 2024 um 13:35 Uhr schrieb Anant Saraswat <
> anant.saras...@techblue.co.uk>:
>
>> Hello Everyone,
>>
>> We have a replicated Gluster volume with three nodes, and we face a
>> strange issue whenever we need to restart one of the nodes in this cluster.
>>
>> As per my understanding, if we shut down one node, the Gluster mount
>> should smoothly connect to another remaining Gluster server and shouldn't
>> create any issues.
>>
>> In our setup, when we stop Gluster on any of the nodes, we mostly get the
>> error 'Transport endpoint is not connected' on the clients. When we run the
>> commands to check the connected clients on the Gluster server, we get the
>> following error:
>>
>> gluster volume status tier1data clients
>> FAILED: Commit failed on master1. Error: Unable to decode brick op
>> response.
>>
>> Could anyone recommend a potential solution for this?
>>
>> Thanks,
>> Anant
>>
>> Anant Saraswat
>>
>> DevOps Lead
>>
>> IT | Technology Blueprint Ltd.
>> [image: mobilePhone]
>> +44-8450047142 (5020) <+44-8450047142%20(5020)> | +91-9818761614
>>
>> [image: emailAddress]
>> anant.saras...@techblue.co.uk
>>
>> [image: website]
>> https://www.technologyblueprint.co.uk
>> <https://urldefense.com/v3/__https://www.technologyblueprint.co.uk__;!!I_DbfM1H!HlVD8KfD8301Uq0Lq9EeyrI7mMGdgXcE2IwOS99fuppfYoCIBGTz0MqYAR-oRGuHhwXpq9lDRCX86lHUGepk4EZVhUf-$>
>>
>> [image: address]
>> Salisbury House, Station Road, Cambridge, Cambridgeshire, CB1 2LA
>>
>> DISCLAIMER: This email and any files transmitted with it are confidential
>> and intended solely for the use of the individual or entity to whom they
>> are addressed. If you have received this email in error, please notify the
>> sender. This message contains confidential information and is intended only
>> for the individual named. If you are not the named addressee, you should
>> not disseminate, distribute or copy this email. Please notify the sender
>> immediately by email if you have received this email by mistake and delete
>> this email from your system.
>>
>> If you are not the intended recipient, you are notified that disclosing,
>> copying, distributing or taking any action in reliance on the contents of
>> this information is strictly prohibited. Thanks for your cooperation.
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> <https://urldefense.com/v3/__https://meet.google.com/cpu-eiue-hvk__;!!I_DbfM1H!HlVD8KfD8301Uq0Lq9EeyrI7mMGdgXcE2IwOS99fuppfYoCIBGTz0MqYAR-oRGuHhwXpq9lDRCX86lHUGepk4BWhreAq$>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> <https://urldefense.com/v3/__https://lists.gluster.org/mailman/listinfo/gluster-users__;!!I_DbfM1H!HlVD8KfD8301Uq0Lq9EeyrI7mMGdgXcE2IwOS99fuppfYoCIBGTz0MqYAR-oRGuHhwXpq9lDRCX86lHUGepk4IQKpS1U$>
>>
> DISCLAIMER: This email and any files transmitted with it are confidential
> and intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error, please notify the
> sender. This message contains confidential information and is intended only
> for the individual named. If you are not the

Re: [Gluster-users] Challenges with Replicated Gluster volume after stopping Gluster on any node.

2024-02-05 Thread Hu Bert

Hi,
normally, when we shut down or reboot one of the (server) nodes, we call
the "stop-all-gluster-processes.sh" script. But i think you did that, right?


Best regards,
Hubert


Am Mo., 5. Feb. 2024 um 13:35 Uhr schrieb Anant Saraswat <
anant.saras...@techblue.co.uk>:

> Hello Everyone,
>
> We have a replicated Gluster volume with three nodes, and we face a
> strange issue whenever we need to restart one of the nodes in this cluster.
>
> As per my understanding, if we shut down one node, the Gluster mount
> should smoothly connect to another remaining Gluster server and shouldn't
> create any issues.
>
> In our setup, when we stop Gluster on any of the nodes, we mostly get the
> error 'Transport endpoint is not connected' on the clients. When we run the
> commands to check the connected clients on the Gluster server, we get the
> following error:
>
> gluster volume status tier1data clients
> FAILED: Commit failed on master1. Error: Unable to decode brick op
> response.
>
> Could anyone recommend a potential solution for this?
>
> Thanks,
> Anant
>
> Anant Saraswat
>
> DevOps Lead
>
> IT | Technology Blueprint Ltd.
> [image: mobilePhone]
> +44-8450047142 (5020) <+44-8450047142%20(5020)> | +91-9818761614
>
> [image: emailAddress]
> anant.saras...@techblue.co.uk
>
> [image: website]
> https://www.technologyblueprint.co.uk
>
> [image: address]
> Salisbury House, Station Road, Cambridge, Cambridgeshire, CB1 2LA
>
> DISCLAIMER: This email and any files transmitted with it are confidential
> and intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error, please notify the
> sender. This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee, you should
> not disseminate, distribute or copy this email. Please notify the sender
> immediately by email if you have received this email by mistake and delete
> this email from your system.
>
> If you are not the intended recipient, you are notified that disclosing,
> copying, distributing or taking any action in reliance on the contents of
> this information is strictly prohibited. Thanks for your cooperation.
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-30 Thread Hu Bert

Hi Strahil,
hm, not sure what the clients have to do with the situation. "gluster
volume status workdata clients" - lists all clients with their IP
addresses.

"gluster peer status" and "gluster volume status" are ok, the latter
one says that all bricks are online, have a port etc. The network is
okay, ping works etc. Well, made a check on one client: umount gluster
volume, remount, now the client appears in the list. Yeah... but why
now? Will try a few more... not that easy as most of these systems are
in production...

I had enabled the 3 self-heal values, but that didn't have any effect
back then. And, honestly, i won't do it now, because: if the heal
started now that would probably slow down the live system (with the
clients). I'll try it when the cluster isn't used anymore.

Interesting - new messages incoming on the "bad" server:

[2024-01-30 14:15:11,820] INFO [utils - 67:log_event] - {'nodeid':
'8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620511, 'event':
'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol':
'workdata-replicate-2', 'type': 'gfid', '
file': '/756>', 'count':
'2', 'child-2': 'workdata-client-8', 'gfid-2':
'39807be6-b7de-4a82-8a22-cf61b1415208', 'child-0':
'workdata-client-6', 'gfid-0': 'bb4a12ec-f9b7-46bc-9fb3-c57730f1fc49'}
}
[2024-01-30 14:15:17,028] INFO [utils - 67:log_event] - {'nodeid':
'8ea1e6b4-9c77-4390-96a7-8724c3f9dc0f', 'ts': 1706620517, 'event':
'AFR_SPLIT_BRAIN', 'message': {'client-pid': '-6', 'subvol':
'workdata-replicate-4', 'type': 'gfid', '
file': '/94259611>',
'count': '2', 'child-2': 'workdata-client-14', 'gfid-2':
'01234675-17b9-4523-a598-5e331a72c4fa', 'child-0':
'workdata-client-12', 'gfid-0': 'b11140bd-355b-4583-9a85-5d06085
89f97'}}

They didn't appear in the beginning. Looks like a funny state that
this volume is in :D


Thx & best regards,

Hubert

Am Di., 30. Jan. 2024 um 07:14 Uhr schrieb Strahil Nikolov
:
>
> This is your problem : bad server has only 3 clients.
>
> I remember there is another gluster volume command to list the IPs of the 
> clients. Find it and run it to find which clients are actually OK (those 3) 
> and the remaining 17 are not.
>
> Then try to remount those 17 clients and if the situation persistes - work 
> with your Network Team to identify why the 17 clients can't reach the brick.
>
> Do you have selfheal enabled?
>
> cluster.data-self-heal
> cluster.entry-self-heal
> cluster.metadata-self-heal
>
>
> Best Regards,
>
> Strahil Nikolov
>
> On Mon, Jan 29, 2024 at 10:26, Hu Bert
>  wrote:
> Hi,
> not sure what you mean with "clients" - do you mean the clients that
> mount the volume?
>
> gluster volume status workdata clients
> --
> Brick : glusterpub2:/gluster/md3/workdata
> Clients connected : 20
> Hostname  BytesRead
> BytesWritten  OpVersion
>   -
>   -
> 192.168.0.222:4914043698212
> 41152108  11
> [...shortened...]
> 192.168.0.126:49123  8362352021
> 16445401205  11
> --
> Brick : glusterpub3:/gluster/md3/workdata
> Clients connected : 3
> Hostname  BytesRead
> BytesWritten  OpVersion
>   -
>   -
> 192.168.0.44:49150  5855740279
> 63649538575  11
> 192.168.0.44:49137  308958200
> 319216608  11
> 192.168.0.126:49120  7524915770
> 15489813449  11
>
> 192.168.0.44 (glusterpub3) is the "bad" server. Not sure what you mean
> by "old" - probably not the age of the server, but rather the gluster
> version. op-version is 11 on all servers+clients, upgraded from
> 10.4 -> 11.1
>
> "Have you checked if a client is not allowed to update all 3 copies ?"
> -> are there special log messages for that?
>
> "If it's only 1 system, you can remove the brick, reinitialize it and
> then bring it back for a full sync."
> -> 
> https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-brick
> -> Replacing bricks in Replicate/Distributed Replicate volumes
>
> this part, right? Well, can't do this right now, as there are ~33TB of
> data (many small files) to copy, that would slow down the servers /
> the volume. But if the replacement is running i could do it
> afterwards, just to see what happens.
>
>
> Huber

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-29 Thread Hu Bert

Hi,
not sure what you mean with "clients" - do you mean the clients that
mount the volume?

gluster volume status workdata clients
--
Brick : glusterpub2:/gluster/md3/workdata
Clients connected : 20
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.222:49140 43698212
 41152108  11
[...shortened...]
192.168.0.126:49123   8362352021
16445401205  11
--
Brick : glusterpub3:/gluster/md3/workdata
Clients connected : 3
Hostname   BytesRead
BytesWritten   OpVersion
   -
   -
192.168.0.44:49150  5855740279
63649538575  11
192.168.0.44:49137   308958200
319216608  11
192.168.0.126:49120   7524915770
15489813449  11

192.168.0.44 (glusterpub3) is the "bad" server. Not sure what you mean
by "old" - probably not the age of the server, but rather the gluster
version. op-version is 11 on all servers+clients, upgraded from
10.4 -> 11.1

"Have you checked if a client is not allowed to update all 3 copies ?"
-> are there special log messages for that?

"If it's only 1 system, you can remove the brick, reinitialize it and
then bring it back for a full sync."
-> 
https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-brick
-> Replacing bricks in Replicate/Distributed Replicate volumes

this part, right? Well, can't do this right now, as there are ~33TB of
data (many small files) to copy, that would slow down the servers /
the volume. But if the replacement is running i could do it
afterwards, just to see what happens.


Hubert

Am Mo., 29. Jan. 2024 um 08:21 Uhr schrieb Strahil Nikolov
:
>
> 2800 is too much. Most probably you are affected by a bug. How old are the 
> clients ? Is only 1 server affected ?
> Have you checked if a client is not allowed to update all 3 copies ?
>
> If it's only 1 system, you can remove the brick, reinitialize it and then 
> bring it back for a full sync.
>
> Best Regards,
> Strahil Nikolov
>
> On Mon, Jan 29, 2024 at 8:44, Hu Bert
>  wrote:
> Morning,
> a few bad apples - but which ones? Checked glustershd.log on the "bad"
> server and counted todays "gfid mismatch" entries (2800 in total):
>
> 44 /212>,
> 44 /174>,
> 44 /94037803>,
> 44 /94066216>,
> 44 /249771609>,
> 44 /64235523>,
> 44 /185>,
>
> etc. But as i said, these are pretty new and didn't appear when the
> volume/servers started missbehaving. Are there scripts/snippets
> available how one could handle this?
>
> Healing would be very painful for the running system (still connected,
> but not very long anymore), as there surely are 4-5 million entries to
> be healed. I can't do this now - maybe, when the replacement is in
> productive state, one could give it a try.
>
> Thx,
> Hubert
>
> Am So., 28. Jan. 2024 um 23:12 Uhr schrieb Strahil Nikolov
> :
> >
> > From gfid mismatch a manual effort is needed but you can script it.
> > I think that a few bad "apples" can break the healing and if you fix them 
> > the healing might be recovered.
> >
> > Also, check why the client is not updating all copies. Most probably you 
> > have a client that is not able to connect to a brick.
> >
> > gluster volume status VOLUME_NAME clients
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Sun, Jan 28, 2024 at 20:55, Hu Bert
> >  wrote:
> > Hi Strahil,
> > there's no arbiter: 3 servers with 5 bricks each.
> >
> > Volume Name: workdata
> > Type: Distributed-Replicate
> > Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 5 x 3 = 15
> >
> > The "problem" is: the number of files/entries to-be-healed has
> > continuously grown since the beginning, and now we're talking about
> > way too many files to do this manually. Last time i checked: 700K per
> > brick, should be >900K at the moment. The command 'gluster volume heal
> > workdata statistics heal-count' is unable to finish. Doesn't look that
> > good :D
> >
> > Interesting, the glustershd.log on the "bad" server now shows errors like 
> > these:
> >

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-28 Thread Hu Bert

Morning,
a few bad apples - but which ones? Checked glustershd.log on the "bad"
server and counted todays "gfid mismatch" entries (2800 in total):

44 /212>,
44 /174>,
44 /94037803>,
44 /94066216>,
44 /249771609>,
44 /64235523>,
44 /185>,

etc. But as i said, these are pretty new and didn't appear when the
volume/servers started missbehaving. Are there scripts/snippets
available how one could handle this?

Healing would be very painful for the running system (still connected,
but not very long anymore), as there surely are 4-5 million entries to
be healed. I can't do this now - maybe, when the replacement is in
productive state, one could give it a try.

Thx,
Hubert

Am So., 28. Jan. 2024 um 23:12 Uhr schrieb Strahil Nikolov
:
>
> From gfid mismatch a manual effort is needed but you can script it.
> I think that a few bad "apples" can break the healing and if you fix them the 
> healing might be recovered.
>
> Also, check why the client is not updating all copies. Most probably you have 
> a client that is not able to connect to a brick.
>
> gluster volume status VOLUME_NAME clients
>
> Best Regards,
> Strahil Nikolov
>
> On Sun, Jan 28, 2024 at 20:55, Hu Bert
>  wrote:
> Hi Strahil,
> there's no arbiter: 3 servers with 5 bricks each.
>
> Volume Name: workdata
> Type: Distributed-Replicate
> Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 5 x 3 = 15
>
> The "problem" is: the number of files/entries to-be-healed has
> continuously grown since the beginning, and now we're talking about
> way too many files to do this manually. Last time i checked: 700K per
> brick, should be >900K at the moment. The command 'gluster volume heal
> workdata statistics heal-count' is unable to finish. Doesn't look that
> good :D
>
> Interesting, the glustershd.log on the "bad" server now shows errors like 
> these:
>
> [2024-01-28 18:48:33.734053 +] E [MSGID: 108008]
> [afr-self-heal-common.c:399:afr_gfid_split_brain_source]
> 0-workdata-replicate-3: Gfid mismatch detected for
> /803620716>,
> 82d7939a-8919-40ea-
> 9459-7b8af23d3b72 on workdata-client-11 and
> bb9399a3-0a5c-4cd1-b2b1-3ee787ec835a on workdata-client-9
>
> Shouldn't the heals happen on the 2 "good" servers?
>
> Anyway... we're currently preparing a different solution for our data
> and we'll throw away this gluster volume - no critical data will be
> lost, as these are derived from source data (on a different volume on
> different servers). Will be a hard time (calculating tons of data),
> but the chosen solution should have a way better performance.
>
> Well... thx to all for your efforts, really appreciate that :-)
>
>
> Hubert
>
> Am So., 28. Jan. 2024 um 08:35 Uhr schrieb Strahil Nikolov
> :
> >
> > What about the arbiter node ?
> > Actually, check on all nodes and script it - you might need it in the 
> > future.
> >
> > Simplest way to resolve is to make the file didappear (rename to something 
> > else and then rename it back). Another easy trick is to read thr whole 
> > file: dd if=file of=/dev/null status=progress
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Sat, Jan 27, 2024 at 8:24, Hu Bert
> >  wrote:
> > Morning,
> >
> > gfid1:
> > getfattr -d -e hex -m.
> > /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> >
> > glusterpub1 (good one):
> > getfattr: Removing leading '/' from absolute path names
> > # file: 
> > gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> > trusted.afr.dirty=0x
> > trusted.afr.workdata-client-11=0x00020001
> > trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb
> > trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067
> > trusted.glusterfs.mdata=0x0165aaecff2695ebb765aaecff2695ebb765aaecff2533f110
> >
> > glusterpub3 (bad one):
> > getfattr: 
> > /gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb:
> > No such file or directory
> >
> > gfid 2:
> > getfattr -d -e hex -m.
> > /gluster/md{3,4,5,6,7}/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
> >
> > glusterpub1 (good one):
> > getfattr: Removing leading '/' from absolute path names
> > # file: 
> > gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
> > truste

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-28 Thread Hu Bert

Hi Strahil,
there's no arbiter: 3 servers with 5 bricks each.

Volume Name: workdata
Type: Distributed-Replicate
Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 3 = 15

The "problem" is: the number of files/entries to-be-healed has
continuously grown since the beginning, and now we're talking about
way too many files to do this manually. Last time i checked: 700K per
brick, should be >900K at the moment. The command 'gluster volume heal
workdata statistics heal-count' is unable to finish. Doesn't look that
good :D

Interesting, the glustershd.log on the "bad" server now shows errors like these:

[2024-01-28 18:48:33.734053 +] E [MSGID: 108008]
[afr-self-heal-common.c:399:afr_gfid_split_brain_source]
0-workdata-replicate-3: Gfid mismatch detected for
/803620716>,
82d7939a-8919-40ea-
9459-7b8af23d3b72 on workdata-client-11 and
bb9399a3-0a5c-4cd1-b2b1-3ee787ec835a on workdata-client-9

Shouldn't the heals happen on the 2 "good" servers?

Anyway... we're currently preparing a different solution for our data
and we'll throw away this gluster volume - no critical data will be
lost, as these are derived from source data (on a different volume on
different servers). Will be a hard time (calculating tons of data),
but the chosen solution should have a way better performance.

Well... thx to all for your efforts, really appreciate that :-)


Hubert

Am So., 28. Jan. 2024 um 08:35 Uhr schrieb Strahil Nikolov
:
>
> What about the arbiter node ?
> Actually, check on all nodes and script it - you might need it in the future.
>
> Simplest way to resolve is to make the file didappear (rename to something 
> else and then rename it back). Another easy trick is to read thr whole file: 
> dd if=file of=/dev/null status=progress
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Jan 27, 2024 at 8:24, Hu Bert
>  wrote:
> Morning,
>
> gfid1:
> getfattr -d -e hex -m.
> /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
>
> glusterpub1 (good one):
> getfattr: Removing leading '/' from absolute path names
> # file: 
> gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> trusted.afr.dirty=0x
> trusted.afr.workdata-client-11=0x00020001
> trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb
> trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067
> trusted.glusterfs.mdata=0x0165aaecff2695ebb765aaecff2695ebb765aaecff2533f110
>
> glusterpub3 (bad one):
> getfattr: 
> /gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb:
> No such file or directory
>
> gfid 2:
> getfattr -d -e hex -m.
> /gluster/md{3,4,5,6,7}/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
>
> glusterpub1 (good one):
> getfattr: Removing leading '/' from absolute path names
> # file: 
> gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
> trusted.afr.dirty=0x
> trusted.afr.workdata-client-8=0x00020001
> trusted.gfid=0x604657235dc04ebeaced9f2c12e52642
> trusted.gfid2path.ac4669e3c4faf926=0x33366463366137392d666135642d343238652d613738642d6234376230616662316562642f31323878313238732e6a7067
> trusted.glusterfs.mdata=0x0165aaecfe0c5403bd65aaecfe0c5403bd65aaecfe0ad61ee4
>
> glusterpub3 (bad one):
> getfattr: 
> /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642:
> No such file or directory
>
> thx,
> Hubert
>
> Am Sa., 27. Jan. 2024 um 06:13 Uhr schrieb Strahil Nikolov
> :
> >
> > You don't need to mount it.
> > Like this :
> > # getfattr -d -e hex -m. 
> > /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e
> > # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e
> > trusted.gfid=0x00462be83e6149318bdadae1645c639e
> > trusted.gfid2path.05fcbdafdeea18ab=0x3032673930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079
> > trusted.glusterfs.mdata=0x016170340c25b6a7456170340c20efb5776170340c20d42b07
> > trusted.glusterfs.shard.block-size=0x0400
> > trusted.glusterfs.shard.file-size=0x00cd0001
> >
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> >
> > В четвъртък, 25 януари 2024 г. в 09:42:46 ч. Гринуич+2, Hu Bert 
> >  написа:
> >
> >
> >
> >
> >
>

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-26 Thread Hu Bert

Morning,

gfid1:
getfattr -d -e hex -m.
/gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb

glusterpub1 (good one):
getfattr: Removing leading '/' from absolute path names
# file: 
gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
trusted.afr.dirty=0x
trusted.afr.workdata-client-11=0x00020001
trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb
trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067
trusted.glusterfs.mdata=0x0165aaecff2695ebb765aaecff2695ebb765aaecff2533f110

glusterpub3 (bad one):
getfattr: 
/gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb:
No such file or directory

gfid 2:
getfattr -d -e hex -m.
/gluster/md{3,4,5,6,7}/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642

glusterpub1 (good one):
getfattr: Removing leading '/' from absolute path names
# file: 
gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
trusted.afr.dirty=0x
trusted.afr.workdata-client-8=0x00020001
trusted.gfid=0x604657235dc04ebeaced9f2c12e52642
trusted.gfid2path.ac4669e3c4faf926=0x33366463366137392d666135642d343238652d613738642d6234376230616662316562642f31323878313238732e6a7067
trusted.glusterfs.mdata=0x0165aaecfe0c5403bd65aaecfe0c5403bd65aaecfe0ad61ee4

glusterpub3 (bad one):
getfattr: 
/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642:
No such file or directory

thx,
Hubert

Am Sa., 27. Jan. 2024 um 06:13 Uhr schrieb Strahil Nikolov
:
>
> You don't need to mount it.
> Like this :
> # getfattr -d -e hex -m. 
> /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e
> # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e
> trusted.gfid=0x00462be83e6149318bdadae1645c639e
> trusted.gfid2path.05fcbdafdeea18ab=0x3032673930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079
> trusted.glusterfs.mdata=0x016170340c25b6a7456170340c20efb5776170340c20d42b07
> trusted.glusterfs.shard.block-size=0x0400
> trusted.glusterfs.shard.file-size=0x00cd0001
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
> В четвъртък, 25 януари 2024 г. в 09:42:46 ч. Гринуич+2, Hu Bert 
>  написа:
>
>
>
>
>
> Good morning,
>
> hope i got it right... using:
> https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02
>
> mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata
>
> gfid 1:
> getfattr -n trusted.glusterfs.pathinfo -e text
> /mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> trusted.glusterfs.pathinfo="(
> (
> 
>  uster/md6/workdata/images/133/283/13328349/128x128s.jpg>))"
>
> gfid 2:
> getfattr -n trusted.glusterfs.pathinfo -e text
> /mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
> trusted.glusterfs.pathinfo="(
> (
> 
>  ):glusterpub1:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>))"
>
> glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the
> misbehaving (not healing) one.
>
> The file with gfid 1 is available under
> /gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2
> bricks, but missing on glusterpub3 brick.
>
> gfid 2: 
> /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
> is present on glusterpub1+2, but not on glusterpub3.
>
>
> Thx,
> Hubert
>
> Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov
> :
>
> >
> > Hi,
> >
> > Can you find and check the files with gfids:
> > 60465723-5dc0-4ebe-aced-9f2c12e52642
> > faf59566-10f5-4ddd-8b0c-a87bc6a334fb
> >
> > Use 'getfattr -d -e hex -m. ' command from 
> > https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output
> >  .
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Sat, Jan 20, 2024 at 9:44, Hu Bert
> >  wrote:
> > Good morning,
> >
> > thx Gilberto, did the first three (set to WARNING), but the last one
> > doesn't work. Anyway, with setting these three some new messages
> > appea

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-24 Thread Hu Bert

Good morning,

hope i got it right... using:
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02

mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata

gfid 1:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
trusted.glusterfs.pathinfo="(
(

))"

gfid 2:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
trusted.glusterfs.pathinfo="(
(

))"

glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the
misbehaving (not healing) one.

The file with gfid 1 is available under
/gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2
bricks, but missing on glusterpub3 brick.

gfid 2: 
/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
is present on glusterpub1+2, but not on glusterpub3.


Thx,
Hubert

Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov
:
>
> Hi,
>
> Can you find and check the files with gfids:
> 60465723-5dc0-4ebe-aced-9f2c12e52642
> faf59566-10f5-4ddd-8b0c-a87bc6a334fb
>
> Use 'getfattr -d -e hex -m. ' command from 
> https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output
>  .
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Jan 20, 2024 at 9:44, Hu Bert
>  wrote:
> Good morning,
>
> thx Gilberto, did the first three (set to WARNING), but the last one
> doesn't work. Anyway, with setting these three some new messages
> appear:
>
> [2024-01-20 07:23:58.561106 +] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
> is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.561177 +] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:
> Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor
> in bad state]
> [2024-01-20 07:23:58.562151 +] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:
> remote operation failed.
> [{path=},
> {gfid=faf59566-10f5-4ddd-8b0c-a87b
> c6a334fb}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.562296 +] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:
> remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860552 +] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd
> is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860608 +] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:
> Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor
> in bad state]
> [2024-01-20 07:23:58.861520 +] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:
> remote operation failed.
> [{path=},
> {gfid=60465723-5dc0-4ebe-aced-9f2c1
> 2e52642}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.861640 +] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:
> remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
>
> Not many log entries appear, only a few. Has someone seen error
> messages like these? Setting diagnostics.brick-sys-log-level to DEBUG
> shows way more log entries, uploaded it to:
> https://file.io/spLhlcbMCzr8 - not sure if that helps.
>
>
> Thx,
> Hubert
>
> Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira
> :
>
> >
> > gluster volume set testvol diagnostics.brick-log-level WARNING
> > gluster volume set testvol diagnostics.brick-sys-log-level WARNING
> > gluster volume set testvol diagnostics.client-log-level ERROR
> > gluster --log-level=ERROR volume status
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> >
> >
> >
> >
> >
> > Em sex., 19 de jan. de 2024 às 05:49, Hu Bert  
> > escreveu:
> >>
> >> Hi Strahil,
> >> hm, don't get me wrong, it may sound a bit stupid, but... where do i
> >> set the log level? Using debian...
> >>
> >> https://access.redhat.com/documentation/de-de/red_hat_gluste

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-19 Thread Hu Bert

Good morning,

thx Gilberto, did the first three (set to WARNING), but the last one
doesn't work. Anyway, with setting these three some new messages
appear:

[2024-01-20 07:23:58.561106 +] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.561177 +] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:
Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor
in bad state]
[2024-01-20 07:23:58.562151 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:
remote operation failed.
[{path=},
{gfid=faf59566-10f5-4ddd-8b0c-a87b
c6a334fb}, {errno=2}, {error=No such file or directory}]
[2024-01-20 07:23:58.562296 +] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:
remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.860552 +] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd
is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.860608 +] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:
Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor
in bad state]
[2024-01-20 07:23:58.861520 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:
remote operation failed.
[{path=},
{gfid=60465723-5dc0-4ebe-aced-9f2c1
2e52642}, {errno=2}, {error=No such file or directory}]
[2024-01-20 07:23:58.861640 +] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:
remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
{errno=77}, {error=File descriptor in bad state}]

Not many log entries appear, only a few. Has someone seen error
messages like these? Setting diagnostics.brick-sys-log-level to DEBUG
shows way more log entries, uploaded it to:
https://file.io/spLhlcbMCzr8 - not sure if that helps.


Thx,
Hubert

Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira
:
>
> gluster volume set testvol diagnostics.brick-log-level WARNING
> gluster volume set testvol diagnostics.brick-sys-log-level WARNING
> gluster volume set testvol diagnostics.client-log-level ERROR
> gluster --log-level=ERROR volume status
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
> Em sex., 19 de jan. de 2024 às 05:49, Hu Bert  
> escreveu:
>>
>> Hi Strahil,
>> hm, don't get me wrong, it may sound a bit stupid, but... where do i
>> set the log level? Using debian...
>>
>> https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>>
>> ls /etc/glusterfs/
>> eventsconfig.json  glusterfs-georep-logrotate
>> gluster-rsyslog-5.8.conf  group-db-workload   group-gluster-block
>>  group-nl-cache  group-virt.example  logger.conf.example
>> glusterd.vol   glusterfs-logrotate
>> gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache
>>  group-samba gsyncd.conf thin-arbiter.vol
>>
>> checked: /etc/glusterfs/logger.conf.example
>>
>> # To enable enhanced logging capabilities,
>> #
>> # 1. rename this file to /etc/glusterfs/logger.conf
>> #
>> # 2. rename /etc/rsyslog.d/gluster.conf.example to
>> #/etc/rsyslog.d/gluster.conf
>> #
>> # This change requires restart of all gluster services/volumes and
>> # rsyslog.
>>
>> tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' "
>>
>> restart glusterd on that node, but this doesn't work, log-level stays
>> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably
>> /etc/rsyslog.conf on debian. But first it would be better to know
>> where to set the log-level for glusterd.
>>
>> Depending on how much the DEBUG log-level talks ;-) i could assign up
>> to 100G to /var
>>
>>
>> Thx & best regards,
>> Hubert
>>
>>
>> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
>> :
>> >
>> > Are you able to set the logs to debug level ?
>> > It might provide a clue what it is going on.
>> >
>> > Best Regards,
>> > Strahil Nikolov
>> >
>> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
>> >  wrote:
>> > That's the same kind of errors I keep seeing on my 2 clusters,
>> > regenerated some months ago. Seems a pseudo-split-brain that should be
>> &

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-19 Thread Hu Bert

Hi Strahil,
hm, don't get me wrong, it may sound a bit stupid, but... where do i
set the log level? Using debian...

https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level

ls /etc/glusterfs/
eventsconfig.json  glusterfs-georep-logrotate
gluster-rsyslog-5.8.conf  group-db-workload   group-gluster-block
 group-nl-cache  group-virt.example  logger.conf.example
glusterd.vol   glusterfs-logrotate
gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache
 group-samba gsyncd.conf thin-arbiter.vol

checked: /etc/glusterfs/logger.conf.example

# To enable enhanced logging capabilities,
#
# 1. rename this file to /etc/glusterfs/logger.conf
#
# 2. rename /etc/rsyslog.d/gluster.conf.example to
#/etc/rsyslog.d/gluster.conf
#
# This change requires restart of all gluster services/volumes and
# rsyslog.

tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' "

restart glusterd on that node, but this doesn't work, log-level stays
on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably
/etc/rsyslog.conf on debian. But first it would be better to know
where to set the log-level for glusterd.

Depending on how much the DEBUG log-level talks ;-) i could assign up
to 100G to /var


Thx & best regards,
Hubert


Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
:
>
> Are you able to set the logs to debug level ?
> It might provide a clue what it is going on.
>
> Best Regards,
> Strahil Nikolov
>
> On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
>  wrote:
> That's the same kind of errors I keep seeing on my 2 clusters,
> regenerated some months ago. Seems a pseudo-split-brain that should be
> impossible on a replica 3 cluster but keeps happening.
> Sadly going to ditch Gluster ASAP.
>
> Diego
>
> Il 18/01/2024 07:11, Hu Bert ha scritto:
> > Good morning,
> > heal still not running. Pending heals now sum up to 60K per brick.
> > Heal was starting instantly e.g. after server reboot with version
> > 10.4, but doesn't with version 11. What could be wrong?
> >
> > I only see these errors on one of the "good" servers in glustershd.log:
> >
> > [2024-01-18 06:08:57.328480 +] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
> > remote operation failed.
> > [{path=},
> > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> > f00681b}, {errno=2}, {error=No such file or directory}]
> > [2024-01-18 06:08:57.594051 +] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
> > remote operation failed.
> > [{path=},
> > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> > d94dd11}, {errno=2}, {error=No such file or directory}]
> >
> > About 7K today. Any ideas? Someone?
> >
> >
> > Best regards,
> > Hubert
> >
> > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert :
> >>
> >> ok, finally managed to get all servers, volumes etc runnung, but took
> >> a couple of restarts, cksum checks etc.
> >>
> >> One problem: a volume doesn't heal automatically or doesn't heal at all.
> >>
> >> gluster volume status
> >> Status of volume: workdata
> >> Gluster processTCP Port  RDMA Port  Online  Pid
> >> --
> >> Brick glusterpub1:/gluster/md3/workdata588320  Y  3436
> >> Brick glusterpub2:/gluster/md3/workdata593150  Y  1526
> >> Brick glusterpub3:/gluster/md3/workdata569170  Y  1952
> >> Brick glusterpub1:/gluster/md4/workdata596880  Y  3755
> >> Brick glusterpub2:/gluster/md4/workdata602710  Y  2271
> >> Brick glusterpub3:/gluster/md4/workdata494610  Y  2399
> >> Brick glusterpub1:/gluster/md5/workdata546510  Y  4208
> >> Brick glusterpub2:/gluster/md5/workdata496850  Y  2751
> >> Brick glusterpub3:/gluster/md5/workdata592020  Y  2803
> >> Brick glusterpub1:/gluster/md6/workdata558290  Y  4583
> >> Brick glusterpub2:/gluster/md6/workdata504550  Y  3296
> >> Brick glusterpub3:/gluster/md6/workdata502620  Y  3237
> >> Brick glusterpub1:/gluster/md7/workdata522380  Y  5014
> >> Brick glusterpub2:/gluster/md7/workdata524740  Y  3673
> >> Brick glusterpub3:/gluster/md7/workdata579660  Y  3653
> >&g

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Hu Bert

Thx for your answer. We don't have that much data (but 33 TB anyway),
but millions of files in total, on normal SATA disks. So copying stuff
away and back, with a downtime maybe, is not manageable.

Good thing is: the data can be re-calculated, as they are derived from
source data. But one needs some new hardware for that. And
maybe/probably think of a new solution for that, as we all know about
the state of the gluster project.

Thx,
Hubert

Am Do., 18. Jan. 2024 um 09:33 Uhr schrieb Diego Zuccato
:
>
> Since glusterd does not consider it a split brain, you can't solve it
> with standard split brain tools.
> I've found no way to resolve it except by manually handling one file at
> a time: completely unmanageable with thousands of files and having to
> juggle between actual path on brick and metadata files!
> Previously I "fixed" it by:
> 1) moving all the data from the volume to a temp space
> 2) recovering from the bricks what was inaccessible from the mountpoint
> (keeping different file revisions for the conflicting ones)
> 3) destroying and recreating the volume
> 4) copying back the data from the backup
>
> When gluster gets used because you need lots of space (we had more than
> 400TB on 3 nodes with 30x12TB SAS disks in "replica 3 arbiter 1"), where
> do you park the data? Is the official solution "just have a second
> cluster idle for when you need to fix errors"?
> It took more than a month of downtime this summer, and after less than 6
> months I'd have to repeat it? Users are rightly quite upset...
>
> Diego
>
> Il 18/01/2024 09:17, Hu Bert ha scritto:
> > were you able to solve the problem? Can it be treated like a "normal"
> > split brain? 'gluster peer status' and 'gluster volume status' are ok,
> > so kinda looks like "pseudo"...
> >
> >
> > hubert
> >
> > Am Do., 18. Jan. 2024 um 08:28 Uhr schrieb Diego Zuccato
> > :
> >>
> >> That's the same kind of errors I keep seeing on my 2 clusters,
> >> regenerated some months ago. Seems a pseudo-split-brain that should be
> >> impossible on a replica 3 cluster but keeps happening.
> >> Sadly going to ditch Gluster ASAP.
> >>
> >> Diego
> >>
> >> Il 18/01/2024 07:11, Hu Bert ha scritto:
> >>> Good morning,
> >>> heal still not running. Pending heals now sum up to 60K per brick.
> >>> Heal was starting instantly e.g. after server reboot with version
> >>> 10.4, but doesn't with version 11. What could be wrong?
> >>>
> >>> I only see these errors on one of the "good" servers in glustershd.log:
> >>>
> >>> [2024-01-18 06:08:57.328480 +] W [MSGID: 114031]
> >>> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
> >>> remote operation failed.
> >>> [{path=},
> >>> {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> >>> f00681b}, {errno=2}, {error=No such file or directory}]
> >>> [2024-01-18 06:08:57.594051 +] W [MSGID: 114031]
> >>> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
> >>> remote operation failed.
> >>> [{path=},
> >>> {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> >>> d94dd11}, {errno=2}, {error=No such file or directory}]
> >>>
> >>> About 7K today. Any ideas? Someone?
> >>>
> >>>
> >>> Best regards,
> >>> Hubert
> >>>
> >>> Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert 
> >>> :
> >>>>
> >>>> ok, finally managed to get all servers, volumes etc runnung, but took
> >>>> a couple of restarts, cksum checks etc.
> >>>>
> >>>> One problem: a volume doesn't heal automatically or doesn't heal at all.
> >>>>
> >>>> gluster volume status
> >>>> Status of volume: workdata
> >>>> Gluster process TCP Port  RDMA Port  Online  
> >>>> Pid
> >>>> --
> >>>> Brick glusterpub1:/gluster/md3/workdata 58832 0  Y   
> >>>> 3436
> >>>> Brick glusterpub2:/gluster/md3/workdata 59315 0  Y   
> >>>> 1526
> >>>> Brick glusterpub3:/gluster/md3/workdata 56917 0  Y   
> >>>> 1952
> >>>> Brick glusterpub1:/gluster/md4/workdata 59688 0  Y   
> >>>> 3755
> >>>> Bri

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-18 Thread Hu Bert

were you able to solve the problem? Can it be treated like a "normal"
split brain? 'gluster peer status' and 'gluster volume status' are ok,
so kinda looks like "pseudo"...


hubert

Am Do., 18. Jan. 2024 um 08:28 Uhr schrieb Diego Zuccato
:
>
> That's the same kind of errors I keep seeing on my 2 clusters,
> regenerated some months ago. Seems a pseudo-split-brain that should be
> impossible on a replica 3 cluster but keeps happening.
> Sadly going to ditch Gluster ASAP.
>
> Diego
>
> Il 18/01/2024 07:11, Hu Bert ha scritto:
> > Good morning,
> > heal still not running. Pending heals now sum up to 60K per brick.
> > Heal was starting instantly e.g. after server reboot with version
> > 10.4, but doesn't with version 11. What could be wrong?
> >
> > I only see these errors on one of the "good" servers in glustershd.log:
> >
> > [2024-01-18 06:08:57.328480 +] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
> > remote operation failed.
> > [{path=},
> > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> > f00681b}, {errno=2}, {error=No such file or directory}]
> > [2024-01-18 06:08:57.594051 +] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
> > remote operation failed.
> > [{path=},
> > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> > d94dd11}, {errno=2}, {error=No such file or directory}]
> >
> > About 7K today. Any ideas? Someone?
> >
> >
> > Best regards,
> > Hubert
> >
> > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert :
> >>
> >> ok, finally managed to get all servers, volumes etc runnung, but took
> >> a couple of restarts, cksum checks etc.
> >>
> >> One problem: a volume doesn't heal automatically or doesn't heal at all.
> >>
> >> gluster volume status
> >> Status of volume: workdata
> >> Gluster process TCP Port  RDMA Port  Online  
> >> Pid
> >> --
> >> Brick glusterpub1:/gluster/md3/workdata 58832 0  Y   
> >> 3436
> >> Brick glusterpub2:/gluster/md3/workdata 59315 0  Y   
> >> 1526
> >> Brick glusterpub3:/gluster/md3/workdata 56917 0  Y   
> >> 1952
> >> Brick glusterpub1:/gluster/md4/workdata 59688 0  Y   
> >> 3755
> >> Brick glusterpub2:/gluster/md4/workdata 60271 0  Y   
> >> 2271
> >> Brick glusterpub3:/gluster/md4/workdata 49461 0  Y   
> >> 2399
> >> Brick glusterpub1:/gluster/md5/workdata 54651 0  Y   
> >> 4208
> >> Brick glusterpub2:/gluster/md5/workdata 49685 0  Y   
> >> 2751
> >> Brick glusterpub3:/gluster/md5/workdata 59202 0  Y   
> >> 2803
> >> Brick glusterpub1:/gluster/md6/workdata 55829 0  Y   
> >> 4583
> >> Brick glusterpub2:/gluster/md6/workdata 50455 0  Y   
> >> 3296
> >> Brick glusterpub3:/gluster/md6/workdata 50262 0  Y   
> >> 3237
> >> Brick glusterpub1:/gluster/md7/workdata 52238 0  Y   
> >> 5014
> >> Brick glusterpub2:/gluster/md7/workdata 52474 0  Y   
> >> 3673
> >> Brick glusterpub3:/gluster/md7/workdata 57966 0  Y   
> >> 3653
> >> Self-heal Daemon on localhost   N/A   N/AY   
> >> 4141
> >> Self-heal Daemon on glusterpub1 N/A   N/AY   
> >> 5570
> >> Self-heal Daemon on glusterpub2 N/A   N/AY   
> >> 4139
> >>
> >> "gluster volume heal workdata info" lists a lot of files per brick.
> >> "gluster volume heal workdata statistics heal-count" shows thousands
> >> of files per brick.
> >> "gluster volume heal workdata enable" has no effect.
> >>
> >> gluster volume heal workdata full
> >> Launching heal operation to perform full self heal on volume workdata
> >> has been successful
> >> Use heal info commands to check status.
> >>
> >> -> not doing anything at all. And nothing happening on the 2 "good"
> >> servers in e.g. glustershd.log. Heal was working as expected on
> >> version 10.4, but here... silence. So

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert

Good morning,
heal still not running. Pending heals now sum up to 60K per brick.
Heal was starting instantly e.g. after server reboot with version
10.4, but doesn't with version 11. What could be wrong?

I only see these errors on one of the "good" servers in glustershd.log:

[2024-01-18 06:08:57.328480 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
remote operation failed.
[{path=},
{gfid=cb39a1e4-2a4c-4727-861d-3ed9e
f00681b}, {errno=2}, {error=No such file or directory}]
[2024-01-18 06:08:57.594051 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
remote operation failed.
[{path=},
{gfid=3e9b178c-ae1f-4d85-ae47-fc539
d94dd11}, {errno=2}, {error=No such file or directory}]

About 7K today. Any ideas? Someone?


Best regards,
Hubert

Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert :
>
> ok, finally managed to get all servers, volumes etc runnung, but took
> a couple of restarts, cksum checks etc.
>
> One problem: a volume doesn't heal automatically or doesn't heal at all.
>
> gluster volume status
> Status of volume: workdata
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick glusterpub1:/gluster/md3/workdata 58832 0  Y   3436
> Brick glusterpub2:/gluster/md3/workdata 59315 0  Y   1526
> Brick glusterpub3:/gluster/md3/workdata 56917 0  Y   1952
> Brick glusterpub1:/gluster/md4/workdata 59688 0  Y   3755
> Brick glusterpub2:/gluster/md4/workdata 60271 0  Y   2271
> Brick glusterpub3:/gluster/md4/workdata 49461 0  Y   2399
> Brick glusterpub1:/gluster/md5/workdata 54651 0  Y   4208
> Brick glusterpub2:/gluster/md5/workdata 49685 0  Y   2751
> Brick glusterpub3:/gluster/md5/workdata 59202 0  Y   2803
> Brick glusterpub1:/gluster/md6/workdata 55829 0  Y   4583
> Brick glusterpub2:/gluster/md6/workdata 50455 0  Y   3296
> Brick glusterpub3:/gluster/md6/workdata 50262 0  Y   3237
> Brick glusterpub1:/gluster/md7/workdata 52238 0  Y   5014
> Brick glusterpub2:/gluster/md7/workdata 52474 0  Y   3673
> Brick glusterpub3:/gluster/md7/workdata 57966 0  Y   3653
> Self-heal Daemon on localhost   N/A   N/AY   4141
> Self-heal Daemon on glusterpub1 N/A   N/AY   5570
> Self-heal Daemon on glusterpub2 N/A   N/AY   4139
>
> "gluster volume heal workdata info" lists a lot of files per brick.
> "gluster volume heal workdata statistics heal-count" shows thousands
> of files per brick.
> "gluster volume heal workdata enable" has no effect.
>
> gluster volume heal workdata full
> Launching heal operation to perform full self heal on volume workdata
> has been successful
> Use heal info commands to check status.
>
> -> not doing anything at all. And nothing happening on the 2 "good"
> servers in e.g. glustershd.log. Heal was working as expected on
> version 10.4, but here... silence. Someone has an idea?
>
>
> Best regards,
> Hubert
>
> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
> :
> >
> > Ah! Indeed! You need to perform an upgrade in the clients as well.
> >
> >
> >
> >
> >
> >
> >
> >
> > Em ter., 16 de jan. de 2024 às 03:12, Hu Bert  
> > escreveu:
> >>
> >> morning to those still reading :-)
> >>
> >> i found this: 
> >> https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >>
> >> there's a paragraph about "peer rejected" with the same error message,
> >> telling me: "Update the cluster.op-version" - i had only updated the
> >> server nodes, but not the clients. So upgrading the cluster.op-version
> >> wasn't possible at this time. So... upgrading the clients to version
> >> 11.1 and then the op-version should solve the problem?
> >>
> >>
> >> Thx,
> >> Hubert
> >>
> >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert 
> >> :
> >> >
> >> > Hi,
> >> > just upgraded some gluster servers from version 10.4 to version 11.1.
> >> > Debian bullseye & bookworm. When only installing the packages: good,
> >> > servers, volu

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert

hm, i only see such messages in glustershd.log on the 2 good servers:

[2024-01-17 12:18:48.912952 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6:
remote operation failed.
[{path=},
{gfid=ee28b56c-e352-48f8-bbb5-dbf31
babe073}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:18:48.913015 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-7:
remote operation failed.
[{path=},
{gfid=ee28b56c-e352-48f8-bbb5-dbf31
babe073}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:19:09.450335 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-10:
remote operation failed.
[{path=},
{gfid=ea4a63e3-1470-40a5-8a7e-2a10
61a8fcb0}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:19:09.450771 +] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-9:
remote operation failed.
[{path=},
{gfid=ea4a63e3-1470-40a5-8a7e-2a106
1a8fcb0}, {errno=2}, {error=No such file or directory}]

not sure if this is important.

Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert :
>
> ok, finally managed to get all servers, volumes etc runnung, but took
> a couple of restarts, cksum checks etc.
>
> One problem: a volume doesn't heal automatically or doesn't heal at all.
>
> gluster volume status
> Status of volume: workdata
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick glusterpub1:/gluster/md3/workdata 58832 0  Y   3436
> Brick glusterpub2:/gluster/md3/workdata 59315 0  Y   1526
> Brick glusterpub3:/gluster/md3/workdata 56917 0  Y   1952
> Brick glusterpub1:/gluster/md4/workdata 59688 0  Y   3755
> Brick glusterpub2:/gluster/md4/workdata 60271 0  Y   2271
> Brick glusterpub3:/gluster/md4/workdata 49461 0  Y   2399
> Brick glusterpub1:/gluster/md5/workdata 54651 0  Y   4208
> Brick glusterpub2:/gluster/md5/workdata 49685 0  Y   2751
> Brick glusterpub3:/gluster/md5/workdata 59202 0  Y   2803
> Brick glusterpub1:/gluster/md6/workdata 55829 0  Y   4583
> Brick glusterpub2:/gluster/md6/workdata 50455 0  Y   3296
> Brick glusterpub3:/gluster/md6/workdata 50262 0  Y   3237
> Brick glusterpub1:/gluster/md7/workdata 52238 0  Y   5014
> Brick glusterpub2:/gluster/md7/workdata 52474 0  Y   3673
> Brick glusterpub3:/gluster/md7/workdata 57966 0  Y   3653
> Self-heal Daemon on localhost   N/A   N/AY   4141
> Self-heal Daemon on glusterpub1 N/A   N/AY   5570
> Self-heal Daemon on glusterpub2 N/A   N/AY   4139
>
> "gluster volume heal workdata info" lists a lot of files per brick.
> "gluster volume heal workdata statistics heal-count" shows thousands
> of files per brick.
> "gluster volume heal workdata enable" has no effect.
>
> gluster volume heal workdata full
> Launching heal operation to perform full self heal on volume workdata
> has been successful
> Use heal info commands to check status.
>
> -> not doing anything at all. And nothing happening on the 2 "good"
> servers in e.g. glustershd.log. Heal was working as expected on
> version 10.4, but here... silence. Someone has an idea?
>
>
> Best regards,
> Hubert
>
> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
> :
> >
> > Ah! Indeed! You need to perform an upgrade in the clients as well.
> >
> >
> >
> >
> >
> >
> >
> >
> > Em ter., 16 de jan. de 2024 às 03:12, Hu Bert  
> > escreveu:
> >>
> >> morning to those still reading :-)
> >>
> >> i found this: 
> >> https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >>
> >> there's a paragraph about "peer rejected" with the same error message,
> >> telling me: "Update the cluster.op-version" - i had only updated the
> >> server nodes, but not the clients. So upgrading the cluster.op-version
> >> wasn't possible at this time. So... upgrading the clients to version
> >> 11.1 and then the op-version should solve the problem?
> >>
> >>
> >> Thx,
> >> Hubert
> >>
> >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert 
> >> :
> >> >
> >&

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-17 Thread Hu Bert

ok, finally managed to get all servers, volumes etc runnung, but took
a couple of restarts, cksum checks etc.

One problem: a volume doesn't heal automatically or doesn't heal at all.

gluster volume status
Status of volume: workdata
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick glusterpub1:/gluster/md3/workdata 58832 0  Y   3436
Brick glusterpub2:/gluster/md3/workdata 59315 0  Y   1526
Brick glusterpub3:/gluster/md3/workdata 56917 0  Y   1952
Brick glusterpub1:/gluster/md4/workdata 59688 0  Y   3755
Brick glusterpub2:/gluster/md4/workdata 60271 0  Y   2271
Brick glusterpub3:/gluster/md4/workdata 49461 0  Y   2399
Brick glusterpub1:/gluster/md5/workdata 54651 0  Y   4208
Brick glusterpub2:/gluster/md5/workdata 49685 0  Y   2751
Brick glusterpub3:/gluster/md5/workdata 59202 0  Y   2803
Brick glusterpub1:/gluster/md6/workdata 55829 0  Y   4583
Brick glusterpub2:/gluster/md6/workdata 50455 0  Y   3296
Brick glusterpub3:/gluster/md6/workdata 50262 0  Y   3237
Brick glusterpub1:/gluster/md7/workdata 52238 0  Y   5014
Brick glusterpub2:/gluster/md7/workdata 52474 0  Y   3673
Brick glusterpub3:/gluster/md7/workdata 57966 0  Y   3653
Self-heal Daemon on localhost   N/A   N/AY   4141
Self-heal Daemon on glusterpub1 N/A   N/AY   5570
Self-heal Daemon on glusterpub2 N/A   N/AY   4139

"gluster volume heal workdata info" lists a lot of files per brick.
"gluster volume heal workdata statistics heal-count" shows thousands
of files per brick.
"gluster volume heal workdata enable" has no effect.

gluster volume heal workdata full
Launching heal operation to perform full self heal on volume workdata
has been successful
Use heal info commands to check status.

-> not doing anything at all. And nothing happening on the 2 "good"
servers in e.g. glustershd.log. Heal was working as expected on
version 10.4, but here... silence. Someone has an idea?


Best regards,
Hubert

Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
:
>
> Ah! Indeed! You need to perform an upgrade in the clients as well.
>
>
>
>
>
>
>
>
> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert  
> escreveu:
>>
>> morning to those still reading :-)
>>
>> i found this: 
>> https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
>>
>> there's a paragraph about "peer rejected" with the same error message,
>> telling me: "Update the cluster.op-version" - i had only updated the
>> server nodes, but not the clients. So upgrading the cluster.op-version
>> wasn't possible at this time. So... upgrading the clients to version
>> 11.1 and then the op-version should solve the problem?
>>
>>
>> Thx,
>> Hubert
>>
>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert :
>> >
>> > Hi,
>> > just upgraded some gluster servers from version 10.4 to version 11.1.
>> > Debian bullseye & bookworm. When only installing the packages: good,
>> > servers, volumes etc. work as expected.
>> >
>> > But one needs to test if the systems work after a daemon and/or server
>> > restart. Well, did a reboot, and after that the rebooted/restarted
>> > system is "out". Log message from working node:
>> >
>> > [2024-01-15 08:02:21.585694 +] I [MSGID: 106163]
>> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
>> > 0-management: using the op-version 10
>> > [2024-01-15 08:02:21.589601 +] I [MSGID: 106490]
>> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
>> > 0-glusterd: Received probe from uuid:
>> > b71401c3-512a-47cb-ac18-473c4ba7776e
>> > [2024-01-15 08:02:23.608349 +] E [MSGID: 106010]
>> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
>> > Version of Cksums sourceimages differ. local cksum = 2204642525,
>> > remote cksum = 1931483801 on peer gluster190
>> > [2024-01-15 08:02:23.608584 +] I [MSGID: 106493]
>> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
>> > Responded to gluster190 (0), ret: 0, op_ret: -1
>> > [2024-01-15 08:02:23.613553 +] I [MSGID: 106493]
>> > [glusterd-rpc-ops.c:467:__glusterd

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-15 Thread Hu Bert

morning to those still reading :-)

i found this: 
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them

there's a paragraph about "peer rejected" with the same error message,
telling me: "Update the cluster.op-version" - i had only updated the
server nodes, but not the clients. So upgrading the cluster.op-version
wasn't possible at this time. So... upgrading the clients to version
11.1 and then the op-version should solve the problem?


Thx,
Hubert

Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert :
>
> Hi,
> just upgraded some gluster servers from version 10.4 to version 11.1.
> Debian bullseye & bookworm. When only installing the packages: good,
> servers, volumes etc. work as expected.
>
> But one needs to test if the systems work after a daemon and/or server
> restart. Well, did a reboot, and after that the rebooted/restarted
> system is "out". Log message from working node:
>
> [2024-01-15 08:02:21.585694 +] I [MSGID: 106163]
> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 10
> [2024-01-15 08:02:21.589601 +] I [MSGID: 106490]
> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> b71401c3-512a-47cb-ac18-473c4ba7776e
> [2024-01-15 08:02:23.608349 +] E [MSGID: 106010]
> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> Version of Cksums sourceimages differ. local cksum = 2204642525,
> remote cksum = 1931483801 on peer gluster190
> [2024-01-15 08:02:23.608584 +] I [MSGID: 106493]
> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster190 (0), ret: 0, op_ret: -1
> [2024-01-15 08:02:23.613553 +] I [MSGID: 106493]
> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> gluster190, port: 0
>
> peer status from rebooted node:
>
> root@gluster190 ~ # gluster peer status
> Number of Peers: 2
>
> Hostname: gluster189
> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> State: Peer Rejected (Connected)
>
> Hostname: gluster188
> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> State: Peer Rejected (Connected)
>
> So the rebooted gluster190 is not accepted anymore. And thus does not
> appear in "gluster volume status". I then followed this guide:
>
> https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>
> Remove everything under /var/lib/glusterd/ (except glusterd.info) and
> restart glusterd service etc. Data get copied from other nodes,
> 'gluster peer status' is ok again - but the volume info is missing,
> /var/lib/glusterd/vols is empty. When syncing this dir from another
> node, the volume then is available again, heals start etc.
>
> Well, and just to be sure that everything's working as it should,
> rebooted that node again - the rebooted node is kicked out again, and
> you have to restart bringing it back again.
>
> Sry, but did i miss anything? Has someone experienced similar
> problems? I'll probably downgrade to 10.4 again, that version was
> working...
>
>
> Thx,
> Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-15 Thread Hu Bert

just downgraded one node to 10.4, did a reboot - same result: cksum
error. i'm able to bring it back in again, but it that error persists
when downgrading all servers...

Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert :
>
> Hi,
> just upgraded some gluster servers from version 10.4 to version 11.1.
> Debian bullseye & bookworm. When only installing the packages: good,
> servers, volumes etc. work as expected.
>
> But one needs to test if the systems work after a daemon and/or server
> restart. Well, did a reboot, and after that the rebooted/restarted
> system is "out". Log message from working node:
>
> [2024-01-15 08:02:21.585694 +] I [MSGID: 106163]
> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 10
> [2024-01-15 08:02:21.589601 +] I [MSGID: 106490]
> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> b71401c3-512a-47cb-ac18-473c4ba7776e
> [2024-01-15 08:02:23.608349 +] E [MSGID: 106010]
> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> Version of Cksums sourceimages differ. local cksum = 2204642525,
> remote cksum = 1931483801 on peer gluster190
> [2024-01-15 08:02:23.608584 +] I [MSGID: 106493]
> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster190 (0), ret: 0, op_ret: -1
> [2024-01-15 08:02:23.613553 +] I [MSGID: 106493]
> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> gluster190, port: 0
>
> peer status from rebooted node:
>
> root@gluster190 ~ # gluster peer status
> Number of Peers: 2
>
> Hostname: gluster189
> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> State: Peer Rejected (Connected)
>
> Hostname: gluster188
> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> State: Peer Rejected (Connected)
>
> So the rebooted gluster190 is not accepted anymore. And thus does not
> appear in "gluster volume status". I then followed this guide:
>
> https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>
> Remove everything under /var/lib/glusterd/ (except glusterd.info) and
> restart glusterd service etc. Data get copied from other nodes,
> 'gluster peer status' is ok again - but the volume info is missing,
> /var/lib/glusterd/vols is empty. When syncing this dir from another
> node, the volume then is available again, heals start etc.
>
> Well, and just to be sure that everything's working as it should,
> rebooted that node again - the rebooted node is kicked out again, and
> you have to restart bringing it back again.
>
> Sry, but did i miss anything? Has someone experienced similar
> problems? I'll probably downgrade to 10.4 again, that version was
> working...
>
>
> Thx,
> Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

2024-01-15 Thread Hu Bert

Hi,
just upgraded some gluster servers from version 10.4 to version 11.1.
Debian bullseye & bookworm. When only installing the packages: good,
servers, volumes etc. work as expected.

But one needs to test if the systems work after a daemon and/or server
restart. Well, did a reboot, and after that the rebooted/restarted
system is "out". Log message from working node:

[2024-01-15 08:02:21.585694 +] I [MSGID: 106163]
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 10
[2024-01-15 08:02:21.589601 +] I [MSGID: 106490]
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e
[2024-01-15 08:02:23.608349 +] E [MSGID: 106010]
[glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
Version of Cksums sourceimages differ. local cksum = 2204642525,
remote cksum = 1931483801 on peer gluster190
[2024-01-15 08:02:23.608584 +] I [MSGID: 106493]
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to gluster190 (0), ret: 0, op_ret: -1
[2024-01-15 08:02:23.613553 +] I [MSGID: 106493]
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
gluster190, port: 0

peer status from rebooted node:

root@gluster190 ~ # gluster peer status
Number of Peers: 2

Hostname: gluster189
Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
State: Peer Rejected (Connected)

Hostname: gluster188
Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
State: Peer Rejected (Connected)

So the rebooted gluster190 is not accepted anymore. And thus does not
appear in "gluster volume status". I then followed this guide:

https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/

Remove everything under /var/lib/glusterd/ (except glusterd.info) and
restart glusterd service etc. Data get copied from other nodes,
'gluster peer status' is ok again - but the volume info is missing,
/var/lib/glusterd/vols is empty. When syncing this dir from another
node, the volume then is available again, heals start etc.

Well, and just to be sure that everything's working as it should,
rebooted that node again - the rebooted node is kicked out again, and
you have to restart bringing it back again.

Sry, but did i miss anything? Has someone experienced similar
problems? I'll probably downgrade to 10.4 again, that version was
working...


Thx,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster release 11.1

2023-11-27 Thread Hu Bert

Hi,
on console with wget i see this:

2023-11-27 15:04:35 (317 MB/s) - Read error at byte 32408/3166108
(Error decoding the received TLS packet.). Retrying.

that looks strange :-)

Best regards,
Hubert


Am Mo., 27. Nov. 2023 um 14:44 Uhr schrieb Gilberto Ferreira
:
>
> I am getting this errors:
>
> Err:10 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt
>  bookworm/main amd64 glusterfs-server amd64 11.1-1
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> Fetched 35.9 kB in 36s (1,006 B/s)
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/libglusterfs0_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/libgfxdr0_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/libgfrpc0_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/libgfchangelog0_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/libgfapi0_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/glusterfs-common_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/glusterfs-client_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/glusterfs-cli_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Failed to fetch 
> https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/bookworm/amd64/apt/pool/main/g/glusterfs/glusterfs-server_11.1-1_amd64.deb
>   Error reading from server - read (5: Input/output error) [IP: 8.43.85.185 
> 443]
> E: Unable to fetch some archives, maybe run apt-get update or try with 
> --fix-missing?
>
> Anybody can help me???
> Thanks a lot.
> ---
> Gilberto Nunes Ferreira
> (47) 99676-7530 - Whatsapp / Telegram
>
>
>
>
>
>
> Em sáb., 25 de nov. de 2023 às 09:00, Strahil Nikolov  
> escreveu:
>>
>> Great news!
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Fri, Nov 24, 2023 at 3:32, Shwetha Acharya
>>  wrote:
>> The Gluster community is pleased to announce the release of Gluster11.1
>> Packages available at [1].
>> Release notes for the release can be found at [2].
>>
>> Highlights of Release:
>>
>> -  Fix upgrade issue by reverting posix change related to storage.reserve 
>> value
>> -  Fix possible data loss during rebalance if there is any link file on the 
>> system
>> -  Fix maximum op-version for release 11
>>
>> Thanks,
>> Shwetha
>>
>> References:
>>
>> [1] Packages for 11.1
>> https://download.gluster.org/pub/gluster/glusterfs/11/11.1/
>>
>> [2] Release notes for 11.1:
>> https://docs.gluster.org/en/latest/release-notes/11.1/
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Full docementation of available volume setting

2023-09-21 Thread Hu Bert

Hi,
i use this:

gluster volume set help > /path/to/file.txt

Then you have a text file where you can search in.



Best regards,
Hubert


Am Di., 19. Sept. 2023 um 18:55 Uhr schrieb Omri Sarig :

> Hello,
>
> I'm a new Gluster user, currently mostly investigating the potential use
> of the system for our company.
>
> I've read a lot of the official documentation
> , and I'm able to do most basic
> operations with the system.
>
> However, one thing I'm having difficulty finding is complete documentation
> of the available options to fine-tune volumes.
> I've seen that with the command "gluster volume get  all"  I
> can see the full list of available options for the volume, but I could not
> find documentation for these options.
> I've tried looking in Gluster documentation, as well as through Google
> searches, but did not find information about most of these commands.
>
> What I'm looking for is a page containing a list of all the available
> options, with a short description for each option.
> Does something like this exist?
>
> Any help will be greatly appreciated.
>
> With Kind Regards,
>
>
> *Omri Sarig, *
>
> Software Engineer
>
> *[image: e-brev2]*
>
> Visit: Lyskær 3EF, DK-2730 Herlev
>
> Mobile: +45 28 49 42 27
>
> omri.sa...@prevas.dk
>
> www.prevas.dk
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] gluster 10.3: task glfs_fusenoti blocked for more than 120 seconds

2023-05-05 Thread Hu Bert

Hi Mohit,
strange, after the latest kernel update (5.10.0-22-amd64 #1 Debian
5.10.178-3) and with gluster 10.4 running, i got this message again
today. Result: for a certain time period e.g. 'df -h' hangs. And as
gluster is the only thing (at least that i'm aware of) that uses FUSE
i thought there might... ;-)

Maybe i'll have to check if there's anything that can be done on
kernel/system side. Searching... :-)


Thx,
Hubert


Am Di., 2. Mai 2023 um 14:28 Uhr schrieb Mohit Agrawal :
>
> I don't think the issue is on gluster side, it seems the issue is on kernel 
> side (possible deadlock in fuse_reverse_inval_entry)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bda9a71980e083699a0360963c0135657b73f47a
>
> On Tue, May 2, 2023 at 5:48 PM Hu Bert  wrote:
>>
>> Good morning,
>>
>> we've recently had some strange message in /var/log/syslog.
>>
>> System:
>> debian bullseye, kernel 5.10.0-21-amd64 and 5.10.0-22-amd64
>> gluster 10.3
>>
>> The message look like:
>>
>> Apr 27 13:30:18 piggy kernel: [24287.715229] INFO: task
>> glfs_fusenoti:2787 blocked for more than 120 seconds.
>> Apr 27 13:30:18 piggy kernel: [24287.715327]   Not tainted
>> 5.10.0-22-amd64 #1 Debian 5.10.178-3
>> Apr 27 13:30:18 piggy kernel: [24287.715419] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Apr 27 13:30:18 piggy kernel: [24287.715575] task:glfs_fusenoti
>> state:D stack:0 pid: 2787 ppid: 1 flags:0x
>> Apr 27 13:30:18 piggy kernel: [24287.715734] Call Trace:
>> Apr 27 13:30:18 piggy kernel: [24287.715847]  __schedule+0x282/0x870
>> Apr 27 13:30:18 piggy kernel: [24287.715959]  schedule+0x46/0xb0
>> Apr 27 13:30:18 piggy kernel: [24287.716073]
>> rwsem_down_write_slowpath+0x257/0x4d0
>> Apr 27 13:30:18 piggy kernel: [24287.716194]
>> fuse_reverse_inval_entry+0x3b/0x1e0 [fuse]
>>
>> etc. Full excerpt here: https://pastebin.com/6gDHgh16
>>
>> This may cause our application to "hang". In the latest release i read
>> something about fuse:
>> https://docs.gluster.org/en/latest/release-notes/10.4/
>> "Fix fuse concurrency problems"
>>
>> but i checked the tickets and wasn't able to find something.
>>
>> However, i upgraded gluster client 10.3 -> 10.4, hoping that this
>> fixes the hang-issue.
>>
>> Has anyone seen these messages before?
>>
>>
>> Best regards,
>> Hubert
>> ---
>>
>> Community Meeting Calendar:
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Performance: lots of small files, hdd, nvme etc.

2023-05-01 Thread Hu Bert

Hi there,
well... i know that you can read from the bricks themselves, but when
there are 7 bricks with each 1/7 of the data - which one do you
choose? ;-) Maybe ONE raid1 or raid10 and a replicate 3 performs
better than a "Number of Bricks: 5 x 3 = 15" Distributed-Replicate...

systems are under heavy load. Did some reading regarding tuning, most
of the stuff for small-file-scenario already done, did some more (just
by guessing, as documentation is a bit... poor):

performance.io-cache on
performance.io-cache-size 6GB
performance.quick-read-cache-size 6GB
group nl-cache
network.inode-lru-limit 40

Well, doesn't really help. Here are some images from one server (doing
some reads, 100K for each server per day) and one client (doing
frequent operations on the volume):

server:
cpu: https://abload.de/img/server-cpub3cf8.png
diskstats: https://abload.de/img/server-diskstats_iopsbleui.png
throughput: https://abload.de/img/server-diskstats_thro0kfma.png
disk util: https://abload.de/img/server-diskstats_utili2isl.png
network: https://abload.de/img/server-if_ens5-dayeac51.png
interrupts: https://abload.de/img/server-interruptsxycwd.png
load: https://abload.de/img/server-loadp7cid.png

client:
cpu: https://abload.de/img/client-cpuiuc0h.png
load: https://abload.de/img/client-loadsadod.png

Well, i hope you people can see why i keep asking such stuff
frequently. 4 options:

1) find some good tuning
2) check if a raid10 (with 10-14 huge hdds) performs better
3) migrate to nvme (JBOD or raid10)
4) or, if none of the above is feasible or reasonable, migrate to a
different solution (like ceph, minio, ...)


Thx for reading && best regards,

Hubert

Am Mo., 3. Apr. 2023 um 19:10 Uhr schrieb :
>
> hello
> you can read files from underlying filesystem first (ext4,xfs...), for
> ex: /srv/glusterfs//brick.
>
> as fall back you can check mounted glusterfs path, to heal missing local
> node entries. ex: /mnt/shared/www/...
>
> you need only to write to mount.glusterfs mount point.
>
>
>
>
>
> On 3/30/2023 11:26 AM, Hu Bert wrote:
> > - workload: the (un)famous "lots of small files" setting
> > - currently 70% of the of the volume is used: ~32TB
> > - file size: few KB up to 1MB
> > - so there are hundreds of millions of files (and millions of directories)
> > - each image has an ID
> > - under the base dir the IDs are split into 3 digits
> > - dir structure: /basedir/(000-999)/(000-999)/ID/[lotsoffileshere]
> > - example for ID 123456789: /basedir/123/456/123456789/default.jpg
> > - maybe this structure isn't good and e.g. this would be better:
> > /basedir/IDs/[here the files] - so millions of ID-dirs directly under
> > /basedir/
> > - frequent access to the files by webservers (nginx, tomcat): lookup
> > if file exists, read/write images etc.
> > - Strahil mentioned: "Keep in mind that negative searches (searches of
> > non-existing/deleted objects) has highest penalty." <--- that happens
> > very often...
> > - server load on high traffic days: > 100 (mostly iowait)
> > - bad are server reboots (read filesystem info etc.)
> > - really bad is a sw raid rebuild/resync
>
>
> --
> S pozdravom / Yours sincerely
> Ing. Jan Hudoba
>
> http://www.jahu.sk
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster 10.3: task glfs_fusenoti blocked for more than 120 seconds

2023-04-27 Thread Hu Bert

Good morning,

we've recently had some strange message in /var/log/syslog.

System:
debian bullseye, kernel 5.10.0-21-amd64 and 5.10.0-22-amd64
gluster 10.3

The message look like:

Apr 27 13:30:18 piggy kernel: [24287.715229] INFO: task
glfs_fusenoti:2787 blocked for more than 120 seconds.
Apr 27 13:30:18 piggy kernel: [24287.715327]   Not tainted
5.10.0-22-amd64 #1 Debian 5.10.178-3
Apr 27 13:30:18 piggy kernel: [24287.715419] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 13:30:18 piggy kernel: [24287.715575] task:glfs_fusenoti
state:D stack:0 pid: 2787 ppid: 1 flags:0x
Apr 27 13:30:18 piggy kernel: [24287.715734] Call Trace:
Apr 27 13:30:18 piggy kernel: [24287.715847]  __schedule+0x282/0x870
Apr 27 13:30:18 piggy kernel: [24287.715959]  schedule+0x46/0xb0
Apr 27 13:30:18 piggy kernel: [24287.716073]
rwsem_down_write_slowpath+0x257/0x4d0
Apr 27 13:30:18 piggy kernel: [24287.716194]
fuse_reverse_inval_entry+0x3b/0x1e0 [fuse]

etc. Full excerpt here: https://pastebin.com/6gDHgh16

This may cause our application to "hang". In the latest release i read
something about fuse:
https://docs.gluster.org/en/latest/release-notes/10.4/
"Fix fuse concurrency problems"

but i checked the tickets and wasn't able to find something.

However, i upgraded gluster client 10.3 -> 10.4, hoping that this
fixes the hang-issue.

Has anyone seen these messages before?


Best regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Performance: lots of small files, hdd, nvme etc.

2023-03-30 Thread Hu Bert

Hi Diego,

> > Just an observation: is there a performance difference between a sw
> > raid10 (10 disks -> one brick) or 5x raid1 (each raid1 a brick)
> Err... RAID10 is not 10 disks unless you stripe 5 mirrors of 2 disks.

Maybe i was unprecise?

md3 : active raid10 sdh1[7] sde1[4] sda1[0] sdg1[6] sdc1[2] sdd1[3]
sdf1[5] sdb1[1] sdi1[8] sdj1[9]
 48831518720 blocks super 1.2 512K chunks 2 near-copies [10/10] [UU]

mdadm --detail /dev/md3
/dev/md3:
  Version : 1.2
Creation Time : Fri Jan 18 08:59:51 2019
   Raid Level : raid10
[...]
   Number   Major   Minor   RaidDevice State
  0   810  active sync set-A   /dev/sda1
  1   8   171  active sync set-B   /dev/sdb1
  2   8   332  active sync set-A   /dev/sdc1
  3   8   493  active sync set-B   /dev/sdd1
  4   8   654  active sync set-A   /dev/sde1
  5   8   815  active sync set-B   /dev/sdf1
  9   8  1456  active sync set-A   /dev/sdj1
  8   8  1297  active sync set-B   /dev/sdi1
  7   8  1138  active sync set-A   /dev/sdh1
  6   8   979  active sync set-B   /dev/sdg1


> > with
> > the same disks (10TB hdd)? The heal processes on the 5xraid1-scenario
> > seems faster. Just out of curiosity...
> It should be, since the bricks are smaller. But given you're using a
> replica 3 I don't understand why you're also using RAID1: for each 10T
> of user-facing capacity you're keeping 60TB of data on disks.
> I'd ditch local RAIDs to double the space available. Unless you
> desperately need the extra read performance.

Well, long time ago we used 10TB disks as bricks (JBOD). replicate
3 setup. Then one of the bricks failed: the volume was ok (since 2
bricks were left), but after the hdd change the reset-brick produced a
very high load/iowait. So a raid1 or raid10 is the attempt to avoid
the reset-brick in favor of a sw raid rebuild - iirc this can run with
a lower priority -> less problems in the running system.


Best regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Performance: lots of small files, hdd, nvme etc.

2023-03-30 Thread Hu Bert

Hello there,

as Strahil suggested a separate thread might be better.

current state:
- servers with 10TB hdds
- 2 hdds build up a sw raid1
- each raid1 is a brick
- so 5 bricks per server
- Volume info (complete below):
Volume Name: workdata
Type: Distributed-Replicate
Number of Bricks: 5 x 3 = 15
Bricks:
Brick1: gls1:/gluster/md3/workdata
Brick2: gls2:/gluster/md3/workdata
Brick3: gls3:/gluster/md3/workdata
Brick4: gls1:/gluster/md4/workdata
Brick5: gls2:/gluster/md4/workdata
Brick6: gls3:/gluster/md4/workdata
etc.

- workload: the (un)famous "lots of small files" setting
- currently 70% of the of the volume is used: ~32TB
- file size: few KB up to 1MB
- so there are hundreds of millions of files (and millions of directories)
- each image has an ID
- under the base dir the IDs are split into 3 digits
- dir structure: /basedir/(000-999)/(000-999)/ID/[lotsoffileshere]
- example for ID 123456789: /basedir/123/456/123456789/default.jpg
- maybe this structure isn't good and e.g. this would be better:
/basedir/IDs/[here the files] - so millions of ID-dirs directly under
/basedir/
- frequent access to the files by webservers (nginx, tomcat): lookup
if file exists, read/write images etc.
- Strahil mentioned: "Keep in mind that negative searches (searches of
non-existing/deleted objects) has highest penalty." <--- that happens
very often...
- server load on high traffic days: > 100 (mostly iowait)
- bad are server reboots (read filesystem info etc.)
- really bad is a sw raid rebuild/resync

Some images:
https://abload.de/img/gls-diskutilfti5d.png
https://abload.de/img/gls-io6cfgp.png
https://abload.de/img/gls-throughput3oicf.png

Our conclusion: the hardware is too slow, the disks are too big. For a
future setup we need to improve the performance (or switch to a
different solution). HW-Raid-controller might be an option, but SAS
disks are not available.

Options:
- scale broader: more servers with smaller disks
- faster disks: nvme

Both are costly. Any suggestions, recommendations, ideas?

Just an observation: is there a performance difference between a sw
raid10 (10 disks -> one brick) or 5x raid1 (each raid1 a brick) with
the same disks (10TB hdd)? The heal processes on the 5xraid1-scenario
seems faster. Just out of curiosity...

whoa, lofs of text - thx for reading if you reached ths point :-)


Best regards

Hubert

Volume Name: workdata
Type: Distributed-Replicate
Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 3 = 15
Transport-type: tcp
Bricks:
Brick1: glusterpub1:/gluster/md3/workdata
Brick2: glusterpub2:/gluster/md3/workdata
Brick3: glusterpub3:/gluster/md3/workdata
Brick4: glusterpub1:/gluster/md4/workdata
Brick5: glusterpub2:/gluster/md4/workdata
Brick6: glusterpub3:/gluster/md4/workdata
Brick7: glusterpub1:/gluster/md5/workdata
Brick8: glusterpub2:/gluster/md5/workdata
Brick9: glusterpub3:/gluster/md5/workdata
Brick10: glusterpub1:/gluster/md6/workdata
Brick11: glusterpub2:/gluster/md6/workdata
Brick12: glusterpub3:/gluster/md6/workdata
Brick13: glusterpub1:/gluster/md7/workdata
Brick14: glusterpub2:/gluster/md7/workdata
Brick15: glusterpub3:/gluster/md7/workdata
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.read-ahead: off
performance.io-cache: off
performance.quick-read: on
cluster.self-heal-window-size: 16
cluster.heal-wait-queue-length: 1
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
network.inode-lru-limit: 20
cluster.shd-max-threads: 8
server.outstanding-rpc-limit: 128
transport.listen-backlog: 100
performance.least-prio-threads: 8
performance.cache-size: 6GB
cluster.min-free-disk: 1%
performance.io-thread-count: 32
performance.write-behind-window-size: 16MB
performance.cache-max-file-size: 128MB
client.event-threads: 8
server.event-threads: 8
performance.parallel-readdir: on
performance.cache-refresh-timeout: 4
cluster.readdir-optimize: off
performance.md-cache-timeout: 600
performance.nl-cache: off
cluster.lookup-unhashed: on
cluster.shd-wait-qlength: 1
performance.readdir-ahead: on
storage.build-pgfid: off




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hardware issues and new server advice

2023-03-25 Thread Hu Bert

Hi,
sry if i hijack this, but maybe it's helpful for other gluster users...

> pure NVME-based volume will be waste of money. Gluster excells when you have 
> more servers and clients to consume that data.
> I would choose  LVM cache (NVMEs) + HW RAID10 of SAS 15K disks to cope with 
> the load. At least if you decide to go with more disks for the raids, use 
> several  (not the built-in ones) controllers.

Well, we have to take what our provider (hetzner) offers - SATA hdds
or sata|nvme ssds.

Volume Name: workdata
Type: Distributed-Replicate
Number of Bricks: 5 x 3 = 15
Bricks:
Brick1: gls1:/gluster/md3/workdata
Brick2: gls2:/gluster/md3/workdata
Brick3: gls3:/gluster/md3/workdata
Brick4: gls1:/gluster/md4/workdata
Brick5: gls2:/gluster/md4/workdata
Brick6: gls3:/gluster/md4/workdata
etc.
Below are the volume settings.

Each brick is a sw raid1 (made out of 10TB hdds). file access to the
backends is pretty slow, even with low system load (which reaches >100
on the servers on high traffic days); even a simple 'ls' on a
directory with ~1000 sub-directories will take a couple of seconds.

Some images:
https://abload.de/img/gls-diskutilfti5d.png
https://abload.de/img/gls-io6cfgp.png
https://abload.de/img/gls-throughput3oicf.png

As you mentioned it: is a raid10 better than x*raid1? Anything misconfigured?


Thx a lot & best regards,

Hubert

Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.read-ahead: off
performance.io-cache: off
performance.quick-read: on
cluster.self-heal-window-size: 16
cluster.heal-wait-queue-length: 1
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
network.inode-lru-limit: 20
cluster.shd-max-threads: 8
server.outstanding-rpc-limit: 128
transport.listen-backlog: 100
performance.least-prio-threads: 8
performance.cache-size: 6GB
cluster.min-free-disk: 1%
performance.io-thread-count: 32
performance.write-behind-window-size: 16MB
performance.cache-max-file-size: 128MB
client.event-threads: 8
server.event-threads: 8
performance.parallel-readdir: on
performance.cache-refresh-timeout: 4
cluster.readdir-optimize: off
performance.md-cache-timeout: 600
performance.nl-cache: off
cluster.lookup-unhashed: on
cluster.shd-wait-qlength: 1
performance.readdir-ahead: on
storage.build-pgfid: off




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hardware issues and new server advice

2023-03-23 Thread Hu Bert

Hi,

Am Di., 21. März 2023 um 23:36 Uhr schrieb Martin Bähr
:
> the primary data is photos. we get an average of 5 new files per
> day, with a peak if 7 to 8 times as much during christmas.
>
> gluster has always been able to keep up with that, only when raid resync
> or checks happen the server load sometimes increases to cause issues.

Interesting, we have a similar workload: hundreds of millions of
images, small files, and especially on weekends with high traffic the
load+iowait is really heavy. Or if a hdd fails, or during a raid
check.

our hardware:
10x 10TB hdds -> 5x raid1, each raid1 is a brick, replicate 3 setup.
About 40TB of data.

Well, the bricks are bigger than recommended... Sooner or later we
will have to migrate that stuff, and use nvme for that, either 3.5TB
or bigger ones. Those should be faster... *fingerscrossed*


regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Debian repository problem

2022-09-25 Thread Hu Bert

Hi Emmanuel,

just  checked thes:
https://download.gluster.org/pub/gluster/glusterfs/10/10.2/Debian/12/amd64/apt/pool/main/g/glusterfs/
https://download.gluster.org/pub/gluster/glusterfs/9/9.6/Debian/12/amd64/apt/pool/main/g/glusterfs/

Everything's there, as expected. Which paths did you check?


Best Regards,

Hubert

Am Fr., 23. Sept. 2022 um 08:06 Uhr schrieb Emmanuel BENOIT
:
>
> Hi,
>
> I believe there is a problem with the Debian package repository for Gluster.
>
> I've had errors from machines that are supposed to use it as a source,
> and when I tried to take a look at it, the directories which are
> supposed to contain the repository structure contain a single tarball.
>
> I kind of expected this to fix itself automatically, but it didn't.
> Where should I report this problem ?
>
> Best regards,
>
> --
> Emmanuel Benoît 
> AON ⊆ DSI ⊆ INSSAAHP ⊆ INESAAE
> Agrocampus-Ouest - Angers
> 2, rue André Le Nôtre / 49045 ANGERS Cedex 01
> @ebenoit sur https://chat.agrocampus-ouest.fr
> PGP: F5F013961BDCC6EC7B11A98B2356DC6956CF54EF (keys.openpgp.org)
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Upgrade 7.8 -> 8.4: transport endpoint not connected

2021-03-03 Thread Hu Bert

Hi there,

i just did an upgrade on my test servers from version 7.8 to version
8.4 (debian buster, replicate 3 setup, 2 volumes), and noticed that
one of two mounts wasn't working afterwards.

procedure (same i used for 6.x -> 7.x):
- stop all gluster processes on the server
- apt update && apt upgrade
- check volume, services, mounts etc. afterwards

The volumes themselves are working, but one of the transport endpoints
is not connected.

Before the upgrade:
ls -lah /shared/
drwxr-xr-x  4 root root   75 Apr  1  2020 private
drwxr-xr-x  5 root root   46 Apr 16  2019 public

after the upgrade:

ls -lah /shared/
ls: cannot access '/shared/private': Transport endpoint is not connected
d?  ? ??   ?? private
drwxr-xr-x  2 root root 4.0K Feb 11  2019 public

directory permissions and ownership screwed up? And even the date of
the 2nd dir (public) has changed.

Job for glusterfssharedstorage.service failed because the control
process exited with error code.
See "systemctl status glusterfssharedstorage.service" and "journalctl
-xe" for details.

systemctl status glusterfssharedstorage.service
● glusterfssharedstorage.service - Mount glusterfs sharedstorage
  Loaded: loaded (/lib/systemd/system/glusterfssharedstorage.service;
enabled; vendor preset: enabled)
  Active: activating (start) since Wed 2021-03-03 09:55:16 CET; 1s ago
Cntrl PID: 2148 (mount-shared-st)
   Tasks: 2 (limit: 4915)
  Memory: 4.1M
  CGroup: /system.slice/glusterfssharedstorage.service
  ├─2148 /bin/bash /usr/libexec/glusterfs/mount-shared-storage.sh
  └─2213 sleep 10

Mar 03 09:55:16 dirac systemd[1]: Starting Mount glusterfs sharedstorage...
Mar 03 09:55:16 dirac mount-shared-storage.sh[2148]: ERROR: Mount
point does not exist
Mar 03 09:55:16 dirac mount-shared-storage.sh[2148]: Please specify a
mount point
Mar 03 09:55:16 dirac mount-shared-storage.sh[2148]: Usage:
Mar 03 09:55:16 dirac mount-shared-storage.sh[2148]: man 8 /sbin/mount.glusterfs
Mar 03 09:55:26 dirac mount-shared-storage.sh[2148]: /shared/private
failed to mount
Mar 03 09:55:36 dirac mount-shared-storage.sh[2148]: /shared/public
has been mounted
Mar 03 09:55:36 dirac systemd[1]: glusterfssharedstorage.service:
Control process exited, code=exited, status=1/FAILURE

Well... does anyone have an idea what might have gone wrong?

While writing/thinking i was able to fix it:
- umount /shared/private -> directory looks normal again (no ???)
- systemctl start glusterfssharedstorage.service
- check if mount is done -> yes

ah ok, read it here...
https://docs.gluster.org/en/latest/Upgrade-Guide/generic-upgrade-procedure/

"Upgrade procedure for clients
Unmount all glusterfs mount points on the client"

My servers are mounting the volumes as clients as well. During the
last version upgrade i never had done the unmount before - is this v8
related? Just curious...


Best regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] upgrade gluster from old version

2020-08-26 Thread Hu Bert

Hi,

i'd check the release logs of every x.0-version and the upgrade guide
if there are any params that are not supported anymore. If you have
one of these params set, you need to disable them, e.g.:

https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/


Best regards,
Hubert

Am Mi., 26. Aug. 2020 um 12:11 Uhr schrieb Sanju Rakonde :
>
> Hi,
>
> I believe you can do an offline upgrade (I have never tried upgrading from 
> 3.7 to 7.7, so there might be issues).
>
> If you want to do a fresh install, after installing the 7.7 packages, you can 
> use the same old bricks to create the volumes. but you need to add force at 
> the end of volume create command.
>
> On Wed, Aug 26, 2020 at 5:27 AM liu zhijing  wrote:
>>
>> hi everyone!
>> I found It is hard to upgrade  from a gluster version from 3.7.6 to 7.7,so I 
>> remove all the install package and delete all config file except the brick, 
>> and I resinstall the glusterfs 7.7, then reconfig the same volume name but 
>> use a temp brick.after everything is ok ,then i stop the volume and replace 
>> the old brick , change the brick volume id as the new volume id , at last  
>> start the volume successfully.
>> What I want to know is  if this is the right way to upgrade .
>>
>>
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Thanks,
> Sanju
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How safe are major version upgrades?

2020-08-26 Thread Hu Bert

Hi,
we have 2 replicate-3 systems, and i upgraded both online from 5.12 to
6.8 and then to 7.6. No (big) problems here, upgrade took between 10
to 20 minutes (wait until healing is done) - but no geo replication,
so i can't say anything about that part.

Best regards,
Hubert

Am Di., 25. Aug. 2020 um 05:47 Uhr schrieb David Cunningham
:
>
> Hello,
>
> We have a production system with around 50GB of data running GlusterFS 5.13. 
> It has 3 replicating/mirrored nodes, and also geo-replicates to another site.
>
> How safe would it be to upgrade to a more recent major version, eg 7.x? I'm 
> not sure how recommended in-place upgrades are, or if a complete re-install 
> is necessary for safety.
>
> We have a maximum window of around 4 hours for this upgrade and would not 
> want any significant risk of an unsuccessful upgrade at the end of that time.
>
> Is version 8.0 considered stable?
>
> Thanks in advance,
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] State of Gluster project

2020-06-22 Thread Hu Bert

Am So., 21. Juni 2020 um 19:43 Uhr schrieb Gionatan Danti :

> For the RAID6/10 setup, I found no issues: simply replace the broken
> disk without involing Gluster at all. However, this also means facing
> the "iops wall" I described earlier for single-brick node. Going
> full-Guster with JBODs would be interesting from a performance
> standpoint, but this complicate eventual recovery from bad disks.
>
> Does someone use Gluster in JBOD mode? If so, can you share your
> experience?
> Thanks.

Hi,
we once used gluster with disks in JBOD mode (3 servers, 4x10TB hdd
each, 4 x 3 = 12), and to make it short: in our special case it wasn't
that funny. Big HDDs, lots of small files, (highly) concurrent access
through our application. It was running quite fine, until a disk
failed. The reset-disk took ~30 (!) days, as you have gluster
copying/restoring the data and the normal application read/write.
After the first reset had finished, a couple of days later another
disk died, and the fun started again :-) Maybe a bad use case.

With this experience, the next setup was: splitting data into 2 chunks
(high I/O, low I/O), 3 servers with 2 raid10 (same type of disk), each
raid used as a brick, resulting in replicate 3: 1 x 3 = 3. Changing a
failed disk now results in a complete raid resync, but regarding I/O
this is far better than using reset-disk with a HDD only. Only the
regularly running raid check was a bit of a performance issue.

Latest setup (for the high I/O part) looks like this: 3 servers, 10
disks with 10TB each -> 5 raid1, forming a distribute replicate with 5
bricks, 5 x 3 = 15. No disk has failed so far (fingers crossed), but
if now a disk fails, gluster is still running with all bricks
available, and after changing the failed, there's one raid resync
running, affecting only 1/5 of the volume. In theory that should be
better ;-) The regularly running raid checks are no problem so far,
for 15 raid1 only 1 is running, none parallel.

disclaimer: JBOD may work better with SSDs/NVMes - untested ;-)


Best regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Issues with replicated gluster volume

2020-06-16 Thread Hu Bert

Hi,

if you simply reboot or shutdown one of the gluster nodes, there might
be a (short or medium) unavailability of the volume on the nodes. To
avoid this there's script:

/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh (path may
be different depending on distribution)

If i remember correctly: this notifies the clients that this node is
going to be unavailable (please correct me if the details are wrong).
If i do reboots of one gluster node, i always call this script and
never have seen unavailability issues on the clients.


Regards,
Hubert

Am Mo., 15. Juni 2020 um 19:36 Uhr schrieb ahemad shaik
:
>
> Hi There,
>
> I have created 3 replica gluster volume with 3 bricks from 3 nodes.
>
> "gluster volume create glustervol replica 3 transport tcp node1:/data 
> node2:/data node3:/data force"
>
> mounted on client node using below command.
>
> "mount -t glusterfs node4:/glustervol/mnt/"
>
> when any of the node (either node1,node2 or node3) goes down, gluster 
> mount/volume (/mnt) not accessible at client (node4).
>
> purpose of replicated volume is high availability but not able to achieve it.
>
> Is it a bug or i am missing anything.
>
>
> Any suggestions will be great help!!!
>
> kindly suggest.
>
> Thanks,
> Ahemad
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] unable to connect to download.gluster.org

2020-06-08 Thread Hu Bert

Hi,

On some of my other servers i had similar problems with wget/curl,
which failed to download some xml files, and the current problem
reminded me of that former one; so i did a try with curl:

curl --local-port 5-51000
https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt/

That works. So it's a problem with the local port apt creates/uses.
There was this setting in /etc/sysctl.conf:

net.ipv4.ip_local_port_range = 1200065000

But i had disabled that setting weeks ago - but somehow it didn't
become active (even with sysctl -p /etc/sysctl.conf) on these 2
servers; i now rebooted the servers, and finally it works now.

Sorry for the disturbance in the force ;-)


Best regards,
Hubert

Am Mo., 8. Juni 2020 um 17:39 Uhr schrieb sankarshan :
>
> I can confirm that I see contents at
> https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt/
> Is the network unable to specifically resolve this URL or a general
> gluster.com domain?
>
> On Mon, 8 Jun 2020 at 19:47, Hu Bert  wrote:
> >
> > - hosts are all debian buster.
> > - identical kernel
> > - identical /etc/resolv.conf
> > - identical /etc/hosts
> > - identical /etc/systemd/resolved.conf
> > - identical firewall settings
> > - identical /etc/sysctl.conf
> >
> > normal debian updates are no problem at all. I first noticed the
> > problem during the upgrade v5.11 to v6.8, but that may be accidental,
> > so probably no relationship here. strange.
> >
> >
> > Am Mo., 8. Juni 2020 um 15:36 Uhr schrieb Strahil Nikolov
> > :
> > >
> > > I'm using OS repos.
> > >
> > > Have you checked for system/package manager proxy ?
> > > Is there any difference between the nodes (for example some package)?
> > >
> > > Best Regards,
> > > Strahil Nikolov
> > >
> > > На 8 юни 2020 г. 15:26:08 GMT+03:00, Hu Bert  
> > > написа:
> > > >Hi @ll,
> > > >
> > > >on 2 of 3 identical servers (hosts, resolv.conf, ...) i keep having
> > > >this problem:
> > > >
> > > >Err:7
> > > >https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt
> > > >buster InRelease
> > > >  Could not connect to download.gluster.org:443 (8.43.85.185),
> > > >connection timed out
> > > >
> > > >W: Failed to fetch
> > > >https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt/dists/buster/InRelease
> > > > Could not connect to download.gluster.org:443 (8.43.85.185),
> > > >connection timed out
> > > >W: Some index files failed to download. They have been ignored, or old
> > > >ones used instead.
> > > >
> > > >I've checked everything twice, but can't find the reason for this.
> > > >Someone else having this problem?
> > > >
> > > >
> > > >Best regards,
> > > >Hubert
>
>
>
>
> --
> sankars...@kadalu.io | TZ: UTC+0530 | +91 99606 03294
> kadalu.io : Making it easy to provision storage in k8s!




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] unable to connect to download.gluster.org

2020-06-08 Thread Hu Bert

- hosts are all debian buster.
- identical kernel
- identical /etc/resolv.conf
- identical /etc/hosts
- identical /etc/systemd/resolved.conf
- identical firewall settings
- identical /etc/sysctl.conf

normal debian updates are no problem at all. I first noticed the
problem during the upgrade v5.11 to v6.8, but that may be accidental,
so probably no relationship here. strange.


Am Mo., 8. Juni 2020 um 15:36 Uhr schrieb Strahil Nikolov
:
>
> I'm using OS repos.
>
> Have you checked for system/package manager proxy ?
> Is there any difference between the nodes (for example some package)?
>
> Best Regards,
> Strahil Nikolov
>
> На 8 юни 2020 г. 15:26:08 GMT+03:00, Hu Bert  написа:
> >Hi @ll,
> >
> >on 2 of 3 identical servers (hosts, resolv.conf, ...) i keep having
> >this problem:
> >
> >Err:7
> >https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt
> >buster InRelease
> >  Could not connect to download.gluster.org:443 (8.43.85.185),
> >connection timed out
> >
> >W: Failed to fetch
> >https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt/dists/buster/InRelease
> > Could not connect to download.gluster.org:443 (8.43.85.185),
> >connection timed out
> >W: Some index files failed to download. They have been ignored, or old
> >ones used instead.
> >
> >I've checked everything twice, but can't find the reason for this.
> >Someone else having this problem?
> >
> >
> >Best regards,
> >Hubert
> >
> >
> >
> >
> >Community Meeting Calendar:
> >
> >Schedule -
> >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >Bridge: https://bluejeans.com/441850968
> >
> >Gluster-users mailing list
> >Gluster-users@gluster.org
> >https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-06-08 Thread Hu Bert

Hi,

the "process", if we wanna call it so, has finished. Maybe there was a
process running that accessed/deleted/... files that haven't been
accessed for a while, resulting in ctime mdata fixes. However, heal
count is down to 0 on all bricks. Very strange, i see ~34K such log
entries for each brick.

Let's think positive: gluster is running properly and doing what it
should do. Great! :D


Hubert

Am Mo., 8. Juni 2020 um 15:36 Uhr schrieb Strahil Nikolov
:
>
> Hm... That's something I didn't expect.
>
>
> By the way, have you checked  if all clients are connected to all bricks (if 
> using FUSE)?
>
> Maybe you have some clients that cannot reach a brick.
>
> Best Regards,
> Strahil Nikolov
>
> На 8 юни 2020 г. 12:48:22 GMT+03:00, Hu Bert  написа:
> >Hi Strahil,
> >
> >thx for your answer, but i assume that your approach won't help. It
> >seems like that this behaviour is permanent; e.g. a log entry like
> >this:
> >
> >[2020-06-08 09:40:03.948269] E [MSGID: 113001]
> >[posix-metadata.c:234:posix_store_mdata_xattr] 0-persistent-posix:
> >file:
> >/gluster/md3/persistent/.glusterfs/38/30/38306ef8-6588-40cf-8be3-c0a022714612:
> >gfid: 38306ef8-6588-40cf-8be3-c0a022714612 key:trusted.glusterfs.mdata
> > [No such file or directory]
> >[2020-06-08 09:40:03.948333] E [MSGID: 113114]
> >[posix-metadata.c:433:posix_set_mdata_xattr_legacy_files]
> >0-persistent-posix: gfid: 38306ef8-6588-40cf-8be3-c0a022714612
> >key:trusted.glusterfs.mdata  [No such file or directory]
> >[2020-06-08 09:40:03.948422] I [MSGID: 115060]
> >[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
> >0-persistent-server: 14193413: SETXATTR
> >/images/generated/207/039/2070391/484x425r.jpg
> >(38306ef8-6588-40cf-8be3-c0a022714612) ==> set-ctime-mdata, client:
> >CTX_ID:b738017c-20a3-4547-afba-5b8933d8e6e5-GRAPH_ID:0-PID:1078-HOST:pepe-PC_NAME:persistent-client-2-RECON_NO:-1,
> >error-xlator: persistent-posix
> >
> >tells me that an error (ctime-mdata) is found and fixed. And this is
> >happening over and over again. A couple of minutes ago i wanted to
> >begin with what you suggested and called 'gluster volume heal
> >persistent info' and suddenly saw:
> >
> >Brick gluster1:/gluster/md3/persistent
> >Status: Connected
> >Number of entries: 0
> >
> >Brick gluster2:/gluster/md3/persistent
> >Status: Connected
> >Number of entries: 0
> >
> >Brick gluster3:/gluster/md3/persistent
> >Status: Connected
> >Number of entries: 0
> >
> >I thought 'wtf...'; the heal-count was 0 as well; but the next call
> >~15s later showed this again:
> >
> >Brick gluster1:/gluster/md3/persistent
> >Number of entries: 31
> >
> >Brick gluster2:/gluster/md3/persistent
> >Number of entries: 27
> >
> >Brick gluster3:/gluster/md3/persistent
> >Number of entries: 4
> >
> >For me it looks like the 'error found -> heal it' process works as it
> >should, but due to the permanent errors (log file entries) the heal
> >count of zero is almost impossible to read.
> >
> >Well, one could deactivate features.ctime as this seems to be the
> >reason (as the log entries suggest), but i don't know if that is
> >reasonable, i.e. if this feature is needed.
> >
> >
> >Best regards,
> >Hubert
> >
> >Am Mo., 8. Juni 2020 um 11:22 Uhr schrieb Strahil Nikolov
> >:
> >>
> >> Hi Hubert,
> >>
> >> Here is one idea:
> >> Using 'gluster volume  heal VOL  info' can provide  the gfids of
> >files pending heal.
> >> Once  you have them, you can find the inode of each file via 'ls  -li
> >/gluster/brick/.gfid///gfid
> >>
> >> Then you can search the brick with find for that inode number (don't
> >forget the 'ionice' to reduce the pressure).
> >>
> >> Once  you have the list of files, stat them via the FUSE client and
> >check if they got healed.
> >>
> >> I fully agree that you need to first heal the golumes  before
> >proceeding further or you might get into a nasty situation.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >>
> >> На 8 юни 2020 г. 8:30:57 GMT+03:00, Hu Bert 
> >написа:
> >> >Good morning,
> >> >
> >> >i just wanted to update the version from 6.8 to 6.9 on our replicate
> >3
> >> >system (formerly was version 5.11), and i see tons of these
> >messages:
> >> >
> >> >[2020-06-08 05:25:55.192301] E [MSGID: 113001]
> >> >[posix-metadata.c:23

[Gluster-users] unable to connect to download.gluster.org

2020-06-08 Thread Hu Bert

Hi @ll,

on 2 of 3 identical servers (hosts, resolv.conf, ...) i keep having
this problem:

Err:7 
https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt
buster InRelease
  Could not connect to download.gluster.org:443 (8.43.85.185),
connection timed out

W: Failed to fetch
https://download.gluster.org/pub/gluster/glusterfs/6/6.9/Debian/buster/amd64/apt/dists/buster/InRelease
 Could not connect to download.gluster.org:443 (8.43.85.185),
connection timed out
W: Some index files failed to download. They have been ignored, or old
ones used instead.

I've checked everything twice, but can't find the reason for this.
Someone else having this problem?


Best regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-06-08 Thread Hu Bert

Hi Strahil,

thx for your answer, but i assume that your approach won't help. It
seems like that this behaviour is permanent; e.g. a log entry like
this:

[2020-06-08 09:40:03.948269] E [MSGID: 113001]
[posix-metadata.c:234:posix_store_mdata_xattr] 0-persistent-posix:
file: 
/gluster/md3/persistent/.glusterfs/38/30/38306ef8-6588-40cf-8be3-c0a022714612:
gfid: 38306ef8-6588-40cf-8be3-c0a022714612 key:trusted.glusterfs.mdata
 [No such file or directory]
[2020-06-08 09:40:03.948333] E [MSGID: 113114]
[posix-metadata.c:433:posix_set_mdata_xattr_legacy_files]
0-persistent-posix: gfid: 38306ef8-6588-40cf-8be3-c0a022714612
key:trusted.glusterfs.mdata  [No such file or directory]
[2020-06-08 09:40:03.948422] I [MSGID: 115060]
[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
0-persistent-server: 14193413: SETXATTR
/images/generated/207/039/2070391/484x425r.jpg
(38306ef8-6588-40cf-8be3-c0a022714612) ==> set-ctime-mdata, client:
CTX_ID:b738017c-20a3-4547-afba-5b8933d8e6e5-GRAPH_ID:0-PID:1078-HOST:pepe-PC_NAME:persistent-client-2-RECON_NO:-1,
error-xlator: persistent-posix

tells me that an error (ctime-mdata) is found and fixed. And this is
happening over and over again. A couple of minutes ago i wanted to
begin with what you suggested and called 'gluster volume heal
persistent info' and suddenly saw:

Brick gluster1:/gluster/md3/persistent
Status: Connected
Number of entries: 0

Brick gluster2:/gluster/md3/persistent
Status: Connected
Number of entries: 0

Brick gluster3:/gluster/md3/persistent
Status: Connected
Number of entries: 0

I thought 'wtf...'; the heal-count was 0 as well; but the next call
~15s later showed this again:

Brick gluster1:/gluster/md3/persistent
Number of entries: 31

Brick gluster2:/gluster/md3/persistent
Number of entries: 27

Brick gluster3:/gluster/md3/persistent
Number of entries: 4

For me it looks like the 'error found -> heal it' process works as it
should, but due to the permanent errors (log file entries) the heal
count of zero is almost impossible to read.

Well, one could deactivate features.ctime as this seems to be the
reason (as the log entries suggest), but i don't know if that is
reasonable, i.e. if this feature is needed.


Best regards,
Hubert

Am Mo., 8. Juni 2020 um 11:22 Uhr schrieb Strahil Nikolov
:
>
> Hi Hubert,
>
> Here is one idea:
> Using 'gluster volume  heal VOL  info' can provide  the gfids of files 
> pending heal.
> Once  you have them, you can find the inode of each file via 'ls  -li 
> /gluster/brick/.gfid///gfid
>
> Then you can search the brick with find for that inode number (don't forget 
> the 'ionice' to reduce the pressure).
>
> Once  you have the list of files, stat them via the FUSE client and check if 
> they got healed.
>
> I fully agree that you need to first heal the golumes  before proceeding 
> further or you might get into a nasty situation.
>
> Best Regards,
> Strahil Nikolov
>
>
> На 8 юни 2020 г. 8:30:57 GMT+03:00, Hu Bert  написа:
> >Good morning,
> >
> >i just wanted to update the version from 6.8 to 6.9 on our replicate 3
> >system (formerly was version 5.11), and i see tons of these messages:
> >
> >[2020-06-08 05:25:55.192301] E [MSGID: 113001]
> >[posix-metadata.c:234:posix_store_mdata_xattr] 0-persistent-posix:
> >file:
> >/gluster/md3/persistent/.glusterfs/43/31/43312aba-75c6-42c2-855c-e0db66d7748f:
> >gfid: 43312aba-75c6-42c2-855c-e0db66d7748f key:trusted.glusterfs.mdata
> > [No such file or directory]
> >[2020-06-08 05:25:55.192375] E [MSGID: 113114]
> >[posix-metadata.c:433:posix_set_mdata_xattr_legacy_files]
> >0-persistent-posix: gfid: 43312aba-75c6-42c2-855c-e0db66d7748f
> >key:trusted.glusterfs.mdata  [No such file or directory]
> >[2020-06-08 05:25:55.192426] I [MSGID: 115060]
> >[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
> >0-persistent-server: 13382741: SETXATTR
> >
> >(43312aba-75c6-42c2-855c-e0db66d7748f) ==> set-ctime-mdata, client:
> >CTX_ID:e223ca30-6c30-4a40-ae98-a418143ce548-GRAPH_ID:0-PID:1006-HOST:sam-PC_NAME:persistent-client-2-RECON_NO:-1,
> >error-xlator: persistent-posix
> >
> >Still the ctime-message. And a lot of these messages:
> >
> >[2020-06-08 05:25:53.016606] W [MSGID: 101159]
> >[inode.c:1330:__inode_unlink] 0-inode:
> >7043eed7-dbd7-4277-976f-d467349c1361/21194684.jpg: dentry not found in
> >839512f0-75de-414f-993d-1c35892f8560
> >
> >Well... the problem is: the volume seems to be in a permanent heal
> >status:
> >
> >Gathering count of entries to be healed on volume persistent has been
> >successful
> >Brick gluster1:/gluster/md3/persistent
> >Number of entries: 31
> >Brick gluster2:/gluster/md3/persistent
> >Number of entries: 6
> >Brick gluster3:/gluster/md3/

Re: [Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-06-08 Thread Hu Bert

Good morning,

i just wanted to update the version from 6.8 to 6.9 on our replicate 3
system (formerly was version 5.11), and i see tons of these messages:

[2020-06-08 05:25:55.192301] E [MSGID: 113001]
[posix-metadata.c:234:posix_store_mdata_xattr] 0-persistent-posix:
file: 
/gluster/md3/persistent/.glusterfs/43/31/43312aba-75c6-42c2-855c-e0db66d7748f:
gfid: 43312aba-75c6-42c2-855c-e0db66d7748f key:trusted.glusterfs.mdata
 [No such file or directory]
[2020-06-08 05:25:55.192375] E [MSGID: 113114]
[posix-metadata.c:433:posix_set_mdata_xattr_legacy_files]
0-persistent-posix: gfid: 43312aba-75c6-42c2-855c-e0db66d7748f
key:trusted.glusterfs.mdata  [No such file or directory]
[2020-06-08 05:25:55.192426] I [MSGID: 115060]
[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
0-persistent-server: 13382741: SETXATTR

(43312aba-75c6-42c2-855c-e0db66d7748f) ==> set-ctime-mdata, client:
CTX_ID:e223ca30-6c30-4a40-ae98-a418143ce548-GRAPH_ID:0-PID:1006-HOST:sam-PC_NAME:persistent-client-2-RECON_NO:-1,
error-xlator: persistent-posix

Still the ctime-message. And a lot of these messages:

[2020-06-08 05:25:53.016606] W [MSGID: 101159]
[inode.c:1330:__inode_unlink] 0-inode:
7043eed7-dbd7-4277-976f-d467349c1361/21194684.jpg: dentry not found in
839512f0-75de-414f-993d-1c35892f8560

Well... the problem is: the volume seems to be in a permanent heal status:

Gathering count of entries to be healed on volume persistent has been successful
Brick gluster1:/gluster/md3/persistent
Number of entries: 31
Brick gluster2:/gluster/md3/persistent
Number of entries: 6
Brick gluster3:/gluster/md3/persistent
Number of entries: 5

a bit later:
Gathering count of entries to be healed on volume persistent has been successful
Brick gluster1:/gluster/md3/persistent
Number of entries: 100
Brick gluster2:/gluster/md3/persistent
Number of entries: 74
Brick gluster3:/gluster/md3/persistent
Number of entries: 1

The number of entries never reach 0-0-0; i already updated one of the
systems from 6.8 to 6.9, but updating the other 2 when heal isn't zero
doesn't seem to be a good idea. Well... any idea?


Best regards,
Hubert

Am Fr., 8. Mai 2020 um 21:47 Uhr schrieb Strahil Nikolov
:
>
> On April 21, 2020 8:00:32 PM GMT+03:00, Amar Tumballi  wrote:
> >There seems to be a burst of issues when people upgraded to 5.x or 6.x
> >from
> >3.12 (Thanks to you and Strahil, who have reported most of them).
> >
> >Latest update from Strahil is that if files are copied fresh on 7.5
> >series,
> >there are no issues.
> >
> >We are in process of identifying the patch, and also provide an option
> >to
> >disable 'acl' for testing. Will update once we identify the issue.
> >
> >Regards,
> >Amar
> >
> >
> >
> >On Sat, Apr 11, 2020 at 11:10 AM Hu Bert 
> >wrote:
> >
> >> Hi,
> >>
> >> no one has seen such messages?
> >>
> >> Regards,
> >> Hubert
> >>
> >> Am Mo., 6. Apr. 2020 um 06:13 Uhr schrieb Hu Bert
> > >> >:
> >> >
> >> > Hello,
> >> >
> >> > i just upgraded my servers and clients from 5.11 to 6.8; besides
> >one
> >> > connection problem to the gluster download server everything went
> >> > fine.
> >> >
> >> > On the 3 gluster servers i mount the 2 volumes as well, and only
> >there
> >> > (and not on all the other clients) there are some messages in the
> >log
> >> > file of both mount logs:
> >> >
> >> > [2020-04-06 04:10:53.552561] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-2: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.552635] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-1: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.552639] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-0: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.553226] E [MSGID: 148002]
> >> > [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime:
> >dict
> >> > set of key for set-ctime-mdata failed [Permission denied]
> >> > The message "W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-2: remote operation failed [Permission denied]"
> >> > repeated 4 times between [2020-04-06 04:10:53.552561] and
> >[2020-04-06
> >> > 04:10:53.745542]
> >> > The message "W [MSGID: 114031]
> >&

Re: [Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-05-11 Thread Hu Bert

Hi,

no problem. Output is identical for all 3 bricks (1 x 3 = 3)

:xfs_info /gluster/md4
meta-data=/dev/md4   isize=512agcount=32, agsize=152598400 blks
 =   sectsz=4096  attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0, rmapbt=0
 =   reflink=0
data =   bsize=4096   blocks=4883148800, imaxpct=5
 =   sunit=128swidth=256 blks
naming   =version 2  bsize=8192   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=521728, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

Best Regards,
Hubert

Am Fr., 8. Mai 2020 um 21:47 Uhr schrieb Strahil Nikolov
:
>
> On April 21, 2020 8:00:32 PM GMT+03:00, Amar Tumballi  wrote:
> >There seems to be a burst of issues when people upgraded to 5.x or 6.x
> >from
> >3.12 (Thanks to you and Strahil, who have reported most of them).
> >
> >Latest update from Strahil is that if files are copied fresh on 7.5
> >series,
> >there are no issues.
> >
> >We are in process of identifying the patch, and also provide an option
> >to
> >disable 'acl' for testing. Will update once we identify the issue.
> >
> >Regards,
> >Amar
> >
> >
> >
> >On Sat, Apr 11, 2020 at 11:10 AM Hu Bert 
> >wrote:
> >
> >> Hi,
> >>
> >> no one has seen such messages?
> >>
> >> Regards,
> >> Hubert
> >>
> >> Am Mo., 6. Apr. 2020 um 06:13 Uhr schrieb Hu Bert
> > >> >:
> >> >
> >> > Hello,
> >> >
> >> > i just upgraded my servers and clients from 5.11 to 6.8; besides
> >one
> >> > connection problem to the gluster download server everything went
> >> > fine.
> >> >
> >> > On the 3 gluster servers i mount the 2 volumes as well, and only
> >there
> >> > (and not on all the other clients) there are some messages in the
> >log
> >> > file of both mount logs:
> >> >
> >> > [2020-04-06 04:10:53.552561] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-2: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.552635] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-1: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.552639] W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-0: remote operation failed [Permission denied]
> >> > [2020-04-06 04:10:53.553226] E [MSGID: 148002]
> >> > [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime:
> >dict
> >> > set of key for set-ctime-mdata failed [Permission denied]
> >> > The message "W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-2: remote operation failed [Permission denied]"
> >> > repeated 4 times between [2020-04-06 04:10:53.552561] and
> >[2020-04-06
> >> > 04:10:53.745542]
> >> > The message "W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-1: remote operation failed [Permission denied]"
> >> > repeated 4 times between [2020-04-06 04:10:53.552635] and
> >[2020-04-06
> >> > 04:10:53.745610]
> >> > The message "W [MSGID: 114031]
> >> > [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> >> > 0-persistent-client-0: remote operation failed [Permission denied]"
> >> > repeated 4 times between [2020-04-06 04:10:53.552639] and
> >[2020-04-06
> >> > 04:10:53.745632]
> >> > The message "E [MSGID: 148002]
> >> > [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime:
> >dict
> >> > set of key for set-ctime-mdata failed [Permission denied]" repeated
> >4
> >> > times between [2020-04-06 04:10:53.553226] and [2020-04-06
> >> > 04:10:53.746080]
> >> >
> >> > Anything to worry about?
> >> >
> >> >
> >> > Regards,
> >> > Hubert
> >> 
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://bluejeans.com/441850968
> >>
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>
>
> Hi,
>
> Can you provide the xfs_info for the bricks from the volume ?
>
> I have a theory that I want to confirm or reject.
>
> Best Regards,
> Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 6.8: brick logs flooded by Information messages

2020-04-21 Thread Hu Bert

well, the log entries still appear, up to tens of millions log entries
in the brick logs. Do they appear simply because the ctime feature is
enabled in v6 by default?

https://docs.gluster.org/en/latest/release-notes/6.0/#7-ctime-feature-is-enabled-by-default

So if the problems solves itself when sooner or later for every
file/dir the ctime-mdata is set, it should be a matter of time, right?


Best regards,
Hubert

Am Sa., 11. Apr. 2020 um 15:38 Uhr schrieb Hu Bert :
>
> as i wrote in the opening post: the previous version was 5.11.
>
> Best regards,
> Hubert
>
> Am Sa., 11. Apr. 2020 um 13:25 Uhr schrieb Strahil Nikolov
> :
> >
> > On April 11, 2020 2:06:22 PM GMT+03:00, Hu Bert  
> > wrote:
> > >Hi Strahil,
> > >
> > >looking into the mount logs i think i see the same (or at least very
> > >similar) log messages:
> > >
> > >[2020-04-11 11:01:21.676613] I [MSGID: 108031]
> > >[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
> > >selecting local read_child workdata-client-0
> > >[2020-04-11 11:01:21.791039] W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
> > >remote operation failed [Permission denied]
> > >[2020-04-11 11:01:21.791099] W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
> > >remote operation failed [Permission denied]
> > >[2020-04-11 11:01:21.791172] W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
> > >remote operation failed [Permission denied]
> > >[2020-04-11 11:01:21.791598] E [MSGID: 148002]
> > >[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
> > >set of key for set-ctime-mdata failed [Permission denied]
> > >The message "I [MSGID: 108031]
> > >[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
> > >selecting local read_child workdata-client-0" repeated 947 times
> > >between [2020-04-11 11:01:21.676613] and [2020-04-11 11:03:21.405740]
> > >The message "W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
> > >remote operation failed [Permission denied]" repeated 3947 times
> > >between [2020-04-11 11:01:21.791039] and [2020-04-11 11:03:21.443929]
> > >The message "W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
> > >remote operation failed [Permission denied]" repeated 3947 times
> > >between [2020-04-11 11:01:21.791172] and [2020-04-11 11:03:21.443958]
> > >The message "W [MSGID: 114031]
> > >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
> > >remote operation failed [Permission denied]" repeated 3947 times
> > >between [2020-04-11 11:01:21.791099] and [2020-04-11 11:03:21.444008]
> > >The message "E [MSGID: 148002]
> > >[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
> > >set of key for set-ctime-mdata failed [Permission denied]" repeated
> > >3947 times between [2020-04-11 11:01:21.791598] and [2020-04-11
> > >11:03:21.444357]
> > >
> > >Best regards,
> > >Hubert
> > >
> > >Am Sa., 11. Apr. 2020 um 11:11 Uhr schrieb Strahil Nikolov
> > >:
> > >>
> > >> On April 11, 2020 8:35:41 AM GMT+03:00, Hu Bert
> > > wrote:
> > >> >Hi,
> > >> >
> > >> >this week i upgraded from 5.11 to 6.8. Since that the brick logs get
> > >> >flooded by such messages:
> > >> >
> > >> >[2020-04-11 05:22:48.774688] I [MSGID: 139001]
> > >> >[posix-acl.c:263:posix_acl_log_permit_denied]
> > >> >0-workdata-access-control: client:
> > >>
> > >>CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
> > >> >gfid: 25f487a5-2973-407c-87d2-74979c16c5ab,
> > >> >req(uid:33,gid:33,perm:2,ngrps:1),
> > >> >ctx(uid:109,gid:114,in-groups:0,perm:644,updated-fop:LOOKUP, acl:-)
> > >> >[Permission denied]
> > >> >[2020-04-11 05:22:48.774768] I [MSGID: 115060]
> > >> >[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
> > >> >0-workdata-server: 60517534: SETXATTR
> > >> >/images/383/200/38320023/800x800f.jpg
> > >> >(25f487a5-2973-407c-87d2-74979c16c5ab) ==> set-ctime-mdata, client:
> > >>
> > >>

Re: [Gluster-users] gluster 6.8: brick logs flooded by Information messages

2020-04-11 Thread Hu Bert

as i wrote in the opening post: the previous version was 5.11.

Best regards,
Hubert

Am Sa., 11. Apr. 2020 um 13:25 Uhr schrieb Strahil Nikolov
:
>
> On April 11, 2020 2:06:22 PM GMT+03:00, Hu Bert  
> wrote:
> >Hi Strahil,
> >
> >looking into the mount logs i think i see the same (or at least very
> >similar) log messages:
> >
> >[2020-04-11 11:01:21.676613] I [MSGID: 108031]
> >[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
> >selecting local read_child workdata-client-0
> >[2020-04-11 11:01:21.791039] W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
> >remote operation failed [Permission denied]
> >[2020-04-11 11:01:21.791099] W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
> >remote operation failed [Permission denied]
> >[2020-04-11 11:01:21.791172] W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
> >remote operation failed [Permission denied]
> >[2020-04-11 11:01:21.791598] E [MSGID: 148002]
> >[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
> >set of key for set-ctime-mdata failed [Permission denied]
> >The message "I [MSGID: 108031]
> >[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
> >selecting local read_child workdata-client-0" repeated 947 times
> >between [2020-04-11 11:01:21.676613] and [2020-04-11 11:03:21.405740]
> >The message "W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
> >remote operation failed [Permission denied]" repeated 3947 times
> >between [2020-04-11 11:01:21.791039] and [2020-04-11 11:03:21.443929]
> >The message "W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
> >remote operation failed [Permission denied]" repeated 3947 times
> >between [2020-04-11 11:01:21.791172] and [2020-04-11 11:03:21.443958]
> >The message "W [MSGID: 114031]
> >[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
> >remote operation failed [Permission denied]" repeated 3947 times
> >between [2020-04-11 11:01:21.791099] and [2020-04-11 11:03:21.444008]
> >The message "E [MSGID: 148002]
> >[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
> >set of key for set-ctime-mdata failed [Permission denied]" repeated
> >3947 times between [2020-04-11 11:01:21.791598] and [2020-04-11
> >11:03:21.444357]
> >
> >Best regards,
> >Hubert
> >
> >Am Sa., 11. Apr. 2020 um 11:11 Uhr schrieb Strahil Nikolov
> >:
> >>
> >> On April 11, 2020 8:35:41 AM GMT+03:00, Hu Bert
> > wrote:
> >> >Hi,
> >> >
> >> >this week i upgraded from 5.11 to 6.8. Since that the brick logs get
> >> >flooded by such messages:
> >> >
> >> >[2020-04-11 05:22:48.774688] I [MSGID: 139001]
> >> >[posix-acl.c:263:posix_acl_log_permit_denied]
> >> >0-workdata-access-control: client:
> >>
> >>CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
> >> >gfid: 25f487a5-2973-407c-87d2-74979c16c5ab,
> >> >req(uid:33,gid:33,perm:2,ngrps:1),
> >> >ctx(uid:109,gid:114,in-groups:0,perm:644,updated-fop:LOOKUP, acl:-)
> >> >[Permission denied]
> >> >[2020-04-11 05:22:48.774768] I [MSGID: 115060]
> >> >[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
> >> >0-workdata-server: 60517534: SETXATTR
> >> >/images/383/200/38320023/800x800f.jpg
> >> >(25f487a5-2973-407c-87d2-74979c16c5ab) ==> set-ctime-mdata, client:
> >>
> >>CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
> >> >error-xlator: workdata-access-control
> >> >
> >> >The file in the filesystem:
> >> >
> >> >-rw-r--r--  1 tomcat8 tomcat8 283K Jun 13  2019 800x800f.jpg
> >> >
> >> >So this file exists for 9 months now. The log messages appear for
> >> >files and directories. And there are over 9.5 million log entries in
> >> >the last 10 hours.
> >> >
> >> >Is this behaviour normal for 6.8? Will it disappear? Anyone an idea
> >> >what is happening here?
> >> >
> >> >
> >> >Regards,
> >> >Hubert
> >> >
> >> >
> >> >
> >> &

Re: [Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-11 Thread Hu Bert

Hi Alex & Strahil,

I (roughly) documented the installation process:

- install gluster packages
- start services (glusterd, glustereventsd) manually
- create volume & mount

After that i did a reboot of server1 to see if everything's fine
afterwards - no, services weren't started. Same on server2. On server3
i enabled both services -> after the reboot everything was fine.

That was the status on server1:

systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
vendor preset: enabled)
   Active: inactive (dead)
 Docs: man:glusterd(8)

preset: enabled; but status disabled - i don't know how that may have
happened, as i surely didn't disable them. Strange.


Best Regards,
Hubert

Am Sa., 11. Apr. 2020 um 15:25 Uhr schrieb Strahil Nikolov
:
>
> On April 11, 2020 2:19:54 PM GMT+03:00, Alexander Iliev 
>  wrote:
> >Hi Hubert,
> >
> >I think this would vary from distribution to distribution and it is up
> >to the package maintainers of the particular distribution to decide
> >what
> >the default should be.
> >
> >I am using Gluster 6.6 on CentOS and the Gluster-specific services
> >there
> >were also disabled (although not exactly as in your original post - the
> >
> >vendor preset was also disabled for me, while it is enabled for you).
> >
> >This is only a speculation for this particular case, but I think the
> >idea in general is to have the system administrator explicitly enable
> >the services he wants running on reboot.
> >
> >I would argue that this is the safer approach as opposed to enabling a
> >service automatically after its installation. An example scenario would
> >
> >be - you install a service, the system is rebooted, e.g. due to a power
> >
> >outage, mistyped command, etc., the service is started automatically
> >even though it hasn't been properly configured yet.
> >
> >I guess, to really know the reasoning, the respective package
> >maintainers would need to jump in and share their idea behind this
> >decision.
> >
> >Best regards,
> >--
> >alexander iliev
> >
> >On 4/11/20 7:40 AM, Hu Bert wrote:
> >> Hi,
> >>
> >> so no one has seen the problem of disabled systemd units before?
> >>
> >> Regards,
> >> Hubert
> >>
> >> Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert
> >:
> >>>
> >>> Hello,
> >>>
> >>> after a server reboot (with a fresh gluster 6.8 install) i noticed
> >>> that the gluster services weren't running.
> >>>
> >>> systemctl status glusterd.service
> >>> ● glusterd.service - GlusterFS, a clustered file-system server
> >>> Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
> >>> vendor preset: enabled)
> >>> Active: inactive (dead)
> >>>   Docs: man:glusterd(8)
> >>>
> >>> Apr 06 11:34:18 glfsserver1 systemd[1]:
> >>> /lib/systemd/system/glusterd.service:9: PIDFile= references path
> >below
> >>> legacy directory /var/run/, updating /var/run/glusterd.pid →
> >>> /run/glusterd.pid; please update the unit file accordingly.
> >>>
> >>> systemctl status glustereventsd.service
> >>> ● glustereventsd.service - Gluster Events Notifier
> >>> Loaded: loaded (/lib/systemd/system/glustereventsd.service;
> >>> disabled; vendor preset: enabled)
> >>> Active: inactive (dead)
> >>>   Docs: man:glustereventsd(8)
> >>>
> >>> Apr 06 11:34:27 glfsserver1 systemd[1]:
> >>> /lib/systemd/system/glustereventsd.service:11: PIDFile= references
> >>> path below legacy directory /var/run/, updating
> >>> /var/run/glustereventsd.pid → /run/glustereventsd.pid; please update
> >>> the unit file accordingly.
> >>>
> >>> You have to enable them manually:
> >>>
> >>> systemctl enable glusterd.service
> >>> Created symlink
> >>> /etc/systemd/system/multi-user.target.wants/glusterd.service →
> >>> /lib/systemd/system/glusterd.service.
> >>> systemctl enable glustereventsd.service
> >>> Created symlink
> >>> /etc/systemd/system/multi-user.target.wants/glustereventsd.service →
> >>> /lib/systemd/system/glustereventsd.service.
> >>>
> >>> Is this a bug? If so: already known?
> >>>
> >>>
> >>> Regards,
> >>

Re: [Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-11 Thread Hu Bert

Hi Strahil,

hmm... i still don't think it has something to do with the mounts
being not ready. See other mail :-)


Best regards,
Hubert

Am Sa., 11. Apr. 2020 um 13:22 Uhr schrieb Strahil Nikolov
:
>
> On April 11, 2020 1:41:55 PM GMT+03:00, Hu Bert  
> wrote:
> >Hi Strahil,
> >
> >i use entries in /etc/fstab . But the disabled systemd units are the
> >one of gluster itself. Well, no big problem once you notice what's
> >wrong, but it's strange since it's a fresh install.
> >
> >
> >Best Regards,
> >Hubert
> >
> >Am Sa., 11. Apr. 2020 um 11:12 Uhr schrieb Strahil Nikolov
> >:
> >>
> >> On April 11, 2020 8:40:47 AM GMT+03:00, Hu Bert
> > wrote:
> >> >Hi,
> >> >
> >> >so no one has seen the problem of disabled systemd units before?
> >> >
> >> >Regards,
> >> >Hubert
> >> >
> >> >Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert
> >> >:
> >> >>
> >> >> Hello,
> >> >>
> >> >> after a server reboot (with a fresh gluster 6.8 install) i noticed
> >> >> that the gluster services weren't running.
> >> >>
> >> >> systemctl status glusterd.service
> >> >> ● glusterd.service - GlusterFS, a clustered file-system server
> >> >>Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
> >> >> vendor preset: enabled)
> >> >>Active: inactive (dead)
> >> >>  Docs: man:glusterd(8)
> >> >>
> >> >> Apr 06 11:34:18 glfsserver1 systemd[1]:
> >> >> /lib/systemd/system/glusterd.service:9: PIDFile= references path
> >> >below
> >> >> legacy directory /var/run/, updating /var/run/glusterd.pid →
> >> >> /run/glusterd.pid; please update the unit file accordingly.
> >> >>
> >> >> systemctl status glustereventsd.service
> >> >> ● glustereventsd.service - Gluster Events Notifier
> >> >>Loaded: loaded (/lib/systemd/system/glustereventsd.service;
> >> >> disabled; vendor preset: enabled)
> >> >>Active: inactive (dead)
> >> >>  Docs: man:glustereventsd(8)
> >> >>
> >> >> Apr 06 11:34:27 glfsserver1 systemd[1]:
> >> >> /lib/systemd/system/glustereventsd.service:11: PIDFile= references
> >> >> path below legacy directory /var/run/, updating
> >> >> /var/run/glustereventsd.pid → /run/glustereventsd.pid; please
> >update
> >> >> the unit file accordingly.
> >> >>
> >> >> You have to enable them manually:
> >> >>
> >> >> systemctl enable glusterd.service
> >> >> Created symlink
> >> >> /etc/systemd/system/multi-user.target.wants/glusterd.service →
> >> >> /lib/systemd/system/glusterd.service.
> >> >> systemctl enable glustereventsd.service
> >> >> Created symlink
> >> >> /etc/systemd/system/multi-user.target.wants/glustereventsd.service
> >→
> >> >> /lib/systemd/system/glustereventsd.service.
> >> >>
> >> >> Is this a bug? If so: already known?
> >> >>
> >> >>
> >> >> Regards,
> >> >> Hubert
> >> >
> >> >
> >> >
> >> >
> >> >Community Meeting Calendar:
> >> >
> >> >Schedule -
> >> >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> >Bridge: https://bluejeans.com/441850968
> >> >
> >> >Gluster-users mailing list
> >> >Gluster-users@gluster.org
> >> >https://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> >> Are you using fstab or systemd mount units for the bricks ?
> >>
> >> Best Regards,
> >> Strahil Nikolov
>
> Most probably your mount points are not ready on time for the gluster.
>
> Try  with systemd  mount units (with 'Before=glusterd.service' in them and in 
> glusterd.service 'After=.mount'
>
> If I'm wrong it won't start again.
>
> If you need  help with that,  I can lend  some help.
>
> Best Regards,
> Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 6.8: brick logs flooded by Information messages

2020-04-11 Thread Hu Bert

Hi Strahil,

looking into the mount logs i think i see the same (or at least very
similar) log messages:

[2020-04-11 11:01:21.676613] I [MSGID: 108031]
[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
selecting local read_child workdata-client-0
[2020-04-11 11:01:21.791039] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
remote operation failed [Permission denied]
[2020-04-11 11:01:21.791099] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
remote operation failed [Permission denied]
[2020-04-11 11:01:21.791172] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
remote operation failed [Permission denied]
[2020-04-11 11:01:21.791598] E [MSGID: 148002]
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
set of key for set-ctime-mdata failed [Permission denied]
The message "I [MSGID: 108031]
[afr-common.c:2548:afr_local_discovery_cbk] 0-workdata-replicate-0:
selecting local read_child workdata-client-0" repeated 947 times
between [2020-04-11 11:01:21.676613] and [2020-04-11 11:03:21.405740]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-0:
remote operation failed [Permission denied]" repeated 3947 times
between [2020-04-11 11:01:21.791039] and [2020-04-11 11:03:21.443929]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-2:
remote operation failed [Permission denied]" repeated 3947 times
between [2020-04-11 11:01:21.791172] and [2020-04-11 11:03:21.443958]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk] 0-workdata-client-1:
remote operation failed [Permission denied]" repeated 3947 times
between [2020-04-11 11:01:21.791099] and [2020-04-11 11:03:21.444008]
The message "E [MSGID: 148002]
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-workdata-utime: dict
set of key for set-ctime-mdata failed [Permission denied]" repeated
3947 times between [2020-04-11 11:01:21.791598] and [2020-04-11
11:03:21.444357]

Best regards,
Hubert

Am Sa., 11. Apr. 2020 um 11:11 Uhr schrieb Strahil Nikolov
:
>
> On April 11, 2020 8:35:41 AM GMT+03:00, Hu Bert  
> wrote:
> >Hi,
> >
> >this week i upgraded from 5.11 to 6.8. Since that the brick logs get
> >flooded by such messages:
> >
> >[2020-04-11 05:22:48.774688] I [MSGID: 139001]
> >[posix-acl.c:263:posix_acl_log_permit_denied]
> >0-workdata-access-control: client:
> >CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
> >gfid: 25f487a5-2973-407c-87d2-74979c16c5ab,
> >req(uid:33,gid:33,perm:2,ngrps:1),
> >ctx(uid:109,gid:114,in-groups:0,perm:644,updated-fop:LOOKUP, acl:-)
> >[Permission denied]
> >[2020-04-11 05:22:48.774768] I [MSGID: 115060]
> >[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
> >0-workdata-server: 60517534: SETXATTR
> >/images/383/200/38320023/800x800f.jpg
> >(25f487a5-2973-407c-87d2-74979c16c5ab) ==> set-ctime-mdata, client:
> >CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
> >error-xlator: workdata-access-control
> >
> >The file in the filesystem:
> >
> >-rw-r--r--  1 tomcat8 tomcat8 283K Jun 13  2019 800x800f.jpg
> >
> >So this file exists for 9 months now. The log messages appear for
> >files and directories. And there are over 9.5 million log entries in
> >the last 10 hours.
> >
> >Is this behaviour normal for 6.8? Will it disappear? Anyone an idea
> >what is happening here?
> >
> >
> >Regards,
> >Hubert
> >
> >
> >
> >
> >Community Meeting Calendar:
> >
> >Schedule -
> >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >Bridge: https://bluejeans.com/441850968
> >
> >Gluster-users mailing list
> >Gluster-users@gluster.org
> >https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Hi Hubert,
> Not  normal at  all.
> It looks  like  my ACL problem.
> The  workaround for me is to copy the file/dir and replace it:
> cp -a  orig new
> mv orig  old && mv new orig
>
> @Amar,  is  the 'https://review.gluster.org/#/c/glusterfs/+/24264/'  in 6.8  ?
> The output seems just like mine.
>
> Best Regards,
> Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-11 Thread Hu Bert

Hi Strahil,

i use entries in /etc/fstab . But the disabled systemd units are the
one of gluster itself. Well, no big problem once you notice what's
wrong, but it's strange since it's a fresh install.


Best Regards,
Hubert

Am Sa., 11. Apr. 2020 um 11:12 Uhr schrieb Strahil Nikolov
:
>
> On April 11, 2020 8:40:47 AM GMT+03:00, Hu Bert  
> wrote:
> >Hi,
> >
> >so no one has seen the problem of disabled systemd units before?
> >
> >Regards,
> >Hubert
> >
> >Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert
> >:
> >>
> >> Hello,
> >>
> >> after a server reboot (with a fresh gluster 6.8 install) i noticed
> >> that the gluster services weren't running.
> >>
> >> systemctl status glusterd.service
> >> ● glusterd.service - GlusterFS, a clustered file-system server
> >>Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
> >> vendor preset: enabled)
> >>Active: inactive (dead)
> >>  Docs: man:glusterd(8)
> >>
> >> Apr 06 11:34:18 glfsserver1 systemd[1]:
> >> /lib/systemd/system/glusterd.service:9: PIDFile= references path
> >below
> >> legacy directory /var/run/, updating /var/run/glusterd.pid →
> >> /run/glusterd.pid; please update the unit file accordingly.
> >>
> >> systemctl status glustereventsd.service
> >> ● glustereventsd.service - Gluster Events Notifier
> >>Loaded: loaded (/lib/systemd/system/glustereventsd.service;
> >> disabled; vendor preset: enabled)
> >>Active: inactive (dead)
> >>  Docs: man:glustereventsd(8)
> >>
> >> Apr 06 11:34:27 glfsserver1 systemd[1]:
> >> /lib/systemd/system/glustereventsd.service:11: PIDFile= references
> >> path below legacy directory /var/run/, updating
> >> /var/run/glustereventsd.pid → /run/glustereventsd.pid; please update
> >> the unit file accordingly.
> >>
> >> You have to enable them manually:
> >>
> >> systemctl enable glusterd.service
> >> Created symlink
> >> /etc/systemd/system/multi-user.target.wants/glusterd.service →
> >> /lib/systemd/system/glusterd.service.
> >> systemctl enable glustereventsd.service
> >> Created symlink
> >> /etc/systemd/system/multi-user.target.wants/glustereventsd.service →
> >> /lib/systemd/system/glustereventsd.service.
> >>
> >> Is this a bug? If so: already known?
> >>
> >>
> >> Regards,
> >> Hubert
> >
> >
> >
> >
> >Community Meeting Calendar:
> >
> >Schedule -
> >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >Bridge: https://bluejeans.com/441850968
> >
> >Gluster-users mailing list
> >Gluster-users@gluster.org
> >https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Are you using fstab or systemd mount units for the bricks ?
>
> Best Regards,
> Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-10 Thread Hu Bert

Hi,

so no one has seen the problem of disabled systemd units before?

Regards,
Hubert

Am Mo., 6. Apr. 2020 um 12:30 Uhr schrieb Hu Bert :
>
> Hello,
>
> after a server reboot (with a fresh gluster 6.8 install) i noticed
> that the gluster services weren't running.
>
> systemctl status glusterd.service
> ● glusterd.service - GlusterFS, a clustered file-system server
>Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
> vendor preset: enabled)
>Active: inactive (dead)
>  Docs: man:glusterd(8)
>
> Apr 06 11:34:18 glfsserver1 systemd[1]:
> /lib/systemd/system/glusterd.service:9: PIDFile= references path below
> legacy directory /var/run/, updating /var/run/glusterd.pid →
> /run/glusterd.pid; please update the unit file accordingly.
>
> systemctl status glustereventsd.service
> ● glustereventsd.service - Gluster Events Notifier
>Loaded: loaded (/lib/systemd/system/glustereventsd.service;
> disabled; vendor preset: enabled)
>Active: inactive (dead)
>  Docs: man:glustereventsd(8)
>
> Apr 06 11:34:27 glfsserver1 systemd[1]:
> /lib/systemd/system/glustereventsd.service:11: PIDFile= references
> path below legacy directory /var/run/, updating
> /var/run/glustereventsd.pid → /run/glustereventsd.pid; please update
> the unit file accordingly.
>
> You have to enable them manually:
>
> systemctl enable glusterd.service
> Created symlink
> /etc/systemd/system/multi-user.target.wants/glusterd.service →
> /lib/systemd/system/glusterd.service.
> systemctl enable glustereventsd.service
> Created symlink
> /etc/systemd/system/multi-user.target.wants/glustereventsd.service →
> /lib/systemd/system/glustereventsd.service.
>
> Is this a bug? If so: already known?
>
>
> Regards,
> Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-04-10 Thread Hu Bert

Hi,

no one has seen such messages?

Regards,
Hubert

Am Mo., 6. Apr. 2020 um 06:13 Uhr schrieb Hu Bert :
>
> Hello,
>
> i just upgraded my servers and clients from 5.11 to 6.8; besides one
> connection problem to the gluster download server everything went
> fine.
>
> On the 3 gluster servers i mount the 2 volumes as well, and only there
> (and not on all the other clients) there are some messages in the log
> file of both mount logs:
>
> [2020-04-06 04:10:53.552561] W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-2: remote operation failed [Permission denied]
> [2020-04-06 04:10:53.552635] W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-1: remote operation failed [Permission denied]
> [2020-04-06 04:10:53.552639] W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-0: remote operation failed [Permission denied]
> [2020-04-06 04:10:53.553226] E [MSGID: 148002]
> [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict
> set of key for set-ctime-mdata failed [Permission denied]
> The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-2: remote operation failed [Permission denied]"
> repeated 4 times between [2020-04-06 04:10:53.552561] and [2020-04-06
> 04:10:53.745542]
> The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-1: remote operation failed [Permission denied]"
> repeated 4 times between [2020-04-06 04:10:53.552635] and [2020-04-06
> 04:10:53.745610]
> The message "W [MSGID: 114031]
> [client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
> 0-persistent-client-0: remote operation failed [Permission denied]"
> repeated 4 times between [2020-04-06 04:10:53.552639] and [2020-04-06
> 04:10:53.745632]
> The message "E [MSGID: 148002]
> [utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict
> set of key for set-ctime-mdata failed [Permission denied]" repeated 4
> times between [2020-04-06 04:10:53.553226] and [2020-04-06
> 04:10:53.746080]
>
> Anything to worry about?
>
>
> Regards,
> Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster 6.8: brick logs flooded by Information messages

2020-04-10 Thread Hu Bert

Hi,

this week i upgraded from 5.11 to 6.8. Since that the brick logs get
flooded by such messages:

[2020-04-11 05:22:48.774688] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied]
0-workdata-access-control: client:
CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
gfid: 25f487a5-2973-407c-87d2-74979c16c5ab,
req(uid:33,gid:33,perm:2,ngrps:1),
ctx(uid:109,gid:114,in-groups:0,perm:644,updated-fop:LOOKUP, acl:-)
[Permission denied]
[2020-04-11 05:22:48.774768] I [MSGID: 115060]
[server-rpc-fops.c:938:_gf_server_log_setxattr_failure]
0-workdata-server: 60517534: SETXATTR
/images/383/200/38320023/800x800f.jpg
(25f487a5-2973-407c-87d2-74979c16c5ab) ==> set-ctime-mdata, client:
CTX_ID:a1b55e2b-af03-484a-9be0-9314fec4bb61-GRAPH_ID:0-PID:8204-HOST:gluster1-PC_NAME:workdata-client-0-RECON_NO:-0,
error-xlator: workdata-access-control

The file in the filesystem:

-rw-r--r--  1 tomcat8 tomcat8 283K Jun 13  2019 800x800f.jpg

So this file exists for 9 months now. The log messages appear for
files and directories. And there are over 9.5 million log entries in
the last 10 hours.

Is this behaviour normal for 6.8? Will it disappear? Anyone an idea
what is happening here?


Regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster v6.8: systemd units disabled after install

2020-04-06 Thread Hu Bert

Hello,

after a server reboot (with a fresh gluster 6.8 install) i noticed
that the gluster services weren't running.

systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/lib/systemd/system/glusterd.service; disabled;
vendor preset: enabled)
   Active: inactive (dead)
 Docs: man:glusterd(8)

Apr 06 11:34:18 glfsserver1 systemd[1]:
/lib/systemd/system/glusterd.service:9: PIDFile= references path below
legacy directory /var/run/, updating /var/run/glusterd.pid →
/run/glusterd.pid; please update the unit file accordingly.

systemctl status glustereventsd.service
● glustereventsd.service - Gluster Events Notifier
   Loaded: loaded (/lib/systemd/system/glustereventsd.service;
disabled; vendor preset: enabled)
   Active: inactive (dead)
 Docs: man:glustereventsd(8)

Apr 06 11:34:27 glfsserver1 systemd[1]:
/lib/systemd/system/glustereventsd.service:11: PIDFile= references
path below legacy directory /var/run/, updating
/var/run/glustereventsd.pid → /run/glustereventsd.pid; please update
the unit file accordingly.

You have to enable them manually:

systemctl enable glusterd.service
Created symlink
/etc/systemd/system/multi-user.target.wants/glusterd.service →
/lib/systemd/system/glusterd.service.
systemctl enable glustereventsd.service
Created symlink
/etc/systemd/system/multi-user.target.wants/glustereventsd.service →
/lib/systemd/system/glustereventsd.service.

Is this a bug? If so: already known?


Regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] One error/warning message after upgrade 5.11 -> 6.8

2020-04-05 Thread Hu Bert

Hello,

i just upgraded my servers and clients from 5.11 to 6.8; besides one
connection problem to the gluster download server everything went
fine.

On the 3 gluster servers i mount the 2 volumes as well, and only there
(and not on all the other clients) there are some messages in the log
file of both mount logs:

[2020-04-06 04:10:53.552561] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-2: remote operation failed [Permission denied]
[2020-04-06 04:10:53.552635] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-1: remote operation failed [Permission denied]
[2020-04-06 04:10:53.552639] W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-0: remote operation failed [Permission denied]
[2020-04-06 04:10:53.553226] E [MSGID: 148002]
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict
set of key for set-ctime-mdata failed [Permission denied]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-2: remote operation failed [Permission denied]"
repeated 4 times between [2020-04-06 04:10:53.552561] and [2020-04-06
04:10:53.745542]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-1: remote operation failed [Permission denied]"
repeated 4 times between [2020-04-06 04:10:53.552635] and [2020-04-06
04:10:53.745610]
The message "W [MSGID: 114031]
[client-rpc-fops_v2.c:851:client4_0_setxattr_cbk]
0-persistent-client-0: remote operation failed [Permission denied]"
repeated 4 times between [2020-04-06 04:10:53.552639] and [2020-04-06
04:10:53.745632]
The message "E [MSGID: 148002]
[utime.c:146:gf_utime_set_mdata_setxattr_cbk] 0-persistent-utime: dict
set of key for set-ctime-mdata failed [Permission denied]" repeated 4
times between [2020-04-06 04:10:53.553226] and [2020-04-06
04:10:53.746080]

Anything to worry about?


Regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Repository down ?

2020-04-05 Thread Hu Bert

Good morning,

upgraded from 5.11 to 6.8 today; 2 servers worked smoothly, one again
had connection problems:

Err:1 
https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt
buster/main amd64 libglusterfs-dev amd64 6.8-1
  Could not connect to download.gluster.org:443 (8.43.85.185),
connection timed out
Err:2 
https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt
buster/main amd64 libgfxdr0 amd64 6.8-1
  Unable to connect to download.gluster.org:https:

As a workaround i downloaded the packages manually on one of the other
2 servers, copied them to server3 and installed them manually.

Any idea why this happens? /etc/hosts, /etc/resolv.conf are identical.
The servers are behind the same gateway (switch in datacenter of
provider), the server IPs differ only in the last number.


Best regards,
Hubert

Am Fr., 3. Apr. 2020 um 10:33 Uhr schrieb Hu Bert :
>
> ok, half an hour later it worked. Not funny during an upgrade. Strange... :-)
>
>
> Regards,
> Hubert
>
> Am Fr., 3. Apr. 2020 um 10:19 Uhr schrieb Hu Bert :
> >
> > Hi,
> >
> > i'm currently preparing an upgrade 5.x -> 6.8; the download of the
> > repository key works on 2 of 3 servers. nameserver settings are
> > identical. On the 3rd server i get this:
> >
> > wget -O - https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> > | apt-key add -
> > --2020-04-03 10:15:43--
> > https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> > Resolving download.gluster.org (download.gluster.org)... 8.43.85.185
> > Connecting to download.gluster.org
> > (download.gluster.org)|8.43.85.185|:443... failed: Connection timed
> > out.
> > Retrying.
> >
> > and this goes on and on... Which errors do you see?
> >
> >
> > Regards,
> > Hubert
> >
> > Am Mo., 30. März 2020 um 20:40 Uhr schrieb Renaud Fortier
> > :
> > >
> > > Hi,
> > >
> > > I’m trying to download packages from the gluster repository 
> > > https://download.gluster.org/ but it failed for every download I’ve tried.
> > >
> > >
> > >
> > > Is it happening only to me ?
> > >
> > > Thank you
> > >
> > >
> > >
> > > Renaud Fortier
> > >
> > >
> > >
> > > 
> > >
> > >
> > >
> > > Community Meeting Calendar:
> > >
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://bluejeans.com/441850968
> > >
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Repository down ?

2020-04-03 Thread Hu Bert

ok, half an hour later it worked. Not funny during an upgrade. Strange... :-)


Regards,
Hubert

Am Fr., 3. Apr. 2020 um 10:19 Uhr schrieb Hu Bert :
>
> Hi,
>
> i'm currently preparing an upgrade 5.x -> 6.8; the download of the
> repository key works on 2 of 3 servers. nameserver settings are
> identical. On the 3rd server i get this:
>
> wget -O - https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> | apt-key add -
> --2020-04-03 10:15:43--
> https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
> Resolving download.gluster.org (download.gluster.org)... 8.43.85.185
> Connecting to download.gluster.org
> (download.gluster.org)|8.43.85.185|:443... failed: Connection timed
> out.
> Retrying.
>
> and this goes on and on... Which errors do you see?
>
>
> Regards,
> Hubert
>
> Am Mo., 30. März 2020 um 20:40 Uhr schrieb Renaud Fortier
> :
> >
> > Hi,
> >
> > I’m trying to download packages from the gluster repository 
> > https://download.gluster.org/ but it failed for every download I’ve tried.
> >
> >
> >
> > Is it happening only to me ?
> >
> > Thank you
> >
> >
> >
> > Renaud Fortier
> >
> >
> >
> > 
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Repository down ?

2020-04-03 Thread Hu Bert

Hi,

i'm currently preparing an upgrade 5.x -> 6.8; the download of the
repository key works on 2 of 3 servers. nameserver settings are
identical. On the 3rd server i get this:

wget -O - https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
| apt-key add -
--2020-04-03 10:15:43--
https://download.gluster.org/pub/gluster/glusterfs/6/rsa.pub
Resolving download.gluster.org (download.gluster.org)... 8.43.85.185
Connecting to download.gluster.org
(download.gluster.org)|8.43.85.185|:443... failed: Connection timed
out.
Retrying.

and this goes on and on... Which errors do you see?


Regards,
Hubert

Am Mo., 30. März 2020 um 20:40 Uhr schrieb Renaud Fortier
:
>
> Hi,
>
> I’m trying to download packages from the gluster repository 
> https://download.gluster.org/ but it failed for every download I’ve tried.
>
>
>
> Is it happening only to me ?
>
> Thank you
>
>
>
> Renaud Fortier
>
>
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster 6.8: some error messages during op-version-update

2020-04-01 Thread Hu Bert

Hi,

i just upgraded a test cluster from version 5.12 to 6.8; that went
fine, but iirc after setting the new op-version i saw some error
messages:

3 servers: becquerel, dirac, tesla
2 volumes:
workload, mounted on /shared/public
persistent, mounted on /shared/private

server becquerel, volume persistent:

[2020-04-01 08:36:29.029953] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.317342] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.341508] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.402862] E [MSGID: 101002]
[graph.y:134:new_volume] 0-parser: new volume
(persistent-write-behind) definition in line 308 unexpected
[2020-04-01 08:36:29.402924] E [MSGID: 101098]
[xlator.c:938:xlator_tree_free_members] 0-parser: Translator tree not
found
[2020-04-01 08:36:29.402945] E [MSGID: 101098]
[xlator.c:959:xlator_tree_free_memacct] 0-parser: Translator tree not
found
[2020-04-01 08:36:29.407428] E [MSGID: 101019]
[graph.y:352:graphyyerror] 0-parser: line 309: duplicate 'type'
defined for volume 'xlator_tree_free_memacct'
[2020-04-01 08:36:29.410943] E [MSGID: 101021]
[graph.y:363:graphyyerror] 0-parser: syntax error: line 309 (volume
'xlator_tree_free_memacct'): "performance/write-behind"
allowed tokens are 'volume', 'type', 'subvolumes', 'option', 'end-volume'()

sever becquerel, volume workload:

[2020-04-01 08:36:29.029953] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.317385] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.341511] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.400282] E [MSGID: 101002]
[graph.y:134:new_volume] 0-parser: new volume (workdata-write-behind)
definition in line 308 unexpected
[2020-04-01 08:36:29.400338] E [MSGID: 101098]
[xlator.c:938:xlator_tree_free_members] 0-parser: Translator tree not
found
[2020-04-01 08:36:29.400354] E [MSGID: 101098]
[xlator.c:959:xlator_tree_free_memacct] 0-parser: Translator tree not
found
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-04-01 08:36:29
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.12
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25c3f)[0x7facd212cc3f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x323)[0x7facd2137163]
/lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7facd1846840]
/lib/x86_64-linux-gnu/libc.so.6(+0x15c1a7)[0x7facd196b1a7]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x1fff)[0x7facd18609ef]
/lib/x86_64-linux-gnu/libc.so.6(__vasprintf_chk+0xc8)[0x7facd19190f8]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x1b0)[0x7facd212dd40]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0xa1970)[0x7facd21a8970]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0xa1d86)[0x7facd21a8d86]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(glusterfs_graph_construct+0x344)[0x7facd21a9a24]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(glusterfs_volfile_reconfigure+0x30)[0x7facd2165cc0]
/usr/sbin/glusterfs(mgmt_getspec_cbk+0x2e1)[0x55a5e0a6de71]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xec60)[0x7facd20f7c60]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xefbf)[0x7facd20f7fbf]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7facd20f44e3]
/usr/lib/x86_64-linux-gnu/glusterfs/5.12/rpc-transport/socket.so(+0xbdb0)[0x7faccde83db0]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x83e7f)[0x7facd218ae7f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7facd1cc0fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7facd19084cf]
-

server tesla: nothing related
server dirac, log for mount volume persistent on /shared/private

[2020-04-01 08:36:29.029845] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.317253] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.341371] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2020-04-01 08:36:29.397448] E [MSGID: 101002]
[graph.y:134:new_volume] 0-parser: new volume
(persistent-write-behind) definition in line 2554 unexpected
[2020-04-01 08:36:29.397546] E [MSGID: 101098]
[xlator.c:938:xlator_tree_free_members] 0-parser: Translator tree not
found
[2020-04-01 08:36:29.397567] E [MSGID: 101098]
[xlator.c:959:xlator_tree_free_memacct] 0-parser: Translator tree not
found
[2020-04-01 08:36:29.403301] E [MSGID: 101021]
[graph.y:377:graphyyerror] 0-parser: syntax error in line 2555: "type"
(allowed tokens are 'volume', 'type', 'subvolumes', 'option', 'end-volume')

[2020-04-01 08:36:29.407495] E [MSGID: 101021]
[graph.y:377:graphyyerror] 0-parser: syntax error in line 2555:

Re: [Gluster-users] Gluster 6.8 & debian

2020-04-01 Thread Hu Bert

Hi Sheetal,

thx for updating. I just upgraded my test gluster from 5.12 to 6.8 and
everything went fine. Services restart wasn't consistent, on one
server i had to restart the services, but that's ok i think.


Best regards
Hubert

Am Di., 31. März 2020 um 12:23 Uhr schrieb Sheetal Pamecha
:
>
> Hi,
> The packages are rebuilt with the missing dependencies and updated.
>
> Regards,
> Sheetal Pamecha
>
>
> On Mon, Mar 30, 2020 at 6:53 PM Sheetal Pamecha  wrote:
>>
>> Hi Hubert,
>>
>> This time we triggered the automation scripts for package building instead 
>> of doing it manually. It seems this is a bug in the script that all lib 
>> packages are excluded.
>> Thanks for trying and pointing it out. We are working to resolve this. I 
>> will update the package once build is complete.
>>
>> Regards,
>> Sheetal Pamecha
>>
>>
>> On Mon, Mar 30, 2020 at 5:28 PM Hu Bert  wrote:
>>>
>>> Hi Sheetal,
>>>
>>> thx so far, but some additional packages are missing: libgfapi0,
>>> libgfchangelog0, libgfrpc0, libgfxdr0, libglusterfs0
>>>
>>> The following packages have unmet dependencies:
>>>  glusterfs-common : Depends: libgfapi0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libgfchangelog0 (>= 6.8) but it is not
>>> going to be installed
>>> Depends: libgfrpc0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libgfxdr0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libglusterfs0 (>= 6.8) but it is not
>>> going to be installed
>>>  glusterfs-server : Depends: libgfapi0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libgfrpc0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libgfxdr0 (>= 6.8) but it is not going to
>>> be installed
>>> Depends: libglusterfs0 (>= 6.8) but it is not
>>> going to be installed
>>>
>>> All the lib* packages are simply missing for version 6.8, but are
>>> there for version 6.7.
>>>
>>> https://download.gluster.org/pub/gluster/glusterfs/6/6.7/Debian/buster/amd64/apt/pool/main/g/glusterfs/
>>> vs.
>>> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/
>>>
>>> Can you please check?
>>>
>>>
>>> Thx,
>>> Hubert
>>>
>>> Am Mo., 30. März 2020 um 12:57 Uhr schrieb Sheetal Pamecha
>>> :
>>> >
>>> > Hi,
>>> >
>>> > I have updated the path now and now latest points to 6.8 and packages in 
>>> > place.
>>> > Regards,
>>> > Sheetal Pamecha
>>> >
>>> >
>>> > On Mon, Mar 30, 2020 at 2:23 PM Hu Bert  wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> now the packages appeared:
>>> >>
>>> >> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/
>>> >>
>>> >> Dated: 2020-03-17 - so this looks good, right? Thx to the one who... ;-)
>>> >>
>>> >>
>>> >> Best Regards,
>>> >> Hubert
>>> >>
>>> >> Am Do., 26. März 2020 um 15:03 Uhr schrieb Ingo Fischer 
>>> >> :
>>> >> >
>>> >> > Hey,
>>> >> >
>>> >> > I also asked for "when 6.8 comes to LATEST" in two mails here the last
>>> >> > weeks ... I would also be very interested in the reasons.
>>> >> >
>>> >> > Ingo
>>> >> >
>>> >> > Am 26.03.20 um 07:15 schrieb Hu Bert:
>>> >> > > Hello,
>>> >> > >
>>> >> > > i just wanted to test an upgrade from version 5.12 to version 6.8, 
>>> >> > > but
>>> >> > > there are no packages for debian buster in version 6.8.
>>> >> > >
>>> >> > > https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/
>>> >> > >
>>> >> > > This directory is empty. LATEST still links to version 6.7
>>> >> > >
>>> >> > > https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/ -> 6.7
>>> >> > >
>>> >> > > 6.8 was released on 2nd of march - is there any reason why there are
>>> >> > > no packages? bugs?
>>> >> > >
>>> >> > >
>>> >> > > Best regards
>>> >> > >
>>> >> > > Hubert
>>> >> > > 
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > Community Meeting Calendar:
>>> >> > >
>>> >> > > Schedule -
>>> >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> >> > > Bridge: https://bluejeans.com/441850968
>>> >> > >
>>> >> > > Gluster-users mailing list
>>> >> > > Gluster-users@gluster.org
>>> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users
>>> >> > >
>>> >> 
>>> >>
>>> >>
>>> >>
>>> >> Community Meeting Calendar:
>>> >>
>>> >> Schedule -
>>> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> >> Bridge: https://bluejeans.com/441850968
>>> >>
>>> >> Gluster-users mailing list
>>> >> Gluster-users@gluster.org
>>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 6.8 & debian

2020-03-30 Thread Hu Bert

Hi Sheetal,

thx so far, but some additional packages are missing: libgfapi0,
libgfchangelog0, libgfrpc0, libgfxdr0, libglusterfs0

The following packages have unmet dependencies:
 glusterfs-common : Depends: libgfapi0 (>= 6.8) but it is not going to
be installed
Depends: libgfchangelog0 (>= 6.8) but it is not
going to be installed
Depends: libgfrpc0 (>= 6.8) but it is not going to
be installed
Depends: libgfxdr0 (>= 6.8) but it is not going to
be installed
Depends: libglusterfs0 (>= 6.8) but it is not
going to be installed
 glusterfs-server : Depends: libgfapi0 (>= 6.8) but it is not going to
be installed
Depends: libgfrpc0 (>= 6.8) but it is not going to
be installed
Depends: libgfxdr0 (>= 6.8) but it is not going to
be installed
Depends: libglusterfs0 (>= 6.8) but it is not
going to be installed

All the lib* packages are simply missing for version 6.8, but are
there for version 6.7.

https://download.gluster.org/pub/gluster/glusterfs/6/6.7/Debian/buster/amd64/apt/pool/main/g/glusterfs/
vs.
https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/

Can you please check?


Thx,
Hubert

Am Mo., 30. März 2020 um 12:57 Uhr schrieb Sheetal Pamecha
:
>
> Hi,
>
> I have updated the path now and now latest points to 6.8 and packages in 
> place.
> Regards,
> Sheetal Pamecha
>
>
> On Mon, Mar 30, 2020 at 2:23 PM Hu Bert  wrote:
>>
>> Hello,
>>
>> now the packages appeared:
>>
>> https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/
>>
>> Dated: 2020-03-17 - so this looks good, right? Thx to the one who... ;-)
>>
>>
>> Best Regards,
>> Hubert
>>
>> Am Do., 26. März 2020 um 15:03 Uhr schrieb Ingo Fischer :
>> >
>> > Hey,
>> >
>> > I also asked for "when 6.8 comes to LATEST" in two mails here the last
>> > weeks ... I would also be very interested in the reasons.
>> >
>> > Ingo
>> >
>> > Am 26.03.20 um 07:15 schrieb Hu Bert:
>> > > Hello,
>> > >
>> > > i just wanted to test an upgrade from version 5.12 to version 6.8, but
>> > > there are no packages for debian buster in version 6.8.
>> > >
>> > > https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/
>> > >
>> > > This directory is empty. LATEST still links to version 6.7
>> > >
>> > > https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/ -> 6.7
>> > >
>> > > 6.8 was released on 2nd of march - is there any reason why there are
>> > > no packages? bugs?
>> > >
>> > >
>> > > Best regards
>> > >
>> > > Hubert
>> > > 
>> > >
>> > >
>> > >
>> > > Community Meeting Calendar:
>> > >
>> > > Schedule -
>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> > > Bridge: https://bluejeans.com/441850968
>> > >
>> > > Gluster-users mailing list
>> > > Gluster-users@gluster.org
>> > > https://lists.gluster.org/mailman/listinfo/gluster-users
>> > >
>> 
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 6.8 & debian

2020-03-30 Thread Hu Bert

Hello,

now the packages appeared:

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/pool/main/g/glusterfs/

Dated: 2020-03-17 - so this looks good, right? Thx to the one who... ;-)


Best Regards,
Hubert

Am Do., 26. März 2020 um 15:03 Uhr schrieb Ingo Fischer :
>
> Hey,
>
> I also asked for "when 6.8 comes to LATEST" in two mails here the last
> weeks ... I would also be very interested in the reasons.
>
> Ingo
>
> Am 26.03.20 um 07:15 schrieb Hu Bert:
> > Hello,
> >
> > i just wanted to test an upgrade from version 5.12 to version 6.8, but
> > there are no packages for debian buster in version 6.8.
> >
> > https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/
> >
> > This directory is empty. LATEST still links to version 6.7
> >
> > https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/ -> 6.7
> >
> > 6.8 was released on 2nd of march - is there any reason why there are
> > no packages? bugs?
> >
> >
> > Best regards
> >
> > Hubert
> > 
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster 6.8 & debian

2020-03-26 Thread Hu Bert

Hello,

i just wanted to test an upgrade from version 5.12 to version 6.8, but
there are no packages for debian buster in version 6.8.

https://download.gluster.org/pub/gluster/glusterfs/6/6.8/Debian/buster/amd64/apt/

This directory is empty. LATEST still links to version 6.7

https://download.gluster.org/pub/gluster/glusterfs/6/LATEST/ -> 6.7

6.8 was released on 2nd of march - is there any reason why there are
no packages? bugs?


Best regards

Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] mount of 2 volumes fails at boot (/etc/fstab)

2020-03-23 Thread Hu Bert

Hi Strahil,

problem found: the driver for the network card was buggy or not
working properly, which caused the network interface going up and
down.

I'll check if a systemd mount unit is an option. Thx!


Best regards,
Hubert

Am Sa., 21. März 2020 um 21:06 Uhr schrieb Strahil Nikolov
:
>
> On March 20, 2020 8:13:28 AM GMT+02:00, Hu Bert  
> wrote:
> >Hello,
> >
> >i just reinstall a server (debian buster). I added 2 entries to
> >/etc/fstab:
> >
> >gluster1:/persistent /data/repository/shared/private glusterfs
> >defaults,_netdev,attribute-timeout=0,entry-timeout=0,backup-volfile-servers=gluster2:gluster3
> >0 0
> >gluster1:/workdata /data/repository/shared/public glusterfs
> >defaults,_netdev,attribute-timeout=0,entry-timeout=0,backup-volfile-servers=gluster2:gluster3
> >0 0
> >
> >Entries in /etc/hosts for gluster1+2+3 are there. But after a reboot
> >the mount of the 2 gluster volumes fails. There are additional servers
> >with exact the same entries, they don't have a problem with mounting.
> >But only this server does. The log entries for one of the volumes
> >shows this:
> >
> >[2020-03-20 05:32:13.089703] I [MSGID: 100030]
> >[glusterfsd.c:2725:main] 0-/usr/sbin/glusterfs: Started running
> >/usr/sbin/glusterfs version 5.11 (args: /usr/sbin/glusterfs
> >--attribute-timeout=0 --entry-timeout=0 --process-name fuse
> >--volfile-server=gluster1 --volfile-server=gluster2
> >--volfile-server=gluster3 --volfile-id=/persistent
> >/data/repository/shared/private)
> >[2020-03-20 05:32:13.120904] I [MSGID: 101190]
> >[event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started
> >thread with index 1
> >[2020-03-20 05:32:16.196568] I
> >[glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt:
> >disconnected from remote-host: gluster1
> >[2020-03-20 05:32:16.196614] I
> >[glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
> >to next volfile server gluster2
> >[2020-03-20 05:32:20.164538] I
> >[glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
> >to next volfile server gluster3
> >[2020-03-20 05:32:26.180546] I
> >[glusterfsd-mgmt.c:2444:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted
> >all volfile servers
> >[2020-03-20 05:32:26.181618] W [glusterfsd.c:1500:cleanup_and_exit]
> >(-->/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xee13) [0x7fbf38a98e13]
> >-->/usr/sbin/glusterfs(+0x127d7) [0x55bac75517d7]
> >-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55bac7549f54] ) 0-:
> >received signum (1), shutting down
> >[2020-03-20 05:32:26.181744] I [fuse-bridge.c:5914:fini] 0-fuse:
> >Unmounting '/data/repository/shared/private'.
> >[2020-03-20 05:32:26.200708] I [fuse-bridge.c:5919:fini] 0-fuse:
> >Closing fuse connection to '/data/repository/shared/private'.
> >[2020-03-20 05:32:26.200885] W [glusterfsd.c:1500:cleanup_and_exit]
> >(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7fbf38661fa3]
> >-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55bac754a0fd]
> >-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55bac7549f54] ) 0-:
> >received signum (15), shutting down
> >
> >The messages for the other volume are identical. If i do a 'mount -a',
> >the volumes get mounted.
> >
> >Did i miss anything?
> >
> >
> >Regards,
> >Hubert
> >
> >
> >
> >
> >Community Meeting Calendar:
> >
> >Schedule -
> >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >Bridge: https://bluejeans.com/441850968
> >
> >Gluster-users mailing list
> >Gluster-users@gluster.org
> >https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Why don't you set it up as a systemd '.mount' unit ?
> You can define dependencies.
>
> Currently,  after  a  reboot  you can check the following:
> 'systemctl  status persistent-data-repository-shared-private.mount'
>
>
> Best Regards,
> Strahil Nikolov




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] mount of 2 volumes fails at boot (/etc/fstab)

2020-03-20 Thread Hu Bert

Hello,

i just reinstall a server (debian buster). I added 2 entries to /etc/fstab:

gluster1:/persistent /data/repository/shared/private glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,backup-volfile-servers=gluster2:gluster3
0 0
gluster1:/workdata /data/repository/shared/public glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,backup-volfile-servers=gluster2:gluster3
0 0

Entries in /etc/hosts for gluster1+2+3 are there. But after a reboot
the mount of the 2 gluster volumes fails. There are additional servers
with exact the same entries, they don't have a problem with mounting.
But only this server does. The log entries for one of the volumes
shows this:

[2020-03-20 05:32:13.089703] I [MSGID: 100030]
[glusterfsd.c:2725:main] 0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs version 5.11 (args: /usr/sbin/glusterfs
--attribute-timeout=0 --entry-timeout=0 --process-name fuse
--volfile-server=gluster1 --volfile-server=gluster2
--volfile-server=gluster3 --volfile-id=/persistent
/data/repository/shared/private)
[2020-03-20 05:32:13.120904] I [MSGID: 101190]
[event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started
thread with index 1
[2020-03-20 05:32:16.196568] I
[glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt:
disconnected from remote-host: gluster1
[2020-03-20 05:32:16.196614] I
[glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
to next volfile server gluster2
[2020-03-20 05:32:20.164538] I
[glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting
to next volfile server gluster3
[2020-03-20 05:32:26.180546] I
[glusterfsd-mgmt.c:2444:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted
all volfile servers
[2020-03-20 05:32:26.181618] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xee13) [0x7fbf38a98e13]
-->/usr/sbin/glusterfs(+0x127d7) [0x55bac75517d7]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55bac7549f54] ) 0-:
received signum (1), shutting down
[2020-03-20 05:32:26.181744] I [fuse-bridge.c:5914:fini] 0-fuse:
Unmounting '/data/repository/shared/private'.
[2020-03-20 05:32:26.200708] I [fuse-bridge.c:5919:fini] 0-fuse:
Closing fuse connection to '/data/repository/shared/private'.
[2020-03-20 05:32:26.200885] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7fbf38661fa3]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55bac754a0fd]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55bac7549f54] ) 0-:
received signum (15), shutting down

The messages for the other volume are identical. If i do a 'mount -a',
the volumes get mounted.

Did i miss anything?


Regards,
Hubert




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disk use with GlusterFS

2020-03-06 Thread Hu Bert

Hi David,

it simply checks the inode usage. There may be enough disk space, but
no more inodes available, e.g.:

Filesystem   InodesIUsed  IFree IUse% Mounted on
/dev/mapper/vg0-root 65536059698 595662   10% /

If IUse% is at 100%... ;-)


Regards,
Hubert

Am Fr., 6. März 2020 um 09:20 Uhr schrieb David Cunningham
:
>
> Hi Hu.
>
> Just to clarify, what should we be looking for with "df -i"?
>
>
> On Fri, 6 Mar 2020 at 18:51, Hu Bert  wrote:
>>
>> Hi,
>>
>> just a guess and easy to test/try: inodes? df -i?
>>
>> regards,
>> Hubert
>>
>> Am Fr., 6. März 2020 um 04:42 Uhr schrieb David Cunningham
>> :
>> >
>> > Hi Aravinda,
>> >
>> > That's what was reporting 54% used, at the same time that GlusterFS was 
>> > giving no space left on device errors. It's a bit worrying that they're 
>> > not reporting the same thing.
>> >
>> > Thank you.
>> >
>> >
>> > On Fri, 6 Mar 2020 at 16:33, Aravinda VK  wrote:
>> >>
>> >> Hi David,
>> >>
>> >> What is it reporting for brick’s `df` output?
>> >>
>> >> ```
>> >> df /nodirectwritedata/gluster/gvol0
>> >> ```
>> >>
>> >> —
>> >> regards
>> >> Aravinda Vishwanathapura
>> >> https://kadalu.io
>> >>
>> >> On 06-Mar-2020, at 2:52 AM, David Cunningham  
>> >> wrote:
>> >>
>> >> Hello,
>> >>
>> >> A major concern we have is that "df" was reporting only 54% used and yet 
>> >> GlusterFS was giving "No space left on device" errors. We rely on "df" to 
>> >> report the correct result to monitor the system and ensure stability. 
>> >> Does anyone know what might have been going on here?
>> >>
>> >> Thanks in advance.
>> >>
>> >>
>> >> On Thu, 5 Mar 2020 at 21:35, David Cunningham  
>> >> wrote:
>> >>>
>> >>> Hi Aravinda,
>> >>>
>> >>> Thanks for the reply. This test server is indeed the master server for 
>> >>> geo-replication to a slave.
>> >>>
>> >>> I'm really surprised that geo-replication simply keeps writing logs 
>> >>> until all space is consumed, without cleaning them up itself. I didn't 
>> >>> see any warning about it in the geo-replication install documentation 
>> >>> which is unfortunate. We'll come up with a solution to delete log files 
>> >>> older than the LAST_SYNCED time in the geo-replication status. Is anyone 
>> >>> aware of any other potential gotchas like this?
>> >>>
>> >>> Does anyone have an idea why in my previous note some space in the 2GB 
>> >>> GlusterFS partition apparently went missing? We had 0.47GB of data, 1GB 
>> >>> reported used by .glusterfs, which even if they were separate files 
>> >>> would only add up to 1.47GB used, meaning 0.53GB should have been left 
>> >>> in the partition. If less space is actually being used because of the 
>> >>> hard links then it's even harder to understand where the other 1.53GB 
>> >>> went. So why would GlusterFS report "No space left on device"?
>> >>>
>> >>> Thanks again for any assistance.
>> >>>
>> >>>
>> >>> On Thu, 5 Mar 2020 at 17:31, Aravinda VK  wrote:
>> >>>>
>> >>>> Hi David,
>> >>>>
>> >>>> Is this Volume is uses Geo-replication? Geo-replication feature enables 
>> >>>> Changelog to identify the latest changes happening in the GlusterFS 
>> >>>> volume.
>> >>>>
>> >>>> Content of .glusterfs directory also includes hardlinks to the actual 
>> >>>> data, so the size shown in .glusterfs is including data. Please refer 
>> >>>> the comment by Xavi 
>> >>>> https://github.com/gluster/glusterfs/issues/833#issuecomment-594436009
>> >>>>
>> >>>> If Changelogs files are causing issue, you can use archival tool to 
>> >>>> remove processed changelogs.
>> >>>> https://github.com/aravindavk/archive_gluster_changelogs
>> >>>>
>> >>>> —
>> >>>> regards
>> >>>> Aravinda Vishwanathapura
>> >>>

Re: [Gluster-users] Disk use with GlusterFS

2020-03-05 Thread Hu Bert

Hi,

just a guess and easy to test/try: inodes? df -i?

regards,
Hubert

Am Fr., 6. März 2020 um 04:42 Uhr schrieb David Cunningham
:
>
> Hi Aravinda,
>
> That's what was reporting 54% used, at the same time that GlusterFS was 
> giving no space left on device errors. It's a bit worrying that they're not 
> reporting the same thing.
>
> Thank you.
>
>
> On Fri, 6 Mar 2020 at 16:33, Aravinda VK  wrote:
>>
>> Hi David,
>>
>> What is it reporting for brick’s `df` output?
>>
>> ```
>> df /nodirectwritedata/gluster/gvol0
>> ```
>>
>> —
>> regards
>> Aravinda Vishwanathapura
>> https://kadalu.io
>>
>> On 06-Mar-2020, at 2:52 AM, David Cunningham  
>> wrote:
>>
>> Hello,
>>
>> A major concern we have is that "df" was reporting only 54% used and yet 
>> GlusterFS was giving "No space left on device" errors. We rely on "df" to 
>> report the correct result to monitor the system and ensure stability. Does 
>> anyone know what might have been going on here?
>>
>> Thanks in advance.
>>
>>
>> On Thu, 5 Mar 2020 at 21:35, David Cunningham  
>> wrote:
>>>
>>> Hi Aravinda,
>>>
>>> Thanks for the reply. This test server is indeed the master server for 
>>> geo-replication to a slave.
>>>
>>> I'm really surprised that geo-replication simply keeps writing logs until 
>>> all space is consumed, without cleaning them up itself. I didn't see any 
>>> warning about it in the geo-replication install documentation which is 
>>> unfortunate. We'll come up with a solution to delete log files older than 
>>> the LAST_SYNCED time in the geo-replication status. Is anyone aware of any 
>>> other potential gotchas like this?
>>>
>>> Does anyone have an idea why in my previous note some space in the 2GB 
>>> GlusterFS partition apparently went missing? We had 0.47GB of data, 1GB 
>>> reported used by .glusterfs, which even if they were separate files would 
>>> only add up to 1.47GB used, meaning 0.53GB should have been left in the 
>>> partition. If less space is actually being used because of the hard links 
>>> then it's even harder to understand where the other 1.53GB went. So why 
>>> would GlusterFS report "No space left on device"?
>>>
>>> Thanks again for any assistance.
>>>
>>>
>>> On Thu, 5 Mar 2020 at 17:31, Aravinda VK  wrote:

 Hi David,

 Is this Volume is uses Geo-replication? Geo-replication feature enables 
 Changelog to identify the latest changes happening in the GlusterFS volume.

 Content of .glusterfs directory also includes hardlinks to the actual 
 data, so the size shown in .glusterfs is including data. Please refer the 
 comment by Xavi 
 https://github.com/gluster/glusterfs/issues/833#issuecomment-594436009

 If Changelogs files are causing issue, you can use archival tool to remove 
 processed changelogs.
 https://github.com/aravindavk/archive_gluster_changelogs

 —
 regards
 Aravinda Vishwanathapura
 https://kadalu.io


 On 05-Mar-2020, at 9:02 AM, David Cunningham  
 wrote:

 Hello,

 We are looking for some advice on disk use. This is on a single node 
 GlusterFS test server.

 There's a 2GB partition for GlusterFS. Of that, 470MB is used for actual 
 data, and 1GB is used by the .glusterfs directory. The .glusterfs 
 directory is mostly used by the two-character directories and the 
 "changelogs" directory. Why is so much used by .glusterfs, and can we 
 reduce that overhead?

 We also have a problem with this test system where GlusterFS is giving "No 
 space left on device" errors. That's despite "df" reporting only 54% used, 
 and even if we add the 470MB to 1GB used above, that still comes out to 
 less than the 2GB available, so there should be some spare.

 Would anyone be able to advise on these please? Thank you in advance.

 The GlusterFS version is 5.11 and here is the volume information:

 Volume Name: gvol0
 Type: Distribute
 Volume ID: 33ed309b-0e63-4f9a-8132-ab1b0fdcbc36
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 1
 Transport-type: tcp
 Bricks:
 Brick1: myhost:/nodirectwritedata/gluster/gvol0
 Options Reconfigured:
 transport.address-family: inet
 nfs.disable: on
 geo-replication.indexing: on
 geo-replication.ignore-pid-check: on
 changelog.changelog: on

 --
 David Cunningham, Voisonics Limited
 http://voisonics.com/
 USA: +1 213 221 1092
 New Zealand: +64 (0)28 2558 3782
 



 Community Meeting Calendar:

 Schedule -
 Every Tuesday at 14:30 IST / 09:00 UTC
 Bridge: https://bluejeans.com/441850968

 Gluster-users mailing list
 Gluster-users@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-users






>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64

Re: [Gluster-users] recommendation: gluster version upgrade and/or OS dist-upgrade

2020-02-20 Thread Hu Bert

Hi Mika,

i did a test on my 5.x test cluster. I started with the dist-upgrade
stretch -> buster, which went fine. But "updating" from the glusterfs
packages stretch -> buster was not possible, i had to deinstall the
stretch packages and reinstall the buster packages.

For me it looks like that it isn't important which one you do first.


Regards,
Hubert

Am Mi., 19. Feb. 2020 um 09:50 Uhr schrieb Michael Böhm
:
>
>
> Am Di., 18. Feb. 2020 um 08:51 Uhr schrieb Hu Bert :
>>
>> Hello,
>>
>> i currently have a replicate 3 setup, gluster version 5.11 and debian
>> stretch. In the next weeks i want to migrate to gluster version 6.x
>> and upgrade the OS to debian buster.
>>
>> So... any recommendation of what to do first? First upgrade gluster or
>> the operating system?
>
>
> Hello Hubert,
>
> i needed to do this a while ago, in my case it was a Proxmox-Cluster based on 
> the Debian Stretch upgrade to Buster while also upgrading gluster up to v7. 
> The way i did it:
>
> 1. i had the "normal" stretch gluster packages installed (3.8) not the 
> backports (4.1), so i first needed to upgrade gluster to 3.12 and move on 
> from there (see [1]). You may be able to skip the next step.
> 2. i activated the old-releases gluster repos for 3.12 [2] and upgrade from 
> 3.8(debian) to 3.12(gluster)
> 3. after this i switched to the "latest" repo and upgraded to v7 while still 
> running stretch
> 4. then i upgraded stretch to buster, while gluster repos where commented out
> 5. after the dist-upgrade i switched the gluster repos to the buster ones and 
> activated them again
>
> I write this all from memory as i am not at work atm, but this should be it. 
> I also can say, that i did all the upgrades online (couldn't afford to stop 
> all the vm's) - and to my surprise everything went without a big problem.
>
> Good luck
>
> Mika
>
> [1] https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_7/ or 
> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/
> [2] 
> https://download.gluster.org/pub/gluster/glusterfs/old-releases/3.12/LATEST/Debian/9/


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] recommendation: gluster version upgrade and/or OS dist-upgrade

2020-02-18 Thread Hu Bert

ah, the reason why i was asking: i wanted to do the dist-upgrade and
the installation of the debian buster glusterfs-packages in one step,
which led me to the following error message:

Errors were encountered while processing:
 /tmp/apt-dpkg-install-FRTZ1D/139-glusterfs-common_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/140-libglusterfs0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/141-libgfxdr0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/142-libgfrpc0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/143-libgfapi0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/144-libgfchangelog0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/145-libgfdb0_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/150-libglusterfs-dev_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/163-glusterfs-client_5.10-1_amd64.deb
 /tmp/apt-dpkg-install-FRTZ1D/164-glusterfs-server_5.10-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

so i first did the dist-upgrade; but the buster-glusterfs-packages
didn't want to install again, so i had to remove the "old" glusterfs
packages and install the buster packages. Strange.

Am Di., 18. Feb. 2020 um 08:51 Uhr schrieb Hu Bert :
>
> Hello,
>
> i currently have a replicate 3 setup, gluster version 5.11 and debian
> stretch. In the next weeks i want to migrate to gluster version 6.x
> and upgrade the OS to debian buster.
>
> So... any recommendation of what to do first? First upgrade gluster or
> the operating system?
>
>
> Thx,
> Hubert


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] recommendation: gluster version upgrade and/or OS dist-upgrade

2020-02-18 Thread Hu Bert

Hello,

i currently have a replicate 3 setup, gluster version 5.11 and debian
stretch. In the next weeks i want to migrate to gluster version 6.x
and upgrade the OS to debian buster.

So... any recommendation of what to do first? First upgrade gluster or
the operating system?


Thx,
Hubert


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] No possible to mount a gluster volume via /etc/fstab?

2020-01-23 Thread Hu Bert

Hi Sherry,

maybe at the time, when the mount from /etc/fstab should take place,
name resolution is not yet working? In your case i'd try to place
proper entries in /etc/hosts and test it with a reboot.


regards
Hubert

Am Fr., 24. Jan. 2020 um 02:37 Uhr schrieb Sherry Reese :
>
> Hello everyone,
>
> I am using the following entry on a CentOS server.
>
> gluster01.home:/videos /data2/plex/videos glusterfs _netdev 0 0
> gluster01.home:/photos /data2/plex/photos glusterfs _netdev 0 0
>
> I am able to use sudo mount -a to mount the volumes without any problems. 
> When I reboot my server, nothing is mounted.
>
> I can see errors in /var/log/glusterfs/data2-plex-photos.log:
>
> ...
> [2020-01-24 01:24:18.302191] I [glusterfsd.c:2594:daemonize] 0-glusterfs: Pid 
> of current running process is 3679
> [2020-01-24 01:24:18.310017] E [MSGID: 101075] 
> [common-utils.c:505:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) 
> (Name or service not known)
> [2020-01-24 01:24:18.310046] E 
> [name.c:266:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution 
> failed on host gluster01.home
> [2020-01-24 01:24:18.310187] I [MSGID: 101190] 
> [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with 
> index 0
> ...
>
> I am able to to nslookup on gluster01 and gluster01.home without problems, so 
> "DNS resolution failed" is confusing to me. What happens here?
>
> Output of my volumes.
>
> sudo gluster volume status
> Status of volume: documents
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick gluster01.home:/data/documents49152 0  Y   5658
> Brick gluster02.home:/data/documents49152 0  Y   5340
> Brick gluster03.home:/data/documents49152 0  Y   5305
> Self-heal Daemon on localhost   N/A   N/AY   5679
> Self-heal Daemon on gluster03.home  N/A   N/AY   5326
> Self-heal Daemon on gluster02.home  N/A   N/AY   5361
>
> Task Status of Volume documents
> --
> There are no active volume tasks
>
> Status of volume: photos
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick gluster01.home:/data/photos   49153 0  Y   5779
> Brick gluster02.home:/data/photos   49153 0  Y   5401
> Brick gluster03.home:/data/photos   49153 0  Y   5366
> Self-heal Daemon on localhost   N/A   N/AY   5679
> Self-heal Daemon on gluster03.home  N/A   N/AY   5326
> Self-heal Daemon on gluster02.home  N/A   N/AY   5361
>
> Task Status of Volume photos
> --
> There are no active volume tasks
>
> Status of volume: videos
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick gluster01.home:/data/videos   49154 0  Y   5883
> Brick gluster02.home:/data/videos   49154 0  Y   5452
> Brick gluster03.home:/data/videos   49154 0  Y   5416
> Self-heal Daemon on localhost   N/A   N/AY   5679
> Self-heal Daemon on gluster03.home  N/A   N/AY   5326
> Self-heal Daemon on gluster02.home  N/A   N/AY   5361
>
> Task Status of Volume videos
> --
> There are no active volume tasks
>
> On the server (Ubuntu) following versions are installed.
>
> glusterfs-client/bionic,now 7.2-ubuntu1~bionic1 armhf [installed,automatic]
> glusterfs-common/bionic,now 7.2-ubuntu1~bionic1 armhf [installed,automatic]
> glusterfs-server/bionic,now 7.2-ubuntu1~bionic1 armhf [installed]
>
> On the client (CentOS) following versions are installed.
>
> sudo rpm -qa | grep gluster
> glusterfs-client-xlators-7.2-1.el7.x86_64
> glusterfs-cli-7.2-1.el7.x86_64
> glusterfs-libs-7.2-1.el7.x86_64
> glusterfs-7.2-1.el7.x86_64
> glusterfs-api-7.2-1.el7.x86_64
> libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.3.x86_64
> centos-release-gluster7-1.0-1.el7.centos.noarch
> glusterfs-fuse-7.2-1.el7.x86_64
>
> I tried to disable IPv6 on the client voa sysctl with following parameters.
>
> net.ipv6.conf.all.disable_ipv6 = 1
> net.ipv6.conf.default.disable_ipv6 = 1
>
> That did not help.
>
> Volumes are configured with inet.
>
> sudo gluster volume info videos
>
> Volume Name: videos
> Type: Replicate
> Volume ID: 8fddde82-66b3-447f-8860-ed3768c51876
> Status: Started
>

Re: [Gluster-users] To RAID or not to RAID...

2020-01-14 Thread Hu Bert

Hi,

our old setup is not really comparable, but i thought i'd drop some
lines... we once had a Distributed-Replicate setup with 4 x 3 = 12
disks (10 TB hdd). Simple JBOD, every disk == brick. Was running
pretty good, until one of the disks died. The restore (reset-brick)
took about a month, because the application has a quite high I/O and
therefore slows down the volume and the disks.

Next step: take servers with 10x10TB disks and build a RAID 10; raid
array == brick, replicate volume (1 x 3 = 3). When a disk fails, you
only have to rebuild the SW RAID which takes about 3-4 days, plus the
periodic redundany checks. This was way better than the
JBOD/reset-scenario before. But still not optimal. Upcoming step:
build a distribute-replicate with lots of SSDs (maybe again with a
RAID underneath) .

tl;dr what i wanted to say: we waste a lot of disks. It simply depends
on which setup you have and how to handle the situation when one of
the disks fails - and it will! ;-(


regards
Hubert

Am Di., 14. Jan. 2020 um 12:36 Uhr schrieb Markus Kern :
>
>
> Greetings again!
>
> After reading RedHat documentation regarding optimizing Gluster storage
> another question comes to my mind:
>
> Let's presume that I want to go the distributed dispersed volume way.
> Three nodes which two bricks each.
> According to RedHat's recommendation, I should use RAID6 as underlying
> RAID for my planned workload.
> I am frightened by that "waste" of disks in such a case:
> When each brick is a RAID6, I would "loose" two disks per brick - 12
> lossed disks in total.
> In addition to this, distributed dispersed volume adds another layer of
> lossed disk space.
>
> Am I wrong here? Maybe I didn't understand the recommendations wrong?
>
> Markus
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster release 5.10

2019-10-21 Thread Hu Bert

Hi Shwetha,

thx for your efforts, now i see the amd64 packages - i need those and
not arm64 ;-)

Now i noticed something new: the 5.9 folder now contains the 5.10
packages - see:

https://download.gluster.org/pub/gluster/glusterfs/5/5.9/Debian/stretch/amd64/apt/pool/main/g/glusterfs/

So it seems like the 5.9 packages got deleted/replaced? No problem
here, 5.9 is running, but everyone that does a simple "apt update &&
apt upgrade" and doesn't check versions might unintentionally receive
the 5.10 packages.


Thx,
Hubert

Am Mo., 21. Okt. 2019 um 14:34 Uhr schrieb Shwetha Acharya
:
>
> Hu Bert,
>
> Find my reply inline.
>
> Regards,
> Shwetha
> On Mon, Oct 21, 2019 at 1:22 PM Hu Bert  wrote:
>>
>> Hi Shwetha,
>>
>> thx, now there are the 5.10 packages. But maybe I should've been more 
>> precise:
>>
>> https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/
>> -> only arm64 available - where is amd64?
>>
> Currently we are unable to build for arm64.  
> https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/ 
> now correctly points to amd64.
>>
>> For 5.9 there's also no amd64 anymore (but has been at least back in
>> August). 5.8 still has the amd64 directory.
>>
>> And the packages in arm64 are named amd64, e.g.
>> glusterfs-client_5.10-1_amd64.deb
>
> This is fixed now.
>>
>> Is there no difference between arm64 and amd64? And one should simply
>> use arm64 path/dir instead of amd64?
>>
> They are two different architectures.
>>
>>
>> Best regards,
>> Hubert
>>
>> Am Mo., 21. Okt. 2019 um 09:25 Uhr schrieb Shwetha Acharya
>> :
>> >
>> > Hi Hu Bert,
>> >
>> > Thanks for informing about the issue.
>> > Now, you can find correct packages at 
>> > https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/arm64/apt/pool/main/g/glusterfs/
>> >
>> > Regards,
>> >
>> > Shwetha
>> >
>> > On Mon, Oct 21, 2019 at 10:53 AM Hu Bert  wrote:
>> >>
>> >> Good morning,
>> >>
>> >> i just wanted to check for version 5.10 for debian stretch - but it
>> >> doesn't seem to be available.
>> >>
>> >> https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>> >> -> only version 5.9
>> >>
>> >> https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/
>> >> -> arm64, but packages are named e.g.
>> >> https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/arm64/apt/pool/main/g/glusterfs/glusterfs-client_5.10-1_amd64.deb
>> >>
>> >> Did i miss anything?
>> >>
>> >> Regards,
>> >> Hubert
>> >>
>> >> Am Mo., 14. Okt. 2019 um 06:48 Uhr schrieb Hari Gowtham 
>> >> :
>> >> >
>> >> > Hi,
>> >> >
>> >> > The Gluster community is pleased to announce the release of Gluster
>> >> > 5.10 (packages available at [1]).
>> >> >
>> >> > Release notes for the release can be found at [2].
>> >> >
>> >> > Major changes, features and limitations addressed in this release:
>> >> > None
>> >> >
>> >> > Thanks,
>> >> > Gluster community
>> >> >
>> >> > [1] Packages for 5.10:
>> >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.10/
>> >> >
>> >> > [2] Release notes for 5.10:
>> >> > https://docs.gluster.org/en/latest/release-notes/5.10/
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> > Hari Gowtham.
>> >> >
>> >> > 
>> >> >
>> >> > Community Meeting Calendar:
>> >> >
>> >> > APAC Schedule -
>> >> > Every 2nd and 4th Tuesday at 11:30 AM IST
>> >> > Bridge: https://bluejeans.com/118564314
>> >> >
>> >> > NA/EMEA Schedule -
>> >> > Every 1st and 3rd Tuesday at 01:00 PM EDT
>> >> > Bridge: https://bluejeans.com/118564314
>> >> >
>> >> > Gluster-users mailing list
>> >> > Gluster-users@gluster.org
>> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> >> 
>> >>
>> >> Community Meeting Calendar:
>> >>
>> >> APAC Schedule -
>> >> Every 2nd and 4th Tuesday at 11:30 AM IST
>> >> Bridge: https://bluejeans.com/118564314
>> >>
>> >> NA/EMEA Schedule -
>> >> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> >> Bridge: https://bluejeans.com/118564314
>> >>
>> >> Gluster-users mailing list
>> >> Gluster-users@gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster release 5.10

2019-10-21 Thread Hu Bert

Hi Shwetha,

thx, now there are the 5.10 packages. But maybe I should've been more precise:

https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/
-> only arm64 available - where is amd64?

For 5.9 there's also no amd64 anymore (but has been at least back in
August). 5.8 still has the amd64 directory.

And the packages in arm64 are named amd64, e.g.
glusterfs-client_5.10-1_amd64.deb

Is there no difference between arm64 and amd64? And one should simply
use arm64 path/dir instead of amd64?


Best regards,
Hubert

Am Mo., 21. Okt. 2019 um 09:25 Uhr schrieb Shwetha Acharya
:
>
> Hi Hu Bert,
>
> Thanks for informing about the issue.
> Now, you can find correct packages at 
> https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/arm64/apt/pool/main/g/glusterfs/
>
> Regards,
>
> Shwetha
>
> On Mon, Oct 21, 2019 at 10:53 AM Hu Bert  wrote:
>>
>> Good morning,
>>
>> i just wanted to check for version 5.10 for debian stretch - but it
>> doesn't seem to be available.
>>
>> https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>> -> only version 5.9
>>
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/
>> -> arm64, but packages are named e.g.
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/arm64/apt/pool/main/g/glusterfs/glusterfs-client_5.10-1_amd64.deb
>>
>> Did i miss anything?
>>
>> Regards,
>> Hubert
>>
>> Am Mo., 14. Okt. 2019 um 06:48 Uhr schrieb Hari Gowtham 
>> :
>> >
>> > Hi,
>> >
>> > The Gluster community is pleased to announce the release of Gluster
>> > 5.10 (packages available at [1]).
>> >
>> > Release notes for the release can be found at [2].
>> >
>> > Major changes, features and limitations addressed in this release:
>> > None
>> >
>> > Thanks,
>> > Gluster community
>> >
>> > [1] Packages for 5.10:
>> > https://download.gluster.org/pub/gluster/glusterfs/5/5.10/
>> >
>> > [2] Release notes for 5.10:
>> > https://docs.gluster.org/en/latest/release-notes/5.10/
>> >
>> >
>> > --
>> > Regards,
>> > Hari Gowtham.
>> >
>> > 
>> >
>> > Community Meeting Calendar:
>> >
>> > APAC Schedule -
>> > Every 2nd and 4th Tuesday at 11:30 AM IST
>> > Bridge: https://bluejeans.com/118564314
>> >
>> > NA/EMEA Schedule -
>> > Every 1st and 3rd Tuesday at 01:00 PM EDT
>> > Bridge: https://bluejeans.com/118564314
>> >
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/118564314
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/118564314
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing Gluster release 5.10

2019-10-20 Thread Hu Bert

Good morning,

i just wanted to check for version 5.10 for debian stretch - but it
doesn't seem to be available.

https://download.gluster.org/pub/gluster/glusterfs/5/LATEST/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
-> only version 5.9

https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/
-> arm64, but packages are named e.g.
https://download.gluster.org/pub/gluster/glusterfs/5/5.10/Debian/stretch/arm64/apt/pool/main/g/glusterfs/glusterfs-client_5.10-1_amd64.deb

Did i miss anything?

Regards,
Hubert

Am Mo., 14. Okt. 2019 um 06:48 Uhr schrieb Hari Gowtham :
>
> Hi,
>
> The Gluster community is pleased to announce the release of Gluster
> 5.10 (packages available at [1]).
>
> Release notes for the release can be found at [2].
>
> Major changes, features and limitations addressed in this release:
> None
>
> Thanks,
> Gluster community
>
> [1] Packages for 5.10:
> https://download.gluster.org/pub/gluster/glusterfs/5/5.10/
>
> [2] Release notes for 5.10:
> https://docs.gluster.org/en/latest/release-notes/5.10/
>
>
> --
> Regards,
> Hari Gowtham.
>
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/118564314
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/118564314
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD

2019-06-06 Thread Hu Bert

If i remember correctly: in the video they suggested not to make a
RAID 10 too big (i.e. too many (big) disks), because the RAID resync
then could take a long time. They didn't mention a limit; on my 3
servers with 2 RAID 10 (1x4 disks, 1x6 disks), no disk failed so far,
but there were automatic periodic redundancy checks (mdadm checkarray)
which ran for a couple of days, increasing load on the servers and
responsiveness of glusterfs on the clients. Almost no one even noticed
that mdadm checks were running :-)

But if i compare it with our old JBOD setup: after the disk change the
heal took about a month, resulting in really poor performance on the
client side. As we didn't want to experience that period again ->
throw hardware at the problem. Maybe a different setup (10 disks -> 5
RAID 1, building a distribute replicate) would've been even better,
but so far we're happy with the current setup.

Am Do., 6. Juni 2019 um 18:48 Uhr schrieb Eduardo Mayoral :
>
> Your comment actually helps me more than you think, one of the main
> doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with
> replica2 + arbitrer. Before reading your email I was leaning more
> towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm
> can be painful too. Now I see a reconstruct is going to be painful
> either way...
>
> For the record, the workload I am going to migrate is currently
> 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as
> files, but let's use that for a rough estimate), for an average file
> size of about 539 KB per file.
>
> Thanks a lot for your time and insights!
>
> On 6/6/19 8:53, Hu Bert wrote:
> > Good morning,
> >
> > my comment won't help you directly, but i thought i'd send it anyway...
> >
> > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB,
> > JBOD) each. Was running fine in the beginning, but then 1 disk failed.
> > The following heal took ~1 month, with a bad performance (quite high
> > IO). Shortly after the heal hat finished another disk failed -> same
> > problems again. Not funny.
> >
> > For our new system we decided to use 3 servers with 10 disks (10 TB)
> > each, but now the 10 disks in a SW RAID 10 (well, we split the 10
> > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster
> > volumes). A lot of disk space "wasted", with this type of SW RAID and
> > a replicate 3 setup, but we wanted to avoid the "healing takes a long
> > time with bad performance" problems. Now mdadm takes care of
> > replicating data, glusterfs should always see "good" bricks.
> >
> > And the decision may depend on what kind of data you have. Many small
> > files, like tens of millions? Or not that much, but bigger files? I
> > once watched a video (i think it was this one:
> > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there:
> > RAID 6 or 10 for small files, for big files... well, already 2 years
> > "old" ;-)
> >
> > As i said, this won't help you directly. You have to identify what's
> > most important for your scenario; as you said, high performance is not
> > an issue - if this is true even when you have slight performance
> > issues after a disk fail then ok. My experience so far: the bigger and
> > slower the disks are and the more data you have -> healing will hurt
> > -> try to avoid this. If the disks are small and fast (SSDs), healing
> > will be faster -> JBOD is an option.
> >
> >
> > hth,
> > Hubert
> >
> > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral 
> > :
> >> Hi,
> >>
> >> I am looking into a new gluster deployment to replace an ancient one.
> >>
> >> For this deployment I will be using some repurposed servers I
> >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW
> >> RAID controller. They also have some SSD which would be nice to leverage
> >> as cache or similar to improve performance, since it is already there.
> >> Advice on how to leverage the SSDs would be greatly appreciated.
> >>
> >> One of the design choices I have to make is using 3 nodes for a
> >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID
> >> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk
> >> as metadata node for the replica set. I would love to hear advice on the
> >> pros and cons of each setup from the gluster experts.
> >>
> >> The data will be accessed from 4 to 6 systems with native gluster,
> >> not sure if that makes any difference.
&

Re: [Gluster-users] Advice for setup: SW RAID 6 vs JBOD

2019-06-06 Thread Hu Bert

Good morning,

my comment won't help you directly, but i thought i'd send it anyway...

Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB,
JBOD) each. Was running fine in the beginning, but then 1 disk failed.
The following heal took ~1 month, with a bad performance (quite high
IO). Shortly after the heal hat finished another disk failed -> same
problems again. Not funny.

For our new system we decided to use 3 servers with 10 disks (10 TB)
each, but now the 10 disks in a SW RAID 10 (well, we split the 10
disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster
volumes). A lot of disk space "wasted", with this type of SW RAID and
a replicate 3 setup, but we wanted to avoid the "healing takes a long
time with bad performance" problems. Now mdadm takes care of
replicating data, glusterfs should always see "good" bricks.

And the decision may depend on what kind of data you have. Many small
files, like tens of millions? Or not that much, but bigger files? I
once watched a video (i think it was this one:
https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there:
RAID 6 or 10 for small files, for big files... well, already 2 years
"old" ;-)

As i said, this won't help you directly. You have to identify what's
most important for your scenario; as you said, high performance is not
an issue - if this is true even when you have slight performance
issues after a disk fail then ok. My experience so far: the bigger and
slower the disks are and the more data you have -> healing will hurt
-> try to avoid this. If the disks are small and fast (SSDs), healing
will be faster -> JBOD is an option.


hth,
Hubert

Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral :
>
> Hi,
>
> I am looking into a new gluster deployment to replace an ancient one.
>
> For this deployment I will be using some repurposed servers I
> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW
> RAID controller. They also have some SSD which would be nice to leverage
> as cache or similar to improve performance, since it is already there.
> Advice on how to leverage the SSDs would be greatly appreciated.
>
> One of the design choices I have to make is using 3 nodes for a
> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID
> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk
> as metadata node for the replica set. I would love to hear advice on the
> pros and cons of each setup from the gluster experts.
>
> The data will be accessed from 4 to 6 systems with native gluster,
> not sure if that makes any difference.
>
> The amount of data I have to store there is currently 20 TB, with
> moderate growth. iops are quite low so high performance is not an issue.
> The data will fit in any of the two setups.
>
> Thanks in advance for your advice!
>
> --
> Eduardo Mayoral Jimeno
> Systems engineer, platform department. Arsys Internet.
> emayo...@arsys.es - +34 941 620 105 - ext 2153
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 5.6: Gfid mismatch detected

2019-05-22 Thread Hu Bert

Hi Ravi,

mount path of the volume is /shared/public, so complete paths are
/shared/public/staticmap/120/710/ and
/shared/public/staticmap/120/710/120710351/ .

getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/
getfattr: Removing leading '/' from absolute path names
# file: shared/public/staticmap/120/710/
glusterfs.gfid.string="751233b0-7789-4550-bd95-4dd9c8f57c19"

getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/120710351/
getfattr: Removing leading '/' from absolute path names
# file: shared/public/staticmap/120/710/120710351/
glusterfs.gfid.string="eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace"

So that fits. It somehow took a couple of attempts to resolve this,
and none of the commands seem to have "officially" succeeded:

gluster3 (host with the "fail"):
gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
/shared/public/staticmap/120/710/120710351/
Lookup failed on /shared/public/staticmap/120/710:No such file or directory
Volume heal failed.

gluster1 ("good" host):
gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
/shared/public/staticmap/120/710/120710351/
Lookup failed on /shared/public/staticmap/120/710:No such file or directory
Volume heal failed.

Only in the logs i see:

[2019-05-22 07:42:22.004182] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-workdata-replicate-0: performing metadata selfheal on
eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace
[2019-05-22 07:42:22.008502] I [MSGID: 108026]
[afr-self-heal-common.c:1729:afr_log_selfheal] 0-workdata-replicate-0:
Completed metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace.
sources=0 [1]  sinks=2

And via "gluster volume heal workdata statistics heal-count" there are
0 entries left. Files/directories are there. Happened the first time
with this setup, but everything ok now.

Thx for your fast help :-)

Hubert

Am Mi., 22. Mai 2019 um 09:32 Uhr schrieb Ravishankar N
:
>
>
> On 22/05/19 12:39 PM, Hu Bert wrote:
> > Hi @ll,
> >
> > today i updated and rebooted the 3 servers of my replicate 3 setup;
> > after the 3rd one came up again i noticed this error:
> >
> > [2019-05-22 06:41:26.781165] E [MSGID: 108008]
> > [afr-self-heal-common.c:392:afr_gfid_split_brain_source]
> > 0-workdata-replicate-0: Gfid mismatch detected for
> > /120710351>,
> > 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
> > eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.
>
> 120710351 seems to be the entry that is in split-brain. Is
> /staticmap/120/710/120710351 the complete path to that entry? (check if
> gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710).
>
> You can then try "gluster volume heal workdata split-brain source-brick
> gluster1:/gluster/md4/workdata /staticmap/120/710/120710351"
>
> -Ravi
>
> > [2019-05-22 06:41:27.069969] W [MSGID: 108027]
> > [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
> > no read subvols for /staticmap/120/710/120710351
> > [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
> > 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
> > (Transport endpoint is not connected)
> >
> > A simple 'gluster volume heal workdata' didn't help; 'gluster volume
> > heal workdata info' says:
> >
> > Brick gluster1:/gluster/md4/workdata
> > /staticmap/120/710
> > /staticmap/120/710/120710351
> > 
> > Status: Connected
> > Number of entries: 3
> >
> > Brick gluster2:/gluster/md4/workdata
> > /staticmap/120/710
> > /staticmap/120/710/120710351
> > 
> > Status: Connected
> > Number of entries: 3
> >
> > Brick gluster3:/gluster/md4/workdata
> > /staticmap/120/710/120710351
> > Status: Connected
> > Number of entries: 1
> >
> > There's a mismatch in one directory; I tried to follow these instructions:
> > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
> >
> > gluster volume heal workdata split-brain source-brick
> > gluster1:/gluster/md4/workdata
> > gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
> > Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
> > split-brain.
> > Volume heal failed.
>
> >
> > Is there any other documentation for gfid mismatch and how to resolve this?
> >
> >
> > Thx,
> > Hubert
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster 5.6: Gfid mismatch detected

2019-05-22 Thread Hu Bert

Hi @ll,

today i updated and rebooted the 3 servers of my replicate 3 setup;
after the 3rd one came up again i noticed this error:

[2019-05-22 06:41:26.781165] E [MSGID: 108008]
[afr-self-heal-common.c:392:afr_gfid_split_brain_source]
0-workdata-replicate-0: Gfid mismatch detected for
/120710351>,
82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.
[2019-05-22 06:41:27.069969] W [MSGID: 108027]
[afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
no read subvols for /staticmap/120/710/120710351
[2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
(Transport endpoint is not connected)

A simple 'gluster volume heal workdata' didn't help; 'gluster volume
heal workdata info' says:

Brick gluster1:/gluster/md4/workdata
/staticmap/120/710
/staticmap/120/710/120710351

Status: Connected
Number of entries: 3

Brick gluster2:/gluster/md4/workdata
/staticmap/120/710
/staticmap/120/710/120710351

Status: Connected
Number of entries: 3

Brick gluster3:/gluster/md4/workdata
/staticmap/120/710/120710351
Status: Connected
Number of entries: 1

There's a mismatch in one directory; I tried to follow these instructions:
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/

gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
split-brain.
Volume heal failed.

Is there any other documentation for gfid mismatch and how to resolve this?


Thx,
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-29 Thread Hu Bert

Good morning,

back in office... ;-) i reactivated quick-read on both volumes and
watched traffic, which now looks normal. Well, i did umount/mount of
both gluster volumes after installing the upgrades 5.5 -> 5.6, but it
seems that this wasn't enough? And the changes took place after doing
a reboot (kernel update...) of all clients? Maybe some processes were
still running?

I'll keep watching network traffic and report if i see that it's
higher than usual.


Best regards,
Hubert

Am Di., 23. Apr. 2019 um 15:34 Uhr schrieb Poornima Gurusiddaiah
:
>
> Hi,
>
> Thank you for the update, sorry for the delay.
>
> I did some more tests, but couldn't see the behaviour of spiked network 
> bandwidth usage when quick-read is on. After upgrading, have you remounted 
> the clients? As in the fix will not be effective until the process is 
> restarted.
> If you have already restarted the client processes, then there must be 
> something related to workload in the live system that is triggering a bug in 
> quick-read. Would need wireshark capture if possible, to debug further.
>
> Regards,
> Poornima
>
> On Tue, Apr 16, 2019 at 6:25 PM Hu Bert  wrote:
>>
>> Hi Poornima,
>>
>> thx for your efforts. I made a couple of tests and the results are the
>> same, so the options are not related. Anyway, i'm not able to
>> reproduce the problem on my testing system, although the volume
>> options are the same.
>>
>> About 1.5 hours ago i set performance.quick-read to on again and
>> watched: load/iowait went up (not bad at the moment, little traffic),
>> but network traffic went up - from <20 MBit/s up to 160 MBit/s. After
>> deactivating quick-read traffic dropped to < 20 MBit/s again.
>>
>> munin graph: https://abload.de/img/network-client4s0kle.png
>>
>> The 2nd peak is from the last test.
>>
>>
>> Thx,
>> Hubert
>>
>> Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert :
>> >
>> > In my first test on my testing setup the traffic was on a normal
>> > level, so i thought i was "safe". But on my live system the network
>> > traffic was a multiple of the traffic one would expect.
>> > performance.quick-read was enabled in both, the only difference in the
>> > volume options between live and testing are:
>> >
>> > performance.read-ahead: testing on, live off
>> > performance.io-cache: testing on, live off
>> >
>> > I ran another test on my testing setup, deactivated both and copied 9
>> > GB of data. Now the traffic went up as well, from before ~9-10 MBit/s
>> > up to 100 MBit/s with both options off. Does performance.quick-read
>> > require one of those options set to 'on'?
>> >
>> > I'll start another test shortly, and activate on of those 2 options,
>> > maybe there's a connection between those 3 options?
>> >
>> >
>> > Best Regards,
>> > Hubert
>> >
>> > Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah
>> > :
>> > >
>> > > Thank you for reporting this. I had done testing on my local setup and 
>> > > the issue was resolved even with quick-read enabled. Let me test it 
>> > > again.
>> > >
>> > > Regards,
>> > > Poornima
>> > >
>> > > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:
>> > >>
>> > >> fyi: after setting performance.quick-read to off network traffic
>> > >> dropped to normal levels, client load/iowait back to normal as well.
>> > >>
>> > >> client: https://abload.de/img/network-client-afterihjqi.png
>> > >> server: https://abload.de/img/network-server-afterwdkrl.png
>> > >>
>> > >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert 
>> > >> :
>> > >> >
>> > >> > Good Morning,
>> > >> >
>> > >> > today i updated my replica 3 setup (debian stretch) from version 5.5
>> > >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
>> > >> > i could re-activate 'performance.quick-read' again. See release notes:
>> > >> >
>> > >> > https://review.gluster.org/#/c/glusterfs/+/22538/
>> > >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
>> > >> >
>> > >> > Upgrade went fine, and then i was watching iowait and network traffic.
>> > >> > It seems that the network traff

Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-16 Thread Hu Bert

Hi Poornima,

thx for your efforts. I made a couple of tests and the results are the
same, so the options are not related. Anyway, i'm not able to
reproduce the problem on my testing system, although the volume
options are the same.

About 1.5 hours ago i set performance.quick-read to on again and
watched: load/iowait went up (not bad at the moment, little traffic),
but network traffic went up - from <20 MBit/s up to 160 MBit/s. After
deactivating quick-read traffic dropped to < 20 MBit/s again.

munin graph: https://abload.de/img/network-client4s0kle.png

The 2nd peak is from the last test.


Thx,
Hubert

Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert :
>
> In my first test on my testing setup the traffic was on a normal
> level, so i thought i was "safe". But on my live system the network
> traffic was a multiple of the traffic one would expect.
> performance.quick-read was enabled in both, the only difference in the
> volume options between live and testing are:
>
> performance.read-ahead: testing on, live off
> performance.io-cache: testing on, live off
>
> I ran another test on my testing setup, deactivated both and copied 9
> GB of data. Now the traffic went up as well, from before ~9-10 MBit/s
> up to 100 MBit/s with both options off. Does performance.quick-read
> require one of those options set to 'on'?
>
> I'll start another test shortly, and activate on of those 2 options,
> maybe there's a connection between those 3 options?
>
>
> Best Regards,
> Hubert
>
> Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah
> :
> >
> > Thank you for reporting this. I had done testing on my local setup and the 
> > issue was resolved even with quick-read enabled. Let me test it again.
> >
> > Regards,
> > Poornima
> >
> > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:
> >>
> >> fyi: after setting performance.quick-read to off network traffic
> >> dropped to normal levels, client load/iowait back to normal as well.
> >>
> >> client: https://abload.de/img/network-client-afterihjqi.png
> >> server: https://abload.de/img/network-server-afterwdkrl.png
> >>
> >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert 
> >> :
> >> >
> >> > Good Morning,
> >> >
> >> > today i updated my replica 3 setup (debian stretch) from version 5.5
> >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
> >> > i could re-activate 'performance.quick-read' again. See release notes:
> >> >
> >> > https://review.gluster.org/#/c/glusterfs/+/22538/
> >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
> >> >
> >> > Upgrade went fine, and then i was watching iowait and network traffic.
> >> > It seems that the network traffic went up after upgrade and
> >> > reactivation of performance.quick-read. Here are some graphs:
> >> >
> >> > network client1: https://abload.de/img/network-clientfwj1m.png
> >> > network client2: https://abload.de/img/network-client2trkow.png
> >> > network server: https://abload.de/img/network-serverv3jjr.png
> >> >
> >> > gluster volume info: https://pastebin.com/ZMuJYXRZ
> >> >
> >> > Just wondering if the network traffic bug really got fixed or if this
> >> > is a new problem. I'll wait a couple of minutes and then deactivate
> >> > performance.quick-read again, just to see if network traffic goes down
> >> > to normal levels.
> >> >
> >> >
> >> > Best regards,
> >> > Hubert
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-16 Thread Hu Bert

In my first test on my testing setup the traffic was on a normal
level, so i thought i was "safe". But on my live system the network
traffic was a multiple of the traffic one would expect.
performance.quick-read was enabled in both, the only difference in the
volume options between live and testing are:

performance.read-ahead: testing on, live off
performance.io-cache: testing on, live off

I ran another test on my testing setup, deactivated both and copied 9
GB of data. Now the traffic went up as well, from before ~9-10 MBit/s
up to 100 MBit/s with both options off. Does performance.quick-read
require one of those options set to 'on'?

I'll start another test shortly, and activate on of those 2 options,
maybe there's a connection between those 3 options?


Best Regards,
Hubert

Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah
:
>
> Thank you for reporting this. I had done testing on my local setup and the 
> issue was resolved even with quick-read enabled. Let me test it again.
>
> Regards,
> Poornima
>
> On Mon, Apr 15, 2019 at 12:25 PM Hu Bert  wrote:
>>
>> fyi: after setting performance.quick-read to off network traffic
>> dropped to normal levels, client load/iowait back to normal as well.
>>
>> client: https://abload.de/img/network-client-afterihjqi.png
>> server: https://abload.de/img/network-server-afterwdkrl.png
>>
>> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert :
>> >
>> > Good Morning,
>> >
>> > today i updated my replica 3 setup (debian stretch) from version 5.5
>> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and
>> > i could re-activate 'performance.quick-read' again. See release notes:
>> >
>> > https://review.gluster.org/#/c/glusterfs/+/22538/
>> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
>> >
>> > Upgrade went fine, and then i was watching iowait and network traffic.
>> > It seems that the network traffic went up after upgrade and
>> > reactivation of performance.quick-read. Here are some graphs:
>> >
>> > network client1: https://abload.de/img/network-clientfwj1m.png
>> > network client2: https://abload.de/img/network-client2trkow.png
>> > network server: https://abload.de/img/network-serverv3jjr.png
>> >
>> > gluster volume info: https://pastebin.com/ZMuJYXRZ
>> >
>> > Just wondering if the network traffic bug really got fixed or if this
>> > is a new problem. I'll wait a couple of minutes and then deactivate
>> > performance.quick-read again, just to see if network traffic goes down
>> > to normal levels.
>> >
>> >
>> > Best regards,
>> > Hubert
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-15 Thread Hu Bert

fyi: after setting performance.quick-read to off network traffic
dropped to normal levels, client load/iowait back to normal as well.

client: https://abload.de/img/network-client-afterihjqi.png
server: https://abload.de/img/network-server-afterwdkrl.png

Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert :
>
> Good Morning,
>
> today i updated my replica 3 setup (debian stretch) from version 5.5
> to 5.6, as i thought the network traffic bug (#1673058) was fixed and
> i could re-activate 'performance.quick-read' again. See release notes:
>
> https://review.gluster.org/#/c/glusterfs/+/22538/
> http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795
>
> Upgrade went fine, and then i was watching iowait and network traffic.
> It seems that the network traffic went up after upgrade and
> reactivation of performance.quick-read. Here are some graphs:
>
> network client1: https://abload.de/img/network-clientfwj1m.png
> network client2: https://abload.de/img/network-client2trkow.png
> network server: https://abload.de/img/network-serverv3jjr.png
>
> gluster volume info: https://pastebin.com/ZMuJYXRZ
>
> Just wondering if the network traffic bug really got fixed or if this
> is a new problem. I'll wait a couple of minutes and then deactivate
> performance.quick-read again, just to see if network traffic goes down
> to normal levels.
>
>
> Best regards,
> Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed?

2019-04-15 Thread Hu Bert

Good Morning,

today i updated my replica 3 setup (debian stretch) from version 5.5
to 5.6, as i thought the network traffic bug (#1673058) was fixed and
i could re-activate 'performance.quick-read' again. See release notes:

https://review.gluster.org/#/c/glusterfs/+/22538/
http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795

Upgrade went fine, and then i was watching iowait and network traffic.
It seems that the network traffic went up after upgrade and
reactivation of performance.quick-read. Here are some graphs:

network client1: https://abload.de/img/network-clientfwj1m.png
network client2: https://abload.de/img/network-client2trkow.png
network server: https://abload.de/img/network-serverv3jjr.png

gluster volume info: https://pastebin.com/ZMuJYXRZ

Just wondering if the network traffic bug really got fixed or if this
is a new problem. I'll wait a couple of minutes and then deactivate
performance.quick-read again, just to see if network traffic goes down
to normal levels.


Best regards,
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters

2019-04-01 Thread Hu Bert

Good morning,

it seems like setting performance.quick-read to off (context:
increased network traffic
https://bugzilla.redhat.com/show_bug.cgi?id=1673058) solved the main
problem. See those 2 munin graphs, especially network and iowait on
March 24th and 31st (high traffic days); param was set to off on March
26th.

network: https://abload.de/img/client-internal-netwoh3kh7.png
cpu: https://abload.de/img/client-cpu-iowaitatkfc.png

I'll keep watching this, but hopefully the problems have disappeared.
Awaiting glusterfs v5.6 with the bugfix; then, after re-enabling
quick-read, i'll check again.


Regards,
Hubert

Am Fr., 29. März 2019 um 07:47 Uhr schrieb Hu Bert :
>
> Hi Raghavendra,
>
> i'll try to gather the information you need, hopefully this weekend.
>
> One thing i've done this week: deactivate performance.quick-read
> (https://bugzilla.redhat.com/show_bug.cgi?id=1673058), which
> (according to munin) ended in a massive drop in network traffic and a
> slightly lower iowait. Maybe that has helped already. We'll see.
>
> performance.nl-cache is deactivated due to unreadable
> files/directories; we have a highly concurrent workload. There are
> some nginx backend webservers that check if a requested file exists in
> the glusterfs filesystem; i counted the log entries, this can be up to
> 5 million entries a day; about 2/3 of the files are found in the
> filesystem, they get delivered to the frontend; if not: the nginx's
> send the  request via round robin to 3 backend tomcats, and they have
> to check whether a directory exists or not (and then create it and the
> requested files). So it happens that tomcatA creates a directory and a
> file in it, and within (milli)seconds tomcatB+C create additional
> files in this dir.
>
> Deactivating nl-cache helped to solve this issue, after having
> conversation with Nithya and Ravishankar. Just wanted to explain that.
>
>
> Thx so far,
> Hubert
>
> Am Fr., 29. März 2019 um 06:29 Uhr schrieb Raghavendra Gowdappa
> :
> >
> > +Gluster-users
> >
> > Sorry about the delay. There is nothing suspicious about per thread CPU 
> > utilization of glusterfs process. However looking at the volume profile 
> > attached I see huge number of lookups. I think if we cutdown the number of 
> > lookups probably we'll see improvements in performance. I need following 
> > information:
> >
> > * dump of fuse traffic under heavy load (use --dump-fuse option while 
> > mounting)
> > * client volume profile for the duration of heavy load - 
> > https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/
> > * corresponding brick volume profile
> >
> > Basically I need to find out
> > * whether these lookups are on existing files or non-existent files
> > * whether they are on directories or files
> > * why/whether md-cache or kernel attribute cache or nl-cache will help to 
> > cut down lookups.
> >
> > regards,
> > Raghavendra
> >
> > On Mon, Mar 25, 2019 at 12:13 PM Hu Bert  wrote:
> >>
> >> Hi Raghavendra,
> >>
> >> sorry, this took a while. The last weeks the weather was bad -> less
> >> traffic, but this weekend there was a massive peak. I made 3 profiles
> >> with top, but at first look there's nothing special here.
> >>
> >> I also made a gluster profile (on one of the servers) at a later
> >> moment. Maybe that helps. I also added some munin graphics from 2 of
> >> the clients and 1 graphic of server network, just to show how massive
> >> the problem is.
> >>
> >> Just wondering if the high io wait is related to the high network
> >> traffic bug (https://bugzilla.redhat.com/show_bug.cgi?id=1673058); if
> >> so, i could deactivate performance.quick-read and check if there is
> >> less iowait. If that helps: wonderful - and yearningly awaiting
> >> updated packages (e.g. v5.6). If not: maybe we have to switch from our
> >> normal 10TB hdds (raid10) to SSDs if the problem is based on slow
> >> hardware in the use case of small files (images).
> >>
> >>
> >> Thx,
> >> Hubert
> >>
> >> Am Mo., 4. März 2019 um 16:59 Uhr schrieb Raghavendra Gowdappa
> >> :
> >> >
> >> > Were you seeing high Io-wait when you captured the top output? I guess 
> >> > not as you mentioned the load increases during weekend. Please note that 
> >> > this data has to be captured when you are experiencing problems.
> >> >
> >> > On Mon, Mar 4, 2019 at 8:02 PM Hu Bert  wrote:
> >> >>
> >> >> Hi,
> &

Re: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters

2019-03-29 Thread Hu Bert

Hi Raghavendra,

i'll try to gather the information you need, hopefully this weekend.

One thing i've done this week: deactivate performance.quick-read
(https://bugzilla.redhat.com/show_bug.cgi?id=1673058), which
(according to munin) ended in a massive drop in network traffic and a
slightly lower iowait. Maybe that has helped already. We'll see.

performance.nl-cache is deactivated due to unreadable
files/directories; we have a highly concurrent workload. There are
some nginx backend webservers that check if a requested file exists in
the glusterfs filesystem; i counted the log entries, this can be up to
5 million entries a day; about 2/3 of the files are found in the
filesystem, they get delivered to the frontend; if not: the nginx's
send the  request via round robin to 3 backend tomcats, and they have
to check whether a directory exists or not (and then create it and the
requested files). So it happens that tomcatA creates a directory and a
file in it, and within (milli)seconds tomcatB+C create additional
files in this dir.

Deactivating nl-cache helped to solve this issue, after having
conversation with Nithya and Ravishankar. Just wanted to explain that.


Thx so far,
Hubert

Am Fr., 29. März 2019 um 06:29 Uhr schrieb Raghavendra Gowdappa
:
>
> +Gluster-users
>
> Sorry about the delay. There is nothing suspicious about per thread CPU 
> utilization of glusterfs process. However looking at the volume profile 
> attached I see huge number of lookups. I think if we cutdown the number of 
> lookups probably we'll see improvements in performance. I need following 
> information:
>
> * dump of fuse traffic under heavy load (use --dump-fuse option while 
> mounting)
> * client volume profile for the duration of heavy load - 
> https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/
> * corresponding brick volume profile
>
> Basically I need to find out
> * whether these lookups are on existing files or non-existent files
> * whether they are on directories or files
> * why/whether md-cache or kernel attribute cache or nl-cache will help to cut 
> down lookups.
>
> regards,
> Raghavendra
>
> On Mon, Mar 25, 2019 at 12:13 PM Hu Bert  wrote:
>>
>> Hi Raghavendra,
>>
>> sorry, this took a while. The last weeks the weather was bad -> less
>> traffic, but this weekend there was a massive peak. I made 3 profiles
>> with top, but at first look there's nothing special here.
>>
>> I also made a gluster profile (on one of the servers) at a later
>> moment. Maybe that helps. I also added some munin graphics from 2 of
>> the clients and 1 graphic of server network, just to show how massive
>> the problem is.
>>
>> Just wondering if the high io wait is related to the high network
>> traffic bug (https://bugzilla.redhat.com/show_bug.cgi?id=1673058); if
>> so, i could deactivate performance.quick-read and check if there is
>> less iowait. If that helps: wonderful - and yearningly awaiting
>> updated packages (e.g. v5.6). If not: maybe we have to switch from our
>> normal 10TB hdds (raid10) to SSDs if the problem is based on slow
>> hardware in the use case of small files (images).
>>
>>
>> Thx,
>> Hubert
>>
>> Am Mo., 4. März 2019 um 16:59 Uhr schrieb Raghavendra Gowdappa
>> :
>> >
>> > Were you seeing high Io-wait when you captured the top output? I guess not 
>> > as you mentioned the load increases during weekend. Please note that this 
>> > data has to be captured when you are experiencing problems.
>> >
>> > On Mon, Mar 4, 2019 at 8:02 PM Hu Bert  wrote:
>> >>
>> >> Hi,
>> >> sending the link directly to  you and not the list, you can distribute
>> >> if necessary. the command ran for about half a minute. Is that enough?
>> >> More? Less?
>> >>
>> >> https://download.outdooractive.com/top.output.tar.gz
>> >>
>> >> Am Mo., 4. März 2019 um 15:21 Uhr schrieb Raghavendra Gowdappa
>> >> :
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa 
>> >> >  wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Mon, Mar 4, 2019 at 4:26 PM Hu Bert  wrote:
>> >> >>>
>> >> >>> Hi Raghavendra,
>> >> >>>
>> >> >>> at the moment iowait and cpu consumption is quite low, the main
>> >> >>> problems appear during the weekend (high traffic, especially on
>> >> >>> sunday), so either we have to wait until next sunday or use a t

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-21 Thread Hu Bert

Good morning,

looks like on 2 clients there was an automatic cleanup:

[2019-03-21 05:04:52.857127] I [fuse-bridge.c:5144:fuse_thread_proc]
0-fuse: initating unmount of /data/repository/shared/public
[2019-03-21 05:04:52.857507] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7fa062cf64a4]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+
0xfd) [0x56223e5b291d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54)
[0x56223e5b2774] ) 0-: received signum (15), shutting down
[2019-03-21 05:04:52.857532] I [fuse-bridge.c:5914:fini] 0-fuse:
Unmounting '/data/repository/shared/public'.
[2019-03-21 05:04:52.857547] I [fuse-bridge.c:5919:fini] 0-fuse:
Closing fuse connection to '/data/repository/shared/public'.

On the 3rd client i unmounted both volumes, killed the 4 processes and
mounted the volumes again. Now no more "dict is NULL" messages. Fine
:-)

Best regards,
Hubert

Am Mi., 20. März 2019 um 09:39 Uhr schrieb Hu Bert :
>
> Hi,
>
> i updated our live systems (debian stretch) from 5.3 -> 5.5 this
> morning; update went fine so far :-)
>
> However, on 3 (of 9) clients, the log entries still appear. The
> upgrade steps for all clients were identical:
>
> - install 5.5 (via apt upgrade)
> - umount volumes
> - mount volumes
>
> Interestingly the log entries still refer to version 5.3:
>
> [2019-03-20 08:38:31.880132] W [dict.c:761:dict_ref]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4)
> [0x7f35f214ddf4]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d)
> [0x7f35f235f39d]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
> [0x7f35f9403a38] ) 11-dict: dict is NULL [Invalid argument]
>
> First i thought there could be old processes running/hanging on these
> 3 clients, but I see that there are 4 processes (for 2 volumes)
> running on all clients:
>
> root 11234  0.0  0.2 1858720 580964 ?  Ssl  Mar11   7:23
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --lru-limit=0 --process-name fuse --volfile-server=gluster1
> --volfile-id=/persistent /data/repository/shared/private
> root 11323  0.6  2.5 10061536 6788940 ?Ssl  Mar11  77:42
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --lru-limit=0 --process-name fuse --volfile-server=gluster1
> --volfile-id=/workdata /data/repository/shared/public
> root 11789  0.0  0.0 874116 11076 ?Ssl  07:32   0:00
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --process-name fuse --volfile-server=gluster1 --volfile-id=/persistent
> /data/repository/shared/private
> root 11881  0.0  0.0 874116 10992 ?Ssl  07:32   0:00
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --process-name fuse --volfile-server=gluster1 --volfile-id=/workdata
> /data/repository/shared/public
>
> The first 2 processes are for the "old" mount (with lru-limit=0), the
> last 2 processes are for the "new" mount. But only 3 clients still
> have these entries. Systems are running fine, no problems so far.
> Maybe wrong order of the update? If i look at
> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ -
> then it would be better to: unmount - upgrade - mount?
>
>
> Best regards,
> Hubert
>
> Am Di., 19. März 2019 um 15:53 Uhr schrieb Artem Russakovskii
> :
> >
> > The flood is indeed fixed for us on 5.5. However, the crashes are not.
> >
> > Sincerely,
> > Artem
> >
> > --
> > Founder, Android Police, APK Mirror, Illogical Robot LLC
> > beerpla.net | +ArtemRussakovskii | @ArtemR
> >
> >
> > On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
> >>
> >> Hi Amar,
> >>
> >> if you refer to this bug:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
> >> setup i haven't seen those entries, while copying & deleting a few GBs
> >> of data. For a final statement we have to wait until i updated our
> >> live gluster servers - could take place on tuesday or wednesday.
> >>
> >> Maybe other users can do an update to 5.4 as well and report back here.
> >>
> >>
> >> Hubert
> >>
> >>
> >>
> >> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
> >> :
> >> >
> >> > Hi Hu Bert,
> >> >
> >> > Appreciate the feedback. Also are the other boiling issues related to 
> >> > logs fixed now?
> >> >
> >> > -Amar
> >> >
> >> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
> >> >>
> >&

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-20 Thread Hu Bert

Hi,

i updated our live systems (debian stretch) from 5.3 -> 5.5 this
morning; update went fine so far :-)

However, on 3 (of 9) clients, the log entries still appear. The
upgrade steps for all clients were identical:

- install 5.5 (via apt upgrade)
- umount volumes
- mount volumes

Interestingly the log entries still refer to version 5.3:

[2019-03-20 08:38:31.880132] W [dict.c:761:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4)
[0x7f35f214ddf4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d)
[0x7f35f235f39d]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
[0x7f35f9403a38] ) 11-dict: dict is NULL [Invalid argument]

First i thought there could be old processes running/hanging on these
3 clients, but I see that there are 4 processes (for 2 volumes)
running on all clients:

root 11234  0.0  0.2 1858720 580964 ?  Ssl  Mar11   7:23
/usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
--lru-limit=0 --process-name fuse --volfile-server=gluster1
--volfile-id=/persistent /data/repository/shared/private
root 11323  0.6  2.5 10061536 6788940 ?Ssl  Mar11  77:42
/usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
--lru-limit=0 --process-name fuse --volfile-server=gluster1
--volfile-id=/workdata /data/repository/shared/public
root 11789  0.0  0.0 874116 11076 ?Ssl  07:32   0:00
/usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
--process-name fuse --volfile-server=gluster1 --volfile-id=/persistent
/data/repository/shared/private
root 11881  0.0  0.0 874116 10992 ?Ssl  07:32   0:00
/usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
--process-name fuse --volfile-server=gluster1 --volfile-id=/workdata
/data/repository/shared/public

The first 2 processes are for the "old" mount (with lru-limit=0), the
last 2 processes are for the "new" mount. But only 3 clients still
have these entries. Systems are running fine, no problems so far.
Maybe wrong order of the update? If i look at
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ -
then it would be better to: unmount - upgrade - mount?


Best regards,
Hubert

Am Di., 19. März 2019 um 15:53 Uhr schrieb Artem Russakovskii
:
>
> The flood is indeed fixed for us on 5.5. However, the crashes are not.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police, APK Mirror, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii | @ArtemR
>
>
> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
>>
>> Hi Amar,
>>
>> if you refer to this bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
>> setup i haven't seen those entries, while copying & deleting a few GBs
>> of data. For a final statement we have to wait until i updated our
>> live gluster servers - could take place on tuesday or wednesday.
>>
>> Maybe other users can do an update to 5.4 as well and report back here.
>>
>>
>> Hubert
>>
>>
>>
>> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
>> :
>> >
>> > Hi Hu Bert,
>> >
>> > Appreciate the feedback. Also are the other boiling issues related to logs 
>> > fixed now?
>> >
>> > -Amar
>> >
>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
>> >>
>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
>> >> volumes done. In 'gluster peer status' the peers stay connected during
>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
>> >> logs. Looks good :-)
>> >>
>> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert 
>> >> :
>> >> >
>> >> > Good morning :-)
>> >> >
>> >> > for debian the packages are there:
>> >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>> >> >
>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there
>> >> > are some errors etc. and report back.
>> >> >
>> >> > btw: no release notes for 5.4 and 5.5 so far?
>> >> > https://docs.gluster.org/en/latest/release-notes/ ?
>> >> >
>> >> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
>> >> > :
>> >> > >
>> >> > > We created a 5.5 release tag, and it is under packaging now. It should
>> >> > > be packaged and ready for testing early next week and should be 
>> >> > > released
>> >> >

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-18 Thread Hu Bert

Hi Amar,

if you refer to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
setup i haven't seen those entries, while copying & deleting a few GBs
of data. For a final statement we have to wait until i updated our
live gluster servers - could take place on tuesday or wednesday.

Maybe other users can do an update to 5.4 as well and report back here.


Hubert



Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
:
>
> Hi Hu Bert,
>
> Appreciate the feedback. Also are the other boiling issues related to logs 
> fixed now?
>
> -Amar
>
> On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
>>
>> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
>> volumes done. In 'gluster peer status' the peers stay connected during
>> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
>> logs. Looks good :-)
>>
>> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert :
>> >
>> > Good morning :-)
>> >
>> > for debian the packages are there:
>> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>> >
>> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there
>> > are some errors etc. and report back.
>> >
>> > btw: no release notes for 5.4 and 5.5 so far?
>> > https://docs.gluster.org/en/latest/release-notes/ ?
>> >
>> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
>> > :
>> > >
>> > > We created a 5.5 release tag, and it is under packaging now. It should
>> > > be packaged and ready for testing early next week and should be released
>> > > close to mid-week next week.
>> > >
>> > > Thanks,
>> > > Shyam
>> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
>> > > > Wednesday now with no update :-/
>> > > >
>> > > > Sincerely,
>> > > > Artem
>> > > >
>> > > > --
>> > > > Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> > > > <http://www.apkmirror.com/>, Illogical Robot LLC
>> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> > > > <http://twitter.com/ArtemR>
>> > > >
>> > > >
>> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii 
>> > > > > > > > <mailto:archon...@gmail.com>> wrote:
>> > > >
>> > > > Hi Amar,
>> > > >
>> > > > Any updates on this? I'm still not seeing it in OpenSUSE build
>> > > > repos. Maybe later today?
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Sincerely,
>> > > > Artem
>> > > >
>> > > > --
>> > > > Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> > > > <http://www.apkmirror.com/>, Illogical Robot LLC
>> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> > > > <http://twitter.com/ArtemR>
>> > > >
>> > > >
>> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan
>> > > > mailto:atumb...@redhat.com>> wrote:
>> > > >
>> > > > We are talking days. Not weeks. Considering already it is
>> > > > Thursday here. 1 more day for tagging, and packaging. May be ok
>> > > > to expect it on Monday.
>> > > >
>> > > > -Amar
>> > > >
>> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii
>> > > > mailto:archon...@gmail.com>> wrote:
>> > > >
>> > > > Is the next release going to be an imminent hotfix, i.e.
>> > > > something like today/tomorrow, or are we talking weeks?
>> > > >
>> > > > Sincerely,
>> > > > Artem
>> > > >
>> > > > --
>> > > > Founder, Android Police <http://www.androidpolice.com>, APK
>> > > > Mirror <http://www.apkmirror.com/>, Illogical Robo

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-18 Thread Hu Bert

update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
volumes done. In 'gluster peer status' the peers stay connected during
the upgrade, no 'peer rejected' messages. No cksum mismatches in the
logs. Looks good :-)

Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert :
>
> Good morning :-)
>
> for debian the packages are there:
> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>
> I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there
> are some errors etc. and report back.
>
> btw: no release notes for 5.4 and 5.5 so far?
> https://docs.gluster.org/en/latest/release-notes/ ?
>
> Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
> :
> >
> > We created a 5.5 release tag, and it is under packaging now. It should
> > be packaged and ready for testing early next week and should be released
> > close to mid-week next week.
> >
> > Thanks,
> > Shyam
> > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
> > > Wednesday now with no update :-/
> > >
> > > Sincerely,
> > > Artem
> > >
> > > --
> > > Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> > > <http://www.apkmirror.com/>, Illogical Robot LLC
> > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> > > <http://twitter.com/ArtemR>
> > >
> > >
> > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii  > > <mailto:archon...@gmail.com>> wrote:
> > >
> > > Hi Amar,
> > >
> > > Any updates on this? I'm still not seeing it in OpenSUSE build
> > > repos. Maybe later today?
> > >
> > > Thanks.
> > >
> > > Sincerely,
> > > Artem
> > >
> > > --
> > > Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> > > <http://www.apkmirror.com/>, Illogical Robot LLC
> > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> > > <http://twitter.com/ArtemR>
> > >
> > >
> > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan
> > > mailto:atumb...@redhat.com>> wrote:
> > >
> > > We are talking days. Not weeks. Considering already it is
> > > Thursday here. 1 more day for tagging, and packaging. May be ok
> > > to expect it on Monday.
> > >
> > > -Amar
> > >
> > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii
> > > mailto:archon...@gmail.com>> wrote:
> > >
> > > Is the next release going to be an imminent hotfix, i.e.
> > > something like today/tomorrow, or are we talking weeks?
> > >
> > > Sincerely,
> > > Artem
> > >
> > > --
> > > Founder, Android Police <http://www.androidpolice.com>, APK
> > > Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
> > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> > > <http://twitter.com/ArtemR>
> > >
> > >
> > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii
> > > mailto:archon...@gmail.com>> wrote:
> > >
> > > Ended up downgrading to 5.3 just in case. Peer status
> > > and volume status are OK now.
> > >
> > > zypper install --oldpackage glusterfs-5.3-lp150.100.1
> > > Loading repository data...
> > > Reading installed packages...
> > > Resolving package dependencies...
> > >
> > > Problem: glusterfs-5.3-lp150.100.1.x86_64 requires
> > > libgfapi0 = 5.3, but this requirement cannot be provided
> > >   not installable providers:
> > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs]
> > >  Solution 1: Following actions will be done:
> > >   downgrade of libgfapi0-5.4-lp150.100.1.x86_64 to
> > > libgfapi0-5.3-lp150.100.1.x86_64
> > >   down

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-18 Thread Hu Bert

   Choose from above solutions by number or cancel
> > [1/2/3/c] (c): 1
> > Resolving dependencies...
> > Resolving package dependencies...
> >
> > The following 6 packages are going to be downgraded:
> >   glusterfs libgfapi0 libgfchangelog0 libgfrpc0
> > libgfxdr0 libglusterfs0
> >
> > 6 packages to downgrade.
> >
> > Sincerely,
> > Artem
> >
> > --
> > Founder, Android Police
> > <http://www.androidpolice.com>, APK Mirror
> > <http://www.apkmirror.com/>, Illogical Robot LLC
> > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
> > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> > <http://twitter.com/ArtemR>
> >
> >
> > On Tue, Mar 5, 2019 at 10:57 AM Artem Russakovskii
> > mailto:archon...@gmail.com>> wrote:
> >
> > Noticed the same when upgrading from 5.3 to 5.4, as
> > mentioned.
> >
> > I'm confused though. Is actual replication affected,
> > because the 5.4 server and the 3x 5.3 servers still
> > show heal info as all 4 connected, and the files
> > seem to be replicating correctly as well.
> >
> > So what's actually affected - just the status
> > command, or leaving 5.4 on one of the nodes is doing
> > some damage to the underlying fs? Is it fixable by
> > tweaking transport.socket.ssl-enabled? Does
> > upgrading all servers to 5.4 resolve it, or should
> > we revert back to 5.3?
> >
> > Sincerely,
> > Artem
> >
> > --
> > Founder, Android Police
> > <http://www.androidpolice.com>, APK Mirror
> > <http://www.apkmirror.com/>, Illogical Robot LLC
> > beerpla.net <http://beerpla.net/> |
> > +ArtemRussakovskii
> > <https://plus.google.com/+ArtemRussakovskii>
> > | @ArtemR <http://twitter.com/ArtemR>
> >
> >
> > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert
> >  > <mailto:revi...@googlemail.com>> wrote:
> >
> > fyi: did a downgrade 5.4 -> 5.3 and it worked.
> > all replicas are up and
> > running. Awaiting updated v5.4.
> >
> > thx :-)
> >
> > Am Di., 5. März 2019 um 09:26 Uhr schrieb Hari
> > Gowtham  > <mailto:hgowt...@redhat.com>>:
> > >
> > > There are plans to revert the patch causing
> > this error and rebuilt 5.4.
> > > This should happen faster. the rebuilt 5.4
> > should be void of this upgrade issue.
> > >
> > > In the meantime, you can use 5.3 for this cluster.
> > > Downgrading to 5.3 will work if it was just
> > one node that was upgrade to 5.4
> > > and the other nodes are still in 5.3.
> > >
> > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert
> >  > <mailto:revi...@googlemail.com>> wrote:
> > > >
> >     > > Hi Hari,
> > > >
> > > > thx for the hint. Do you know when this will
> > be fixed? Is a downgrade
> > > > 5.4 -> 5.3 a possibility to fix this?
> > > >
> > > > Hubert
> > > >
> > > > Am Di., 5. März 2019 um 08:32 Uhr schrieb
> > Hari Gowtham  >

Re: [Gluster-users] ganesha-gfapi

2019-03-14 Thread Hu Bert

btw.: re-adding the list ;-)

there's another bug: https://bugzilla.redhat.com/show_bug.cgi?id=1671603

nfs-ganesha is not mentioned there - maybe adding some relevant log
entries or stack traces to the bug report might be good.

for me setting 'performance.parallel-readdir off' solved the issue; a
developer told me that a fix will find its way into a 5.x update.



Am Mi., 13. März 2019 um 16:34 Uhr schrieb Valerio Luccio
:
>
> On 3/13/19 11:06 AM, Hu Bert wrote:
>
> > Hi Valerio,
> >
> > is an already known "behaviour" and maybe a bug:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1674225=DwIFaQ=slrrB7dE8n7gBJbeO0g-IQ=zZK0dca4HNf-XwnAN9ais1C3ncS0n2x39pF7yr-muHY=t4iNWel0wQK63nIiDxlIQnBc8ZjPgVrh-qv-YOSY54o=xaZD_8hLEKFghjpbd9rMA99dqNbO7ZMMLVuBJunp-3s=
> >
> >
> > Regards,
> > Hubert
>
> Thanks Hubert,
>
> my nfs-ganesha crashes on a regular basis and i wonder if the two are
> related.
>
> --
> Valerio Luccio (212) 998-8736
> Center for Brain Imaging   4 Washington Place, Room 157
> New York UniversityNew York, NY 10003
>
> "In an open world, who needs windows or gates ?"
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] ganesha-gfapi

2019-03-13 Thread Hu Bert

Hi Valerio,

is an already known "behaviour" and maybe a bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1674225


Regards,
Hubert

Am Mi., 13. März 2019 um 15:43 Uhr schrieb Valerio Luccio
:
>
> Hi all,
>
> I recently mounting my gluster from another server using NFS. I started 
> ganesha and my ganesha-gfapi.log file is filled with the following message:
>
>  W [dict.c:761:dict_ref] 
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7384) 
> [0x7f1c299b2384] 
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xae3e) 
> [0x7f1c29bc3e3e] -->/lib64/libglusterfs.so.0(dict_ref+0x5d) [0x7f1c379092ad] 
> ) 0-dict: dict is NULL [Invalid argument]
>
> Which sometimes is followed by:
>
> E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: 
> Failed to dispatch handler
>
> Has anyone seen this ? What can be done about it ?
>
> Thanks,
>
> --
> Valerio Luccio (212) 998-8736
> Center for Brain Imaging   4 Washington Place, Room 157
> New York UniversityNew York, NY 10003
>
> "In an open world, who needs windows or gates ?"
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-05 Thread Hu Bert

fyi: did a downgrade 5.4 -> 5.3 and it worked. all replicas are up and
running. Awaiting updated v5.4.

thx :-)

Am Di., 5. März 2019 um 09:26 Uhr schrieb Hari Gowtham :
>
> There are plans to revert the patch causing this error and rebuilt 5.4.
> This should happen faster. the rebuilt 5.4 should be void of this upgrade 
> issue.
>
> In the meantime, you can use 5.3 for this cluster.
> Downgrading to 5.3 will work if it was just one node that was upgrade to 5.4
> and the other nodes are still in 5.3.
>
> On Tue, Mar 5, 2019 at 1:07 PM Hu Bert  wrote:
> >
> > Hi Hari,
> >
> > thx for the hint. Do you know when this will be fixed? Is a downgrade
> > 5.4 -> 5.3 a possibility to fix this?
> >
> > Hubert
> >
> > Am Di., 5. März 2019 um 08:32 Uhr schrieb Hari Gowtham 
> > :
> > >
> > > Hi,
> > >
> > > This is a known issue we are working on.
> > > As the checksum differs between the updated and non updated node, the
> > > peers are getting rejected.
> > > The bricks aren't coming because of the same issue.
> > >
> > > More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120
> > >
> > > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert  wrote:
> > > >
> > > > Interestingly: gluster volume status misses gluster1, while heal
> > > > statistics show gluster1:
> > > >
> > > > gluster volume status workdata
> > > > Status of volume: workdata
> > > > Gluster process TCP Port  RDMA Port  Online 
> > > >  Pid
> > > > --
> > > > Brick gluster2:/gluster/md4/workdata49153 0  Y  
> > > >  1723
> > > > Brick gluster3:/gluster/md4/workdata49153 0  Y  
> > > >  2068
> > > > Self-heal Daemon on localhost   N/A   N/AY  
> > > >  1732
> > > > Self-heal Daemon on gluster3N/A   N/AY  
> > > >  2077
> > > >
> > > > vs.
> > > >
> > > > gluster volume heal workdata statistics heal-count
> > > > Gathering count of entries to be healed on volume workdata has been 
> > > > successful
> > > >
> > > > Brick gluster1:/gluster/md4/workdata
> > > > Number of entries: 0
> > > >
> > > > Brick gluster2:/gluster/md4/workdata
> > > > Number of entries: 10745
> > > >
> > > > Brick gluster3:/gluster/md4/workdata
> > > > Number of entries: 10744
> > > >
> > > > Am Di., 5. März 2019 um 08:18 Uhr schrieb Hu Bert 
> > > > :
> > > > >
> > > > > Hi Miling,
> > > > >
> > > > > well, there are such entries, but those haven't been a problem during
> > > > > install and the last kernel update+reboot. The entries look like:
> > > > >
> > > > > PUBLIC_IP  gluster2.alpserver.de gluster2
> > > > >
> > > > > 192.168.0.50 gluster1
> > > > > 192.168.0.51 gluster2
> > > > > 192.168.0.52 gluster3
> > > > >
> > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> > > > > 1st line, did a reboot ... no, didn't help. From
> > > > > /var/log/glusterfs/glusterd.log
> > > > >  on gluster 2:
> > > > >
> > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> > > > > Version of Cksums persistent differ. local cksum = 3950307018, remote
> > > > > cksum = 455409345 on peer gluster1
> > > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> > > > > Responded to gluster1 (0), ret: 0, op_ret: -1
> > > > >
> > > > > Interestingly there are no entries in the brick logs of the rejected
> > > > > server. Well, not surprising as no brick process is running. The
> > > > > server gluster1 is still in rejected state.
> > > > >
> > > > > 'gluster volume start workdata force' starts the brick process on
> > > > > gluster1, and some heals are happening on gluster2+3, but via 'gluster
> > > > > volume s

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-04 Thread Hu Bert

Hi Hari,

thx for the hint. Do you know when this will be fixed? Is a downgrade
5.4 -> 5.3 a possibility to fix this?

Hubert

Am Di., 5. März 2019 um 08:32 Uhr schrieb Hari Gowtham :
>
> Hi,
>
> This is a known issue we are working on.
> As the checksum differs between the updated and non updated node, the
> peers are getting rejected.
> The bricks aren't coming because of the same issue.
>
> More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120
>
> On Tue, Mar 5, 2019 at 12:56 PM Hu Bert  wrote:
> >
> > Interestingly: gluster volume status misses gluster1, while heal
> > statistics show gluster1:
> >
> > gluster volume status workdata
> > Status of volume: workdata
> > Gluster process TCP Port  RDMA Port  Online  Pid
> > --
> > Brick gluster2:/gluster/md4/workdata49153 0  Y   
> > 1723
> > Brick gluster3:/gluster/md4/workdata49153 0  Y   
> > 2068
> > Self-heal Daemon on localhost   N/A   N/AY   
> > 1732
> > Self-heal Daemon on gluster3N/A   N/AY   
> > 2077
> >
> > vs.
> >
> > gluster volume heal workdata statistics heal-count
> > Gathering count of entries to be healed on volume workdata has been 
> > successful
> >
> > Brick gluster1:/gluster/md4/workdata
> > Number of entries: 0
> >
> > Brick gluster2:/gluster/md4/workdata
> > Number of entries: 10745
> >
> > Brick gluster3:/gluster/md4/workdata
> > Number of entries: 10744
> >
> > Am Di., 5. März 2019 um 08:18 Uhr schrieb Hu Bert :
> > >
> > > Hi Miling,
> > >
> > > well, there are such entries, but those haven't been a problem during
> > > install and the last kernel update+reboot. The entries look like:
> > >
> > > PUBLIC_IP  gluster2.alpserver.de gluster2
> > >
> > > 192.168.0.50 gluster1
> > > 192.168.0.51 gluster2
> > > 192.168.0.52 gluster3
> > >
> > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> > > 1st line, did a reboot ... no, didn't help. From
> > > /var/log/glusterfs/glusterd.log
> > >  on gluster 2:
> > >
> > > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> > > Version of Cksums persistent differ. local cksum = 3950307018, remote
> > > cksum = 455409345 on peer gluster1
> > > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> > > Responded to gluster1 (0), ret: 0, op_ret: -1
> > >
> > > Interestingly there are no entries in the brick logs of the rejected
> > > server. Well, not surprising as no brick process is running. The
> > > server gluster1 is still in rejected state.
> > >
> > > 'gluster volume start workdata force' starts the brick process on
> > > gluster1, and some heals are happening on gluster2+3, but via 'gluster
> > > volume status workdata' the volumes still aren't complete.
> > >
> > > gluster1:
> > > --
> > > Brick gluster1:/gluster/md4/workdata49152 0  Y   
> > > 2523
> > > Self-heal Daemon on localhost   N/A   N/AY   
> > > 2549
> > >
> > > gluster2:
> > > Gluster process TCP Port  RDMA Port  Online  
> > > Pid
> > > --
> > > Brick gluster2:/gluster/md4/workdata49153 0  Y   
> > > 1723
> > > Brick gluster3:/gluster/md4/workdata49153 0  Y   
> > > 2068
> > > Self-heal Daemon on localhost   N/A   N/AY   
> > > 1732
> > > Self-heal Daemon on gluster3N/A   N/AY   
> > > 2077
> > >
> > >
> > > Hubert
> > >
> > > Am Di., 5. März 2019 um 07:58 Uhr schrieb Milind Changire 
> > > :
> > > >
> > > > There are probably DNS entries or /etc/hosts entries with the public IP 
> > > > Addresses that the host names (gluster1, gluster2, gluster3) are 
>

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-04 Thread Hu Bert

Interestingly: gluster volume status misses gluster1, while heal
statistics show gluster1:

gluster volume status workdata
Status of volume: workdata
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster2:/gluster/md4/workdata49153 0  Y   1723
Brick gluster3:/gluster/md4/workdata49153 0  Y   2068
Self-heal Daemon on localhost   N/A   N/AY   1732
Self-heal Daemon on gluster3N/A   N/AY   2077

vs.

gluster volume heal workdata statistics heal-count
Gathering count of entries to be healed on volume workdata has been successful

Brick gluster1:/gluster/md4/workdata
Number of entries: 0

Brick gluster2:/gluster/md4/workdata
Number of entries: 10745

Brick gluster3:/gluster/md4/workdata
Number of entries: 10744

Am Di., 5. März 2019 um 08:18 Uhr schrieb Hu Bert :
>
> Hi Miling,
>
> well, there are such entries, but those haven't been a problem during
> install and the last kernel update+reboot. The entries look like:
>
> PUBLIC_IP  gluster2.alpserver.de gluster2
>
> 192.168.0.50 gluster1
> 192.168.0.51 gluster2
> 192.168.0.52 gluster3
>
> 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> 1st line, did a reboot ... no, didn't help. From
> /var/log/glusterfs/glusterd.log
>  on gluster 2:
>
> [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> Version of Cksums persistent differ. local cksum = 3950307018, remote
> cksum = 455409345 on peer gluster1
> [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster1 (0), ret: 0, op_ret: -1
>
> Interestingly there are no entries in the brick logs of the rejected
> server. Well, not surprising as no brick process is running. The
> server gluster1 is still in rejected state.
>
> 'gluster volume start workdata force' starts the brick process on
> gluster1, and some heals are happening on gluster2+3, but via 'gluster
> volume status workdata' the volumes still aren't complete.
>
> gluster1:
> --
> Brick gluster1:/gluster/md4/workdata49152 0  Y   2523
> Self-heal Daemon on localhost   N/A   N/AY   2549
>
> gluster2:
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick gluster2:/gluster/md4/workdata49153 0  Y   1723
> Brick gluster3:/gluster/md4/workdata49153 0  Y   2068
> Self-heal Daemon on localhost   N/A   N/AY   1732
> Self-heal Daemon on gluster3N/A   N/AY   2077
>
>
> Hubert
>
> Am Di., 5. März 2019 um 07:58 Uhr schrieb Milind Changire 
> :
> >
> > There are probably DNS entries or /etc/hosts entries with the public IP 
> > Addresses that the host names (gluster1, gluster2, gluster3) are getting 
> > resolved to.
> > /etc/resolv.conf would tell which is the default domain searched for the 
> > node names and the DNS servers which respond to the queries.
> >
> >
> > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert  wrote:
> >>
> >> Good morning,
> >>
> >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on
> >> debian stretch. This morning i upgraded one server to version 5.4 and
> >> rebooted the machine; after the restart i noticed that:
> >>
> >> - no brick process is running
> >> - gluster volume status only shows the server itself:
> >> gluster volume status workdata
> >> Status of volume: workdata
> >> Gluster process TCP Port  RDMA Port  Online  
> >> Pid
> >> --
> >> Brick gluster1:/gluster/md4/workdataN/A   N/AN   
> >> N/A
> >> NFS Server on localhost N/A   N/AN   
> >> N/A
> >>
> >> - gluster peer status on the server
> >> gluster peer status
> >> Number of Peers: 2
> >>
> >> Hostname: gluster3
> >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> >> State: Peer Rejected (Connected)
> >>
> >> Hostname: gluster2
> >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> >> State: Peer Rejected (Connected)

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-04 Thread Hu Bert

Hi Miling,

well, there are such entries, but those haven't been a problem during
install and the last kernel update+reboot. The entries look like:

PUBLIC_IP  gluster2.alpserver.de gluster2

192.168.0.50 gluster1
192.168.0.51 gluster2
192.168.0.52 gluster3

'ping gluster2' resolves to LAN IP; I removed the last entry in the
1st line, did a reboot ... no, didn't help. From
/var/log/glusterfs/glusterd.log
 on gluster 2:

[2019-03-05 07:04:36.188128] E [MSGID: 106010]
[glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
Version of Cksums persistent differ. local cksum = 3950307018, remote
cksum = 455409345 on peer gluster1
[2019-03-05 07:04:36.188314] I [MSGID: 106493]
[glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to gluster1 (0), ret: 0, op_ret: -1

Interestingly there are no entries in the brick logs of the rejected
server. Well, not surprising as no brick process is running. The
server gluster1 is still in rejected state.

'gluster volume start workdata force' starts the brick process on
gluster1, and some heals are happening on gluster2+3, but via 'gluster
volume status workdata' the volumes still aren't complete.

gluster1:
--
Brick gluster1:/gluster/md4/workdata49152 0  Y   2523
Self-heal Daemon on localhost   N/A   N/AY   2549

gluster2:
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster2:/gluster/md4/workdata49153 0  Y   1723
Brick gluster3:/gluster/md4/workdata49153 0  Y   2068
Self-heal Daemon on localhost   N/A   N/AY   1732
Self-heal Daemon on gluster3N/A   N/AY   2077


Hubert

Am Di., 5. März 2019 um 07:58 Uhr schrieb Milind Changire :
>
> There are probably DNS entries or /etc/hosts entries with the public IP 
> Addresses that the host names (gluster1, gluster2, gluster3) are getting 
> resolved to.
> /etc/resolv.conf would tell which is the default domain searched for the node 
> names and the DNS servers which respond to the queries.
>
>
> On Tue, Mar 5, 2019 at 12:14 PM Hu Bert  wrote:
>>
>> Good morning,
>>
>> i have a replicate 3 setup with 2 volumes, running on version 5.3 on
>> debian stretch. This morning i upgraded one server to version 5.4 and
>> rebooted the machine; after the restart i noticed that:
>>
>> - no brick process is running
>> - gluster volume status only shows the server itself:
>> gluster volume status workdata
>> Status of volume: workdata
>> Gluster process TCP Port  RDMA Port  Online  Pid
>> --
>> Brick gluster1:/gluster/md4/workdataN/A   N/AN   N/A
>> NFS Server on localhost N/A   N/AN   N/A
>>
>> - gluster peer status on the server
>> gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gluster3
>> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
>> State: Peer Rejected (Connected)
>>
>> Hostname: gluster2
>> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
>> State: Peer Rejected (Connected)
>>
>> - gluster peer status on the other 2 servers:
>> gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gluster1
>> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
>> State: Peer Rejected (Connected)
>>
>> Hostname: gluster3
>> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
>> State: Peer in Cluster (Connected)
>>
>> I noticed that, in the brick logs, i see that the public IP is used
>> instead of the LAN IP. brick logs from one of the volumes:
>>
>> rejected node: https://pastebin.com/qkpj10Sd
>> connected nodes: https://pastebin.com/8SxVVYFV
>>
>> Why is the public IP suddenly used instead of the LAN IP? Killing all
>> gluster processes and rebooting (again) didn't help.
>>
>>
>> Thx,
>> Hubert
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Milind
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-04 Thread Hu Bert

Good morning,

i have a replicate 3 setup with 2 volumes, running on version 5.3 on
debian stretch. This morning i upgraded one server to version 5.4 and
rebooted the machine; after the restart i noticed that:

- no brick process is running
- gluster volume status only shows the server itself:
gluster volume status workdata
Status of volume: workdata
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gluster1:/gluster/md4/workdataN/A   N/AN   N/A
NFS Server on localhost N/A   N/AN   N/A

- gluster peer status on the server
gluster peer status
Number of Peers: 2

Hostname: gluster3
Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
State: Peer Rejected (Connected)

Hostname: gluster2
Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
State: Peer Rejected (Connected)

- gluster peer status on the other 2 servers:
gluster peer status
Number of Peers: 2

Hostname: gluster1
Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
State: Peer Rejected (Connected)

Hostname: gluster3
Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
State: Peer in Cluster (Connected)

I noticed that, in the brick logs, i see that the public IP is used
instead of the LAN IP. brick logs from one of the volumes:

rejected node: https://pastebin.com/qkpj10Sd
connected nodes: https://pastebin.com/8SxVVYFV

Why is the public IP suddenly used instead of the LAN IP? Killing all
gluster processes and rebooting (again) didn't help.


Thx,
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters

2019-03-04 Thread Hu Bert

Do you mean "gluster volume heal $volname statistics heal-count"? If
yes: 0 for both volumes.

Am Mo., 4. März 2019 um 16:08 Uhr schrieb Amar Tumballi Suryanarayan
:
>
> What does self-heal pending numbers show?
>
> On Mon, Mar 4, 2019 at 7:52 PM Hu Bert  wrote:
>>
>> Hi Alberto,
>>
>> wow, good hint! We switched from old servers with version 4.1.6 to new
>> servers (fresh install) with version 5.3 on february 5th. I saw that
>> there was more network traffic on server side, but didn't watch it on
>> client side - the traffic went up significantly on both sides, from
>> about 20-40 MBit/s up to 200 MBit/s, on server side from about 20-40
>> MBit/s up to 500 MBit/s. Here's a screenshot of the munin graphs:
>>
>> network traffic on high iowait client:
>> https://abload.de/img/client-eth1-traffic76j4i.jpg
>> network traffic on old servers: 
>> https://abload.de/img/oldservers-eth1nejzt.jpg
>> network traffic on new servers: 
>> https://abload.de/img/newservers-eth17ojkf.jpg
>>
>> Don't know if that's related to our iowait problem, maybe only a
>> correlation. But we see the same high network traffic with v5.3.
>>
>>
>> Thx,
>> Hubert
>>
>> Am Mo., 4. März 2019 um 14:57 Uhr schrieb Alberto Bengoa
>> :
>> >
>> > Hello Hubert,
>> >
>> > On Mon, 4 Mar 2019 at 10:56, Hu Bert  wrote:
>> >>
>> >> Hi Raghavendra,
>> >>
>> >> at the moment iowait and cpu consumption is quite low, the main
>> >> problems appear during the weekend (high traffic, especially on
>> >> sunday), so either we have to wait until next sunday or use a time
>> >> machine ;-)
>> >>
>> >
>> > Check if your high IO Wait is not related to high network traffic. We had 
>> > to left 5.3 version due this issue[1]:
>> >
>> > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058
>> >
>> > Cheers,
>> > Alberto
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters

2019-03-04 Thread Hu Bert

Hi Alberto,

wow, good hint! We switched from old servers with version 4.1.6 to new
servers (fresh install) with version 5.3 on february 5th. I saw that
there was more network traffic on server side, but didn't watch it on
client side - the traffic went up significantly on both sides, from
about 20-40 MBit/s up to 200 MBit/s, on server side from about 20-40
MBit/s up to 500 MBit/s. Here's a screenshot of the munin graphs:

network traffic on high iowait client:
https://abload.de/img/client-eth1-traffic76j4i.jpg
network traffic on old servers: https://abload.de/img/oldservers-eth1nejzt.jpg
network traffic on new servers: https://abload.de/img/newservers-eth17ojkf.jpg

Don't know if that's related to our iowait problem, maybe only a
correlation. But we see the same high network traffic with v5.3.

Thx,
Hubert

Am Mo., 4. März 2019 um 14:57 Uhr schrieb Alberto Bengoa
:
>
> Hello Hubert,
>
> On Mon, 4 Mar 2019 at 10:56, Hu Bert  wrote:
>>
>> Hi Raghavendra,
>>
>> at the moment iowait and cpu consumption is quite low, the main
>> problems appear during the weekend (high traffic, especially on
>> sunday), so either we have to wait until next sunday or use a time
>> machine ;-)
>>
>
> Check if your high IO Wait is not related to high network traffic. We had to 
> left 5.3 version due this issue[1]:
>
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1673058
>
> Cheers,
> Alberto
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

1 2 >

1 - 100 of 151 matches

Mail list logo