[Gluster-users] remote operation failed [Transport endpoint is not connected]
Hello, I am stuck with failure of my gluster 2x replica heal with messages at glustershd.log as : *[2018-11-21 05:28:07.813003] E [MSGID: 114031] [client-rpc-fops.c:1646:client3_3_entrylk_cbk] 0-gv1-client-0: remote operation failed [Transport endpoint is not connected]* When the log hits here In either of the replica nodes; I can see that my command: *watch glsuter volume heal statistics *returns no more progress and status unchanged afterward. I am running glusterfs on top of zfs and it is basically a storage for small read-only files. There was a thread with Shyam Ranganathan and Reiner Keller in here where the core of the problem was the storage going out of inods and no space left error which obviously can not be my case as am on top of ZFS. However, the similarity between us is that we were previously on 3.10 and upon various issues with that version we upgraded to 3.12 on Ubuntu 16.04 with kernel 4.4.0-116-generic. Anybody faced issue as above?Can you advise what can be done as it is for over a month with no effective self-heal process completed... Here is my gluster cluster info: *Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: cluster.self-heal-daemon: enable cluster.eager-lock: off client.event-threads: 4 performance.cache-max-file-size: 8 features.scrub: Inactive features.bitrot: off network.inode-lru-limit: 5 nfs.disable: true performance.readdir-ahead: on server.statedump-path: /tmp cluster.background-self-heal-count: 32 performance.md-cache-timeout: 30 cluster.readdir-optimize: on cluster.shd-max-threads: 4 cluster.lookup-optimize: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on server.event-threads: 4* Thank you -- Hamid Safe www.devopt.net +989361491768 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] selfheal operation takes infinite to complete
Hello all, Can somebody please respond to this? as of now if I run "gluster volume heal gv1 info" there is infinite number of lines of gfid which never ends...usually and in stable scenario this ended with some numbers and status but currently it never finishes...is it a bad sign ? is it a loop? are there any actions required to do beside gluster? Appreciate any help... On 10/21/18 8:05 AM, hsafe wrote: Hello all gluster community, I am in a scenario unmatched for the past year of using glusterfs in a 2 replica set on glusterfs 3.10.12 servers where they are the storage back of my application which saves small images into them. Now the problem I face and unique for the time is that whenever we were asynced or one server went down; bringing the other one will start the self heal and eventually we could see the clustered volume in sync, but now if I run the volume heal info the list of the gfid does not even finish after couple of hours. if I look at the heal log I can see that the process is ongoing but it a very small scale and speed! My question is how can I expect it finished and how can I speed it up there? Here is a bit of info: Status of volume: gv1 Gluster process TCP Port RDMA Port Online Pid -- Brick IMG-01:/images/storage/brick1 49152 0 Y 4176 Brick IMG-02:/images/storage/brick1 49152 0 Y 4095 Self-heal Daemon on localhost N/A N/A Y 4067 Self-heal Daemon on IMG-01 N/A N/A Y 4146 Task Status of Volume gv1 -- There are no active volume tasks Status of volume: gv2 Gluster process TCP Port RDMA Port Online Pid -- Brick IMG-01:/data/brick2 49153 0 Y 4185 Brick IMG-02:/data/brick2 49153 0 Y 4104 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 4067 NFS Server on IMG-01 N/A N/A N N/A Self-heal Daemon on IMG-01 N/A N/A Y 4146 Task Status of Volume gv2 -- There are no active volume tasks gluster> peer status Number of Peers: 1 Hostname: IMG-01 Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b State: Peer in Cluster (Connected) Hostname: IMG-01 Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b State: Peer in Cluster (Connected) gluster> exit root@NAS02:/var/log/glusterfs# gluster volume gv1 info unrecognized word: gv1 (position 1) root@NAS02:/var/log/glusterfs# gluster volume info Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: server.event-threads: 4 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on cluster.lookup-optimize: on cluster.shd-max-threads: 4 cluster.readdir-optimize: on performance.md-cache-timeout: 30 cluster.background-self-heal-count: 32 server.statedump-path: /tmp performance.readdir-ahead: on nfs.disable: true network.inode-lru-limit: 5 features.bitrot: off features.scrub: Inactive performance.cache-max-file-size: 16MB client.event-threads: 8 cluster.eager-lock: on cluster.self-heal-daemon: enable Please do help me out...Thanks -- Hamid Safe www.devopt.net +989361491768 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] selfheal operation takes infinite to complete
Hello all gluster community, I am in a scenario unmatched for the past year of using glusterfs in a 2 replica set on glusterfs 3.10.12 servers where they are the storage back of my application which saves small images into them. Now the problem I face and unique for the time is that whenever we were asynced or one server went down; bringing the other one will start the self heal and eventually we could see the clustered volume in sync, but now if I run the volume heal info the list of the gfid does not even finish after couple of hours. if I look at the heal log I can see that the process is ongoing but it a very small scale and speed! My question is how can I expect it finished and how can I speed it up there? Here is a bit of info: Status of volume: gv1 Gluster process TCP Port RDMA Port Online Pid -- Brick IMG-01:/images/storage/brick1 49152 0 Y 4176 Brick IMG-02:/images/storage/brick1 49152 0 Y 4095 Self-heal Daemon on localhost N/A N/A Y 4067 Self-heal Daemon on IMG-01 N/A N/A Y 4146 Task Status of Volume gv1 -- There are no active volume tasks Status of volume: gv2 Gluster process TCP Port RDMA Port Online Pid -- Brick IMG-01:/data/brick2 49153 0 Y 4185 Brick IMG-02:/data/brick2 49153 0 Y 4104 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 4067 NFS Server on IMG-01 N/A N/A N N/A Self-heal Daemon on IMG-01 N/A N/A Y 4146 Task Status of Volume gv2 -- There are no active volume tasks gluster> peer status Number of Peers: 1 Hostname: IMG-01 Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b State: Peer in Cluster (Connected) Hostname: IMG-01 Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b State: Peer in Cluster (Connected) gluster> exit root@NAS02:/var/log/glusterfs# gluster volume gv1 info unrecognized word: gv1 (position 1) root@NAS02:/var/log/glusterfs# gluster volume info Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: server.event-threads: 4 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on cluster.lookup-optimize: on cluster.shd-max-threads: 4 cluster.readdir-optimize: on performance.md-cache-timeout: 30 cluster.background-self-heal-count: 32 server.statedump-path: /tmp performance.readdir-ahead: on nfs.disable: true network.inode-lru-limit: 5 features.bitrot: off features.scrub: Inactive performance.cache-max-file-size: 16MB client.event-threads: 8 cluster.eager-lock: on cluster.self-heal-daemon: enable Please do help me out...Thanks -- Hamid Safe www.devopt.net +989361491768 ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] issue with self-heal
Hello Gluster community, After several hundred GB of data writes (small image 100k 1M) into a replicated 2x glusterfs servers , I am facing issue with healing process. Earlier the heal info returned the bricks and nodes and the fact that there are no failed heal; but now it gets to the state with below message: *# gluster volume heal gv1 info healed* *Gathering list of heal failed entries on volume gv1 has been unsuccessful on bricks that are down. Please check if all brick processes are running.* issuing the heal info command gives a log list of gfid info that takes like an hour to complete. The file data being images would not change and primarily served from 8x server mount native glusterfs. Here is some insight on the status of the gluster, but how can I effectively do a successful heal on the storages cause last times trying to do that send the servers southway and irresponsive *# gluster volume info Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: performance.md-cache-timeout: 128 cluster.background-self-heal-count: 32 server.statedump-path: /tmp performance.readdir-ahead: on nfs.disable: true network.inode-lru-limit: 5 features.bitrot: off features.scrub: Inactive performance.cache-max-file-size: 16MB client.event-threads: 8 cluster.eager-lock: on* Appreciate your help.Thanks ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] delettion of files in gluster directories
Hi all, I have a rather simplistic question, there are dirs that contain a lot of small files in a 2x replica set accessed natively on the clients. Due to the directory file number; it fails to show the dir contents from clients. In case of move or deletion of the dirs natively and from the server's view of the dirs , how does glusterfs converge or "heal" if you can call it the dirs as emptied or as if moved? I am running on Glusterfs-server and Glusterfs-client version: 3.10.12. To add more details,it is that we learned it the hard way that our app is shipping too small files into dirs with daily accumulaiton, accesed for serving by an nginx. Here is a little more info: # gluster volume info Volume Name: gv1 Type: Replicate Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: IMG-01:/images/storage/brick1 Brick2: IMG-02:/images/storage/brick1 Options Reconfigured: nfs.disable: true diagnostics.count-fop-hits: on diagnostics.latency-measurement: on server.statedump-path: /tmp performance.readdir-ahead: on # gluster volume status Status of volume: gv1 Gluster process TCP Port RDMA Port Online Pid -- Brick IMG-01:/images/storage/brick1 49152 0 Y 3577 Brick IMG-02:/images/storage/brick1 49152 0 Y 21699 Self-heal Daemon on localhost N/A N/A Y 24813 Self-heal Daemon on IMG-01 N/A N/A Y 3560 Task Status of Volume gv1 -- There are no active volume tasks ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users