[Gluster-users] remote operation failed [Transport endpoint is not connected]

2018-11-20 Thread hsafe

Hello,

I am stuck with failure of my gluster 2x replica heal with messages at 
glustershd.log as :


*[2018-11-21 05:28:07.813003] E [MSGID: 114031] 
[client-rpc-fops.c:1646:client3_3_entrylk_cbk] 0-gv1-client-0: remote 
operation failed [Transport endpoint is not connected]*


When the log hits here In either of the replica nodes; I can see that my 
command: *watch glsuter volume heal  statistics *returns no 
more progress and status unchanged afterward. I am running glusterfs on 
top of zfs and it is basically a storage for small read-only files. 
There was a thread with Shyam Ranganathan and Reiner Keller in here 
where the core of the problem was the storage going out of inods and no 
space left error which obviously can not be my case as am on top of ZFS. 
However, the similarity between us is that we were previously on 3.10 
and upon various issues with that version we upgraded to 3.12 on Ubuntu 
16.04 with kernel 4.4.0-116-generic.


Anybody faced issue as above?Can you advise what can be done as it is 
for over a month with no effective self-heal process completed...


Here is my gluster cluster info:

*Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
cluster.self-heal-daemon: enable
cluster.eager-lock: off
client.event-threads: 4
performance.cache-max-file-size: 8
features.scrub: Inactive
features.bitrot: off
network.inode-lru-limit: 5
nfs.disable: true
performance.readdir-ahead: on
server.statedump-path: /tmp
cluster.background-self-heal-count: 32
performance.md-cache-timeout: 30
cluster.readdir-optimize: on
cluster.shd-max-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
server.event-threads: 4*

Thank you


--
Hamid Safe
www.devopt.net
+989361491768

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] selfheal operation takes infinite to complete

2018-10-23 Thread hsafe

Hello all,

Can somebody please respond to this? as of now if I run "gluster volume 
heal gv1 info"


there is infinite number of lines of gfid which never ends...usually and 
in stable scenario this ended with some numbers and status but currently 
it never finishes...is it a bad sign ? is it a loop? are there any 
actions required to do beside gluster?


Appreciate any help...

On 10/21/18 8:05 AM, hsafe wrote:

Hello all gluster community,

I am in a scenario unmatched for the past year of using glusterfs in a 
2 replica set on glusterfs 3.10.12 servers where they are the storage 
back of my application which saves small images into them.


Now the problem I face and unique for the time is that whenever we 
were asynced or one server went down; bringing the other one will 
start the self heal and eventually we could see the clustered volume 
in sync, but now if I run the volume heal info the list of the gfid 
does not even finish after couple of hours. if I look at the heal log 
I can see that the process is ongoing but it a very small scale and 
speed!


My question is how can I expect it finished and how can I speed it up 
there?


Here is a bit of info:

Status of volume: gv1
Gluster process TCP Port  RDMA Port 
Online  Pid
-- 


Brick IMG-01:/images/storage/brick1 49152 0 Y 4176
Brick IMG-02:/images/storage/brick1 49152 0 Y 4095
Self-heal Daemon on localhost   N/A   N/A Y 4067
Self-heal Daemon on IMG-01  N/A   N/A Y 4146

Task Status of Volume gv1
-- 


There are no active volume tasks

Status of volume: gv2
Gluster process TCP Port  RDMA Port 
Online  Pid
-- 


Brick IMG-01:/data/brick2   49153 0 Y 4185
Brick IMG-02:/data/brick2   49153 0 Y 4104
NFS Server on localhost N/A   N/A N N/A
Self-heal Daemon on localhost   N/A   N/A Y 4067
NFS Server on IMG-01    N/A   N/A N N/A
Self-heal Daemon on IMG-01  N/A   N/A Y 4146

Task Status of Volume gv2
-- 


There are no active volume tasks



gluster> peer status
Number of Peers: 1

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
gluster> exit
root@NAS02:/var/log/glusterfs# gluster volume gv1 info
unrecognized word: gv1 (position 1)
root@NAS02:/var/log/glusterfs# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
server.event-threads: 4
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.lookup-optimize: on
cluster.shd-max-threads: 4
cluster.readdir-optimize: on
performance.md-cache-timeout: 30
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 5
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on
cluster.self-heal-daemon: enable


Please do help me out...Thanks




--
Hamid Safe
www.devopt.net
+989361491768

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] selfheal operation takes infinite to complete

2018-10-20 Thread hsafe

Hello all gluster community,

I am in a scenario unmatched for the past year of using glusterfs in a 2 
replica set on glusterfs 3.10.12 servers where they are the storage back 
of my application which saves small images into them.


Now the problem I face and unique for the time is that whenever we were 
asynced or one server went down; bringing the other one will start the 
self heal and eventually we could see the clustered volume in sync, but 
now if I run the volume heal info the list of the gfid does not even 
finish after couple of hours. if I look at the heal log I can see that 
the process is ongoing but it a very small scale and speed!


My question is how can I expect it finished and how can I speed it up there?

Here is a bit of info:

Status of volume: gv1
Gluster process TCP Port  RDMA Port Online  Pid
--
Brick IMG-01:/images/storage/brick1 49152 0 Y   4176
Brick IMG-02:/images/storage/brick1 49152 0 Y   4095
Self-heal Daemon on localhost   N/A   N/A Y   4067
Self-heal Daemon on IMG-01  N/A   N/A Y   4146

Task Status of Volume gv1
--
There are no active volume tasks

Status of volume: gv2
Gluster process TCP Port  RDMA Port Online  Pid
--
Brick IMG-01:/data/brick2   49153 0 Y   4185
Brick IMG-02:/data/brick2   49153 0 Y   4104
NFS Server on localhost N/A   N/A N   N/A
Self-heal Daemon on localhost   N/A   N/A Y   4067
NFS Server on IMG-01    N/A   N/A N   N/A
Self-heal Daemon on IMG-01  N/A   N/A Y   4146

Task Status of Volume gv2
--
There are no active volume tasks



gluster> peer status
Number of Peers: 1

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)

Hostname: IMG-01
Uuid: 5faf60fc-7f5c-4c6e-aa3f-802482391c1b
State: Peer in Cluster (Connected)
gluster> exit
root@NAS02:/var/log/glusterfs# gluster volume gv1 info
unrecognized word: gv1 (position 1)
root@NAS02:/var/log/glusterfs# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
server.event-threads: 4
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.lookup-optimize: on
cluster.shd-max-threads: 4
cluster.readdir-optimize: on
performance.md-cache-timeout: 30
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 5
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on
cluster.self-heal-daemon: enable


Please do help me out...Thanks



--
Hamid Safe
www.devopt.net
+989361491768

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] issue with self-heal

2018-07-13 Thread hsafe

Hello Gluster community,

After several hundred GB of data writes (small image  100k  1M) 
into a replicated 2x glusterfs servers , I am facing issue with healing 
process. Earlier the heal info returned the bricks and nodes and the 
fact that there are no failed heal; but now it gets to the state with 
below message:


*# gluster volume heal gv1 info healed*

*Gathering list of heal failed entries on volume gv1 has been 
unsuccessful on bricks that are down. Please check if all brick 
processes are running.*


issuing the heal info command gives a log list of gfid info that takes 
like an hour to complete. The file data being images would not change 
and primarily served from 8x server mount native glusterfs.


Here is some insight on the status of the gluster, but how can I 
effectively do a successful heal on the storages cause last times trying 
to do that send the servers southway and irresponsive


*# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
performance.md-cache-timeout: 128
cluster.background-self-heal-count: 32
server.statedump-path: /tmp
performance.readdir-ahead: on
nfs.disable: true
network.inode-lru-limit: 5
features.bitrot: off
features.scrub: Inactive
performance.cache-max-file-size: 16MB
client.event-threads: 8
cluster.eager-lock: on*

Appreciate your help.Thanks

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] delettion of files in gluster directories

2018-07-04 Thread hsafe

Hi all,

I have a rather simplistic question, there are dirs that contain a lot 
of small files in a 2x replica set accessed natively on the clients. Due 
to the directory file number; it fails to show the dir contents from 
clients.


In case of move or deletion of the dirs natively and from the server's 
view of the dirs , how does glusterfs converge or "heal" if you can call 
it the dirs as emptied or as if moved?


I am running on Glusterfs-server and Glusterfs-client version: 3.10.12.

To add more details,it is that we learned it the hard way that our app 
is shipping too small files into dirs with daily accumulaiton, accesed 
for serving by an nginx.


Here is a little more info:

# gluster volume info

Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
nfs.disable: true
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
server.statedump-path: /tmp
performance.readdir-ahead: on
# gluster volume status
Status of volume: gv1
Gluster process TCP Port  RDMA Port Online  Pid
--
Brick IMG-01:/images/storage/brick1 49152 0 Y   3577
Brick IMG-02:/images/storage/brick1 49152 0 Y   21699
Self-heal Daemon on localhost   N/A   N/A Y   24813
Self-heal Daemon on IMG-01  N/A   N/A Y   3560

Task Status of Volume gv1
--
There are no active volume tasks


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users