Re: [Gluster-users] Conflicting info on whether replicated bricks both online
On 11/18/2016 08:23 PM, Whit Blauvelt wrote: On the one hand: # gluster volume heal foretee info healed Gathering list of healed entries on volume foretee has been unsuccessful on bricks that are down. Please check if all brick processes are running. 'info healed' and 'info heal-failed' are deprecated sub commands. That message is a bug; there's a patch (http://review.gluster.org/#/c/15724/) in progress to remove them from the CLI. root@bu-4t-a:/mnt/gluster# gluster volume status foretee Status of volume: foretee Gluster process TCP Port RDMA Port Online Pid -- Brick bu-4t-a:/mnt/gluster 49153 0 Y 9807 Brick bu-4t-b:/mnt/gluster 49152 0 Y 24638 Self-heal Daemon on localhost N/A N/AY 2743 Self-heal Daemon on bu-4t-b N/A N/AY 12819 Task Status of Volume foretee -- There are no active volume tasks On the other: # gluster volume heal foretee info healed Gathering list of healed entries on volume foretee has been unsuccessful on bricks that are down. Please check if all brick processes are running. And: # gluster volume heal foretee info This is the only command you need to run to monitor pending entries. As to why they are not getting healed, you would have to look at the glustershd.log on both nodes. Manually launch heal with `gluster volume heal ` and see what the shd log spews out. HTH, Ravi ... Status: Connected Number of entries: 3141 Both systems have their bricks in /mnt/gluster, and the volume then mounted in /backups. I can write or delete a file in /backups on either system, and it appears in both /backups on the other, and in /mnt/gluster on both. So Gluster is working. There have only ever been the two bricks. But there are 3141 entries that won't heal, and a suggestion that one of the bricks is offline -- when they're both plainly there. This is with glusterfs 3.8.5 on Ubuntu 16.04.1. What's my next move? Thanks, Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to enable shared_storage?
Hello, I try to enable shared storage for Geo-Replication but I am not sure that I do it properly. Here is what I do: # gluster volume set all cluster.enable-shared-storage enable volume set: success # mount -t glusterfs 127.0.0.1:gluster_shared_storage /var/run/gluster/shared_storage ERROR: Mount point does not exist Please specify a mount point Usage: man 8 /sbin/mount.glusterfs Why last command shows an error? Sincerely, Alexandr ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Can use geo-replication with distributed-replicated volumes?
Thank you very much for your help! Best regards, Alexandr On Fri, Nov 18, 2016 at 11:56 AM, Bipin Kunalwrote: > Unfortunately upstream doc is not up to date with failover and > failback commands. > > But you can use downstream doc : > https://access.redhat.com/documentation/en-US/Red_Hat_ > Storage/3.1/html-single/Administration_Guide/index. > html#sect-Disaster_Recovery > > These steps should work fine for you. > > We will try to update upstream doc as early as possible. > > Thanks, > Bipin Kunal > > On Thu, Nov 17, 2016 at 10:24 PM, Alexandr Porunov > wrote: > > Thank you I will wait for it > > > > Sincerely, > > Alexandr > > > > On Thu, Nov 17, 2016 at 6:43 PM, Bipin Kunal wrote: > >> > >> I don't have URL handy right now. Will send you tomorrow. Texting from > >> mobile right now. > >> > >> Thanks, > >> Bipin > >> > >> > >> On Nov 17, 2016 9:00 PM, "Alexandr Porunov" > > >> wrote: > >>> > >>> Thank you for your help! > >>> > >>> Could you please give me a link or some information about failover? How > >>> to change a master state to a slave state? > >>> > >>> Best regards, > >>> Alexandr > >>> > >>> On Thu, Nov 17, 2016 at 5:07 PM, Bipin Kunal > wrote: > > Please find my comments inline. > > On Nov 17, 2016 8:30 PM, "Alexandr Porunov" < > alexandr.poru...@gmail.com> > wrote: > > > > Hello, > > > > I have several questions about Geo-replication. Please answer if you > > can. > > > > 1. Can use geo-replication with distributed-replicated volumes? > Yes. You can. > > 2. Can we use less servers in a slave datacenter then in the master > > datacenter? (I.e. if I replicate a distributed -replicated volume > which > > consists from 10 servers to the slave datacenter where only 5 > servers). For > > example use less replicas in the slave datacenter. > Yes. You are free to use. It is just recommended to have slave > volume > size equal to master volume. > > 3. Are there a possibility to enable failover? I.e. when master > > datacenter dies we change our slave to the master? > Yes. You can promote slave when master dies. And when Master comes > back > you can failback to master. > > > > Sincerely, > > Alexandr > > > > > Thanks, > Bipin Kunal > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Glusterfs volume bricks disk is filled up
Hi Team, Can anyone please have a look at the question below ? Thanks Imaad On Thu, Nov 17, 2016 at 11:00 PM, Imaad Ghouriwrote: > Hi Team, > Quick question. > > I have a glusterfs cluster setup for 20 nodes that has /share where I do > store all of the data that is shared successfully across all the nodes in > the cluster .. And each node also has /data where glusterfs volume bricks > data gets stored by some gluster mechanism .. > > I have a question here at /data . In one of the node A (out of 20) , > /data disk space got filled up and there is no space left on /data. How > does glusterfs behaves in this case? I do see no space left message on the > glusterfs logs node where disk space is full .. Not sure if I need to worry > about it .. Is it just a warning message ? What happens when I try to > access the /share data and if request goes node A, what will happen then? > > I am using glusterfs 3.6 version. Thanks > > Sent from my iPhone -- Regards, Imaad Ghouri ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
If it's writing to the root partition then the mount went away. Any clues in the gluster client log? On 11/18/2016 08:21 AM, Olivier Lambert wrote: After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing anymore in the local Gluster mount, but in the root partition. Despite "df -h" shows the Gluster brick mounted: /dev/mapper/centos-root 3,1G3,1G 20K 100% / ... /dev/xvdb 61G 61G 956M 99% /bricks/brick1 localhost:/gv0 61G 61G 956M 99% /mnt If I unmount it, I still see the "block.img" in /mnt which is filling the root space. So it's like Fuse is messing with the local Gluster mount, which could lead to the data corruption on the client level. It doesn't make sense for me... What am I missing? On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambertwrote: Yes, I did it only if I have the previous result of heal info ("number of entries: 0"). But same result, as soon as the second Node is offline (after they were both working/back online), everything is corrupted. To recap: * Node 1 UP Node 2 UP -> OK * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see the path down and change if necessary) * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed in heal command) * Node 1 DOWN Node 2 UP -> NOT OK (data corruption) On Fri, Nov 18, 2016 at 3:39 PM, David Gossage wrote: On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert wrote: Hi David, What are the exact commands to be sure it's fine? Right now I got: # gluster volume heal gv0 info Brick 10.0.0.1:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.2:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.3:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Did you run this before taking down 2nd node to see if any heals were ongoing? Also I see you have sharding enabled. Are your files being served sharded already as well? Everything is online and working, but this command give a strange output: # gluster volume heal gv0 info heal-failed Gathering list of heal failed entries on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running. Is it normal? I don't think that is a valid command anymore as whern I run it I get same message and this is in logs [2016-11-18 14:35:02.260503] I [MSGID: 106533] [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume GLUSTER1 [2016-11-18 14:35:02.263341] W [MSGID: 106530] [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command not supported. Please use "gluster volume heal GLUSTER1 info" and logs to find the heal information. [2016-11-18 14:35:02.263365] E [MSGID: 106301] [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Command not supported. Please use "gluster volume heal GLUSTER1 info" and logs to find the heal information. On Fri, Nov 18, 2016 at 2:51 AM, David Gossage wrote: On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert wrote: Okay, used the exact same config you provided, and adding an arbiter node (node3) After halting node2, VM continues to work after a small "lag"/freeze. I restarted node2 and it was back online: OK Then, after waiting few minutes, halting node1. And **just** at this moment, the VM is corrupted (segmentation fault, /var/log folder empty etc.) Other than waiting a few minutes did you make sure heals had completed? dmesg of the VM: [ 1645.852905] EXT4-fs error (device xvda1): htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0 [ 1645.854509] Aborting journal on device xvda1-8. [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only And got a lot of " comm bash: bad entry in directory" messages then... Here is the current config with all Node back online: # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.0.0.1:/bricks/brick1/gv0 Brick2: 10.0.0.2:/bricks/brick1/gv0 Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.shard: on features.shard-block-size: 16MB network.remote-dio: enable cluster.eager-lock: enable performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.stat-prefetch: on performance.strict-write-ordering: off cluster.server-quorum-type: server cluster.quorum-type: auto cluster.data-self-heal: on # gluster volume status Status of volume: gv0 Gluster process TCP
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing anymore in the local Gluster mount, but in the root partition. Despite "df -h" shows the Gluster brick mounted: /dev/mapper/centos-root 3,1G3,1G 20K 100% / ... /dev/xvdb 61G 61G 956M 99% /bricks/brick1 localhost:/gv0 61G 61G 956M 99% /mnt If I unmount it, I still see the "block.img" in /mnt which is filling the root space. So it's like Fuse is messing with the local Gluster mount, which could lead to the data corruption on the client level. It doesn't make sense for me... What am I missing? On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambertwrote: > Yes, I did it only if I have the previous result of heal info ("number > of entries: 0"). But same result, as soon as the second Node is > offline (after they were both working/back online), everything is > corrupted. > > To recap: > > * Node 1 UP Node 2 UP -> OK > * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see > the path down and change if necessary) > * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed > in heal command) > * Node 1 DOWN Node 2 UP -> NOT OK (data corruption) > > On Fri, Nov 18, 2016 at 3:39 PM, David Gossage > wrote: >> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert >> wrote: >>> >>> Hi David, >>> >>> What are the exact commands to be sure it's fine? >>> >>> Right now I got: >>> >>> # gluster volume heal gv0 info >>> Brick 10.0.0.1:/bricks/brick1/gv0 >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick 10.0.0.2:/bricks/brick1/gv0 >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick 10.0.0.3:/bricks/brick1/gv0 >>> Status: Connected >>> Number of entries: 0 >>> >>> >> Did you run this before taking down 2nd node to see if any heals were >> ongoing? >> >> Also I see you have sharding enabled. Are your files being served sharded >> already as well? >> >>> >>> Everything is online and working, but this command give a strange output: >>> >>> # gluster volume heal gv0 info heal-failed >>> Gathering list of heal failed entries on volume gv0 has been >>> unsuccessful on bricks that are down. Please check if all brick >>> processes are running. >>> >>> Is it normal? >> >> >> I don't think that is a valid command anymore as whern I run it I get same >> message and this is in logs >> [2016-11-18 14:35:02.260503] I [MSGID: 106533] >> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management: >> Received heal vol req for volume GLUSTER1 >> [2016-11-18 14:35:02.263341] W [MSGID: 106530] >> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command >> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to >> find the heal information. >> [2016-11-18 14:35:02.263365] E [MSGID: 106301] >> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of >> operation 'Volume Heal' failed on localhost : Command not supported. Please >> use "gluster volume heal GLUSTER1 info" and logs to find the heal >> information. >> >>> >>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage >>> wrote: >>> > >>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert >>> > >>> > wrote: >>> >> >>> >> Okay, used the exact same config you provided, and adding an arbiter >>> >> node (node3) >>> >> >>> >> After halting node2, VM continues to work after a small "lag"/freeze. >>> >> I restarted node2 and it was back online: OK >>> >> >>> >> Then, after waiting few minutes, halting node1. And **just** at this >>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >>> >> etc.) >>> >> >>> > Other than waiting a few minutes did you make sure heals had completed? >>> > >>> >> >>> >> dmesg of the VM: >>> >> >>> >> [ 1645.852905] EXT4-fs error (device xvda1): >>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >>> >> entry in directory: rec_len is smaller than minimal - offset=0(0), >>> >> inode=0, rec_len=0, name_len=0 >>> >> [ 1645.854509] Aborting journal on device xvda1-8. >>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >>> >> >>> >> And got a lot of " comm bash: bad entry in directory" messages then... >>> >> >>> >> Here is the current config with all Node back online: >>> >> >>> >> # gluster volume info >>> >> >>> >> Volume Name: gv0 >>> >> Type: Replicate >>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >>> >> Status: Started >>> >> Snapshot Count: 0 >>> >> Number of Bricks: 1 x (2 + 1) = 3 >>> >> Transport-type: tcp >>> >> Bricks: >>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >>> >> Options Reconfigured: >>> >> nfs.disable: on >>> >> performance.readdir-ahead: on >>> >> transport.address-family: inet >>> >> features.shard: on >>> >> features.shard-block-size:
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
Yes, I did it only if I have the previous result of heal info ("number of entries: 0"). But same result, as soon as the second Node is offline (after they were both working/back online), everything is corrupted. To recap: * Node 1 UP Node 2 UP -> OK * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see the path down and change if necessary) * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed in heal command) * Node 1 DOWN Node 2 UP -> NOT OK (data corruption) On Fri, Nov 18, 2016 at 3:39 PM, David Gossagewrote: > On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert > wrote: >> >> Hi David, >> >> What are the exact commands to be sure it's fine? >> >> Right now I got: >> >> # gluster volume heal gv0 info >> Brick 10.0.0.1:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> Brick 10.0.0.2:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> Brick 10.0.0.3:/bricks/brick1/gv0 >> Status: Connected >> Number of entries: 0 >> >> > Did you run this before taking down 2nd node to see if any heals were > ongoing? > > Also I see you have sharding enabled. Are your files being served sharded > already as well? > >> >> Everything is online and working, but this command give a strange output: >> >> # gluster volume heal gv0 info heal-failed >> Gathering list of heal failed entries on volume gv0 has been >> unsuccessful on bricks that are down. Please check if all brick >> processes are running. >> >> Is it normal? > > > I don't think that is a valid command anymore as whern I run it I get same > message and this is in logs > [2016-11-18 14:35:02.260503] I [MSGID: 106533] > [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume GLUSTER1 > [2016-11-18 14:35:02.263341] W [MSGID: 106530] > [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command > not supported. Please use "gluster volume heal GLUSTER1 info" and logs to > find the heal information. > [2016-11-18 14:35:02.263365] E [MSGID: 106301] > [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of > operation 'Volume Heal' failed on localhost : Command not supported. Please > use "gluster volume heal GLUSTER1 info" and logs to find the heal > information. > >> >> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage >> wrote: >> > >> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert >> > >> > wrote: >> >> >> >> Okay, used the exact same config you provided, and adding an arbiter >> >> node (node3) >> >> >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> >> I restarted node2 and it was back online: OK >> >> >> >> Then, after waiting few minutes, halting node1. And **just** at this >> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> >> etc.) >> >> >> > Other than waiting a few minutes did you make sure heals had completed? >> > >> >> >> >> dmesg of the VM: >> >> >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> >> inode=0, rec_len=0, name_len=0 >> >> [ 1645.854509] Aborting journal on device xvda1-8. >> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> >> >> Here is the current config with all Node back online: >> >> >> >> # gluster volume info >> >> >> >> Volume Name: gv0 >> >> Type: Replicate >> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> >> Options Reconfigured: >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> features.shard: on >> >> features.shard-block-size: 16MB >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> >> >> # gluster volume status >> >> Status of volume: gv0 >> >> Gluster process TCP Port RDMA Port Online >> >> Pid >> >> >> >> >> >> -- >> >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> >> 1331 >> >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> >> 2274 >> >> Brick 10.0.0.3:/bricks/brick1/gv0 49152
[Gluster-users] Conflicting info on whether replicated bricks both online
On the one hand: # gluster volume heal foretee info healed Gathering list of healed entries on volume foretee has been unsuccessful on bricks that are down. Please check if all brick processes are running. root@bu-4t-a:/mnt/gluster# gluster volume status foretee Status of volume: foretee Gluster process TCP Port RDMA Port Online Pid -- Brick bu-4t-a:/mnt/gluster 49153 0 Y 9807 Brick bu-4t-b:/mnt/gluster 49152 0 Y 24638 Self-heal Daemon on localhost N/A N/AY 2743 Self-heal Daemon on bu-4t-b N/A N/AY 12819 Task Status of Volume foretee -- There are no active volume tasks On the other: # gluster volume heal foretee info healed Gathering list of healed entries on volume foretee has been unsuccessful on bricks that are down. Please check if all brick processes are running. And: # gluster volume heal foretee info ... Status: Connected Number of entries: 3141 Both systems have their bricks in /mnt/gluster, and the volume then mounted in /backups. I can write or delete a file in /backups on either system, and it appears in both /backups on the other, and in /mnt/gluster on both. So Gluster is working. There have only ever been the two bricks. But there are 3141 entries that won't heal, and a suggestion that one of the bricks is offline -- when they're both plainly there. This is with glusterfs 3.8.5 on Ubuntu 16.04.1. What's my next move? Thanks, Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
Okay, got it attached :) On Fri, Nov 18, 2016 at 11:00 AM, Krutika Dhananjaywrote: > Assuming you're using FUSE, if your gluster volume is mounted at /some/dir, > for example, > then its corresponding logs will be at /var/log/glusterfs/some-dir.log > > -Krutika > > On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert > wrote: >> >> Attached, bricks log. Where could I find the fuse client log? >> >> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay >> wrote: >> > Could you attach the fuse client and brick logs? >> > >> > -Krutika >> > >> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert >> > >> > wrote: >> >> >> >> Okay, used the exact same config you provided, and adding an arbiter >> >> node (node3) >> >> >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> >> I restarted node2 and it was back online: OK >> >> >> >> Then, after waiting few minutes, halting node1. And **just** at this >> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> >> etc.) >> >> >> >> dmesg of the VM: >> >> >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> >> inode=0, rec_len=0, name_len=0 >> >> [ 1645.854509] Aborting journal on device xvda1-8. >> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> >> >> Here is the current config with all Node back online: >> >> >> >> # gluster volume info >> >> >> >> Volume Name: gv0 >> >> Type: Replicate >> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x (2 + 1) = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> >> Options Reconfigured: >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> features.shard: on >> >> features.shard-block-size: 16MB >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> >> >> # gluster volume status >> >> Status of volume: gv0 >> >> Gluster process TCP Port RDMA Port Online >> >> Pid >> >> >> >> >> >> -- >> >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> >> 1331 >> >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> >> 2274 >> >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> >> 2355 >> >> Self-heal Daemon on localhost N/A N/AY >> >> 2300 >> >> Self-heal Daemon on 10.0.0.3N/A N/AY >> >> 10530 >> >> Self-heal Daemon on 10.0.0.2N/A N/AY >> >> 2425 >> >> >> >> Task Status of Volume gv0 >> >> >> >> >> >> -- >> >> There are no active volume tasks >> >> >> >> >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> >> wrote: >> >> > It's planned to have an arbiter soon :) It was just preliminary >> >> > tests. >> >> > >> >> > Thanks for the settings, I'll test this soon and I'll come back to >> >> > you! >> >> > >> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> >> > wrote: >> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >> >>> >> >> >>> gluster volume info gv0 >> >> >>> >> >> >>> Volume Name: gv0 >> >> >>> Type: Replicate >> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >> >>> Status: Started >> >> >>> Snapshot Count: 0 >> >> >>> Number of Bricks: 1 x 2 = 2 >> >> >>> Transport-type: tcp >> >> >>> Bricks: >> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >> >>> Options Reconfigured: >> >> >>> nfs.disable: on >> >> >>> performance.readdir-ahead: on >> >> >>> transport.address-family: inet >> >> >>> features.shard: on >> >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> >> >> network.remote-dio: enable >> >> >> cluster.eager-lock: enable >> >> >> performance.io-cache: off >> >> >> performance.read-ahead: off >> >> >> performance.quick-read: off >> >> >> performance.stat-prefetch: on >> >> >> performance.strict-write-ordering: off >> >> >>
[Gluster-users] Search Indexer for Files and Folder
Hi, i have a distributed glusterfs with over 77TB of Folder and Files. The Gluster is not 24/7 online. To check if some files are exists i create, if the the gluster is online, a simple textfile like ls -ahls -R ../glustfs/. A simple search with grep shows if the file is on the gluster or not. I think there is a better way and want to ask if someone uses some kind of search indexer... a good solution would be to run under debian or centos with an fast create and update mech of files in glusterfs perhaps with mysql or mariadb storage. to search offline it would good to use a website to search and show results. Any Idea? Taste ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
Assuming you're using FUSE, if your gluster volume is mounted at /some/dir, for example, then its corresponding logs will be at /var/log/glusterfs/some-dir.log -Krutika On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambertwrote: > Attached, bricks log. Where could I find the fuse client log? > > On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay > wrote: > > Could you attach the fuse client and brick logs? > > > > -Krutika > > > > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert < > lambert.oliv...@gmail.com> > > wrote: > >> > >> Okay, used the exact same config you provided, and adding an arbiter > >> node (node3) > >> > >> After halting node2, VM continues to work after a small "lag"/freeze. > >> I restarted node2 and it was back online: OK > >> > >> Then, after waiting few minutes, halting node1. And **just** at this > >> moment, the VM is corrupted (segmentation fault, /var/log folder empty > >> etc.) > >> > >> dmesg of the VM: > >> > >> [ 1645.852905] EXT4-fs error (device xvda1): > >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad > >> entry in directory: rec_len is smaller than minimal - offset=0(0), > >> inode=0, rec_len=0, name_len=0 > >> [ 1645.854509] Aborting journal on device xvda1-8. > >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only > >> > >> And got a lot of " comm bash: bad entry in directory" messages then... > >> > >> Here is the current config with all Node back online: > >> > >> # gluster volume info > >> > >> Volume Name: gv0 > >> Type: Replicate > >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 > >> Status: Started > >> Snapshot Count: 0 > >> Number of Bricks: 1 x (2 + 1) = 3 > >> Transport-type: tcp > >> Bricks: > >> Brick1: 10.0.0.1:/bricks/brick1/gv0 > >> Brick2: 10.0.0.2:/bricks/brick1/gv0 > >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) > >> Options Reconfigured: > >> nfs.disable: on > >> performance.readdir-ahead: on > >> transport.address-family: inet > >> features.shard: on > >> features.shard-block-size: 16MB > >> network.remote-dio: enable > >> cluster.eager-lock: enable > >> performance.io-cache: off > >> performance.read-ahead: off > >> performance.quick-read: off > >> performance.stat-prefetch: on > >> performance.strict-write-ordering: off > >> cluster.server-quorum-type: server > >> cluster.quorum-type: auto > >> cluster.data-self-heal: on > >> > >> > >> # gluster volume status > >> Status of volume: gv0 > >> Gluster process TCP Port RDMA Port Online > >> Pid > >> > >> > -- > >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y > >> 1331 > >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y > >> 2274 > >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y > >> 2355 > >> Self-heal Daemon on localhost N/A N/AY > >> 2300 > >> Self-heal Daemon on 10.0.0.3N/A N/AY > >> 10530 > >> Self-heal Daemon on 10.0.0.2N/A N/AY > >> 2425 > >> > >> Task Status of Volume gv0 > >> > >> > -- > >> There are no active volume tasks > >> > >> > >> > >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert > >> wrote: > >> > It's planned to have an arbiter soon :) It was just preliminary tests. > >> > > >> > Thanks for the settings, I'll test this soon and I'll come back to > you! > >> > > >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson > >> > wrote: > >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: > >> >>> > >> >>> gluster volume info gv0 > >> >>> > >> >>> Volume Name: gv0 > >> >>> Type: Replicate > >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 > >> >>> Status: Started > >> >>> Snapshot Count: 0 > >> >>> Number of Bricks: 1 x 2 = 2 > >> >>> Transport-type: tcp > >> >>> Bricks: > >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 > >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 > >> >>> Options Reconfigured: > >> >>> nfs.disable: on > >> >>> performance.readdir-ahead: on > >> >>> transport.address-family: inet > >> >>> features.shard: on > >> >>> features.shard-block-size: 16MB > >> >> > >> >> > >> >> > >> >> When hosting VM's its essential to set these options: > >> >> > >> >> network.remote-dio: enable > >> >> cluster.eager-lock: enable > >> >> performance.io-cache: off > >> >> performance.read-ahead: off > >> >> performance.quick-read: off > >> >> performance.stat-prefetch: on > >> >> performance.strict-write-ordering: off > >> >> cluster.server-quorum-type: server > >> >> cluster.quorum-type: auto > >> >> cluster.data-self-heal: on > >> >> > >> >> Also with replica two and quorum on (required) your volume will > become > >> >> read-only when one node goes down to prevent the possibility of > >> >> split-brain > >> >> -
Re: [Gluster-users] Can use geo-replication with distributed-replicated volumes?
Unfortunately upstream doc is not up to date with failover and failback commands. But you can use downstream doc : https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html-single/Administration_Guide/index.html#sect-Disaster_Recovery These steps should work fine for you. We will try to update upstream doc as early as possible. Thanks, Bipin Kunal On Thu, Nov 17, 2016 at 10:24 PM, Alexandr Porunovwrote: > Thank you I will wait for it > > Sincerely, > Alexandr > > On Thu, Nov 17, 2016 at 6:43 PM, Bipin Kunal wrote: >> >> I don't have URL handy right now. Will send you tomorrow. Texting from >> mobile right now. >> >> Thanks, >> Bipin >> >> >> On Nov 17, 2016 9:00 PM, "Alexandr Porunov" >> wrote: >>> >>> Thank you for your help! >>> >>> Could you please give me a link or some information about failover? How >>> to change a master state to a slave state? >>> >>> Best regards, >>> Alexandr >>> >>> On Thu, Nov 17, 2016 at 5:07 PM, Bipin Kunal wrote: Please find my comments inline. On Nov 17, 2016 8:30 PM, "Alexandr Porunov" wrote: > > Hello, > > I have several questions about Geo-replication. Please answer if you > can. > > 1. Can use geo-replication with distributed-replicated volumes? Yes. You can. > 2. Can we use less servers in a slave datacenter then in the master > datacenter? (I.e. if I replicate a distributed -replicated volume which > consists from 10 servers to the slave datacenter where only 5 servers). > For > example use less replicas in the slave datacenter. Yes. You are free to use. It is just recommended to have slave volume size equal to master volume. > 3. Are there a possibility to enable failover? I.e. when master > datacenter dies we change our slave to the master? Yes. You can promote slave when master dies. And when Master comes back you can failback to master. > > Sincerely, > Alexandr > > Thanks, Bipin Kunal ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] corruption using gluster and iSCSI with LIO
Hi David, What are the exact commands to be sure it's fine? Right now I got: # gluster volume heal gv0 info Brick 10.0.0.1:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.2:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Brick 10.0.0.3:/bricks/brick1/gv0 Status: Connected Number of entries: 0 Everything is online and working, but this command give a strange output: # gluster volume heal gv0 info heal-failed Gathering list of heal failed entries on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running. Is it normal? On Fri, Nov 18, 2016 at 2:51 AM, David Gossagewrote: > > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert > wrote: >> >> Okay, used the exact same config you provided, and adding an arbiter >> node (node3) >> >> After halting node2, VM continues to work after a small "lag"/freeze. >> I restarted node2 and it was back online: OK >> >> Then, after waiting few minutes, halting node1. And **just** at this >> moment, the VM is corrupted (segmentation fault, /var/log folder empty >> etc.) >> > Other than waiting a few minutes did you make sure heals had completed? > >> >> dmesg of the VM: >> >> [ 1645.852905] EXT4-fs error (device xvda1): >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad >> entry in directory: rec_len is smaller than minimal - offset=0(0), >> inode=0, rec_len=0, name_len=0 >> [ 1645.854509] Aborting journal on device xvda1-8. >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only >> >> And got a lot of " comm bash: bad entry in directory" messages then... >> >> Here is the current config with all Node back online: >> >> # gluster volume info >> >> Volume Name: gv0 >> Type: Replicate >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter) >> Options Reconfigured: >> nfs.disable: on >> performance.readdir-ahead: on >> transport.address-family: inet >> features.shard: on >> features.shard-block-size: 16MB >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> performance.stat-prefetch: on >> performance.strict-write-ordering: off >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.data-self-heal: on >> >> >> # gluster volume status >> Status of volume: gv0 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> -- >> Brick 10.0.0.1:/bricks/brick1/gv0 49152 0 Y >> 1331 >> Brick 10.0.0.2:/bricks/brick1/gv0 49152 0 Y >> 2274 >> Brick 10.0.0.3:/bricks/brick1/gv0 49152 0 Y >> 2355 >> Self-heal Daemon on localhost N/A N/AY >> 2300 >> Self-heal Daemon on 10.0.0.3N/A N/AY >> 10530 >> Self-heal Daemon on 10.0.0.2N/A N/AY >> 2425 >> >> Task Status of Volume gv0 >> >> -- >> There are no active volume tasks >> >> >> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert >> wrote: >> > It's planned to have an arbiter soon :) It was just preliminary tests. >> > >> > Thanks for the settings, I'll test this soon and I'll come back to you! >> > >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson >> > wrote: >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote: >> >>> >> >>> gluster volume info gv0 >> >>> >> >>> Volume Name: gv0 >> >>> Type: Replicate >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53 >> >>> Status: Started >> >>> Snapshot Count: 0 >> >>> Number of Bricks: 1 x 2 = 2 >> >>> Transport-type: tcp >> >>> Bricks: >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0 >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0 >> >>> Options Reconfigured: >> >>> nfs.disable: on >> >>> performance.readdir-ahead: on >> >>> transport.address-family: inet >> >>> features.shard: on >> >>> features.shard-block-size: 16MB >> >> >> >> >> >> >> >> When hosting VM's its essential to set these options: >> >> >> >> network.remote-dio: enable >> >> cluster.eager-lock: enable >> >> performance.io-cache: off >> >> performance.read-ahead: off >> >> performance.quick-read: off >> >> performance.stat-prefetch: on >> >> performance.strict-write-ordering: off >> >> cluster.server-quorum-type: server >> >> cluster.quorum-type: auto >> >> cluster.data-self-heal: on >> >> >> >> Also with replica two and quorum on (required) your volume will become >> >> read-only when one node goes down to prevent the possibility of