Re: [Gluster-users] Conflicting info on whether replicated bricks both online

2016-11-18 Thread Ravishankar N

On 11/18/2016 08:23 PM, Whit Blauvelt wrote:

On the one hand:

   # gluster volume heal foretee info healed
   Gathering list of healed entries on volume foretee has been unsuccessful on 
bricks that are down. Please check if all brick processes are running.


'info healed' and 'info heal-failed' are deprecated sub commands. That 
message is a bug; there's a patch (http://review.gluster.org/#/c/15724/) 
in progress to remove them from the CLI.

   root@bu-4t-a:/mnt/gluster# gluster volume status foretee
   Status of volume: foretee
   Gluster process TCP Port  RDMA Port  Online  Pid
   
--
   Brick bu-4t-a:/mnt/gluster  49153 0  Y   9807
   Brick bu-4t-b:/mnt/gluster  49152 0  Y   
24638
   Self-heal Daemon on localhost   N/A   N/AY   2743
   Self-heal Daemon on bu-4t-b N/A   N/AY   
12819

   Task Status of Volume foretee

   
--
   There are no active volume tasks

On the other:

   # gluster volume heal foretee info healed
   Gathering list of healed entries on volume foretee has been unsuccessful on 
bricks that are down. Please check if all brick processes are running.

And:

   # gluster volume heal foretee info
This is the only command you need to run to monitor pending entries. As 
to why they are not getting healed, you would have to look at the 
glustershd.log on both nodes.  Manually launch heal with `gluster volume 
heal ` and see what the shd log spews out.


HTH,
Ravi

   ...
   
   
   Status: Connected
   Number of entries: 3141

Both systems have their bricks in /mnt/gluster, and the volume then mounted
in /backups. I can write or delete a file in /backups on either system, and
it appears in both /backups on the other, and in /mnt/gluster on both.

So Gluster is working. There have only ever been the two bricks. But there
are 3141 entries that won't heal, and a suggestion that one of the bricks is
offline -- when they're both plainly there.

This is with glusterfs 3.8.5 on Ubuntu 16.04.1.




What's my next move?

Thanks,
Whit
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to enable shared_storage?

2016-11-18 Thread Alexandr Porunov
Hello,

I try to enable shared storage for Geo-Replication but I am not sure that I
do it properly.

Here is what I do:
# gluster volume set all cluster.enable-shared-storage enable
volume set: success

# mount -t glusterfs 127.0.0.1:gluster_shared_storage
/var/run/gluster/shared_storage
ERROR: Mount point does not exist

Please specify a mount point

Usage:

man 8 /sbin/mount.glusterfs


Why last command shows an error?

Sincerely,
Alexandr
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Can use geo-replication with distributed-replicated volumes?

2016-11-18 Thread Alexandr Porunov
Thank you very much for your help!

Best regards,
Alexandr

On Fri, Nov 18, 2016 at 11:56 AM, Bipin Kunal  wrote:

> Unfortunately upstream doc is not up to date with failover and
> failback commands.
>
> But you can use downstream doc :
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Storage/3.1/html-single/Administration_Guide/index.
> html#sect-Disaster_Recovery
>
> These steps should work fine for you.
>
> We will try to update upstream doc as early as possible.
>
> Thanks,
> Bipin Kunal
>
> On Thu, Nov 17, 2016 at 10:24 PM, Alexandr Porunov
>  wrote:
> > Thank you I will wait for it
> >
> > Sincerely,
> > Alexandr
> >
> > On Thu, Nov 17, 2016 at 6:43 PM, Bipin Kunal  wrote:
> >>
> >> I don't have URL handy right now.  Will send you tomorrow. Texting from
> >> mobile right now.
> >>
> >> Thanks,
> >> Bipin
> >>
> >>
> >> On Nov 17, 2016 9:00 PM, "Alexandr Porunov"  >
> >> wrote:
> >>>
> >>> Thank you for your help!
> >>>
> >>> Could you please give me a link or some information about failover? How
> >>> to change a master state to a slave state?
> >>>
> >>> Best regards,
> >>> Alexandr
> >>>
> >>> On Thu, Nov 17, 2016 at 5:07 PM, Bipin Kunal 
> wrote:
> 
>  Please find my comments inline.
> 
>  On Nov 17, 2016 8:30 PM, "Alexandr Porunov" <
> alexandr.poru...@gmail.com>
>  wrote:
>  >
>  > Hello,
>  >
>  > I have several questions about Geo-replication. Please answer if you
>  > can.
>  >
>  > 1. Can use geo-replication with distributed-replicated volumes?
>  Yes. You can.
>  > 2. Can we use less servers in a slave datacenter then in the master
>  > datacenter? (I.e. if I replicate a distributed -replicated volume
> which
>  > consists from 10 servers to the slave datacenter where only 5
> servers). For
>  > example use less replicas in the slave datacenter.
>  Yes.  You are free to use.  It is just recommended to have slave
> volume
>  size equal to master volume.
>  > 3. Are there a possibility to enable failover? I.e. when master
>  > datacenter dies we change our slave to the master?
>  Yes.  You can promote slave when master dies. And when Master comes
> back
>  you can failback to master.
>  >
>  > Sincerely,
>  > Alexandr
>  >
>  >
>  Thanks,
>  Bipin Kunal
>  ___
>  > Gluster-users mailing list
>  > Gluster-users@gluster.org
>  > http://www.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterfs volume bricks disk is filled up

2016-11-18 Thread Imaad Ghouri
Hi Team,

Can anyone please have a look at the question below ? Thanks

Imaad

On Thu, Nov 17, 2016 at 11:00 PM, Imaad Ghouri 
wrote:

> Hi Team,
> Quick question.
>
> I have a glusterfs cluster setup for 20 nodes that has /share where I do
> store all of the data that is shared successfully across all the nodes in
> the cluster .. And each node also has /data where glusterfs volume bricks
> data gets stored by some gluster mechanism ..
>
>  I have a question here at /data . In one of the node A (out of 20) ,
> /data disk space got filled up and there is no space left on /data. How
> does glusterfs behaves in this case? I do see no space left message on the
> glusterfs logs node where disk space is full .. Not sure if I need to worry
> about it .. Is it just a warning message ? What happens when I try to
> access the /share data and if request goes node A, what will happen then?
>
> I am using glusterfs 3.6 version. Thanks
>
> Sent from my iPhone




-- 
Regards,
Imaad Ghouri
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Joe Julian
If it's writing to the root partition then the mount went away. Any 
clues in the gluster client log?


On 11/18/2016 08:21 AM, Olivier Lambert wrote:

After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
anymore in the local Gluster mount, but in the root partition.

Despite "df -h" shows the Gluster brick mounted:

/dev/mapper/centos-root   3,1G3,1G   20K 100% /
...
/dev/xvdb  61G 61G  956M  99% /bricks/brick1
localhost:/gv0 61G 61G  956M  99% /mnt

If I unmount it, I still see the "block.img" in /mnt which is filling
the root space. So it's like Fuse is messing with the local Gluster
mount, which could lead to the data corruption on the client level.

It doesn't make sense for me... What am I missing?

On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
 wrote:

Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
 wrote:

On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
wrote:

Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0



Did you run this before taking down 2nd node to see if any heals were
ongoing?

Also I see you have sharding enabled.  Are your files being served sharded
already as well?


Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?


I don't think that is a valid command anymore as whern I run it I get same
message and this is in logs
  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
[glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume GLUSTER1
[2016-11-18 14:35:02.263341] W [MSGID: 106530]
[glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
find the heal information.
[2016-11-18 14:35:02.263365] E [MSGID: 106301]
[glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
operation 'Volume Heal' failed on localhost : Command not supported. Please
use "gluster volume heal GLUSTER1 info" and logs to find the heal
information.


On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
 wrote:

On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert

wrote:

Okay, used the exact same config you provided, and adding an arbiter
node (node3)

After halting node2, VM continues to work after a small "lag"/freeze.
I restarted node2 and it was back online: OK

Then, after waiting few minutes, halting node1. And **just** at this
moment, the VM is corrupted (segmentation fault, /var/log folder empty
etc.)


Other than waiting a few minutes did you make sure heals had completed?


dmesg of the VM:

[ 1645.852905] EXT4-fs error (device xvda1):
htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
entry in directory: rec_len is smaller than minimal - offset=0(0),
inode=0, rec_len=0, name_len=0
[ 1645.854509] Aborting journal on device xvda1-8.
[ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only

And got a lot of " comm bash: bad entry in directory" messages then...

Here is the current config with all Node back online:

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.0.1:/bricks/brick1/gv0
Brick2: 10.0.0.2:/bricks/brick1/gv0
Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.shard: on
features.shard-block-size: 16MB
network.remote-dio: enable
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.stat-prefetch: on
performance.strict-write-ordering: off
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.data-self-heal: on


# gluster volume status
Status of volume: gv0
Gluster process TCP 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
After Node 1 is DOWN, LIO on Node2 (iSCSI target) is not writing
anymore in the local Gluster mount, but in the root partition.

Despite "df -h" shows the Gluster brick mounted:

/dev/mapper/centos-root   3,1G3,1G   20K 100% /
...
/dev/xvdb  61G 61G  956M  99% /bricks/brick1
localhost:/gv0 61G 61G  956M  99% /mnt

If I unmount it, I still see the "block.img" in /mnt which is filling
the root space. So it's like Fuse is messing with the local Gluster
mount, which could lead to the data corruption on the client level.

It doesn't make sense for me... What am I missing?

On Fri, Nov 18, 2016 at 5:00 PM, Olivier Lambert
 wrote:
> Yes, I did it only if I have the previous result of heal info ("number
> of entries: 0"). But same result, as soon as the second Node is
> offline (after they were both working/back online), everything is
> corrupted.
>
> To recap:
>
> * Node 1 UP Node 2 UP -> OK
> * Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
> the path down and change if necessary)
> * Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
> in heal command)
> * Node 1 DOWN Node 2 UP -> NOT OK (data corruption)
>
> On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
>  wrote:
>> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
>> wrote:
>>>
>>> Hi David,
>>>
>>> What are the exact commands to be sure it's fine?
>>>
>>> Right now I got:
>>>
>>> # gluster volume heal gv0 info
>>> Brick 10.0.0.1:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.2:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>> Brick 10.0.0.3:/bricks/brick1/gv0
>>> Status: Connected
>>> Number of entries: 0
>>>
>>>
>> Did you run this before taking down 2nd node to see if any heals were
>> ongoing?
>>
>> Also I see you have sharding enabled.  Are your files being served sharded
>> already as well?
>>
>>>
>>> Everything is online and working, but this command give a strange output:
>>>
>>> # gluster volume heal gv0 info heal-failed
>>> Gathering list of heal failed entries on volume gv0 has been
>>> unsuccessful on bricks that are down. Please check if all brick
>>> processes are running.
>>>
>>> Is it normal?
>>
>>
>> I don't think that is a valid command anymore as whern I run it I get same
>> message and this is in logs
>>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
>> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
>> Received heal vol req for volume GLUSTER1
>> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
>> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
>> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
>> find the heal information.
>> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
>> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
>> operation 'Volume Heal' failed on localhost : Command not supported. Please
>> use "gluster volume heal GLUSTER1 info" and logs to find the heal
>> information.
>>
>>>
>>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>>  wrote:
>>> >
>>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>>> > 
>>> > wrote:
>>> >>
>>> >> Okay, used the exact same config you provided, and adding an arbiter
>>> >> node (node3)
>>> >>
>>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>>> >> I restarted node2 and it was back online: OK
>>> >>
>>> >> Then, after waiting few minutes, halting node1. And **just** at this
>>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>>> >> etc.)
>>> >>
>>> > Other than waiting a few minutes did you make sure heals had completed?
>>> >
>>> >>
>>> >> dmesg of the VM:
>>> >>
>>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>>> >> inode=0, rec_len=0, name_len=0
>>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>> >>
>>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>>> >>
>>> >> Here is the current config with all Node back online:
>>> >>
>>> >> # gluster volume info
>>> >>
>>> >> Volume Name: gv0
>>> >> Type: Replicate
>>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>>> >> Status: Started
>>> >> Snapshot Count: 0
>>> >> Number of Bricks: 1 x (2 + 1) = 3
>>> >> Transport-type: tcp
>>> >> Bricks:
>>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>>> >> Options Reconfigured:
>>> >> nfs.disable: on
>>> >> performance.readdir-ahead: on
>>> >> transport.address-family: inet
>>> >> features.shard: on
>>> >> features.shard-block-size: 

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Yes, I did it only if I have the previous result of heal info ("number
of entries: 0"). But same result, as soon as the second Node is
offline (after they were both working/back online), everything is
corrupted.

To recap:

* Node 1 UP Node 2 UP -> OK
* Node 1 UP Node 2 DOWN -> OK (just a small lag for multipath to see
the path down and change if necessary)
* Node 1 UP Node 2 UP -> OK (and waiting to have no entries displayed
in heal command)
* Node 1 DOWN Node 2 UP -> NOT OK (data corruption)

On Fri, Nov 18, 2016 at 3:39 PM, David Gossage
 wrote:
> On Fri, Nov 18, 2016 at 3:49 AM, Olivier Lambert 
> wrote:
>>
>> Hi David,
>>
>> What are the exact commands to be sure it's fine?
>>
>> Right now I got:
>>
>> # gluster volume heal gv0 info
>> Brick 10.0.0.1:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.2:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick 10.0.0.3:/bricks/brick1/gv0
>> Status: Connected
>> Number of entries: 0
>>
>>
> Did you run this before taking down 2nd node to see if any heals were
> ongoing?
>
> Also I see you have sharding enabled.  Are your files being served sharded
> already as well?
>
>>
>> Everything is online and working, but this command give a strange output:
>>
>> # gluster volume heal gv0 info heal-failed
>> Gathering list of heal failed entries on volume gv0 has been
>> unsuccessful on bricks that are down. Please check if all brick
>> processes are running.
>>
>> Is it normal?
>
>
> I don't think that is a valid command anymore as whern I run it I get same
> message and this is in logs
>  [2016-11-18 14:35:02.260503] I [MSGID: 106533]
> [glusterd-volume-ops.c:878:__glusterd_handle_cli_heal_volume] 0-management:
> Received heal vol req for volume GLUSTER1
> [2016-11-18 14:35:02.263341] W [MSGID: 106530]
> [glusterd-volume-ops.c:1882:glusterd_handle_heal_cmd] 0-management: Command
> not supported. Please use "gluster volume heal GLUSTER1 info" and logs to
> find the heal information.
> [2016-11-18 14:35:02.263365] E [MSGID: 106301]
> [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Heal' failed on localhost : Command not supported. Please
> use "gluster volume heal GLUSTER1 info" and logs to find the heal
> information.
>
>>
>> On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
>>  wrote:
>> >
>> > On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert
>> > 
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just** at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> >> etc.)
>> >>
>> > Other than waiting a few minutes did you make sure heals had completed?
>> >
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process TCP Port  RDMA Port  Online
>> >> Pid
>> >>
>> >>
>> >> --
>> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 

[Gluster-users] Conflicting info on whether replicated bricks both online

2016-11-18 Thread Whit Blauvelt
On the one hand:

  # gluster volume heal foretee info healed
  Gathering list of healed entries on volume foretee has been unsuccessful on 
bricks that are down. Please check if all brick processes are running.
  root@bu-4t-a:/mnt/gluster# gluster volume status foretee
  Status of volume: foretee
  Gluster process TCP Port  RDMA Port  Online  Pid
  --
  Brick bu-4t-a:/mnt/gluster  49153 0  Y   9807 
  Brick bu-4t-b:/mnt/gluster  49152 0  Y   24638
  Self-heal Daemon on localhost   N/A   N/AY   2743 
  Self-heal Daemon on bu-4t-b N/A   N/AY   12819
   
  Task Status of Volume foretee
  --
  There are no active volume tasks

On the other:

  # gluster volume heal foretee info healed
  Gathering list of healed entries on volume foretee has been unsuccessful on 
bricks that are down. Please check if all brick processes are running.

And:

  # gluster volume heal foretee info
  ...
   
   
  Status: Connected
  Number of entries: 3141

Both systems have their bricks in /mnt/gluster, and the volume then mounted
in /backups. I can write or delete a file in /backups on either system, and
it appears in both /backups on the other, and in /mnt/gluster on both. 

So Gluster is working. There have only ever been the two bricks. But there
are 3141 entries that won't heal, and a suggestion that one of the bricks is
offline -- when they're both plainly there.

This is with glusterfs 3.8.5 on Ubuntu 16.04.1.

What's my next move?

Thanks, 
Whit
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Okay, got it attached :)

On Fri, Nov 18, 2016 at 11:00 AM, Krutika Dhananjay  wrote:
> Assuming you're using FUSE, if your gluster volume is mounted at /some/dir,
> for example,
> then its corresponding logs will be at /var/log/glusterfs/some-dir.log
>
> -Krutika
>
> On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert 
> wrote:
>>
>> Attached, bricks log. Where could I find the fuse client log?
>>
>> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay 
>> wrote:
>> > Could you attach the fuse client and brick logs?
>> >
>> > -Krutika
>> >
>> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert
>> > 
>> > wrote:
>> >>
>> >> Okay, used the exact same config you provided, and adding an arbiter
>> >> node (node3)
>> >>
>> >> After halting node2, VM continues to work after a small "lag"/freeze.
>> >> I restarted node2 and it was back online: OK
>> >>
>> >> Then, after waiting few minutes, halting node1. And **just** at this
>> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> >> etc.)
>> >>
>> >> dmesg of the VM:
>> >>
>> >> [ 1645.852905] EXT4-fs error (device xvda1):
>> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> >> inode=0, rec_len=0, name_len=0
>> >> [ 1645.854509] Aborting journal on device xvda1-8.
>> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>> >>
>> >> And got a lot of " comm bash: bad entry in directory" messages then...
>> >>
>> >> Here is the current config with all Node back online:
>> >>
>> >> # gluster volume info
>> >>
>> >> Volume Name: gv0
>> >> Type: Replicate
>> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> >> Status: Started
>> >> Snapshot Count: 0
>> >> Number of Bricks: 1 x (2 + 1) = 3
>> >> Transport-type: tcp
>> >> Bricks:
>> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> >> Options Reconfigured:
>> >> nfs.disable: on
>> >> performance.readdir-ahead: on
>> >> transport.address-family: inet
>> >> features.shard: on
>> >> features.shard-block-size: 16MB
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >>
>> >> # gluster volume status
>> >> Status of volume: gv0
>> >> Gluster process TCP Port  RDMA Port  Online
>> >> Pid
>> >>
>> >>
>> >> --
>> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> >> 1331
>> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> >> 2274
>> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>> >> 2355
>> >> Self-heal Daemon on localhost   N/A   N/AY
>> >> 2300
>> >> Self-heal Daemon on 10.0.0.3N/A   N/AY
>> >> 10530
>> >> Self-heal Daemon on 10.0.0.2N/A   N/AY
>> >> 2425
>> >>
>> >> Task Status of Volume gv0
>> >>
>> >>
>> >> --
>> >> There are no active volume tasks
>> >>
>> >>
>> >>
>> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>> >>  wrote:
>> >> > It's planned to have an arbiter soon :) It was just preliminary
>> >> > tests.
>> >> >
>> >> > Thanks for the settings, I'll test this soon and I'll come back to
>> >> > you!
>> >> >
>> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >> >  wrote:
>> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >> >>>
>> >> >>> gluster volume info gv0
>> >> >>>
>> >> >>> Volume Name: gv0
>> >> >>> Type: Replicate
>> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >> >>> Status: Started
>> >> >>> Snapshot Count: 0
>> >> >>> Number of Bricks: 1 x 2 = 2
>> >> >>> Transport-type: tcp
>> >> >>> Bricks:
>> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >> >>> Options Reconfigured:
>> >> >>> nfs.disable: on
>> >> >>> performance.readdir-ahead: on
>> >> >>> transport.address-family: inet
>> >> >>> features.shard: on
>> >> >>> features.shard-block-size: 16MB
>> >> >>
>> >> >>
>> >> >>
>> >> >> When hosting VM's its essential to set these options:
>> >> >>
>> >> >> network.remote-dio: enable
>> >> >> cluster.eager-lock: enable
>> >> >> performance.io-cache: off
>> >> >> performance.read-ahead: off
>> >> >> performance.quick-read: off
>> >> >> performance.stat-prefetch: on
>> >> >> performance.strict-write-ordering: off
>> >> >> 

[Gluster-users] Search Indexer for Files and Folder

2016-11-18 Thread kontakt

Hi,
i have a distributed glusterfs with over 77TB of Folder and Files. The 
Gluster is not 24/7 online. To check if some files are exists i create, 
if the the gluster is online, a simple textfile like ls -ahls -R 
../glustfs/. A simple search with grep shows if the file is on the 
gluster or not. I think there is a better way and want to ask if someone 
uses some kind of search indexer... a good solution would be to run 
under debian or centos with an fast create and update mech of files in 
glusterfs perhaps with mysql or mariadb storage. to search offline it 
would good to use a website to search and show results.


Any Idea?
Taste
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Krutika Dhananjay
Assuming you're using FUSE, if your gluster volume is mounted at /some/dir,
for example,
then its corresponding logs will be at /var/log/glusterfs/some-dir.log

-Krutika

On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert 
wrote:

> Attached, bricks log. Where could I find the fuse client log?
>
> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay 
> wrote:
> > Could you attach the fuse client and brick logs?
> >
> > -Krutika
> >
> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <
> lambert.oliv...@gmail.com>
> > wrote:
> >>
> >> Okay, used the exact same config you provided, and adding an arbiter
> >> node (node3)
> >>
> >> After halting node2, VM continues to work after a small "lag"/freeze.
> >> I restarted node2 and it was back online: OK
> >>
> >> Then, after waiting few minutes, halting node1. And **just** at this
> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> >> etc.)
> >>
> >> dmesg of the VM:
> >>
> >> [ 1645.852905] EXT4-fs error (device xvda1):
> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
> >> inode=0, rec_len=0, name_len=0
> >> [ 1645.854509] Aborting journal on device xvda1-8.
> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
> >>
> >> And got a lot of " comm bash: bad entry in directory" messages then...
> >>
> >> Here is the current config with all Node back online:
> >>
> >> # gluster volume info
> >>
> >> Volume Name: gv0
> >> Type: Replicate
> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> >> Status: Started
> >> Snapshot Count: 0
> >> Number of Bricks: 1 x (2 + 1) = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> >> Options Reconfigured:
> >> nfs.disable: on
> >> performance.readdir-ahead: on
> >> transport.address-family: inet
> >> features.shard: on
> >> features.shard-block-size: 16MB
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >>
> >> # gluster volume status
> >> Status of volume: gv0
> >> Gluster process TCP Port  RDMA Port  Online
> >> Pid
> >>
> >> 
> --
> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
> >> 1331
> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
> >> 2274
> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
> >> 2355
> >> Self-heal Daemon on localhost   N/A   N/AY
> >> 2300
> >> Self-heal Daemon on 10.0.0.3N/A   N/AY
> >> 10530
> >> Self-heal Daemon on 10.0.0.2N/A   N/AY
> >> 2425
> >>
> >> Task Status of Volume gv0
> >>
> >> 
> --
> >> There are no active volume tasks
> >>
> >>
> >>
> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> >>  wrote:
> >> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >> >
> >> > Thanks for the settings, I'll test this soon and I'll come back to
> you!
> >> >
> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >> >  wrote:
> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >> >>>
> >> >>> gluster volume info gv0
> >> >>>
> >> >>> Volume Name: gv0
> >> >>> Type: Replicate
> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >> >>> Status: Started
> >> >>> Snapshot Count: 0
> >> >>> Number of Bricks: 1 x 2 = 2
> >> >>> Transport-type: tcp
> >> >>> Bricks:
> >> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> >>> Options Reconfigured:
> >> >>> nfs.disable: on
> >> >>> performance.readdir-ahead: on
> >> >>> transport.address-family: inet
> >> >>> features.shard: on
> >> >>> features.shard-block-size: 16MB
> >> >>
> >> >>
> >> >>
> >> >> When hosting VM's its essential to set these options:
> >> >>
> >> >> network.remote-dio: enable
> >> >> cluster.eager-lock: enable
> >> >> performance.io-cache: off
> >> >> performance.read-ahead: off
> >> >> performance.quick-read: off
> >> >> performance.stat-prefetch: on
> >> >> performance.strict-write-ordering: off
> >> >> cluster.server-quorum-type: server
> >> >> cluster.quorum-type: auto
> >> >> cluster.data-self-heal: on
> >> >>
> >> >> Also with replica two and quorum on (required) your volume will
> become
> >> >> read-only when one node goes down to prevent the possibility of
> >> >> split-brain
> >> >> - 

Re: [Gluster-users] Can use geo-replication with distributed-replicated volumes?

2016-11-18 Thread Bipin Kunal
Unfortunately upstream doc is not up to date with failover and
failback commands.

But you can use downstream doc :
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html-single/Administration_Guide/index.html#sect-Disaster_Recovery

These steps should work fine for you.

We will try to update upstream doc as early as possible.

Thanks,
Bipin Kunal

On Thu, Nov 17, 2016 at 10:24 PM, Alexandr Porunov
 wrote:
> Thank you I will wait for it
>
> Sincerely,
> Alexandr
>
> On Thu, Nov 17, 2016 at 6:43 PM, Bipin Kunal  wrote:
>>
>> I don't have URL handy right now.  Will send you tomorrow. Texting from
>> mobile right now.
>>
>> Thanks,
>> Bipin
>>
>>
>> On Nov 17, 2016 9:00 PM, "Alexandr Porunov" 
>> wrote:
>>>
>>> Thank you for your help!
>>>
>>> Could you please give me a link or some information about failover? How
>>> to change a master state to a slave state?
>>>
>>> Best regards,
>>> Alexandr
>>>
>>> On Thu, Nov 17, 2016 at 5:07 PM, Bipin Kunal  wrote:

 Please find my comments inline.

 On Nov 17, 2016 8:30 PM, "Alexandr Porunov" 
 wrote:
 >
 > Hello,
 >
 > I have several questions about Geo-replication. Please answer if you
 > can.
 >
 > 1. Can use geo-replication with distributed-replicated volumes?
 Yes. You can.
 > 2. Can we use less servers in a slave datacenter then in the master
 > datacenter? (I.e. if I replicate a distributed -replicated volume which
 > consists from 10 servers to the slave datacenter where only 5 servers). 
 > For
 > example use less replicas in the slave datacenter.
 Yes.  You are free to use.  It is just recommended to have slave volume
 size equal to master volume.
 > 3. Are there a possibility to enable failover? I.e. when master
 > datacenter dies we change our slave to the master?
 Yes.  You can promote slave when master dies. And when Master comes back
 you can failback to master.
 >
 > Sincerely,
 > Alexandr
 >
 >
 Thanks,
 Bipin Kunal
 ___
 > Gluster-users mailing list
 > Gluster-users@gluster.org
 > http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Olivier Lambert
Hi David,

What are the exact commands to be sure it's fine?

Right now I got:

# gluster volume heal gv0 info
Brick 10.0.0.1:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.2:/bricks/brick1/gv0
Status: Connected
Number of entries: 0

Brick 10.0.0.3:/bricks/brick1/gv0
Status: Connected
Number of entries: 0


Everything is online and working, but this command give a strange output:

# gluster volume heal gv0 info heal-failed
Gathering list of heal failed entries on volume gv0 has been
unsuccessful on bricks that are down. Please check if all brick
processes are running.

Is it normal?

On Fri, Nov 18, 2016 at 2:51 AM, David Gossage
 wrote:
>
> On Thu, Nov 17, 2016 at 6:42 PM, Olivier Lambert 
> wrote:
>>
>> Okay, used the exact same config you provided, and adding an arbiter
>> node (node3)
>>
>> After halting node2, VM continues to work after a small "lag"/freeze.
>> I restarted node2 and it was back online: OK
>>
>> Then, after waiting few minutes, halting node1. And **just** at this
>> moment, the VM is corrupted (segmentation fault, /var/log folder empty
>> etc.)
>>
> Other than waiting a few minutes did you make sure heals had completed?
>
>>
>> dmesg of the VM:
>>
>> [ 1645.852905] EXT4-fs error (device xvda1):
>> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
>> entry in directory: rec_len is smaller than minimal - offset=0(0),
>> inode=0, rec_len=0, name_len=0
>> [ 1645.854509] Aborting journal on device xvda1-8.
>> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>>
>> And got a lot of " comm bash: bad entry in directory" messages then...
>>
>> Here is the current config with all Node back online:
>>
>> # gluster volume info
>>
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> features.shard: on
>> features.shard-block-size: 16MB
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: on
>> performance.strict-write-ordering: off
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.data-self-heal: on
>>
>>
>> # gluster volume status
>> Status of volume: gv0
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>>
>> --
>> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>> 1331
>> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>> 2274
>> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>> 2355
>> Self-heal Daemon on localhost   N/A   N/AY
>> 2300
>> Self-heal Daemon on 10.0.0.3N/A   N/AY
>> 10530
>> Self-heal Daemon on 10.0.0.2N/A   N/AY
>> 2425
>>
>> Task Status of Volume gv0
>>
>> --
>> There are no active volume tasks
>>
>>
>>
>> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>>  wrote:
>> > It's planned to have an arbiter soon :) It was just preliminary tests.
>> >
>> > Thanks for the settings, I'll test this soon and I'll come back to you!
>> >
>> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
>> >  wrote:
>> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
>> >>>
>> >>> gluster volume info gv0
>> >>>
>> >>> Volume Name: gv0
>> >>> Type: Replicate
>> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
>> >>> Status: Started
>> >>> Snapshot Count: 0
>> >>> Number of Bricks: 1 x 2 = 2
>> >>> Transport-type: tcp
>> >>> Bricks:
>> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
>> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
>> >>> Options Reconfigured:
>> >>> nfs.disable: on
>> >>> performance.readdir-ahead: on
>> >>> transport.address-family: inet
>> >>> features.shard: on
>> >>> features.shard-block-size: 16MB
>> >>
>> >>
>> >>
>> >> When hosting VM's its essential to set these options:
>> >>
>> >> network.remote-dio: enable
>> >> cluster.eager-lock: enable
>> >> performance.io-cache: off
>> >> performance.read-ahead: off
>> >> performance.quick-read: off
>> >> performance.stat-prefetch: on
>> >> performance.strict-write-ordering: off
>> >> cluster.server-quorum-type: server
>> >> cluster.quorum-type: auto
>> >> cluster.data-self-heal: on
>> >>
>> >> Also with replica two and quorum on (required) your volume will become
>> >> read-only when one node goes down to prevent the possibility of