Re: [Gluster-users] gluster and qcow2 images

2019-06-28 Thread Alex K
Hi

On Fri, Jun 28, 2019, 17:54 Marcus Schopen  wrote:

> Hi,
>
> does anyone have experience with gluster in KVM environments? I would
> like to hold qcow2 images of a KVM host with a second KVM host in sync.
> Unfortunately, shared storage is not available to me, only the
> two KVM hosts. In principle, it would be sufficient for me - in case of
> a failure of the first KVM host - to start the guests on the second
> host by hand without restoring the images from the nightly backup
> first. The question is, is glusterfs a sensible solution here or
> should one better use other approaches e.g. DRBD. I have read
> contradictory statements about this, many advise against using gluster
> for qcow2 images, some report no problems at all.
>
Redhat uses gluster in its RHEV solution. Ovirt is the open source one.
Thus gluster can be used with good results. You will need a 10G network for
the gluster storage for higher performance and enable sharding on the
shared volume. Two node setups are prone to split brain issues which may
cause headaches. I am running such setups for years and encountered few
splits which i was able to recover from. You need some fencing solution
inplace to minimize such issues. I would expect higher performance from
DRBD, though I am not aware of any GUI solution that simplifies its
management.

>
> Cheers
> Marcus
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Geo-Replication Changelog Error - is a directory

2019-06-28 Thread Matthew Benstead
Hello,

I'm having some issues with successfully establishing a geo-repolication
session between a 7-server distribute cluster as the primary volume, and
a 2 server distribute cluster as the secondary volume. Both are running
the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64.

I was able to setup the replication keys, user, groups, etc and
establish the session, but it goes faulty quickly after starting.

The error from the gsyncd.log is:

Changelog register failed error=[Errno 21] Is a directory

We made an attempt about 2 years ago to configure geo-replication but
abandoned it, now with a new cluster I wanted to get it setup, but it
looks like changelogs have been accumulating since then:

[root@gluster07 .glusterfs]# ls -lh changelogs > /var/tmp/changelogs.txt
[root@gluster07 ~]# head /var/tmp/changelogs.txt
total 11G
-rw-r--r--. 1 root root  130 Jun 27 13:48 CHANGELOG
-rw-r--r--. 1 root root 2.6K Jun 19  2017 CHANGELOG.1497891971
-rw-r--r--. 1 root root  470 Jun 19  2017 CHANGELOG.1497892055
-rw-r--r--. 1 root root  186 Jun 19  2017 CHANGELOG.1497892195
-rw-r--r--. 1 root root  458 Jun 19  2017 CHANGELOG.1497892308
-rw-r--r--. 1 root root  188 Jun 19  2017 CHANGELOG.1497892491
-rw-r--r--. 1 root root  862 Jun 19  2017 CHANGELOG.1497892828
-rw-r--r--. 1 root root  11K Jun 19  2017 CHANGELOG.1497892927
-rw-r--r--. 1 root root 4.4K Jun 19  2017 CHANGELOG.1497892941
[root@gluster07 ~]# tail /var/tmp/changelogs.txt
-rw-r--r--. 1 root root  130 Jun 27 13:47 CHANGELOG.1561668463
-rw-r--r--. 1 root root  130 Jun 27 13:47 CHANGELOG.1561668477
-rw-r--r--. 1 root root  130 Jun 27 13:48 CHANGELOG.1561668491
-rw-r--r--. 1 root root  130 Jun 27 13:48 CHANGELOG.1561668506
-rw-r--r--. 1 root root  130 Jun 27 13:48 CHANGELOG.1561668521
-rw-r--r--. 1 root root  130 Jun 27 13:48 CHANGELOG.1561668536
-rw-r--r--. 1 root root  130 Jun 27 13:49 CHANGELOG.1561668550
-rw-r--r--. 1 root root  130 Jun 27 13:49 CHANGELOG.1561668565
drw---. 2 root root   10 Jun 19  2017 csnap
drw---. 2 root root   37 Jun 19  2017 htime

Could this be related?

When deleting the replication session I made sure to try the
'delete reset-sync-time' option, but it failed with:

gsyncd failed to delete session info for storage and
10.0.231.81::pcic-backup peers
geo-replication command failed

Here is the volume info:

[root@gluster07 ~]# gluster volume info storage

Volume Name: storage
Type: Distribute
Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: 10.0.231.50:/mnt/raid6-storage/storage
Brick2: 10.0.231.51:/mnt/raid6-storage/storage
Brick3: 10.0.231.52:/mnt/raid6-storage/storage
Brick4: 10.0.231.53:/mnt/raid6-storage/storage
Brick5: 10.0.231.54:/mnt/raid6-storage/storage
Brick6: 10.0.231.55:/mnt/raid6-storage/storage
Brick7: 10.0.231.56:/mnt/raid6-storage/storage
Options Reconfigured:
features.quota-deem-statfs: on
features.read-only: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
nfs.disable: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
transport.address-family: inet


Any ideas?

Thanks,
 -Matthew
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] gluster and qcow2 images

2019-06-28 Thread Marcus Schopen
Hi,

does anyone have experience with gluster in KVM environments? I would
like to hold qcow2 images of a KVM host with a second KVM host in sync.
Unfortunately, shared storage is not available to me, only the
two KVM hosts. In principle, it would be sufficient for me - in case of
a failure of the first KVM host - to start the guests on the second
host by hand without restoring the images from the nightly backup
first. The question is, is glusterfs a sensible solution here or
should one better use other approaches e.g. DRBD. I have read
contradictory statements about this, many advise against using gluster
for qcow2 images, some report no problems at all.

Cheers
Marcus


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Removing subvolume from dist/rep volume

2019-06-28 Thread Dave Sherohman
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> There are some edge cases that may prevent a file from being migrated
> during a remove-brick. Please do the following after this:
> 
>1. Check the remove-brick status for any failures.  If there are any,
>check the rebalance log file for errors.
>2. Even if there are no failures, check the removed bricks to see if any
>files have not been migrated. If there are any, please check that they are
>valid files on the brick and copy them to the volume from the brick to the
>mount point.

Well, looks like I hit one of those edge cases.  Probably because of
some issues around a reboot last September which left a handful of files
in a state where self-heal identified them as needing to be healed, but
incapable of actually healing them.  (Check the list archives for
"Kicking a stuck heal", posted on Sept 4, if you want more details.)

So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick
(gandalf), and 3 on the other (saruman).  Looking in
/var/log/gluster/palantir-rebalance.log, I see those numbers of

migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229: failed 
to lock file on palantir-replicate-1 [Stale file handle]

errors.

Also, merlin has four errors, and gandalf has one, of the form:

Gfid mismatch detected for 
/0f500288-ff62-4f0b-9574-53f510b4159f.2898>,
 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and 
08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1.

There are no gfid mismatches recorded on saruman.  All of the gfid
mismatches are for  and (on
saruman) appear to correspond to 0-byte files (e.g.,
.shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the
gfid mismatch quoted above).

For both types of errors, all affected files are in .shard/ and have
UUID-style names, so I have no idea which actual files they belong to.
File sizes are generally either 0 bytes or 4M (exactly), although one of
them has a size slightly larger than 3M.  So I'm assuming they're chunks
of larger files (which would be almost all the files on the volume -
it's primarily holding disk image files for kvm servers).

Web searches generally seem to consider gfid mismatches to be a form of
split-brain, but `gluster volume heal palantir info split-brain` shows
"Number of entries in split-brain: 0" for all bricks, including those
bricks which are reporting gfid mismatches.


Given all that, how do I proceed with cleaning up the stale handle
issues?  I would guess that this will involve somehow converting the
shard filename to a "real" filename, then shutting down the
corresponding VM and maybe doing some additional cleanup.

And then there's the gfid mismatches.  Since they're for 0-byte files,
is it safe to just ignore them on the assumption that they only hold
metadata?  Or do I need to do some kind of split-brain resolution on
them (even though gluster says no files are in split-brain)?


Finally, a listing of /var/local/brick0/data/.shard on saruman, in case
any of the information it contains (like file sizes/permissions) might
provide clues to resolving the errors:

--- cut here ---
root@saruman:/var/local/brick0/data/.shard# ls -l
total 63996
-rw-rw 2 root libvirt-qemu   0 Sep 17  2018 
0f500288-ff62-4f0b-9574-53f510b4159f.2864
-rw-rw 2 root libvirt-qemu   0 Sep 17  2018 
0f500288-ff62-4f0b-9574-53f510b4159f.2868
-rw-rw 2 root libvirt-qemu   0 Sep 17  2018 
0f500288-ff62-4f0b-9574-53f510b4159f.2879
-rw-rw 2 root libvirt-qemu   0 Sep 17  2018 
0f500288-ff62-4f0b-9574-53f510b4159f.2898
-rw--- 2 root libvirt-qemu 4194304 May 17 14:42 
291e9749-2d1b-47af-ad53-3a09ad4e64c6.229
-rw--- 2 root libvirt-qemu 4194304 Jun 24 09:10 
291e9749-2d1b-47af-ad53-3a09ad4e64c6.925
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54 
2df12cb0-6cf4-44ae-8b0a-4a554791187e.266
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30 
2df12cb0-6cf4-44ae-8b0a-4a554791187e.820
-rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22 
323186b1-6296-4cbe-8275-b940cc9d65cf.27466
-rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01 
323186b1-6296-4cbe-8275-b940cc9d65cf.32575
-rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23 
323186b1-6296-4cbe-8275-b940cc9d65cf.3448
-T 2 root libvirt-qemu   0 Jun 28 14:26 
4cd094f4-0344-4660-98b0-83249d5bd659.22998
-rw--- 2 root libvirt-qemu 4194304 Mar 13  2018 
6cdd2e5c-f49e-492b-8039-239e71577836.1302
-T 2 root libvirt-qemu   0 Jun 28 13:22 
7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131
-T 2 root libvirt-qemu   0 Jun 28 13:22 
7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56 
8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29 
8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106
-rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35 
8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137
-rw-rw-r-- 2 root libvirt-qemu 4194304 Nov  4  2018 

Re: [Gluster-users] Removing subvolume from dist/rep volume

2019-06-28 Thread Dave Sherohman
OK, I'm just careless.  Forgot to include "start" after the list of
bricks...

On Fri, Jun 28, 2019 at 04:03:40AM -0500, Dave Sherohman wrote:
> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman  wrote:
> > > My objective is to remove nodes B and C entirely.
> > >
> > > First up is to pull their bricks from the volume:
> > >
> > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start
> > > (wait for data to be migrated)
> > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit
> > >
> > >
> > There are some edge cases that may prevent a file from being migrated
> > during a remove-brick. Please do the following after this:
> > 
> >1. Check the remove-brick status for any failures.  If there are any,
> >check the rebalance log file for errors.
> >2. Even if there are no failures, check the removed bricks to see if any
> >files have not been migrated. If there are any, please check that they 
> > are
> >valid files on the brick and copy them to the volume from the brick to 
> > the
> >mount point.
> > 
> > The rest of the steps look good.
> 
> Apparently, they weren't quite right.  I tried it and it just gives me
> the usage notes in return.  Transcript of the commands and output is below.
> 
> Any insight on how I got the syntax wrong?
> 
> --- cut here ---
> root@merlin:/# gluster volume status
> Status of volume: palantir
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick saruman:/var/local/brick0/data49153 0  Y   17995
> Brick gandalf:/var/local/brick0/data49153 0  Y   9415 
> Brick merlin:/var/local/arbiter1/data   49170 0  Y   35034
> Brick azathoth:/var/local/brick0/data   49153 0  Y   25312
> Brick yog-sothoth:/var/local/brick0/data49152 0  Y   10671
> Brick merlin:/var/local/arbiter2/data   49171 0  Y   35043
> Brick cthulhu:/var/local/brick0/data49153 0  Y   21925
> Brick mordiggian:/var/local/brick0/data 49152 0  Y   12368
> Brick merlin:/var/local/arbiter3/data   49172 0  Y   35050
> Self-heal Daemon on localhost   N/A   N/AY   1209 
> Self-heal Daemon on saruman.lub.lu.se   N/A   N/AY   23253
> Self-heal Daemon on gandalf.lub.lu.se   N/A   N/AY   9542 
> Self-heal Daemon on mordiggian.lub.lu.seN/A   N/AY   11016
> Self-heal Daemon on yog-sothoth.lub.lu.se   N/A   N/AY   8126 
> Self-heal Daemon on cthulhu.lub.lu.se   N/A   N/AY   30998
> Self-heal Daemon on azathoth.lub.lu.se  N/A   N/AY   34399
>  
> Task Status of Volume palantir
> --
> Task : Rebalance   
> ID   : e58bc091-5809-4364-af83-2b89bc5c7106
> Status   : completed   
>  
> root@merlin:/# gluster volume remove-brick palantir 
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
> merlin:/var/local/arbiter1/data
> 
> Usage:
> volume remove-brick  [replica ]  ... 
> 
> 
> root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
> merlin:/var/local/arbiter1/data
> 
> Usage:
> volume remove-brick  [replica ]  ... 
> 
> 
> root@merlin:/# gluster volume remove-brick palantir replica 3 
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
> merlin:/var/local/arbiter1/data
> 
> Usage:
> volume remove-brick  [replica ]  ... 
> 
> --- cut here ---
> 
> -- 
> Dave Sherohman
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Dave Sherohman
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Removing subvolume from dist/rep volume

2019-06-28 Thread Nithya Balachandran
On Fri, 28 Jun 2019 at 14:34, Dave Sherohman  wrote:

> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> > On Tue, 25 Jun 2019 at 15:26, Dave Sherohman  wrote:
> > > My objective is to remove nodes B and C entirely.
> > >
> > > First up is to pull their bricks from the volume:
> > >
> > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start
> > > (wait for data to be migrated)
> > > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit
> > >
> > >
> > There are some edge cases that may prevent a file from being migrated
> > during a remove-brick. Please do the following after this:
> >
> >1. Check the remove-brick status for any failures.  If there are any,
> >check the rebalance log file for errors.
> >2. Even if there are no failures, check the removed bricks to see if
> any
> >files have not been migrated. If there are any, please check that
> they are
> >valid files on the brick and copy them to the volume from the brick
> to the
> >mount point.
> >
> > The rest of the steps look good.
>
> Apparently, they weren't quite right.  I tried it and it just gives me
> the usage notes in return.  Transcript of the commands and output is below.
>
> Any insight on how I got the syntax wrong?
>
> --- cut here ---
> root@merlin:/# gluster volume status
> Status of volume: palantir
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick saruman:/var/local/brick0/data49153 0  Y
>  17995
> Brick gandalf:/var/local/brick0/data49153 0  Y
>  9415
> Brick merlin:/var/local/arbiter1/data   49170 0  Y
>  35034
> Brick azathoth:/var/local/brick0/data   49153 0  Y
>  25312
> Brick yog-sothoth:/var/local/brick0/data49152 0  Y
>  10671
> Brick merlin:/var/local/arbiter2/data   49171 0  Y
>  35043
> Brick cthulhu:/var/local/brick0/data49153 0  Y
>  21925
> Brick mordiggian:/var/local/brick0/data 49152 0  Y
>  12368
> Brick merlin:/var/local/arbiter3/data   49172 0  Y
>  35050
> Self-heal Daemon on localhost   N/A   N/AY
>  1209
> Self-heal Daemon on saruman.lub.lu.se   N/A   N/AY
>  23253
> Self-heal Daemon on gandalf.lub.lu.se   N/A   N/AY
>  9542
> Self-heal Daemon on mordiggian.lub.lu.seN/A   N/AY
>  11016
> Self-heal Daemon on yog-sothoth.lub.lu.se   N/A   N/AY
>  8126
> Self-heal Daemon on cthulhu.lub.lu.se   N/A   N/AY
>  30998
> Self-heal Daemon on azathoth.lub.lu.se  N/A   N/AY
>  34399
>
> Task Status of Volume palantir
>
> --
> Task : Rebalance
> ID   : e58bc091-5809-4364-af83-2b89bc5c7106
> Status   : completed
>
> root@merlin:/# gluster volume remove-brick palantir
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data
> merlin:/var/local/arbiter1/data
>
>

You had it  right in the first email.

 gluster volume remove-brick palantir replica 3 arbiter 1
saruman:/var/local/brick0/data gandalf:/var/local/brick0/data
merlin:/var/local/arbiter1/data *start*


Usage:
> volume remove-brick  [replica ]  ...
> 
>
> root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data
> merlin:/var/local/arbiter1/data
>
> Usage:
> volume remove-brick  [replica ]  ...
> 
>
> root@merlin:/# gluster volume remove-brick palantir replica 3
> saruman:/var/local/brick0/data gandalf:/var/local/brick0/data
> merlin:/var/local/arbiter1/data
>
> Usage:
> volume remove-brick  [replica ]  ...
> 
> --- cut here ---
>
> --
> Dave Sherohman
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Removing subvolume from dist/rep volume

2019-06-28 Thread Dave Sherohman
On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> On Tue, 25 Jun 2019 at 15:26, Dave Sherohman  wrote:
> > My objective is to remove nodes B and C entirely.
> >
> > First up is to pull their bricks from the volume:
> >
> > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 start
> > (wait for data to be migrated)
> > # gluster volume remove-brick myvol B:/data C:/data A:/arb1 commit
> >
> >
> There are some edge cases that may prevent a file from being migrated
> during a remove-brick. Please do the following after this:
> 
>1. Check the remove-brick status for any failures.  If there are any,
>check the rebalance log file for errors.
>2. Even if there are no failures, check the removed bricks to see if any
>files have not been migrated. If there are any, please check that they are
>valid files on the brick and copy them to the volume from the brick to the
>mount point.
> 
> The rest of the steps look good.

Apparently, they weren't quite right.  I tried it and it just gives me
the usage notes in return.  Transcript of the commands and output is below.

Any insight on how I got the syntax wrong?

--- cut here ---
root@merlin:/# gluster volume status
Status of volume: palantir
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick saruman:/var/local/brick0/data49153 0  Y   17995
Brick gandalf:/var/local/brick0/data49153 0  Y   9415 
Brick merlin:/var/local/arbiter1/data   49170 0  Y   35034
Brick azathoth:/var/local/brick0/data   49153 0  Y   25312
Brick yog-sothoth:/var/local/brick0/data49152 0  Y   10671
Brick merlin:/var/local/arbiter2/data   49171 0  Y   35043
Brick cthulhu:/var/local/brick0/data49153 0  Y   21925
Brick mordiggian:/var/local/brick0/data 49152 0  Y   12368
Brick merlin:/var/local/arbiter3/data   49172 0  Y   35050
Self-heal Daemon on localhost   N/A   N/AY   1209 
Self-heal Daemon on saruman.lub.lu.se   N/A   N/AY   23253
Self-heal Daemon on gandalf.lub.lu.se   N/A   N/AY   9542 
Self-heal Daemon on mordiggian.lub.lu.seN/A   N/AY   11016
Self-heal Daemon on yog-sothoth.lub.lu.se   N/A   N/AY   8126 
Self-heal Daemon on cthulhu.lub.lu.se   N/A   N/AY   30998
Self-heal Daemon on azathoth.lub.lu.se  N/A   N/AY   34399
 
Task Status of Volume palantir
--
Task : Rebalance   
ID   : e58bc091-5809-4364-af83-2b89bc5c7106
Status   : completed   
 
root@merlin:/# gluster volume remove-brick palantir 
saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
merlin:/var/local/arbiter1/data

Usage:
volume remove-brick  [replica ]  ... 


root@merlin:/# gluster volume remove-brick palantir replica 3 arbiter 1 
saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
merlin:/var/local/arbiter1/data

Usage:
volume remove-brick  [replica ]  ... 


root@merlin:/# gluster volume remove-brick palantir replica 3 
saruman:/var/local/brick0/data gandalf:/var/local/brick0/data 
merlin:/var/local/arbiter1/data

Usage:
volume remove-brick  [replica ]  ... 

--- cut here ---

-- 
Dave Sherohman
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users