[Gluster-users] active-active georeplication?

2017-10-23 Thread atris adam
hi everybody,

Have glusterfs released a feature named active-active georeplication? If
yes, in which version it is released? If no, is it planned to have this
feature?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] problems running a vol over IPoIB, and qemu off it?

2017-10-23 Thread Mohammed Rafi K C
The backtrace you have provided here suggests that the issue could be
with mellanox driver, though the question still valid to users of the IPoIB.


Regards

Rafi KC


On 10/23/2017 09:29 PM, lejeczek wrote:
> hi people
>
> I wonder if anybody experience any problems with vols in replica mode
> that run across IPoIB links and libvirt stores qcow image on such a
> volume?
>
> I wonder if maybe devel could confirm it should just work, and then
> hardware/Infiniband I should blame.
>
> I have a direct IPoIB link between two hosts, gluster replica volume,
> libvirt store disk images there.
>
> I start a guest on hostA and I get below on hostB(which is IB subnet
> manager):
>
> [Mon Oct 23 16:43:32 2017] Workqueue: ipoib_wq ipoib_cm_tx_start
> [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  8010 553c90b1
> 880c1c6eb818 816a3db1
> [Mon Oct 23 16:43:32 2017]  880c1c6eb8a8 81188810
>  88042ffdb000
> [Mon Oct 23 16:43:32 2017]  0004 8010
> 880c1c6eb8a8 553c90b1
> [Mon Oct 23 16:43:32 2017] Call Trace:
> [Mon Oct 23 16:43:32 2017]  [] dump_stack+0x19/0x1b
> [Mon Oct 23 16:43:32 2017]  []
> warn_alloc_failed+0x110/0x180
> [Mon Oct 23 16:43:32 2017]  []
> __alloc_pages_slowpath+0x6b6/0x724
> [Mon Oct 23 16:43:32 2017]  []
> __alloc_pages_nodemask+0x405/0x420
> [Mon Oct 23 16:43:32 2017]  []
> dma_generic_alloc_coherent+0x8f/0x140
> [Mon Oct 23 16:43:32 2017]  []
> gart_alloc_coherent+0x2d/0x40
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core]
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_buf_alloc+0x1cb/0x240 [mlx4_core]
> [Mon Oct 23 16:43:32 2017]  []
> create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib]
> [Mon Oct 23 16:43:32 2017]  []
> mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib]
> [Mon Oct 23 16:43:32 2017]  [] ?
> ipoib_cm_tx_init+0x5c/0x400 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> ib_create_qp+0x7a/0x2f0 [ib_core]
> [Mon Oct 23 16:43:32 2017]  []
> ipoib_cm_tx_init+0x103/0x400 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> ipoib_cm_tx_start+0x268/0x3f0 [ib_ipoib]
> [Mon Oct 23 16:43:32 2017]  []
> process_one_work+0x17a/0x440
> [Mon Oct 23 16:43:32 2017]  []
> worker_thread+0x126/0x3c0
> [Mon Oct 23 16:43:32 2017]  [] ?
> manage_workers.isra.24+0x2a0/0x2a0
> [Mon Oct 23 16:43:32 2017]  [] kthread+0xcf/0xe0
> [Mon Oct 23 16:43:32 2017]  [] ?
> insert_kthread_work+0x40/0x40
> [Mon Oct 23 16:43:32 2017]  [] ret_from_fork+0x58/0x90
> [Mon Oct 23 16:43:32 2017]  [] ?
> insert_kthread_work+0x40/0x40
> [Mon Oct 23 16:43:32 2017] Mem-Info:
> [Mon Oct 23 16:43:32 2017] active_anon:2389656 inactive_anon:17792
> isolated_anon:0
>  active_file:14294829 inactive_file:14609973 isolated_file:0
>  unevictable:24185 dirty:11846 writeback:9907 unstable:0
>  slab_reclaimable:1024309 slab_unreclaimable:127961
>  mapped:74895 shmem:28096 pagetables:30088 bounce:0
>  free:142329 free_pcp:249 free_cma:0
> [Mon Oct 23 16:43:32 2017] Node 0 DMA free:15320kB min:24kB low:28kB
> high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
>
>
> To clarify - other volumes which use that IPoIB link do not seem to
> case that, or any other problem.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] trying to add a 3rd peer

2017-10-23 Thread Ludwig Gamache
All,

I am trying to add a third peer to my gluster install. The first 2 nodes
are running since many months and have gluster 3.10.3-1.

I recently installed the 3rd node and gluster 3.10.6-1. I was able to start
the gluster daemon on it. After, I tried to add the peer from one of the 2
previous server (gluster peer probe IPADDRESS).

That first peer started the communication with the 3rd peer. At that point,
peer status were messed up. Server 1 saw both other servers as connected.
Server 2 only saw server 1 as connected and did not have server 3 as a
peer. Server 3 only had server 1 as a peer and saw it as disconnected.

I also found errors in the gluster logs of server 3 that could not be done:

[2017-10-24 00:15:20.090462] E
[name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host HOST3.DOMAIN.lan

I rebooted node 3 and now gluster does not even restart on that node. It
keeps giving Name resolution problems. The 2 other nodes are active.

However, I can ping the 3 servers (one from each others) using their DNS
names.

Any idea about what to look at?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-23 Thread Jim Kinney
I'm not so lucky. ALL of mine show 2 links and none have the attr data
that supplies the path to the original.
I have the inode from stat. Looking now to dig out the path/filename
from  xfs_db on the specific inodes individually.
Is the hash of the filename or /filename and if so relative to
where? /, , ?
On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote:
> In my case I was able to delete the hard links in the .glusterfs
> folders of the bricks and it seems to have done the trick, thanks!
>  
> 
> From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
> 
> 
> Sent: Monday, October 23, 2017 1:52 AM
> 
> To: Jim Kinney ; Matt Waymack  om>
> 
> Cc: gluster-users 
> 
> Subject: Re: [Gluster-users] gfid entries in volume heal info that do
> not heal
>  
> 
> 
> 
> Hi Jim & Matt,
> 
> Can you also check for the link count in the stat output of those
> hardlink entries in the .glusterfs folder on the bricks.
> 
> If the link count is 1 on all the bricks for those entries, then they
> are orphaned entries and you can delete those hardlinks.
> 
> 
> To be on the safer side have a backup before deleting any of the
> entries.
> 
> 
> Regards,
> 
> 
> Karthik
> 
> 
> 
>  
> 
> On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney 
> wrote:
> > 
> > I've been following this particular thread as I have a similar
> > issue (RAID6 array failed out with 3 dead drives at once while a 12
> > TB load was being copied into one mounted space - what a mess)
> > 
> > 
> >  
> > 
> > 
> > I have >700K GFID entries that have no path data:
> > 
> > 
> > Example:
> > 
> > 
> > getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-
> > ff2a51c10421
> > 
> > 
> > # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> > 
> > 
> > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c616265
> > 6c65645f743a733000
> > 
> > 
> > trusted.bit-rot.version=0x020059b1b316000270e7
> > 
> > 
> > trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421
> > 
> > 
> >  
> > 
> > 
> > [root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e
> > hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> > 
> > 
> > .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421:
> > trusted.glusterfs.pathinfo: No such attribute
> > 
> > 
> >  
> > 
> > 
> > I had to totally rebuild the dead RAID array and did a copy from
> > the live one before activating gluster on the rebuilt system. I
> > accidentally copied over the .glusterfs folder from the working
> > side
> > 
> > 
> > (replica 2 only for now - adding arbiter node as soon as I can get
> > this one cleaned up).
> > 
> > 
> > 
> >  
> > 
> > 
> > I've run the methods from "http://docs.gluster.org/en/latest/Troubl
> > eshooting/gfid-to-path/" with no results using random GFIDs. A full
> > systemic
> >  run using the script from method 3 crashes with "too many nested
> > links" error (or something similar).
> > 
> > 
> >  
> > 
> > 
> > When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh.
> > gluster 3.8.4 on Centos 7.3
> > 
> > 
> >  
> > 
> > 
> > Should I just remove the contents of the .glusterfs folder on both
> > and restart gluster and run a ls/stat on every file?
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> > When I run a heal, it no longer has a decreasing number of files to
> > heal so that's an improvement over the last 2-3 weeks :-)
> > 
> > 
> > 
> > 
> >  
> > 
> > 
> > On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote:
> > 
> > 
> > 
> > > 
> > > Attached is the heal log for the volume as well as the shd log. 
> > >  
> > > >  
> > > > >  
> > > > > Run these commands on all the bricks of the replica pair to
> > > > > get the attrs set on the backend.
> > > > 
> > > >  
> > > 
> > >  
> > >  
> > > [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m .
> > > /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> > > getfattr: Removing leading '/' from absolute path names
> > > # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-
> > > ad6a15d811a2
> > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162
> > > 656c65645f743a733000
> > > trusted.afr.dirty=0x
> > > trusted.afr.gv0-client-2=0x0001
> > > trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
> > > trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d
> > > 346463622d393630322d3839356136396461363131662f435f564f4c2d6230303
> > > 12d693637342d63642d63772e6d6435
> > >  
> > > [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m .
> > > /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> > > getfattr: Removing leading '/' from absolute path names
> > > # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-
> > > ad6a15d811a2
> > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162
> > > 656c65645f743a733000
> > > trusted.afr.dirty=0x
> > > 

Re: [Gluster-users] [Gluster-devel] Announcing Glusterfs release 3.12.2 (Long Term Maintenance)

2017-10-23 Thread Niels de Vos
On Mon, Oct 23, 2017 at 02:12:53PM -0400, Alastair Neil wrote:
> Any idea when these packages will be in the CentOS mirrors? there is no
> sign of them on download.gluster.org.

We're waiting for someone other than me to test the new packages at
least a little. Installing the packages and run something on top of a
Gluster volume is already sufficient, just describe a bit what was
tested. Once a confirmation is sent that it works for someone, we can
mark the packages for releasing to the mirrors.

Getting the (unsigned) RPMs is easy, run this on your test environment:

  # yum --enablrepo=centos-gluster312-test update glusterfs

This does not restart the brick processes so I/O is not affected with
the installation. Make sure to restart the processes (or just reboot)
and do whatever validation you deem sufficient.

Thanks,
Niels


> 
> On 13 October 2017 at 08:45, Jiffin Tony Thottan 
> wrote:
> 
> > The Gluster community is pleased to announce the release of Gluster 3.12.2
> > (packages available at [1,2,3]).
> >
> > Release notes for the release can be found at [4].
> >
> > We still carry following major issues that is reported in the
> > release-notes as follows,
> >
> > 1.) - Expanding a gluster volume that is sharded may cause file corruption
> >
> > Sharded volumes are typically used for VM images, if such volumes are
> > expanded or possibly contracted (i.e add/remove bricks and rebalance) there
> > are reports of VM images getting corrupted.
> >
> > The last known cause for corruption (Bug #1465123) has a fix with this
> > release. As further testing is still in progress, the issue is retained as
> > a major issue.
> >
> > Status of this bug can be tracked here, #1465123
> >
> >
> > 2 .) Gluster volume restarts fail if the sub directory export feature is
> > in use. Status of this issue can be tracked here, #1501315
> >
> > 3.) Mounting a gluster snapshot will fail, when attempting a FUSE based
> > mount of the snapshot. So for the current users, it is recommend to only
> > access snapshot via
> >
> > ".snaps" directory on a mounted gluster volume. Status of this issue can
> > be tracked here, #1501378
> >
> > Thanks,
> >  Gluster community
> >
> >
> > [1] https://download.gluster.org/pub/gluster/glusterfs/3.12/3.12.2/
> > 
> > [2] https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12
> > 
> > [3] https://build.opensuse.org/project/subprojects/home:glusterfs
> >
> > [4] Release notes: https://gluster.readthedocs.
> > io/en/latest/release-notes/3.12.2/
> > 
> >
> > ___
> > Gluster-devel mailing list
> > gluster-de...@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-devel
> >

> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel



signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-23 Thread Matt Waymack
In my case I was able to delete the hard links in the .glusterfs folders of the 
bricks and it seems to have done the trick, thanks!

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
Sent: Monday, October 23, 2017 1:52 AM
To: Jim Kinney ; Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal

Hi Jim & Matt,
Can you also check for the link count in the stat output of those hardlink 
entries in the .glusterfs folder on the bricks.
If the link count is 1 on all the bricks for those entries, then they are 
orphaned entries and you can delete those hardlinks.
To be on the safer side have a backup before deleting any of the entries.
Regards,
Karthik

On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney 
> wrote:
I've been following this particular thread as I have a similar issue (RAID6 
array failed out with 3 dead drives at once while a 12 TB load was being copied 
into one mounted space - what a mess)

I have >700K GFID entries that have no path data:
Example:
getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
# file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x020059b1b316000270e7
trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421

[root@bmidata1 brick]# getfattr -d -n 
trusted.glusterfs.pathinfo -e hex -m . 
.glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
.glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421: 
trusted.glusterfs.pathinfo: No such attribute

I had to totally rebuild the dead RAID array and did a copy from the live one 
before activating gluster on the rebuilt system. I accidentally copied over the 
.glusterfs folder from the working side
(replica 2 only for now - adding arbiter node as soon as I can get this one 
cleaned up).

I've run the methods from 
"http://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/; with no 
results using random GFIDs. A full systemic run using the script from method 3 
crashes with "too many nested links" error (or something similar).

When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh. gluster 
3.8.4 on Centos 7.3

Should I just remove the contents of the .glusterfs folder on both and restart 
gluster and run a ls/stat on every file?


When I run a heal, it no longer has a decreasing number of files to heal so 
that's an improvement over the last 2-3 weeks :-)

On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote:

Attached is the heal log for the volume as well as the shd log.







Run these commands on all the bricks of the replica pair to get the attrs set 
on the backend.







[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: Removing leading '/' from absolute path names

# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-2=0x0001

trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2

trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435



[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: Removing leading '/' from absolute path names

# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-2=0x0001

trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2

trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435



[root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No 
such file or directory





[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

getfattr: Removing leading '/' from absolute path names

# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-11=0x0001

trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3


Re: [Gluster-users] [Gluster-devel] Announcing Glusterfs release 3.12.2 (Long Term Maintenance)

2017-10-23 Thread Alastair Neil
Any idea when these packages will be in the CentOS mirrors? there is no
sign of them on download.gluster.org.

On 13 October 2017 at 08:45, Jiffin Tony Thottan 
wrote:

> The Gluster community is pleased to announce the release of Gluster 3.12.2
> (packages available at [1,2,3]).
>
> Release notes for the release can be found at [4].
>
> We still carry following major issues that is reported in the
> release-notes as follows,
>
> 1.) - Expanding a gluster volume that is sharded may cause file corruption
>
> Sharded volumes are typically used for VM images, if such volumes are
> expanded or possibly contracted (i.e add/remove bricks and rebalance) there
> are reports of VM images getting corrupted.
>
> The last known cause for corruption (Bug #1465123) has a fix with this
> release. As further testing is still in progress, the issue is retained as
> a major issue.
>
> Status of this bug can be tracked here, #1465123
>
>
> 2 .) Gluster volume restarts fail if the sub directory export feature is
> in use. Status of this issue can be tracked here, #1501315
>
> 3.) Mounting a gluster snapshot will fail, when attempting a FUSE based
> mount of the snapshot. So for the current users, it is recommend to only
> access snapshot via
>
> ".snaps" directory on a mounted gluster volume. Status of this issue can
> be tracked here, #1501378
>
> Thanks,
>  Gluster community
>
>
> [1] https://download.gluster.org/pub/gluster/glusterfs/3.12/3.12.2/
> 
> [2] https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.12
> 
> [3] https://build.opensuse.org/project/subprojects/home:glusterfs
>
> [4] Release notes: https://gluster.readthedocs.
> io/en/latest/release-notes/3.12.2/
> 
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] problems running a vol over IPoIB, and qemu off it?

2017-10-23 Thread lejeczek

hi people

I wonder if anybody experience any problems with vols in 
replica mode that run across IPoIB links and libvirt stores 
qcow image on such a volume?


I wonder if maybe devel could confirm it should just work, 
and then hardware/Infiniband I should blame.


I have a direct IPoIB link between two hosts, gluster 
replica volume, libvirt store disk images there.


I start a guest on hostA and I get below on hostB(which is 
IB subnet manager):


[Mon Oct 23 16:43:32 2017] Workqueue: ipoib_wq 
ipoib_cm_tx_start [ib_ipoib]
[Mon Oct 23 16:43:32 2017]  8010 
553c90b1 880c1c6eb818 816a3db1
[Mon Oct 23 16:43:32 2017]  880c1c6eb8a8 
81188810  88042ffdb000
[Mon Oct 23 16:43:32 2017]  0004 
8010 880c1c6eb8a8 553c90b1

[Mon Oct 23 16:43:32 2017] Call Trace:
[Mon Oct 23 16:43:32 2017]  [] 
dump_stack+0x19/0x1b
[Mon Oct 23 16:43:32 2017]  [] 
warn_alloc_failed+0x110/0x180
[Mon Oct 23 16:43:32 2017]  [] 
__alloc_pages_slowpath+0x6b6/0x724
[Mon Oct 23 16:43:32 2017]  [] 
__alloc_pages_nodemask+0x405/0x420
[Mon Oct 23 16:43:32 2017]  [] 
dma_generic_alloc_coherent+0x8f/0x140
[Mon Oct 23 16:43:32 2017]  [] 
gart_alloc_coherent+0x2d/0x40
[Mon Oct 23 16:43:32 2017]  [] 
mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core]
[Mon Oct 23 16:43:32 2017]  [] 
mlx4_buf_alloc+0x1cb/0x240 [mlx4_core]
[Mon Oct 23 16:43:32 2017]  [] 
create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib]
[Mon Oct 23 16:43:32 2017]  [] 
mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib]
[Mon Oct 23 16:43:32 2017]  [] ? 
ipoib_cm_tx_init+0x5c/0x400 [ib_ipoib]
[Mon Oct 23 16:43:32 2017]  [] 
ib_create_qp+0x7a/0x2f0 [ib_core]
[Mon Oct 23 16:43:32 2017]  [] 
ipoib_cm_tx_init+0x103/0x400 [ib_ipoib]
[Mon Oct 23 16:43:32 2017]  [] 
ipoib_cm_tx_start+0x268/0x3f0 [ib_ipoib]
[Mon Oct 23 16:43:32 2017]  [] 
process_one_work+0x17a/0x440
[Mon Oct 23 16:43:32 2017]  [] 
worker_thread+0x126/0x3c0
[Mon Oct 23 16:43:32 2017]  [] ? 
manage_workers.isra.24+0x2a0/0x2a0
[Mon Oct 23 16:43:32 2017]  [] 
kthread+0xcf/0xe0
[Mon Oct 23 16:43:32 2017]  [] ? 
insert_kthread_work+0x40/0x40
[Mon Oct 23 16:43:32 2017]  [] 
ret_from_fork+0x58/0x90
[Mon Oct 23 16:43:32 2017]  [] ? 
insert_kthread_work+0x40/0x40

[Mon Oct 23 16:43:32 2017] Mem-Info:
[Mon Oct 23 16:43:32 2017] active_anon:2389656 
inactive_anon:17792 isolated_anon:0

 active_file:14294829 inactive_file:14609973 isolated_file:0
 unevictable:24185 dirty:11846 writeback:9907 unstable:0
 slab_reclaimable:1024309 slab_unreclaimable:127961
 mapped:74895 shmem:28096 pagetables:30088 bounce:0
 free:142329 free_pcp:249 free_cma:0
[Mon Oct 23 16:43:32 2017] Node 0 DMA free:15320kB min:24kB 
low:28kB high:36kB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15984kB 
managed:15900kB mlocked:0kB dirty:0kB writeback:0kB 
mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes



To clarify - other volumes which use that IPoIB link do not 
seem to case that, or any other problem.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-23 Thread Karthik Subrahmanya
Hi Jim & Matt,

Can you also check for the link count in the stat output of those hardlink
entries in the .glusterfs folder on the bricks.
If the link count is 1 on all the bricks for those entries, then they are
orphaned entries and you can delete those hardlinks.
To be on the safer side have a backup before deleting any of the entries.

Regards,
Karthik

On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney  wrote:

> I've been following this particular thread as I have a similar issue
> (RAID6 array failed out with 3 dead drives at once while a 12 TB load was
> being copied into one mounted space - what a mess)
>
> I have >700K GFID entries that have no path data:
> Example:
> getfattr -d -e hex -m . .glusterfs/00/00/a5ef-
> 5af7-401b-84b5-ff2a51c10421
> # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.bit-rot.version=0x020059b1b316000270e7
> trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421
>
> [root@bmidata1 brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex
> -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
> .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421:
> trusted.glusterfs.pathinfo: No such attribute
>
> I had to totally rebuild the dead RAID array and did a copy from the live
> one before activating gluster on the rebuilt system. I accidentally copied
> over the .glusterfs folder from the working side
> (replica 2 only for now - adding arbiter node as soon as I can get this
> one cleaned up).
>
> I've run the methods from "http://docs.gluster.org/en/
> latest/Troubleshooting/gfid-to-path/" with no results using random GFIDs.
> A full systemic run using the script from method 3 crashes with "too many
> nested links" error (or something similar).
>
> When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh.
> gluster 3.8.4 on Centos 7.3
>
> Should I just remove the contents of the .glusterfs folder on both and
> restart gluster and run a ls/stat on every file?
>
>
> When I run a heal, it no longer has a decreasing number of files to heal
> so that's an improvement over the last 2-3 weeks :-)
>
> On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote:
>
> Attached is the heal log for the volume as well as the shd log.
>
>
>
> Run these commands on all the bricks of the replica pair to get the attrs set 
> on the backend.
>
>
>
> [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
> /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> getfattr: Removing leading '/' from absolute path names
> # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.gv0-client-2=0x0001
> trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
> trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435
>
> [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
> /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> getfattr: Removing leading '/' from absolute path names
> # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.gv0-client-2=0x0001
> trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
> trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435
>
> [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
> /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
> getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: 
> No such file or directory
>
>
> [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
> /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
> getfattr: Removing leading '/' from absolute path names
> # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.gv0-client-11=0x0001
> trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3
> trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435
>
> [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
> /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
> getfattr: Removing leading '/' from absolute path names
> # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
>