Re: [Gluster-devel] [Gluster-users] One client can effectively hang entire gluster array

2016-08-22 Thread Glomski, Patrick
Not a bad idea for a workaround, but that would require significant
investment with our current setup. All of our compute nodes are stateless /
have no disks. All storage is network storage. It's probably still not
feasible if we added disks because some simulations produce terabytes of
data. We would need some kind of periodic check-and-sync mechanism.

I still owe the gluster devs a test of that patch

On Fri, Aug 19, 2016 at 3:22 PM, Steve Dainard <sdain...@spd1.com> wrote:

> As a potential solution on the compute node side, can you have users copy
> relevant data from the gluster volume to a local disk (ie $TMDIR), operate
> on that disk, write output files to that disk, and then write the results
> back to persistent storage once the job is complete?
>
> There are lots of factors to consider, but this is how we operate in a
> small compute environment trying to avoid over-loading gluster storage
> nodes.
>
> On Fri, Jul 8, 2016 at 6:29 AM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
>
>> Hello, users and devs.
>>
>> TL;DR: One gluster client can essentially cause denial of service /
>> availability loss to entire gluster array. There's no way to stop it and
>> almost no way to find the bad client. Probably all (at least 3.6 and 3.7)
>> versions are affected.
>>
>> We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are
>> used in a high-performance computing environment. Two file access cases
>> cause severe issues with glusterfs: Some of our scientific codes write
>> hundreds of files (~400-500) simultaneously (one file or more per processor
>> core, so lots of small or large writes) and others read thousands of files
>> (2000-3000) simultaneously to grab metadata from each file (lots of small
>> reads).
>>
>> In either of these situations, one glusterfsd process on whatever peer
>> the client is currently talking to will skyrocket to *nproc* cpu usage
>> (800%, 1600%) and the storage cluster is essentially useless; all other
>> clients will eventually try to read or write data to the overloaded peer
>> and, when that happens, their connection will hang. Heals between peers
>> hang because the load on the peer is around 1.5x the number of cores or
>> more. This occurs in either gluster 3.6 or 3.7, is very repeatable, and
>> happens much too frequently.
>>
>> Even worse, there seems to be no definitive way to diagnose which client
>> is causing the issues. Getting 'volume status <> clients' doesn't help
>> because it reports the total number of bytes read/written by each client.
>> (a) The metadata in question is tiny compared to the multi-gigabyte output
>> files being dealt with and (b) the byte-count is cumulative for the clients
>> and the compute nodes are always up with the filesystems mounted, so the
>> byte transfer counts are astronomical. The best solution I've come up with
>> is to blackhole-route traffic from clients one at a time (effectively push
>> the traffic over to the other peer), wait a few minutes for all of the
>> backlogged traffic to dissipate (if it's going to), see if the load on
>> glusterfsd drops, and repeat until I find the client causing the issue. I
>> would *love* any ideas on a better way to find rogue clients.
>>
>> More importantly, though, there must be some feature envorced to stop one
>> user from having the capability to render the entire filesystem unavailable
>> for all other users. In the worst case, I would even prefer a gluster
>> volume option that simply disconnects clients making over some threshold of
>> file open requests. That's WAY more preferable than a complete availability
>> loss reminiscent of a DDoS attack...
>>
>> Apologies for the essay and looking forward to any help you can provide.
>>
>> Thanks,
>> Patrick
>>
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] One client can effectively hang entire gluster array

2016-07-12 Thread Glomski, Patrick
Hello, Jeff.

Thanks for responding so quickly. I'm not familiar with the codebase, so if
you don't mind me asking, how much would that list reordering slow things
down for, say, a queue of 1500 client machines? i.e. round-about how long
of a client list would significantly affect latency?

I only ask because we have quite a few clients and you explicitly call out
that the queue reordering method used may have problems for lots of clients.

Thanks again,

Patrick


On Tue, Jul 12, 2016 at 11:18 AM, Jeff Darcy  wrote:

> > > * We might be able to tweak io-threads (which already runs on the
> > > bricks and already has a global queue) to schedule requests in a
> > > fairer way across clients. Right now it executes them in the
> > > same order that they were read from the network.
> >
> > This sounds to be an easier fix. We can make io-threads to factor in
> another
> > input i.e., the client through which request came in (essentially
> > frame->root->client) before scheduling. That should make the problem
> > bearable at-least if not crippling. As to what algorithm to use, I think
> we
> > can consider leaky bucket of bit-rot implementation or dmclock. I've not
> > really thought deeper about the algorithm part. If the approach sounds
> ok,
> > we can discuss more about algos.
>
> I've created a patch to address the most basic part of this, in the
> simplest
> way I could think of.
>
> http://review.gluster.org/#/c/14904/
>
> It's still running through basic tests, so I don't even know if it's really
> correct yet, but it should give an idea of the conceptual direction.
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] One client can effectively hang entire gluster array

2016-07-08 Thread Glomski, Patrick
Hello, users and devs.

TL;DR: One gluster client can essentially cause denial of service /
availability loss to entire gluster array. There's no way to stop it and
almost no way to find the bad client. Probably all (at least 3.6 and 3.7)
versions are affected.

We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are used
in a high-performance computing environment. Two file access cases cause
severe issues with glusterfs: Some of our scientific codes write hundreds
of files (~400-500) simultaneously (one file or more per processor core, so
lots of small or large writes) and others read thousands of files
(2000-3000) simultaneously to grab metadata from each file (lots of small
reads).

In either of these situations, one glusterfsd process on whatever peer the
client is currently talking to will skyrocket to *nproc* cpu usage (800%,
1600%) and the storage cluster is essentially useless; all other clients
will eventually try to read or write data to the overloaded peer and, when
that happens, their connection will hang. Heals between peers hang because
the load on the peer is around 1.5x the number of cores or more. This
occurs in either gluster 3.6 or 3.7, is very repeatable, and happens much
too frequently.

Even worse, there seems to be no definitive way to diagnose which client is
causing the issues. Getting 'volume status <> clients' doesn't help because
it reports the total number of bytes read/written by each client. (a) The
metadata in question is tiny compared to the multi-gigabyte output files
being dealt with and (b) the byte-count is cumulative for the clients and
the compute nodes are always up with the filesystems mounted, so the byte
transfer counts are astronomical. The best solution I've come up with is to
blackhole-route traffic from clients one at a time (effectively push the
traffic over to the other peer), wait a few minutes for all of the
backlogged traffic to dissipate (if it's going to), see if the load on
glusterfsd drops, and repeat until I find the client causing the issue. I
would *love* any ideas on a better way to find rogue clients.

More importantly, though, there must be some feature envorced to stop one
user from having the capability to render the entire filesystem unavailable
for all other users. In the worst case, I would even prefer a gluster
volume option that simply disconnects clients making over some threshold of
file open requests. That's WAY more preferable than a complete availability
loss reminiscent of a DDoS attack...

Apologies for the essay and looking forward to any help you can provide.

Thanks,
Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] gluster 3.7.9 permission denied and mv errors

2016-05-03 Thread Glomski, Patrick
Attaching a text file with the same content that is easier to read.

Patrick

On Tue, May 3, 2016 at 4:59 PM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Raghavendra,
>
> Last night the backup had four of these errors and only one of the
> 'retried moves' succeeded. The only one to succeed in moving the file the
> second time had target files on a different gluster peer (gfs01bkp). Not
> sure if that is significant.
>
> Note that I cannot stat the target file over the FUSE mount for any of
> these, but it exists on the bricks. Running an 'ls' on the directory
> containing the file (via FUSE) does not fix the issue. Source and target
> xattrs are appended for all bricks on all machines in the distributed
> volume.
>
> Let me know if there's any other information it would be useful to gather,
> as this issue seems to recur frequently.
>
> Thanks,
> Patrick
>
> # Move failures
>>
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/54/data_collected4' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists
>>
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/54/data_collected4' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists
>>
>>
>> 
>> retry: renaming ./homegfs/hpc_shared/motorsports/056-1/data_collected3 ->
>> ../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>
>> source xattrs
>>   gfs01bkp
>> getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>>   gfs02bkp
>> getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> # file:
>> data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>   trusted.bit-rot.version=0x0200570308980001d157
>>   trusted.gfid=0xe07abd8ae861442ebc0df8b20719af30
>>   trusted.pgfid.1776adb6-2925-49d3-9cca-8a04c29f4c05=0x0001
>>
>> getfattr: Removing leading '/' from absolute path names
>> getfattr:
>> /data/brick03bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick04bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick05bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>> target xattrs
>>   gfs01bkp
>>getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>getfattr:
>> /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>>   gfs02bkp
>> # file:
>> data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>   trusted.bit-rot.version=0x0200569bb8d20003ed00
>>   trusted.gfid=0xaefffbd0676649cd95eb6dfc874d7a59
>>   trusted.pgfid.f7c5eff3-f474-433b-b10e-480f835

Re: [Gluster-devel] [Gluster-users] gluster 3.7.9 permission denied and mv errors

2016-04-29 Thread Glomski, Patrick
Raghavendra,

This error is occurring in a shell script moving files between directories
on a FUSE mount when overwriting an old file with a newer file (it's a
backup script, moving an incremental backup of a file into a 'rolling full
backup' directory).

As a temporary workaround, we parse the output of this shell script for
move errors and handle the errors as they happen. Simply re-moving the
files fails, so we stat the destination (to see if we can learn anything
about the type of file that causes this behavior), delete the destination,
and try the move again (success!). Typical output is as follows:

/bin/mv: cannot move
`./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4': File exists
> /bin/mv: cannot move 
> `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4': File exists
>   File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4'
>   Size: 1714Blocks: 4  IO Block: 131072 regular file
> Device: 13h/19d Inode: 11051758947722304158  Links: 1
> Access: (0660/-rw-rw)  Uid: (  628/pkeistler)   Gid: ( 2020/   gmirl)
> Access: 2016-01-20 17:20:45.0 -0500
> Modify: 2015-11-06 15:20:41.0 -0500
> Change: 2016-01-27 03:35:00.434712146 -0500
> retry: renaming 
> ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4
> -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/
> 149/data_collected4
>

Not sure if that description rings any bells as to what the problem might
be, but if not, I added some code to print out the 'getattr' for the source
and destination file on all of the bricks (before we delete the
destination) and will post to this thread the next time we have that issue.

Thanks,
Patrick


On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra G 
wrote:

>
>
> On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson <
> david.robin...@corvidtec.com> wrote:
>
>> I am running into two problems (possibly related?).
>>
>> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back
>> with an error:
>> rm: cannot remove `DIRNAME` : Directory not empty
>>
>> If I try the 'rm -rf' again after the error, it deletes the
>> directory.  The issue is that I have scripts that clean up directories, and
>> they are failing unless I go through the deletes a 2nd time.
>>
>
> What kind of mount are you using? Is it a FUSE or NFS mount? Recently we
> saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with
> ENOTEMPTY in some specific cases.
>
>
>>
>> 2) I have different scripts to move a large numbers of files (5-25k) from
>> one directory to another.  Sometimes I receive an error:
>> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists
>>
>
> Does ./bkp00/xyz exist on backend? If yes, what is the value of gfid xattr
> (key: "trusted.gfid") for "xyz" and "./bkp00/xyz" on backend bricks (I need
> gfid from all the bricks) when this issue happens?
>
>
>> The move is done using '/bin/mv -f', so it should overwrite the file
>> if it exists.  I have tested this with hundreds of files, and it works as
>> expected.  However, every few days the script that moves the files will
>> have problems with 1 or 2 files during the move.  This is one move problem
>> out of roughly 10,000 files that are being moved and I cannot figure out
>> any reason for the intermittent problem.
>>
>> Setup details for my gluster configuration shown below.
>>
>> [root@gfs01bkp logs]# gluster volume info
>>
>> Volume Name: gfsbackup
>> Type: Distribute
>> Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41
>> Status: Started
>> Number of Bricks: 7
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup
>> Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup
>> Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup
>> Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup
>> Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup
>> Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup
>> Options Reconfigured:
>> nfs.disable: off
>> server.allow-insecure: on
>> storage.owner-gid: 100
>> server.manage-gids: on
>> cluster.lookup-optimize: on
>> server.event-threads: 8
>> client.event-threads: 8
>> changelog.changelog: off
>> storage.build-pgfid: on
>> performance.readdir-ahead: on
>> diagnostics.brick-log-level: WARNING
>> diagnostics.client-log-level: WARNING
>> cluster.rebal-throttle: aggressive
>> performance.cache-size: 1024MB
>> performance.write-behind-window-size: 10MB
>>
>>
>> [root@gfs01bkp logs]# rpm -qa | grep gluster
>> glusterfs-server-3.7.9-1.el6.x86_64
>> glusterfs-debuginfo-3.7.9-1.el6.x86_64
>> glusterfs-api-3.7.9-1.el6.x86_64
>> 

[Gluster-devel] Gluster + Infiniband + 3.x kernel -> hard crash?

2016-04-06 Thread Glomski, Patrick
We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp)
links the gluster peers together and clients use the ethernet interface.

This setup is stable running CentOS 6.x and using the most recent
infiniband drivers provided by Mellanox. Uptime was 170 days when we took
it down to wipe the systems and update to CentOS 7.

When the exact same setup is loaded onto a CentOS 7 machine (minor setup
differences, but basically the same; setup is handled by ansible), the
peers will (seemingly randomly) experience a hard crash and need to be
power-cycled. There is no output on the screen and nothing in the logs.
After rebooting, the peer reconnects, heals whatever files it missed, and
everything is happy again. Maximum uptime for any given peer is 20 days.
Thanks to the replication, clients maintain connectivity, but from a system
administration perspective it's driving me crazy!

We run other storage servers with the same infiniband and CentOS7 setup
except that they use NFS instead of gluster. NFS shares are served through
infiniband to some machines and ethernet to others.

Is it possible that gluster's (and only gluster's) use of the infiniband
kernel module to send tcp packets to its peers on a 3 kernel is causing the
system to have a hard crash? Pretty specific problem and it doesn't make
much sense to me, but that's sure where the evidence seems to point.

Anyone running CentOS 7 gluster arrays with infiniband out there to confirm
that it works fine for them? Gluster devs care to chime in with a better
theory? I'd love for this random crashing to stop.

Thanks,
Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] gluster-nagios-common missing dependencies?

2016-03-04 Thread Glomski, Patrick
Just an FYI, I installed gluster-nagios-common on a CentOS7 machine
(started with minimal server, so there wasn't much there) to monitor the
bricks in a gluster array. python-ethtool seems to be missing as a
dependency in the rpm spec:

import glusternagios.glustercli as gcli
>   File "/usr/lib64/python2.7/site-packages/glusternagios/glustercli.py",
> line 21, in 
> import ethtool
>

I was using the latest 1.1 version:

http://download.gluster.org/pub/gluster/glusterfs-nagios/1.1.0/EPEL.repo/gluster-nagios-110-epel.repo

Regards,
Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] EPEL repo link for 7

2016-02-10 Thread Glomski, Patrick
Updating LATEST from 3.7.6 to point at 3.7.8, also appreciated.

On Wed, Feb 10, 2016 at 10:18 AM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> The LATEST/ directory used by the glusterfs-epel repo file offered on
> download.gluster.org references the $releasever, but doesn't have
> directories for epel-7.1 or 7.2.
>
> baseurl=
> http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/EPEL.repo/epel-$releasever/$basearch/
>
> I have to manually modify the .repo files to point to the epel-7 folder
> every time I install a new system. Anyone mind adding "epel-7.1" and
> "epel-7.2" so I just pull down the .repo and use it?
>
> Thanks,
> Patrick
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] EPEL repo link for 7

2016-02-10 Thread Glomski, Patrick
The LATEST/ directory used by the glusterfs-epel repo file offered on
download.gluster.org references the $releasever, but doesn't have
directories for epel-7.1 or 7.2.

baseurl=
http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/EPEL.repo/epel-$releasever/$basearch/

I have to manually modify the .repo files to point to the epel-7 folder
every time I install a new system. Anyone mind adding "epel-7.1" and
"epel-7.2" so I just pull down the .repo and use it?

Thanks,
Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Expired SSL certs - review.gluster.org

2016-02-04 Thread Glomski, Patrick
FYI,

The wildcard certificate for *.gluster.org used by review.gluster.org
expired on 2015/09/25. I figure if one goes through the trouble of
configuring SSL on the site, one may as well keep the certs valid...

Patrick

==
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
08:9e:45:06:63:aa:98:b0:2f:6b:39:7d:aa:bc:38:28
Signature Algorithm: sha1WithRSAEncryption
Issuer: C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert High
Assurance CA-3
Validity
Not Before: Sep  3 00:00:00 2013 GMT
***Not After : Sep 10 12:00:00 2015 GMT***
Subject: C=US, ST=North Carolina, L=Raleigh, O=Red Hat Inc., CN=*.
gluster.org
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:c4:a1:76:b9:12:92:04:4b:62:ef:ac:c8:70:9b:
06:8f:00:85:b4:a8:ef:87:70:a5:8b:eb:6b:56:c8:
33:d0:37:54:92:92:46:da:a2:ee:c2:ac:48:b1:98:
48:20:91:b2:e7:a6:dd:60:a6:02:77:58:3c:e2:11:
e7:4f:2b:ae:4f:65:6c:37:f7:45:bf:bf:31:98:5e:
ea:17:96:9a:6d:95:9a:eb:09:b1:cf:89:ca:ba:bc:
70:0a:26:c3:a4:a4:ce:0e:33:d0:fd:6f:2e:c7:27:
b6:2e:e8:48:2f:e1:a1:99:2a:0b:c2:ae:98:e9:f8:
d6:fd:c2:52:0f:85:de:9d:25:28:af:02:7e:db:dd:
e7:68:8b:5e:68:75:f2:05:1e:47:99:5d:9f:60:e7:
6a:3d:d8:ea:b8:af:8a:f2:1d:2e:00:ad:26:75:ca:
82:e6:45:9d:cc:25:98:24:3e:ef:50:fb:57:af:ac:
a0:95:6f:ff:ff:6e:ad:ce:e3:9b:72:db:61:25:bd:
20:4a:ad:33:aa:e6:4d:ab:1b:c8:80:1c:42:21:60:
d0:cc:ce:22:39:f8:93:24:e9:83:2d:bb:ec:bf:15:
45:37:55:f2:27:26:0f:c6:8d:7f:e3:5b:86:0f:c5:
73:74:af:c2:07:8a:bc:df:1f:3d:d5:72:ff:22:e7:
d8:1d
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:

keyid:50:EA:73:89:DB:29:FB:10:8F:9E:E5:01:20:D4:DE:79:99:48:83:F7

X509v3 Subject Key Identifier:
B4:16:D1:76:1F:AA:C6:8B:BD:9A:45:B4:AC:14:FD:0B:F4:3D:3F:9E
X509v3 Subject Alternative Name:
DNS:*.gluster.org, DNS:gluster.org
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 CRL Distribution Points:

Full Name:
  URI:http://crl3.digicert.com/ca3-g28.crl

Full Name:
  URI:http://crl4.digicert.com/ca3-g28.crl

X509v3 Certificate Policies:
Policy: 2.16.840.1.114412.1.1
  CPS: https://www.digicert.com/CPS

Authority Information Access:
OCSP - URI:http://ocsp.digicert.com
CA Issuers - URI:
http://cacerts.digicert.com/DigiCertHighAssuranceCA-3.crt

X509v3 Basic Constraints: critical
CA:FALSE
Signature Algorithm: sha1WithRSAEncryption
 77:11:b2:5f:e6:77:4b:a6:a5:4a:4f:35:c4:95:d6:d1:72:29:
 9b:6c:f1:2b:f6:0e:ec:63:43:9e:5d:25:19:4b:ab:6b:a0:be:
 86:14:cc:54:bc:be:41:f1:23:26:8e:d7:32:1b:69:59:f0:dd:
 36:8a:3b:b2:81:b4:3d:90:07:6c:31:4c:4f:dc:f4:67:d3:d6:
 49:d9:f5:7c:ab:0b:fc:58:bf:5f:df:fd:22:53:de:1d:7f:9a:
 95:f7:c8:8b:b3:ed:e9:fa:0a:76:22:7e:c5:c2:ba:34:4f:9b:
 75:1a:3c:c0:7c:ad:b3:d6:65:f0:5e:cc:5b:1e:ca:15:80:21:
 c7:af:26:bf:2e:a6:03:6a:95:28:a2:8b:84:33:86:7d:61:35:
 9b:86:30:7c:c8:c3:08:44:0a:6b:82:d1:dc:4e:bc:df:2d:d3:
 b5:9c:59:76:5a:68:8e:cd:46:a8:9c:6f:1e:2c:a4:f4:1d:fc:
 43:e1:cf:dc:1e:54:42:cc:01:d3:d5:ec:45:63:b6:c2:12:55:
 fd:87:3c:cc:36:de:de:47:21:46:7b:14:be:cf:13:95:0e:df:
 15:6f:4f:22:e4:47:48:d0:1a:9f:95:6e:d0:39:2b:92:e2:e5:
 d8:2e:a6:35:59:87:cc:fa:9e:c6:2f:19:3c:36:7b:1f:5b:e9:
 c7:1f:8a:9e
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd
process jumps to number of processor cores available (800% or 1200%,
depending on the pair of nodes involved) and the load average on the
machine goes very high (~20). The volume's heal statistics output shows
that it is crawling one of the bricks and trying to heal, but this crawl
hangs and never seems to finish.

The number of files in the xattrop directory varies over time, so I ran a
wc -l as you requested periodically for some time and then started
including a datestamped list of the files that were in the xattrops
directory on each brick to see which were persistent. All bricks had files
in the xattrop folder, so all results are attached.

Please let me know if there is anything else I can provide.

Patrick


On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hey,
>Which process is consuming so much cpu? I went through the logs you
> gave me. I see that the following files are in gfid mismatch state:
>
> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
> ,
>
> Could you give me the output of "ls /indices/xattrop | wc -l"
> output on all the bricks which are acting this way? This will tell us the
> number of pending self-heals on the system.
>
> Pranith
>
>
> On 01/20/2016 09:26 PM, David Robinson wrote:
>
> resending with parsed logs...
>
>
>
>
>
> I am having issues with 3.6.6 where the load will spike up to 800% for one
> of the glusterfsd processes and the users can no longer access the system.
> If I reboot the node, the heal will finish normally after a few minutes and
> the system will be responsive, but a few hours later the issue will start
> again.  It look like it is hanging in a heal and spinning up the load on
> one of the bricks.  The heal gets stuck and says it is crawling and never
> returns.  After a few minutes of the heal saying it is crawling, the load
> spikes up and the mounts become unresponsive.
>
> Any suggestions on how to fix this?  It has us stopped cold as the user
> can no longer access the systems when the load spikes... Logs attached.
>
> System setup info is:
>
> [root@gfs01a ~]# gluster volume info homegfs
>
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> performance.io-thread-count: 32
> performance.cache-size: 128MB
> performance.write-behind-window-size: 128MB
> server.allow-insecure: on
> network.ping-timeout: 42
> storage.owner-gid: 100
> geo-replication.indexing: off
> geo-replication.ignore-pid-check: on
> changelog.changelog: off
> changelog.fsync-interval: 3
> changelog.rollover-time: 15
> server.manage-gids: on
> diagnostics.client-log-level: WARNING
>
> [root@gfs01a ~]# rpm -qa | grep gluster
> gluster-nagios-common-0.1.1-0.el6.noarch
> glusterfs-fuse-3.6.6-1.el6.x86_64
> glusterfs-debuginfo-3.6.6-1.el6.x86_64
> glusterfs-libs-3.6.6-1.el6.x86_64
> glusterfs-geo-replication-3.6.6-1.el6.x86_64
> glusterfs-api-3.6.6-1.el6.x86_64
> glusterfs-devel-3.6.6-1.el6.x86_64
> glusterfs-api-devel-3.6.6-1.el6.x86_64
> glusterfs-3.6.6-1.el6.x86_64
> glusterfs-cli-3.6.6-1.el6.x86_64
> glusterfs-rdma-3.6.6-1.el6.x86_64
> samba-vfs-glusterfs-4.1.11-2.el6.x86_64
> glusterfs-server-3.6.6-1.el6.x86_64
> glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>


gfs01a_data_brick01a_homegfs
Description: Binary data


gfs01a_data_brick02a_homegfs
Description: Binary data


gfs01b_data_brick01b_homegfs
Description: Binary data


gfs01b_data_brick02b_homegfs
Description: Binary data


gfs02a_data_brick01a_homegfs
Description: Binary data


gfs02a_data_brick02a_homegfs
Description: Binary data


gfs02b_data_brick01b_homegfs
Description: Binary data


gfs02b_data_brick02b_homegfs
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
I should mention that the problem is not currently occurring and there are
no heals (output appended). By restarting the gluster services, we can stop
the crawl, which lowers the load for a while. Subsequent crawls seem to
finish properly. For what it's worth, files/folders that show up in the
'volume info' output during a hung crawl don't seem to be anything out of
the ordinary.

Over the past four days, the typical time before the problem recurs after
suppressing it in this manner is an hour. Last night when we reached out to
you was the last time it happened and the load has been low since (a
relief).  David believes that recursively listing the files (ls -alR or
similar) from a client mount can force the issue to happen, but obviously
I'd rather not unless we have some precise thing we're looking for. Let me
know if you'd like me to attempt to drive the system unstable like that and
what I should look for. As it's a production system, I'd rather not leave
it in this state for long.

[root@gfs01a xattrop]# gluster volume heal homegfs info
Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/
Number of entries: 0

Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/
Number of entries: 0

Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/
Number of entries: 0

Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/
Number of entries: 0

Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/
Number of entries: 0

Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/
Number of entries: 0

Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/
Number of entries: 0

Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/
Number of entries: 0




On Thu, Jan 21, 2016 at 10:40 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On 01/21/2016 08:25 PM, Glomski, Patrick wrote:
>
> Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd
> process jumps to number of processor cores available (800% or 1200%,
> depending on the pair of nodes involved) and the load average on the
> machine goes very high (~20). The volume's heal statistics output shows
> that it is crawling one of the bricks and trying to heal, but this crawl
> hangs and never seems to finish.
>
>
> The number of files in the xattrop directory varies over time, so I ran a
> wc -l as you requested periodically for some time and then started
> including a datestamped list of the files that were in the xattrops
> directory on each brick to see which were persistent. All bricks had files
> in the xattrop folder, so all results are attached.
>
> Thanks this info is helpful. I don't see a lot of files. Could you give
> output of "gluster volume heal  info"? Is there any directory in
> there which is LARGE?
>
> Pranith
>
>
> Please let me know if there is anything else I can provide.
>
> Patrick
>
>
> On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> hey,
>>Which process is consuming so much cpu? I went through the logs
>> you gave me. I see that the following files are in gfid mismatch state:
>>
>> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
>> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
>> ,
>>
>> Could you give me the output of "ls /indices/xattrop | wc -l"
>> output on all the bricks which are acting this way? This will tell us the
>> number of pending self-heals on the system.
>>
>> Pranith
>>
>>
>> On 01/20/2016 09:26 PM, David Robinson wrote:
>>
>> resending with parsed logs...
>>
>>
>>
>>
>>
>> I am having issues with 3.6.6 where the load will spike up to 800% for
>> one of the glusterfsd processes and the users can no longer access the
>> system.  If I reboot the node, the heal will finish normally after a few
>> minutes and the system will be responsive, but a few hours later the issue
>> will start again.  It look like it is hanging in a heal and spinning up the
>> load on one of the bricks.  The heal gets stuck and says it is crawling and
>> never returns.  After a few minutes of the heal saying it is crawling, the
>> load spikes up and the mounts become unresponsive.
>>
>> Any suggestions on how to fix this?  It has us stopped cold as the user
>> can no longer access the systems when the load spikes... Logs attached.
>>
>> System setup info is:
>>
>> [root@gfs01a ~]# gluster volume info homegfs
>>
>> Volume Name: homegfs
>> Type: Distributed-Replicate
>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>> Status: Started
>> Number of Bricks: 4 x 2 = 8
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homeg

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
Unfortunately, all samba mounts to the gluster volume through the gfapi vfs
plugin have been disabled for the last 6 hours or so and frequency of %cpu
spikes is increased. We had switched to sharing a fuse mount through samba,
but I just disabled that as well. There are no samba shares of this volume
now. The spikes now happen every thirty minutes or so. We've resorted to
just rebooting the machine with high load for the present.

On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
>
> We use the samba glusterfs virtual filesystem (the current version
> provided on download.gluster.org), but no windows clients connecting
> directly.
>
>
> Hmm.. Is there a way to disable using this and check if the CPU% still
> increases? What getxattr of "glusterfs.get_real_filename " does is
> to scan the entire directory looking for strcasecmp(,
> ). If anything matches then it will return the
> . But the problem is the scan is costly. So I wonder if
> this is the reason for the CPU spikes.
>
> Pranith
>
>
> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Do you have any windows clients? I see a lot of getxattr calls for
>> "glusterfs.get_real_filename" which lead to full readdirs of the
>> directories on the brick.
>>
>> Pranith
>>
>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>>
>> Pranith, could this kind of behavior be self-inflicted by us deleting
>> files directly from the bricks? We have done that in the past to clean up
>> an issues where gluster wouldn't allow us to delete from the mount.
>>
>> If so, is it feasible to clean them up by running a search on the
>> .glusterfs directories directly and removing files with a reference count
>> of 1 that are non-zero size (or directly checking the xattrs to be sure
>> that it's not a DHT link).
>>
>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2
>> -exec rm -f "{}" \;
>>
>> Is there anything I'm inherently missing with that approach that will
>> further corrupt the system?
>>
>>
>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
>> patrick.glom...@corvidtec.com> wrote:
>>
>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
>>> running on one of the bricks on gfs02b for 25 min or so and users cannot
>>> access the volume.
>>>
>>> I re-listed the xattrop directories as well as a 'top' entry and heal
>>> statistics. Then I restarted the gluster services on gfs02a.
>>>
>>> === top ===
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>>> COMMAND
>>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
>>> glusterfsd
>>>
>>> === xattrop ===
>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-41f19453-91e4-437c-afa9-3b25614de210
>>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>>
>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>>
>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>>
>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>>
>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>>
>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>>
>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8
>>> c9ce22ed-6d8b-471b-a111-b39e57f0b512
>>> 94fa1d60-45ad-4341-b69c-315936b51e8d
>>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>>
>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>>
>>>
>>> === heal stats ===
>>>
>>> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21
>>> 12:36:45 2016
>>> homegfs [b0-gfsib01a

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
Samba version is 4.1.17 that you guys maintain at download.gluster.org. The
vfs plugin comes packaged with it.

http://download.gluster.org/pub/gluster/glusterfs/samba/EPEL.repo/epel-6/x86_64/

# smbd --version
Version 4.1.17

# rpm -qa | grep samba-vfs-glusterfs
samba-vfs-glusterfs-4.1.17-4.el6rhs.x86_64

Let us know if there's anything else we can provide,

Patrick


On Thu, Jan 21, 2016 at 10:07 PM, Raghavendra Talur <rta...@redhat.com>
wrote:

>
> On Jan 22, 2016 7:27 AM, "Pranith Kumar Karampuri" <pkara...@redhat.com>
> wrote:
> >
> >
> >
> > On 01/22/2016 07:19 AM, Pranith Kumar Karampuri wrote:
> >>
> >>
> >>
> >> On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
> >>>
> >>> We use the samba glusterfs virtual filesystem (the current version
> provided on download.gluster.org), but no windows clients connecting
> directly.
> >>
> >>
> >> Hmm.. Is there a way to disable using this and check if the CPU% still
> increases? What getxattr of "glusterfs.get_real_filename " does is
> to scan the entire directory looking for strcasecmp(,
> ). If anything matches then it will return the
> . But the problem is the scan is costly. So I wonder if
> this is the reason for the CPU spikes.
> >
> > +Raghavendra Talur, +Poornima
> >
> > Raghavendra, Poornima,
> > When are these getxattrs triggered? Did you guys see any
> brick CPU spikes before? I initially thought it could be because of big
> directory heals. But this is happening even when no self-heals are
> required. So I had to move away from that theory.
>
> These getxattrs are triggered when a SMB client performs a path based
> operation. It is necessary then that some client was connected.
>
> The last fix to go in that code for 3.6 was
> http://review.gluster.org/#/c/10403/.
>
> I am not able to determine which release of 3.6 it made into. Will update.
>
> Also we would need version of Samba installed. Including the vfs plugin
> package.
>
> There is a for loop of strcmp involved here which does take a lot of CPU.
> It should be for short bursts though and is expected and harmless.
>
> >
> > Pranith
> >
> >>
> >> Pranith
> >>>
> >>>
> >>> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> >>>>
> >>>> Do you have any windows clients? I see a lot of getxattr calls for
> "glusterfs.get_real_filename" which lead to full readdirs of the
> directories on the brick.
> >>>>
> >>>> Pranith
> >>>>
> >>>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
> >>>>>
> >>>>> Pranith, could this kind of behavior be self-inflicted by us
> deleting files directly from the bricks? We have done that in the past to
> clean up an issues where gluster wouldn't allow us to delete from the mount.
> >>>>>
> >>>>> If so, is it feasible to clean them up by running a search on the
> .glusterfs directories directly and removing files with a reference count
> of 1 that are non-zero size (or directly checking the xattrs to be sure
> that it's not a DHT link).
> >>>>>
> >>>>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2
> -exec rm -f "{}" \;
> >>>>>
> >>>>> Is there anything I'm inherently missing with that approach that
> will further corrupt the system?
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
> >>>>>>
> >>>>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has
> been running on one of the bricks on gfs02b for 25 min or so and users
> cannot access the volume.
> >>>>>>
> >>>>>> I re-listed the xattrop directories as well as a 'top' entry and
> heal statistics. Then I restarted the gluster services on gfs02a.
> >>>>>>
> >>>>>> === top ===
> >>>>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
> >>>>>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
> glusterfsd
> >>>>>>
> >>>>>> === xattrop ===
> >>>>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> >>>>>> xattrop-41f19453-91e4-437c-afa9-

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
We use the samba glusterfs virtual filesystem (the current version provided
on download.gluster.org), but no windows clients connecting directly.

On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Do you have any windows clients? I see a lot of getxattr calls for
> "glusterfs.get_real_filename" which lead to full readdirs of the
> directories on the brick.
>
> Pranith
>
> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>
> Pranith, could this kind of behavior be self-inflicted by us deleting
> files directly from the bricks? We have done that in the past to clean up
> an issues where gluster wouldn't allow us to delete from the mount.
>
> If so, is it feasible to clean them up by running a search on the
> .glusterfs directories directly and removing files with a reference count
> of 1 that are non-zero size (or directly checking the xattrs to be sure
> that it's not a DHT link).
>
> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec
> rm -f "{}" \;
>
> Is there anything I'm inherently missing with that approach that will
> further corrupt the system?
>
>
> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
>
>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
>> running on one of the bricks on gfs02b for 25 min or so and users cannot
>> access the volume.
>>
>> I re-listed the xattrop directories as well as a 'top' entry and heal
>> statistics. Then I restarted the gluster services on gfs02a.
>>
>> === top ===
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>> COMMAND
>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
>> glusterfsd
>>
>> === xattrop ===
>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-41f19453-91e4-437c-afa9-3b25614de210
>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>
>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>
>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>
>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>
>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>
>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>
>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8  c9ce22ed-6d8b-471b-a111-b39e57f0b512
>> 94fa1d60-45ad-4341-b69c-315936b51e8d
>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>
>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>
>>
>> === heal stats ===
>>
>> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21
>> 12:36:45 2016
>> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21
>> 12:36:45 2016
>> homegfs [b0-gfsib01a] : Type of crawl: INDEX
>> homegfs [b0-gfsib01a] : No. of entries healed: 0
>> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0
>> homegfs [b0-gfsib01a] : No. of heal failed entries   : 0
>>
>> homegfs [b1-gfsib01b] : Starting time of crawl   : Thu Jan 21
>> 12:36:19 2016
>> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21
>> 12:36:19 2016
>> homegfs [b1-gfsib01b] : Type of crawl: INDEX
>> homegfs [b1-gfsib01b] : No. of entries healed: 0
>> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0
>> homegfs [b1-gfsib01b] : No. of heal failed entries   : 1
>>
>> homegfs [b2-gfsib01a] : Starting time of crawl   : Thu Jan 21
>> 12:36:48 2016
>> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21
>> 12:36:48 2016
>> homegfs [b2-gfsib01a] : Type of crawl: INDEX
>> homegfs [b2-gfsib01a] : No. of entries healed: 0
>> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0
>> homegfs [b2-gfsib01a] : No. of heal failed entries   : 0
>>
>> homegfs [b3-gfsib01b] : Starting time of crawl   : Thu Jan 21
>> 12:36:47 2016
>> homegfs [b3-gfsib

Re: [Gluster-devel] [Gluster-users] heal hanging

2016-01-21 Thread Glomski, Patrick
Pranith, could this kind of behavior be self-inflicted by us deleting files
directly from the bricks? We have done that in the past to clean up an
issues where gluster wouldn't allow us to delete from the mount.

If so, is it feasible to clean them up by running a search on the
.glusterfs directories directly and removing files with a reference count
of 1 that are non-zero size (or directly checking the xattrs to be sure
that it's not a DHT link).

find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec
rm -f "{}" \;

Is there anything I'm inherently missing with that approach that will
further corrupt the system?


On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
> running on one of the bricks on gfs02b for 25 min or so and users cannot
> access the volume.
>
> I re-listed the xattrop directories as well as a 'top' entry and heal
> statistics. Then I restarted the gluster services on gfs02a.
>
> === top ===
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
> glusterfsd
>
> === xattrop ===
> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> xattrop-41f19453-91e4-437c-afa9-3b25614de210
> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>
> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>
> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>
> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>
> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>
> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>
> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8  c9ce22ed-6d8b-471b-a111-b39e57f0b512
> 94fa1d60-45ad-4341-b69c-315936b51e8d
> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>
> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>
>
> === heal stats ===
>
> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21 12:36:45
> 2016
> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:45
> 2016
> homegfs [b0-gfsib01a] : Type of crawl: INDEX
> homegfs [b0-gfsib01a] : No. of entries healed: 0
> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0
> homegfs [b0-gfsib01a] : No. of heal failed entries   : 0
>
> homegfs [b1-gfsib01b] : Starting time of crawl   : Thu Jan 21 12:36:19
> 2016
> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:19
> 2016
> homegfs [b1-gfsib01b] : Type of crawl: INDEX
> homegfs [b1-gfsib01b] : No. of entries healed: 0
> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0
> homegfs [b1-gfsib01b] : No. of heal failed entries   : 1
>
> homegfs [b2-gfsib01a] : Starting time of crawl   : Thu Jan 21 12:36:48
> 2016
> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:48
> 2016
> homegfs [b2-gfsib01a] : Type of crawl: INDEX
> homegfs [b2-gfsib01a] : No. of entries healed: 0
> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0
> homegfs [b2-gfsib01a] : No. of heal failed entries   : 0
>
> homegfs [b3-gfsib01b] : Starting time of crawl   : Thu Jan 21 12:36:47
> 2016
> homegfs [b3-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:47
> 2016
> homegfs [b3-gfsib01b] : Type of crawl: INDEX
> homegfs [b3-gfsib01b] : No. of entries healed: 0
> homegfs [b3-gfsib01b] : No. of entries in split-brain: 0
> homegfs [b3-gfsib01b] : No. of heal failed entries   : 0
>
> homegfs [b4-gfsib02a] : Starting time of crawl   : Thu Jan 21 12:36:06
> 2016
> homegfs [b4-gfsib02a] : Ending time of crawl : Thu Jan 21 12:36:06
> 2016
> homegfs [b4-gfsib02a] : Type of crawl: INDEX
> homegfs [b4-gfsib02a] : No. of entries healed: 0
> homegfs [b4-gfsib02a] : No. of entries in split-brain: 0
> homegfs [b4-gfsib02a] : No. of heal failed entries   : 0
>
> homegfs [b5-gfsib02b] : Starting time of crawl   : Thu Jan 21 12:13:40
> 2016
> homegfs [b5-gfsib02b] :*** Crawl is in
> progr

[Gluster-devel] glusterfsd crash due to page allocation failure

2015-12-21 Thread Glomski, Patrick
Hello,

We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
encountering dmesg page allocation errors (stack trace is appended).

It appears that glusterfsd now sometimes fills up the cache completely and
crashes with a page allocation failure. I *believe* it mainly happens when
copying lots of new data to the system, running a 'find', or similar. Hosts
are all Scientific Linux 6.6 and these errors occur consistently on two
separate gluster pools.

Has anyone else seen this issue and are there any known fixes for it via
sysctl kernel parameters or other means?

Please let me know of any other diagnostic information that would help.

Thanks,
Patrick


[1458118.134697] glusterfsd: page allocation failure. order:5, mode:0x20
> [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
> 2.6.32-573.3.1.el6.x86_64 #1
> [1458118.134702] Call Trace:
> [1458118.134714]  [] ? __alloc_pages_nodemask+0x7dc/0x950
> [1458118.134728]  [] ? mlx4_ib_post_send+0x680/0x1f90
> [mlx4_ib]
> [1458118.134733]  [] ? kmem_getpages+0x62/0x170
> [1458118.134735]  [] ? fallback_alloc+0x1ba/0x270
> [1458118.134736]  [] ? cache_grow+0x2cf/0x320
> [1458118.134738]  [] ? cache_alloc_node+0x99/0x160
> [1458118.134743]  [] ? pskb_expand_head+0x62/0x280
> [1458118.134744]  [] ? __kmalloc+0x199/0x230
> [1458118.134746]  [] ? pskb_expand_head+0x62/0x280
> [1458118.134748]  [] ? __pskb_pull_tail+0x2aa/0x360
> [1458118.134751]  [] ? harmonize_features+0x29/0x70
> [1458118.134753]  [] ? dev_hard_start_xmit+0x1c4/0x490
> [1458118.134758]  [] ? sch_direct_xmit+0x15a/0x1c0
> [1458118.134759]  [] ? dev_queue_xmit+0x228/0x320
> [1458118.134762]  [] ? neigh_connected_output+0xbd/0x100
> [1458118.134766]  [] ? ip_finish_output+0x287/0x360
> [1458118.134767]  [] ? ip_output+0xb8/0xc0
> [1458118.134769]  [] ? __ip_local_out+0x9f/0xb0
> [1458118.134770]  [] ? ip_local_out+0x25/0x30
> [1458118.134772]  [] ? ip_queue_xmit+0x190/0x420
> [1458118.134773]  [] ? __alloc_pages_nodemask+0x129/0x950
> [1458118.134776]  [] ? tcp_transmit_skb+0x4b4/0x8b0
> [1458118.134778]  [] ? tcp_write_xmit+0x1da/0xa90
> [1458118.134779]  [] ? __kmalloc_node+0x4d/0x60
> [1458118.134780]  [] ? tcp_push_one+0x30/0x40
> [1458118.134782]  [] ? tcp_sendmsg+0x9cc/0xa20
> [1458118.134786]  [] ? sock_aio_write+0x19b/0x1c0
> [1458118.134788]  [] ? sock_aio_write+0x0/0x1c0
> [1458118.134791]  [] ? do_sync_readv_writev+0xfb/0x140
> [1458118.134797]  [] ? autoremove_wake_function+0x0/0x40
> [1458118.134801]  [] ? selinux_file_permission+0xbf/0x150
> [1458118.134804]  [] ? security_file_permission+0x16/0x20
> [1458118.134806]  [] ? do_readv_writev+0xd6/0x1f0
> [1458118.134807]  [] ? vfs_writev+0x46/0x60
> [1458118.134809]  [] ? sys_writev+0x51/0xd0
> [1458118.134812]  [] ? __audit_syscall_exit+0x25e/0x290
> [1458118.134816]  [] ? system_call_fastpath+0x16/0x1b
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Gluster v3.7.3 REMOVEXATTR clogs volume logs

2015-08-13 Thread Glomski, Patrick
I am currently testing gluster v3.7.3 on Scientific Linux 7.1 and a newly
created gluster volume. After transferring some files to the volume over
the fuse mount, the volume log is flooded with 2.5GB of errors like the
following:

[2015-08-13 15:54:36.921622] W [fuse-bridge.c:1230:fuse_err_cbk]
0-glusterfs-fuse: 361669: REMOVEXATTR() /path/to/file = -1 (No data
available)

There are several (fixed) redhat bugs relating to similar errors:
https://bugzilla.redhat.com/show_bug.cgi?id=1245966
https://bugzilla.redhat.com/show_bug.cgi?id=1188064
https://bugzilla.redhat.com/show_bug.cgi?id=1192832

- Is anyone else running 3.7 seeing similar errors?
- Is there something wrong with my configuration?
- If it's a problem, what other information do you need to diagnose?

gluster volume info:

Volume Name: testbrick
Type: Distribute
Volume ID: 91b0d825-5e39-4b17-a505-174b47849b40
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfstest:/data/brick01/testbrick
Options Reconfigured:
performance.readdir-ahead: on

Thanks,
Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] v3.6.3 doesn't respect default ACLs?

2015-07-27 Thread Glomski, Patrick
I built a patched version of 3.6.4 and the problem does seem to be fixed on
a test server/client when I mounted with those flags (acl, resolve-gids,
and gid-timeout). Seeing as it was a test system, I can't really provide
anything meaningful as to the performance hit seen without the gid-timeout
option. Thank you for implementing it so quickly, though!

Is there any chance of getting this fix incorporated in the upcoming 3.6.5
release?

Patrick


On Thu, Jul 23, 2015 at 6:27 PM, Niels de Vos nde...@redhat.com wrote:

 On Tue, Jul 21, 2015 at 10:30:04PM +0200, Niels de Vos wrote:
  On Wed, Jul 08, 2015 at 03:20:41PM -0400, Glomski, Patrick wrote:
   Gluster devs,
  
   I'm running gluster v3.6.3 (both server and client side). Since my
   application requires more than 32 groups, I don't mount with ACLs on
 the
   client. If I mount with ACLs between the bricks and set a default ACL
 on
   the server, I think I'm right in stating that the server should respect
   that ACL whenever a new file or folder is made.
 
  I would expect that the ACL gets in herited on the brick. When a new
  file is created without the default ACL, things seem to be wrong. You
  mention that creating the file directly on the brick has the correct
  ACL, so there must be some Gluster component interfering.
 
  You reminded me on IRC about this email, and that helped a lot. Its very
  easy to get distracted when trying to investigate things from the
  mailinglists.
 
  I had a brief look, and I think we could reach a solution. An ugly patch
  for initial testing is ready. Well... it compiles. I'll try to run some
  basic tests tomorrow and see if it improves things and does not crash
  immediately.
 
  The change can be found here:
http://review.gluster.org/11732
 
  It basically adds a resolve-gids mount option for the FUSE client.
  This causes the fuse daemon to call getgrouplist() and retrieve all the
  groups for the UID that accesses the mountpoint. Without this option,
  the behavior is not changed, and /proc/$PID/status is used to get up to
  32 groups (the $PID is the process that accesses the mountpoint).
 
  You probably want to also mount with gid-timeout=N where N is seconds
  that the group cache is valid. In the current master branch this is set
  to 300 seconds (like the sssd default), but if the groups of a used
  rarely change, this value can be increased. Previous versions had a
  lower timeout which could cause resolving the groups on almost each
  network packet that arrives (HUGE performance impact).
 
  When using this option, you may also need to enable server.manage-gids.
  This option allows using more than ~93 groups on the bricks. The network
  packets can only contain ~93 groups, when server.manage-gids is enabled,
  the groups are not sent in the network packets, but are resolved on the
  bricks with getgrouplist().

 The patch linked above had been tested, corrected and updated. The
 change works for me on a test-system.

 A backport that you should be able to include in a package for 3.6 can
 be found here: http://termbin.com/f3cj
 Let me know if you are not familiar with rebuilding patched packages,
 and I can build a test-version for you tomorrow.

 On glusterfs-3.6, you will want to pass a gid-timeout mount option too.
 The option enables caching of the resolved groups that the uid belongs
 too, if caching is not enebled (or expires quickly), you will probably
 notice a preformance hit. Newer version of GlusterFS set the timeout to
 300 seconds (like the default timeout sssd uses).

 Please test and let me know if this fixes your use case.

 Thanks,
 Niels


 
  Cheers,
  Niels
 
   Maybe an example is in order:
  
   We first set up a test directory with setgid bit so that our new
   subdirectories inherit the group.
   [root@gfs01a hpc_shared]# mkdir test; cd test; chown pglomski.users .;
   chmod 2770 .; getfacl .
   # file: .
   # owner: pglomski
   # group: users
   # flags: -s-
   user::rwx
   group::rwx
   other::---
  
   New subdirectories share the group, but the umask leads to them being
 group
   read-only.
   [root@gfs01a test]# mkdir a; getfacl a
   # file: a
   # owner: root
   # group: users
   # flags: -s-
   user::rwx
   group::r-x
   other::r-x
  
   Setting default ACLs on the server allows group write to new
 directories
   made on the server.
   [root@gfs01a test]# setfacl -m d:g::rwX ./; mkdir b; getfacl b
   # file: b
   # owner: root
   # group: users
   # flags: -s-
   user::rwx
   group::rwx
   other::---
   default:user::rwx
   default:group::rwx
   default:other::---
  
   The respect for ACLs is (correctly) shared across bricks.
   [root@gfs02a test]# getfacl b
   # file: b
   # owner: root
   # group: users
   # flags: -s-
   user::rwx
   group::rwx
   other::---
   default:user::rwx
   default:group::rwx
   default:other::---
  
   [root@gfs02a test]# mkdir c; getfacl c
   # file: c
   # owner: root
   # group: users
   # flags: -s-
   user::rwx
   group::rwx

[Gluster-devel] v3.6.3 doesn't respect default ACLs?

2015-07-08 Thread Glomski, Patrick
Gluster devs,

I'm running gluster v3.6.3 (both server and client side). Since my
application requires more than 32 groups, I don't mount with ACLs on the
client. If I mount with ACLs between the bricks and set a default ACL on
the server, I think I'm right in stating that the server should respect
that ACL whenever a new file or folder is made. Maybe an example is in
order:

We first set up a test directory with setgid bit so that our new
subdirectories inherit the group.
[root@gfs01a hpc_shared]# mkdir test; cd test; chown pglomski.users .;
chmod 2770 .; getfacl .
# file: .
# owner: pglomski
# group: users
# flags: -s-
user::rwx
group::rwx
other::---

New subdirectories share the group, but the umask leads to them being group
read-only.
[root@gfs01a test]# mkdir a; getfacl a
# file: a
# owner: root
# group: users
# flags: -s-
user::rwx
group::r-x
other::r-x

Setting default ACLs on the server allows group write to new directories
made on the server.
[root@gfs01a test]# setfacl -m d:g::rwX ./; mkdir b; getfacl b
# file: b
# owner: root
# group: users
# flags: -s-
user::rwx
group::rwx
other::---
default:user::rwx
default:group::rwx
default:other::---

The respect for ACLs is (correctly) shared across bricks.
[root@gfs02a test]# getfacl b
# file: b
# owner: root
# group: users
# flags: -s-
user::rwx
group::rwx
other::---
default:user::rwx
default:group::rwx
default:other::---

[root@gfs02a test]# mkdir c; getfacl c
# file: c
# owner: root
# group: users
# flags: -s-
user::rwx
group::rwx
other::---
default:user::rwx
default:group::rwx
default:other::---

However, when folders are created client-side, the default ACLs appear on
the server, but don't seem to be correctly applied.
[root@client test]# mkdir d; getfacl d
# file: d
# owner: root
# group: users
# flags: -s-
user::rwx
group::r-x
other::---

[root@gfs01a test]# getfacl d
# file: d
# owner: root
# group: users
# flags: -s-
user::rwx
group::r-x
other::---
default:user::rwx
default:group::rwx
default:other::---

As no groups or users were specified, I shouldn't need to specify a mask
for the ACL and, indeed, specifying a mask doesn't help.

If it helps diagnose the problem, the volume options are as follows:
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on

This approach to server-side ACLs worked properly with previous versions of
gluster. Can anyone assess the situation for me, confirm/deny that
something changed, and possibly suggest how I can achieve inherited groups
with write permission for new subdirectories in a 32-group environment?

Thanks for your time,

Patrick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel