[Gluster-users] hanging httpd processes.

2017-03-31 Thread Mohit Agrawal
Hi,

As per attached glusterdump/stackdump  it seems it is a known issue
(https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is
already fixed from the patch (https://review.gluster.org/#/c/15380/).

The issue is happened in this case
Assume a file is opened with fd1 and fd2.
1. some WRITE opto fd1 got error, they were add back to 'todo' queue
   because of those error.
2. fd2 closed, a FLUSH op is send to write-behind.
3. FLUSH can not be unwind because it's not a legal waiter for those
   failed write(as func __wb_request_waiting_on() say). and those failed
   WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
   syscall.

As per statedump it also shows flush op fd is not same as write op fd.
Kindly upgrade the package on 3.10.1 and share the result.



Thanks
Mohit Agrawal


On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi http://lists.gluster.org/mailman/listinfo/gluster-users>>
wrote:

>* Hi Alvin,
*>>* Thanks for the dump output. It helped a bit.
*>>* For now, recommend turning off open-behind and read-ahead performance
*>* translators for you to get rid of this situation, As I noticed hung FLUSH
*>* operations from these translators.
*>
Looks like I gave wrong advise by looking at below snippet:

[global.callpool.stack.61]
>* stack=0x7f6c6f628f04
*>* uid=48
*>* gid=48
*>* pid=11077
*>* unique=10048797
*>* lk-owner=a73ae5bdb5fcd0d2
*>* op=FLUSH
*>* type=1
*>* cnt=5
*>>* [global.callpool.stack.61.frame.1]
*>* frame=0x7f6c6f793d88
*>* ref_count=0
*>* translator=edocs-production-write-behind
*>* complete=0
*>* parent=edocs-production-read-ahead
*>* wind_from=ra_flush
*>* wind_to=FIRST_CHILD (this)->fops->flush
*>* unwind_to=ra_flush_cbk
*>>* [global.callpool.stack.61.frame.2]
*>* frame=0x7f6c6f796c90
*>* ref_count=1
*>* translator=edocs-production-read-ahead
*>* complete=0
*>* parent=edocs-production-open-behind
*>* wind_from=default_flush_resume
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=default_flush_cbk
*>>* [global.callpool.stack.61.frame.3]
*>* frame=0x7f6c6f79b724
*>* ref_count=1
*>* translator=edocs-production-open-behind
*>* complete=0
*>* parent=edocs-production
*>* wind_from=io_stats_flush
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=io_stats_flush_cbk
*>>* [global.callpool.stack.61.frame.4]
*>* frame=0x7f6c6f79b474
*>* ref_count=1
*>* translator=edocs-production
*>* complete=0
*>* parent=fuse
*>* wind_from=fuse_flush_resume
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=fuse_err_cbk
*>>* [global.callpool.stack.61.frame.5]
*>* frame=0x7f6c6f796684
*>* ref_count=1
*>* translator=fuse
*>* complete=0
*>
Mos probably, issue is with write-behind's flush. So please turn off
write-behind and test. If you don't have any hung httpd processes, please
let us know.

-Amar


>* -Amar
*>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
*>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
*>>* the clients 3.7.11-2 on Centos 6.8
** We are seeing httpd processes hang in fuse_request_send or sync_page.
** These calls are from PHP 5.3.3-48 scripts
** I am attaching  a tgz file that contains the process dump from glusterfsd
*>>* and the hung pids along with the offending pid's stacks from
*>>* /proc/{pid}/stack.
** This has been a low level annoyance for a while but it has become a much
*>>* bigger issue because the number of hung processes went from a few a week to
*>>* a few hundred a day.
*>>* --
*>>* Alvin Starr   ||   voice: (905)513-7688
*>>* Netvel Inc.   ||   Cell:  (416)806-0133
*>>* alvin at netvel.net

||
*>>* ___
*>>* Gluster-users mailing list
*>>* Gluster-users at gluster.org

*>>* http://lists.gluster.org/mailman/listinfo/gluster-users

*>>* --
*>* Amar Tumballi (amarts)
*>


--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hanging httpd processes.

2017-03-31 Thread Mohit Agrawal
Hi,

As you have mentioned client/server version in thread it shows package
version are different on both(client,server).
We would recommend you to upgrade both servers and clients to rhs-3.10.1.
If it is not possible to upgrade both(client,server) then in this case it
is required to upgrade client only.

Thanks
Mohit Agrawal

On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal  wrote:

> Hi,
>
> As per attached glusterdump/stackdump  it seems it is a known issue 
> (https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is already 
> fixed from the patch (https://review.gluster.org/#/c/15380/).
>
> The issue is happened in this case
> Assume a file is opened with fd1 and fd2.
> 1. some WRITE opto fd1 got error, they were add back to 'todo' queue
>because of those error.
> 2. fd2 closed, a FLUSH op is send to write-behind.
> 3. FLUSH can not be unwind because it's not a legal waiter for those
>failed write(as func __wb_request_waiting_on() say). and those failed
>WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
>syscall.
>
> As per statedump it also shows flush op fd is not same as write op fd.
> Kindly upgrade the package on 3.10.1 and share the result.
>
>
>
> Thanks
> Mohit Agrawal
>
>
> On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi  > wrote:
>
> >* Hi Alvin,
> *>>* Thanks for the dump output. It helped a bit.
> *>>* For now, recommend turning off open-behind and read-ahead performance
> *>* translators for you to get rid of this situation, As I noticed hung FLUSH
> *>* operations from these translators.
> *>
> Looks like I gave wrong advise by looking at below snippet:
>
> [global.callpool.stack.61]
> >* stack=0x7f6c6f628f04
> *>* uid=48
> *>* gid=48
> *>* pid=11077
> *>* unique=10048797
> *>* lk-owner=a73ae5bdb5fcd0d2
> *>* op=FLUSH
> *>* type=1
> *>* cnt=5
> *>>* [global.callpool.stack.61.frame.1]
> *>* frame=0x7f6c6f793d88
> *>* ref_count=0
> *>* translator=edocs-production-write-behind
> *>* complete=0
> *>* parent=edocs-production-read-ahead
> *>* wind_from=ra_flush
> *>* wind_to=FIRST_CHILD (this)->fops->flush
> *>* unwind_to=ra_flush_cbk
> *>>* [global.callpool.stack.61.frame.2]
> *>* frame=0x7f6c6f796c90
> *>* ref_count=1
> *>* translator=edocs-production-read-ahead
> *>* complete=0
> *>* parent=edocs-production-open-behind
> *>* wind_from=default_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=default_flush_cbk
> *>>* [global.callpool.stack.61.frame.3]
> *>* frame=0x7f6c6f79b724
> *>* ref_count=1
> *>* translator=edocs-production-open-behind
> *>* complete=0
> *>* parent=edocs-production
> *>* wind_from=io_stats_flush
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=io_stats_flush_cbk
> *>>* [global.callpool.stack.61.frame.4]
> *>* frame=0x7f6c6f79b474
> *>* ref_count=1
> *>* translator=edocs-production
> *>* complete=0
> *>* parent=fuse
> *>* wind_from=fuse_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=fuse_err_cbk
> *>>* [global.callpool.stack.61.frame.5]
> *>* frame=0x7f6c6f796684
> *>* ref_count=1
> *>* translator=fuse
> *>* complete=0
> *>
> Mos probably, issue is with write-behind's flush. So please turn off
> write-behind and test. If you don't have any hung httpd processes, please
> let us know.
>
> -Amar
>
>
> >* -Amar
> *>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr  > wrote:
> *>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
> *>>* the clients 3.7.11-2 on Centos 6.8
> ** We are seeing httpd processes hang in fuse_request_send or sync_page.
> ** These calls are from PHP 5.3.3-48 scripts
> ** I am attaching  a tgz file that contains the process dump from 
> glusterfsd
> *>>* and the hung pids along with the offending pid's stacks from
> *>>* /proc/{pid}/stack.
> ** This has been a low level annoyance for a while but it has become a 
> much
> *>>* bigger issue because the number of hung processes went from a few a week 
> to
> *>>* a few hundred a day.
> *>>* --
> *>>* Alvin Starr   ||   voice: (905)513-7688
> *>>* Netvel Inc.   ||   Cell:  (416)806-0133
> *>>* alvin at netvel.net 
>   ||
> *>>* ___
> *>>* Gluster-users mailing list
> *>>* Gluster-users at gluster.org 
> 
> *>>* http://lists.gluster.org/mailman/listinfo/gluster-users 
> 
> *>>* --
> *>* Amar Tumballi (amarts)
> *>
>
>
> --
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hanging httpd processes.

2017-03-31 Thread Alvin Starr
Since things are in production making upgrades needs to be scheduled so 
this may take a while before I can get everything up to 3.10.


I have set the write-behind option to off on the servers but do I need 
to restart the servers or can I just get away with umount/mount the clients?


Thank you for taking a look at this for us.


On 03/31/2017 05:54 AM, Mohit Agrawal wrote:

Hi,

As you have mentioned client/server version in thread it shows package 
version are different on both(client,server).

We would recommend you to upgrade both servers and clients to rhs-3.10.1.
If it is not possible to upgrade both(client,server) then in this case 
it is required to upgrade client only.


Thanks
Mohit Agrawal

On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal > wrote:


Hi, As per attached glusterdump/stackdump it seems it is a known
issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211
) and issue
is already fixed from the patch
(https://review.gluster.org/#/c/15380/
). The issue is happened in
this case Assume a file is opened with fd1 and fd2. 1. some WRITE
opto fd1 got error, they were add back to 'todo' queue because of
those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3.
FLUSH can not be unwind because it's not a legal waiter for those
failed write(as func __wb_request_waiting_on() say). and those
failed WRITE also can not be ended if fd1 is not closed. fd2 stuck
in close syscall. As per statedump it also shows flush op fd is
not same as write op fd. Kindly upgrade the package on 3.10.1 and
share the result. Thanks Mohit Agrawal

On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:

>/Hi Alvin, />//>/Thanks for the dump output. It helped a bit. />//>/For 
now, recommend turning off open-behind and read-ahead
performance />/translators for you to get rid of this situation, As I 
noticed
hung FLUSH />/operations from these translators. />//
Looks like I gave wrong advise by looking at below snippet:

[global.callpool.stack.61]
>/stack=0x7f6c6f628f04 />/uid=48 />/gid=48 />/pid=11077 />/unique=10048797 />/lk-owner=a73ae5bdb5fcd0d2 />/op=FLUSH />/type=1 />/cnt=5 />//>/[global.callpool.stack.61.frame.1] />/frame=0x7f6c6f793d88 />/ref_count=0 
/>/translator=edocs-production-write-behind />/complete=0 />/parent=edocs-production-read-ahead />/wind_from=ra_flush />/wind_to=FIRST_CHILD (this)->fops->flush />/unwind_to=ra_flush_cbk />//>/[global.callpool.stack.61.frame.2] 
/>/frame=0x7f6c6f796c90 />/ref_count=1 />/translator=edocs-production-read-ahead />/complete=0 />/parent=edocs-production-open-behind />/wind_from=default_flush_resume />/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=default_flush_cbk 
/>//>/[global.callpool.stack.61.frame.3] />/frame=0x7f6c6f79b724 />/ref_count=1 />/translator=edocs-production-open-behind />/complete=0 />/parent=edocs-production />/wind_from=io_stats_flush />/wind_to=FIRST_CHILD(this)->fops->flush 
/>/unwind_to=io_stats_flush_cbk />//>/[global.callpool.stack.61.frame.4] />/frame=0x7f6c6f79b474 />/ref_count=1 />/translator=edocs-production />/complete=0 />/parent=fuse />/wind_from=fuse_flush_resume 
/>/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=fuse_err_cbk />//>/[global.callpool.stack.61.frame.5] />/frame=0x7f6c6f796684 />/ref_count=1 />/translator=fuse />/complete=0 />//
Mos probably, issue is with write-behind's flush. So please turn off
write-behind and test. If you don't have any hung httpd processes, please
let us know.

-Amar


>/-Amar />//>/On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote: />//>>/We 
are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers
and on />>/the clients 3.7.11-2 on Centos 6.8 />>//>>/We are seeing httpd 
processes hang in fuse_request_send or
sync_page. />>//>>/These calls are from PHP 5.3.3-48 scripts />>//>>/I am 
attaching a tgz file that contains the process dump from
glusterfsd />>/and the hung pids along with the offending pid's stacks from 
/>>//proc/{pid}/stack. />>//>>/This has been a low level annoyance for a while but it 
has become
a much />>/bigger issue because the number of hung processes went from a few
a week to />>/a few hundred a day. />>//>>//>>/-- />>/Alvin Starr || voice: 
(905)513-7688 />>/Netvel Inc. || Cell: (416)806-0133 />>/alvin at netvel.net
 || 
/>>//>>//>>/___ />>/Gluster-users mailing list 
/>>/Gluster-users at gluster.org
 
/>>/http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Node count constraints with EC?

2017-03-31 Thread Terry McGuire
Ok, I think I see my error.  The rule, near as I can tell, is actually 
total/parity>2.  So, 4+2 gives 6/2=3, and therefore is ok.  Eventually I’ll get 
all this straight!

Terry


> On Mar 30, 2017, at 23:13, Ashish Pandey  wrote:
> 
> Terry,
> 
> It is  (data/parity)>=2. You can very well create 4+2 or 8+4 volume.
> Are you seeing any error message that you can not create 4+2 config? (4 = 
> data brick and 2 = redundancy brick count)
> 
> Ashish
> 
> From: "Terry McGuire" 
> To: gluster-users@gluster.org
> Sent: Friday, March 31, 2017 3:34:35 AM
> Subject: Re: [Gluster-users] Node count constraints with EC?
> 
> Thanks Ashish, Cedric, for your comments.
> 
> I’m no longer concerned about my choice of 4 nodes to start, but, I realize 
> that there’s an issue with my subvolume config options.  Turns out only my 
> 8+3 choice is permitted, as the 4+2 and 8+4 options violate the data/parity>2 
> rule.  So, 8+3 it is, as 8+2 isn’t quite enough redundancy for me.
> 
> Regards,
> Terry
> 
> 
> On Mar 30, 2017, at 02:14, yipik...@gmail.com  
> wrote:
> 
> On 30/03/2017 08:35, Ashish Pandey wrote:
> Good point Cedric!!
> The only thing is that, I would prefer to say "bricks" instead of "nodes" in 
> your statement.
> 
> "starting with 4 bricks (3+1) can only evolve by adding 4 bricks (3+1)" 
> Oh right, thanks for correcting me !
> 
> Cheers
> 
> 
> From: "Cedric Lemarchand"  
> To: "Terry McGuire"  
> Cc: gluster-users@gluster.org 
> Sent: Thursday, March 30, 2017 11:57:27 AM
> Subject: Re: [Gluster-users] Node count constraints with EC?
> 
> 
> > Le 29 mars 2017 à 20:29, Terry McGuire  
> >  a écrit :
> > 
> > I was thinking I’d spread these over 4 nodes, and add single nodes over 
> > time, with subvolumes rearranged over new nodes to maintain protection from 
> > whole node failures.
> 
> Also keep in mind that dispersed cluster can only be expanded by the number 
> of initial nodes, eg starting with 4 nodes 3+1 can only evolve by adding 4 
> nodes 3+1, you cannot change the default policy 3+1 to 4+1. So the 
> granularity of the evolution of the cluster is fixed at the beginning. 
> 
> Cheers
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-users 
> 
> 
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hanging httpd processes.

2017-03-31 Thread Yong Zhang
Thanks Amar, I’ll consider your recommendations. But why performance is totally 
different on two nodes? Will data be written to both nodes at the same time?


From: Amar Tumballi
Sent: Friday, March 31, 2017 3:14 PM
To: Alvin Starr
Cc: gluster-users@gluster.org List
Subject: Re: [Gluster-users] hanging httpd processes.



On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi 
> wrote:
Hi Alvin,

Thanks for the dump output. It helped a bit.

For now, recommend turning off open-behind and read-ahead performance 
translators for you to get rid of this situation, As I noticed hung FLUSH 
operations from these translators.

Looks like I gave wrong advise by looking at below snippet:

[global.callpool.stack.61]
stack=0x7f6c6f628f04
uid=48
gid=48
pid=11077
unique=10048797
lk-owner=a73ae5bdb5fcd0d2
op=FLUSH
type=1
cnt=5

[global.callpool.stack.61.frame.1]
frame=0x7f6c6f793d88
ref_count=0
translator=edocs-production-write-behind
complete=0
parent=edocs-production-read-ahead
wind_from=ra_flush
wind_to=FIRST_CHILD (this)->fops->flush
unwind_to=ra_flush_cbk

[global.callpool.stack.61.frame.2]
frame=0x7f6c6f796c90
ref_count=1
translator=edocs-production-read-ahead
complete=0
parent=edocs-production-open-behind
wind_from=default_flush_resume
wind_to=FIRST_CHILD(this)->fops->flush
unwind_to=default_flush_cbk

[global.callpool.stack.61.frame.3]
frame=0x7f6c6f79b724
ref_count=1
translator=edocs-production-open-behind
complete=0
parent=edocs-production
wind_from=io_stats_flush
wind_to=FIRST_CHILD(this)->fops->flush
unwind_to=io_stats_flush_cbk

[global.callpool.stack.61.frame.4]
frame=0x7f6c6f79b474
ref_count=1
translator=edocs-production
complete=0
parent=fuse
wind_from=fuse_flush_resume
wind_to=FIRST_CHILD(this)->fops->flush
unwind_to=fuse_err_cbk

[global.callpool.stack.61.frame.5]
frame=0x7f6c6f796684
ref_count=1
translator=fuse
complete=0

Mos probably, issue is with write-behind's flush. So please turn off 
write-behind and test. If you don't have any hung httpd processes, please let 
us know.

-Amar


-Amar

On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr 
> wrote:
We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on the 
clients 3.7.11-2 on Centos 6.8

We are seeing httpd processes hang in fuse_request_send or sync_page.

These calls are from PHP 5.3.3-48 scripts

I am attaching  a tgz file that contains the process dump from glusterfsd and 
the hung pids along with the offending pid's stacks from /proc/{pid}/stack.

This has been a low level annoyance for a while but it has become a much bigger 
issue because the number of hung processes went from a few a week to a few 
hundred a day.


--
Alvin Starr   ||   voice: (905)513-7688
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



--
Amar Tumballi (amarts)



--
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] TLS support

2017-03-31 Thread Joseph Lorenzini
Hi Yong,

Gluster uses the openssl library, which supports SSL 3.0 and TLS versions
1.0,1.1,1.2. I actually don't know if its dynamically linked against the
openssl library nor what version of the openssl lib gluster has been tested
with. That is important info to know that is currently undocumented.

But in regards to your specific question, it would support SSL (which no
one should use anymore) and all versions of TLS (everyone should be using
at least 1.1)

Thanks,
Joe
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] TLS support

2017-03-31 Thread Niels de Vos
On Fri, Mar 31, 2017 at 07:01:14AM -0500, Joseph Lorenzini wrote:
> Hi Yong,
> 
> Gluster uses the openssl library, which supports SSL 3.0 and TLS versions
> 1.0,1.1,1.2. I actually don't know if its dynamically linked against the
> openssl library nor what version of the openssl lib gluster has been tested
> with. That is important info to know that is currently undocumented.

It is dynamically linked and the version that is used is the openssl
version that is provided by the distribution where the different
glusterfs packages are built.

Niels


signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] TLS support

2017-03-31 Thread Darren Zhang
So how can I know the default ssl protocol currently using between server and 
client? (gluster3.10.0 on ubuntu16.04)


Yong Zhang



On 2017-03-31 20:56 , Niels de Vos Wrote:

On Fri, Mar 31, 2017 at 07:01:14AM -0500, Joseph Lorenzini wrote:
> Hi Yong,
>
> Gluster uses the openssl library, which supports SSL 3.0 and TLS versions
> 1.0,1.1,1.2. I actually don't know if its dynamically linked against the
> openssl library nor what version of the openssl lib gluster has been tested
> with. That is important info to know that is currently undocumented.

It is dynamically linked and the version that is used is the openssl
version that is provided by the distribution where the different
glusterfs packages are built.

Niels
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Terrible Gluster rebuild performance.

2017-03-31 Thread Ernie Dunbar

  
  
We currently have a Gluster array of three baremetal servers in a
  Replicate 1x3 configuration. This single brick has about 1.1TB of
  data and is configured for 3.7 TB of total space. This array is
  mostly hosting mail in Maildir format, although we'd like it to
  also host some Proxmox VMs - the problem with doing that is that
  the performance of the Gluster array is so slow that booting VMs
  from Gluster makes Proxmox time out! We've instead started
  experimenting with using Gluster's NFS server to host the VMs
  which is much faster, but there are obvious issues with stability.
  We're not really hosting anything important yet, this is still an
  experiment. Except for all our mail, of course.

The e-mail performance isn't spectacularly fast, but mostly
  bearable at the moment. 

The real meat of this post however, is "What do we do about
  this?" I figured that I had built a slow RAID configuration (disk
  utilization was very high), so I took down one of the Gluster
  nodes and rebuilt it as a RAID 0 array. This meant starting again
  with a completely empty disk, but after rebuilding the node, and
  starting the volume heal, it absolutely slaughtered performance.
  Our mail server had gotten so slow as to make webmail unusable.
  The process to heal the volume takes days to move 1.1 TB
  of data and we couldn't just let it run with performance that bad,
  so I stopped the Gluster daemon during the day and only ran it at
  night. It took two whole weeks to completely heal the volume in
  this fashion, even when allowing the heal to run over the weekend
  for two days straight. 

So what happens when we add more Gluster nodes to this array? Or
  if we wanted to upgrade the hardware in the array in any way? Or
  if I wanted to make any other changes to the array? It seems that
  first, Gluster's promise of high availability is "things will keep
  working, but they'll be so slow in the meantime that nobody wants
  to use the services built on top of it", and the same is true when
  you have to take a node offline for an extended period of time and
  you have to heal the array again. 

This is a serious issue with the performance of heal operations.
  What can I do to fix it?

  

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Monthly Newsletter, March 2017

2017-03-31 Thread Amye Scavarda
Gluster Monthly Newsletter, March 2017

-- 

3.10 Release: If you didn’t already see this, we’ve released Gluster 3.10.
Further details on the blog.

https://blog.gluster.org/2017/02/announcing-gluster-3-10/

Our weekly community meeting has changed: we'll be meeting every other week
instead of weekly, moving the time to 15:00 UTC, and our agenda is at:
https://bit.ly/gluster-community-meetings

We hope this means that more people can join us. Kaushal outlines the
changes on the mailing list:
http://lists.gluster.org/pipermail/gluster-devel/2017-January/051918.html

New meetup!

Seattle Storage Meetup has its first meeting, April 13!

Upcoming Talks:

Red Hat Summit -

Container-Native Storage for Modern Applications with OpenShift and Red Hat
Gluster Storage

Architecting and Performance-Tuning Efficient Gluster Storage Pools

Noteworthy threads:

Gluster-users:

Gluster RPC Internals - Lecture #1 - recording - Milind Changire

http://lists.gluster.org/pipermail/gluster-users/2017-March/030136.html

Shyam announces release 3.11 : Scope, schedule and feature tracking

http://lists.gluster.org/pipermail/gluster-users/2017-March/030251.html

Vijay announces new demos in Community Meeting

http://lists.gluster.org/pipermail/gluster-users/2017-March/030264.html

Prasanna Kalever posts about Elasticsearch with gluster-block

http://lists.gluster.org/pipermail/gluster-users/2017-March/030302.html

Raghavendra Talur has a proposal to deprecate replace-brick for "distribute
only" volumes

http://lists.gluster.org/pipermail/gluster-users/2017-March/030304.html

Deepak Naidu asks about Secured mount in GlusterFS using keys

http://lists.gluster.org/pipermail/gluster-users/2017-March/030312.html

Ramesh Nachimuthu has a question for gluster-users: How do you oVirt?

http://lists.gluster.org/pipermail/gluster-users/2017-March/030366.html

Joe Julian announces a Seattle Storage meetup

http://lists.gluster.org/pipermail/gluster-users/2017-March/030398.html

Gluster-devel:

Shyam posts about Back porting guidelines: Change-ID consistency across
branches

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052216.html

Niels de Vos asks about a pluggable interface for erasure coding?

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052223.html

Niels de Vos has a proposal on Reducing maintenance burden and moving fuse
support to an external project

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052238.html

Nigel Babu starts a conversation on defining a good build

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052245.html

Ben Werthmann announces gogfapi improvements

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052274.html

Saravanakumar Arumugam posts about Gluster Volume as object storage with S3
interface

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052263.html

Vijay posts about Maintainers 2.0 proposal

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052321.html

George Lian posts: nodeid changed due to write-behind option changed online
will lead to unexpected umount by kernel

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052372.html

Sriram posts a proposal for Gluster volume snapshot - Plugin architecture
proposal

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052385.html

Mark Ferrell posts improvements for Gluster volume snapshot

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052396.html

Sonal Arora has a script to identify ref leaks

http://lists.gluster.org/pipermail/gluster-devel/2017-March/052468.html

Gluster-infra:

Nigel Babu posts about RPM build failures post-mortem

http://lists.gluster.org/pipermail/gluster-infra/2017-March/003300.html

Nigel Babu posts about Servers in UTC now (mostly)

http://lists.gluster.org/pipermail/gluster-infra/2017-March/003368.html

Gluster Top 5 Contributors in the last 30 days:

Krutika Dhananjay, Michael Scherer, Kaleb S. Keithley, Nigel Babu, Xavier
Hernandez

Upcoming CFPs:

OpenSource Summit Los Angeles -
http://events.linuxfoundation.org/events/open-source-summit-north-america/program/cfp
 - May 6


-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Deletion of old CHANGELOG files in .glusterfs/changelogs

2017-03-31 Thread mabi
Hi,

I am using geo-replication since now over a year on my 3.7.20 GlusterFS volumes 
and noticed that the CHANGELOG. in the .glusterfs/changelogs 
directory of a brick never get deleted. I have for example over 120k files in 
one of these directories and it is growing constantly.

So my question, does GlusterFS have any mechanism to automatically delete old 
and processed CHANGELOG files? If not is it safe to delete them manually?

Regards,
Mabi___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] TLS support

2017-03-31 Thread Joseph Lorenzini
Try OpenSSL s_client connect to a volumes brick port. Note you can controll
the allowed ssl versions by setting a gluster vol option.

Joe

On Fri, Mar 31, 2017 at 8:33 AM Darren Zhang  wrote:

So how can I know the default ssl protocol currently using between server
and client? (gluster3.10.0 on ubuntu16.04)


Yong Zhang


On 2017-03-31 20:56 , Niels de Vos  Wrote:

On Fri, Mar 31, 2017 at 07:01:14AM -0500, Joseph Lorenzini wrote:
> Hi Yong,
>
> Gluster uses the openssl library, which supports SSL 3.0 and TLS versions
> 1.0,1.1,1.2. I actually don't know if its dynamically linked against the
> openssl library nor what version of the openssl lib gluster has been
tested
> with. That is important info to know that is currently undocumented.

It is dynamically linked and the version that is used is the openssl
version that is provided by the distribution where the different
glusterfs packages are built.

Niels
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Node count constraints with EC?

2017-03-31 Thread Gandalf Corvotempesta
How can I ensure that each parity brick is stored on a different server ?

Il 30 mar 2017 6:50 AM, "Ashish Pandey"  ha scritto:

> Hi Terry,
>
> There is not constraint on number of nodes for erasure coded volumes.
> However, there are some suggestions to keep in mind.
>
> If you have 4+2 configuration, that means you can loose maximum 2 bricks
> at a time without loosing your volume for IO.
> These bricks may fail because of node crash or node disconnection. That is
> why it is always good to have all the 6 bricks on 6 different nodes. If you
> have 3 bricks on one node and this nodes goes down then you
> will loose the volume and it will be inaccessible.
> So just keep in mind that you should not loose more than redundancy bricks
> even if any one node goes down.
>
> 
> Ashish
>
>
> --
> *From: *"Terry McGuire" 
> *To: *gluster-users@gluster.org
> *Sent: *Wednesday, March 29, 2017 11:59:32 PM
> *Subject: *[Gluster-users] Node count constraints with EC?
>
> Hello list.  Newbie question:  I’m building a low-performance/low-cost
> storage service with a starting size of about 500TB, and want to use
> Gluster with erasure coding.  I’m considering subvolumes of maybe 4+2, or
> 8+3 or 4.  I was thinking I’d spread these over 4 nodes, and add single
> nodes over time, with subvolumes rearranged over new nodes to maintain
> protection from whole node failures.
>
> However, reading through some RedHat-provided documentation, they seem to
> suggest that node counts should be a multiple of 3, 6 or 12, depending on
> subvolume config.  Is this actually a requirement, or is it only a
> suggestion for best performance or something?
>
> Can anyone comment on node count constraints with erasure coded subvolumes?
>
> Thanks in advance for anyone’s reply,
> Terry
>
> _
> Terry McGuire
> Information Services and Technology (IST)
> University of Alberta
> Edmonton, Alberta, Canada  T6G 2H1
> Phone:  780-492-9422 <(780)%20492-9422>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Node count constraints with EC?

2017-03-31 Thread Ashish Pandey

While creating volume just provide bricks which are hosted on different 
servers. 

gluster v create  redundancy 2 server-1:/brick1 server-2:/brick2 
server-3:/brick3 server-4:/brick4 server-5:/brick5 server-6:/brick6 

At present you can not differentiate between data bricks and parity bricks. 
That is , in above command you can not say which bricks out of brick 1 to 
brick6 would be parity brick. 

- Original Message -

From: "Gandalf Corvotempesta"  
To: "Ashish Pandey"  
Cc: gluster-users@gluster.org 
Sent: Friday, March 31, 2017 12:19:58 PM 
Subject: Re: [Gluster-users] Node count constraints with EC? 

How can I ensure that each parity brick is stored on a different server ? 

Il 30 mar 2017 6:50 AM, "Ashish Pandey" < aspan...@redhat.com > ha scritto: 



Hi Terry, 

There is not constraint on number of nodes for erasure coded volumes. 
However, there are some suggestions to keep in mind. 

If you have 4+2 configuration, that means you can loose maximum 2 bricks at a 
time without loosing your volume for IO. 
These bricks may fail because of node crash or node disconnection. That is why 
it is always good to have all the 6 bricks on 6 different nodes. If you have 3 
bricks on one node and this nodes goes down then you 
will loose the volume and it will be inaccessible. 
So just keep in mind that you should not loose more than redundancy bricks even 
if any one node goes down. 

 
Ashish 



From: "Terry McGuire" < tmcgu...@ualberta.ca > 
To: gluster-users@gluster.org 
Sent: Wednesday, March 29, 2017 11:59:32 PM 
Subject: [Gluster-users] Node count constraints with EC? 

Hello list. Newbie question: I’m building a low-performance/low-cost storage 
service with a starting size of about 500TB, and want to use Gluster with 
erasure coding. I’m considering subvolumes of maybe 4+2, or 8+3 or 4. I was 
thinking I’d spread these over 4 nodes, and add single nodes over time, with 
subvolumes rearranged over new nodes to maintain protection from whole node 
failures. 

However, reading through some RedHat-provided documentation, they seem to 
suggest that node counts should be a multiple of 3, 6 or 12, depending on 
subvolume config. Is this actually a requirement, or is it only a suggestion 
for best performance or something? 

Can anyone comment on node count constraints with erasure coded subvolumes? 

Thanks in advance for anyone’s reply, 
Terry 

_ 
Terry McGuire 
Information Services and Technology (IST) 
University of Alberta 
Edmonton, Alberta, Canada T6G 2H1 
Phone: 780-492-9422 


___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 


___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 




___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-users 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hanging httpd processes.

2017-03-31 Thread Amar Tumballi
Hi Alvin,

Thanks for the dump output. It helped a bit.

For now, recommend turning off open-behind and read-ahead performance
translators for you to get rid of this situation, As I noticed hung FLUSH
operations from these translators.

-Amar

On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr  wrote:

> We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
> the clients 3.7.11-2 on Centos 6.8
>
> We are seeing httpd processes hang in fuse_request_send or sync_page.
>
> These calls are from PHP 5.3.3-48 scripts
>
> I am attaching  a tgz file that contains the process dump from glusterfsd
> and the hung pids along with the offending pid's stacks from
> /proc/{pid}/stack.
>
> This has been a low level annoyance for a while but it has become a much
> bigger issue because the number of hung processes went from a few a week to
> a few hundred a day.
>
>
> --
> Alvin Starr   ||   voice: (905)513-7688
> Netvel Inc.   ||   Cell:  (416)806-0133
> al...@netvel.net  ||
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] adding arbiter

2017-03-31 Thread Gambit15
As I understand it, only new files will be sharded, but simply renaming or
moving them may be enough in that case.

I'm interested in the arbiter/sharding bug you've mentioned. Could you
provide any more details or a link?

Cheers,
 D

On 30 March 2017 at 20:25, Laura Bailey  wrote:

> I can't answer all of these, but I think the only way to share existing
> files is to create a new volume with sharding enabled and copy the files
> over into it.
>
> Cheers,
> Laura B
>
>
> On Friday, March 31, 2017, Alessandro Briosi  wrote:
>
>> Hi I need some advice.
>>
>> I'm currently on 3.8.10 and would like to know the following:
>>
>> 1. If I add an arbiter to an existing volume should I also run a
>> rebalance?
>> 2. If I had sharding enabled would adding the arbiter trigger the
>> corruption bug?
>> 3. What's the procedure to enable sharding on an existing volume so that
>> it shards already existing files?
>> 4. Suppose I have sharding disabled, then add an arbiter brick, then
>> enable sharding and execute the procedure for point 3, would this still
>> trigger the corruption bug?
>>
>> Thanks,
>> Alessandro
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> --
> Laura Bailey
> Senior Technical Writer
> Customer Content Services BNE
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] hanging httpd processes.

2017-03-31 Thread Amar Tumballi
On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi  wrote:

> Hi Alvin,
>
> Thanks for the dump output. It helped a bit.
>
> For now, recommend turning off open-behind and read-ahead performance
> translators for you to get rid of this situation, As I noticed hung FLUSH
> operations from these translators.
>

Looks like I gave wrong advise by looking at below snippet:

[global.callpool.stack.61]
> stack=0x7f6c6f628f04
> uid=48
> gid=48
> pid=11077
> unique=10048797
> lk-owner=a73ae5bdb5fcd0d2
> op=FLUSH
> type=1
> cnt=5
>
> [global.callpool.stack.61.frame.1]
> frame=0x7f6c6f793d88
> ref_count=0
> translator=edocs-production-write-behind
> complete=0
> parent=edocs-production-read-ahead
> wind_from=ra_flush
> wind_to=FIRST_CHILD (this)->fops->flush
> unwind_to=ra_flush_cbk
>
> [global.callpool.stack.61.frame.2]
> frame=0x7f6c6f796c90
> ref_count=1
> translator=edocs-production-read-ahead
> complete=0
> parent=edocs-production-open-behind
> wind_from=default_flush_resume
> wind_to=FIRST_CHILD(this)->fops->flush
> unwind_to=default_flush_cbk
>
> [global.callpool.stack.61.frame.3]
> frame=0x7f6c6f79b724
> ref_count=1
> translator=edocs-production-open-behind
> complete=0
> parent=edocs-production
> wind_from=io_stats_flush
> wind_to=FIRST_CHILD(this)->fops->flush
> unwind_to=io_stats_flush_cbk
>
> [global.callpool.stack.61.frame.4]
> frame=0x7f6c6f79b474
> ref_count=1
> translator=edocs-production
> complete=0
> parent=fuse
> wind_from=fuse_flush_resume
> wind_to=FIRST_CHILD(this)->fops->flush
> unwind_to=fuse_err_cbk
>
> [global.callpool.stack.61.frame.5]
> frame=0x7f6c6f796684
> ref_count=1
> translator=fuse
> complete=0
>

Mos probably, issue is with write-behind's flush. So please turn off
write-behind and test. If you don't have any hung httpd processes, please
let us know.

-Amar


> -Amar
>
> On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr  wrote:
>
>> We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
>> the clients 3.7.11-2 on Centos 6.8
>>
>> We are seeing httpd processes hang in fuse_request_send or sync_page.
>>
>> These calls are from PHP 5.3.3-48 scripts
>>
>> I am attaching  a tgz file that contains the process dump from glusterfsd
>> and the hung pids along with the offending pid's stacks from
>> /proc/{pid}/stack.
>>
>> This has been a low level annoyance for a while but it has become a much
>> bigger issue because the number of hung processes went from a few a week to
>> a few hundred a day.
>>
>>
>> --
>> Alvin Starr   ||   voice: (905)513-7688
>> Netvel Inc.   ||   Cell:  (416)806-0133
>> al...@netvel.net  ||
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Amar Tumballi (amarts)
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users