[Gluster-users] Gluster Monthly Newsletter, June 2017

2017-06-30 Thread Amye Scavarda
Gluster Monthly Newsletter, June 2017
Important happenings for Gluster for June:

---
Gluster Summit 2017!
Gluster Summit 2017 will be held in Prague, Czech Republic on October
27 and 28th.
More details at:
https://www.gluster.org/events/summit2017

---
Our weekly community meeting has changed: we'll be meeting every other
week instead of weekly, moving the time to 15:00 UTC, and our agenda
is at: https://bit.ly/gluster-community-meetings
---
3.11 Retrospective
Thanks all who gave feedback! From the notes we got, you’d like to see
us do better release testing, you want to continue to see a focus on
performance for small files, gluster swift, monitoring. You like how
we’re putting our release plans publicly on Github, and we’re planning
on continuing that! You’d like to see the ways that a feature gets
proposed be streamlined, and we’ll look to that in future releases.
We’ll be doing retrospectives for every major release, watch for the
next one for 3.12.

---
Noteworthy threads from the mailing lists:

Vijay Bellur asks - Who's using OpenStack Cinder & Gluster?
http://lists.gluster.org/pipermail/gluster-users/2017-June/031370.html
Nithya B asks for Gluster Documentation Feedback
http://lists.gluster.org/pipermail/gluster-users/2017-June/031498.html
Raghavendra Talur introduces minister
http://lists.gluster.org/pipermail/gluster-users/2017-June/031589.html
Shyam announces Release 3.12: Scope and calendar!
http://lists.gluster.org/pipermail/gluster-devel/2017-June/052953.html
Krutika Dhananjay outlines performance experiments with io-stats translator
http://lists.gluster.org/pipermail/gluster-devel/2017-June/052993.html
Amar requests [Need Feedback] Monitoring
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053048.html
Nigel proposes Regression Voting Changes
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053080.html
Amar explains a new 'experimental' branch created for validating your ideas
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053092.html
Raghavendra Talur starts a conversation on brick multiplexing and
memory consumption
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053101.html
Pranith Kumar Karampuri suggests changes to backport information while
porting patches
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053140.html
Kotresh Hiremath Ravishankar comments on adding xxhash to gluster code base
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053173.html
Amar posts updates on Maintainers 2.0
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053192.html
Jeff Darcy notes a need for a new coding standard
http://lists.gluster.org/pipermail/gluster-devel/2017-June/053204.html

Gluster Top 5 Contributors in the last 30 days:
Kaleb Keithley, Michael Scherer, Nigel Babu, Prashanth Pai, Nithya Balachandran

Upcoming CFPs:
Open Source Summit Europe –
http://events.linuxfoundation.org/events/open-source-summit-europe/program/cfp
July 8
Gluster Summit 2017  -https://goo.gl/forms/IUacgG5JjpuMTWe52 - July 31

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Alastair Neil
I can ask our other engineer but I don't have those figues.

-Alastair


On 30 June 2017 at 13:52, Serkan Çoban  wrote:

> Did you test healing by increasing disperse.shd-max-threads?
> What is your heal times per brick now?
>
> On Fri, Jun 30, 2017 at 8:01 PM, Alastair Neil 
> wrote:
> > We are using 3.10 and have a 7 PB cluster.  We decided against 16+3 as
> the
> > rebuild time are bottlenecked by matrix operations which scale as the
> square
> > of the number of data stripes.  There are some savings because of larger
> > data chunks but we ended up using 8+3 and heal times are about half
> compared
> > to 16+3.
> >
> > -Alastair
> >
> > On 30 June 2017 at 02:22, Serkan Çoban  wrote:
> >>
> >> >Thanks for the reply. We will mainly use this for archival - near-cold
> >> > storage.
> >> Archival usage is good for EC
> >>
> >> >Anything, from your experience, to keep in mind while planning large
> >> > installations?
> >> I am using 3.7.11 and only problem is slow rebuild time when a disk
> >> fails. It takes 8 days to heal a 8TB disk.(This might be related with
> >> my EC configuration 16+4)
> >> 3.9+ versions has some improvements about this but I cannot test them
> >> yet...
> >>
> >> On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak  wrote:
> >> > Thanks for the reply. We will mainly use this for archival - near-cold
> >> > storage.
> >> >
> >> >
> >> > Anything, from your experience, to keep in mind while planning large
> >> > installations?
> >> >
> >> >
> >> > Sent from my Verizon, Samsung Galaxy smartphone
> >> >
> >> >  Original message 
> >> > From: Serkan Çoban 
> >> > Date: 6/29/17 4:39 AM (GMT-05:00)
> >> > To: Jason Kiebzak 
> >> > Cc: Gluster Users 
> >> > Subject: Re: [Gluster-users] Multi petabyte gluster
> >> >
> >> > I am currently using 10PB single volume without problems. 40PB is on
> >> > the way. EC is working fine.
> >> > You need to plan ahead with large installations like this. Do complete
> >> > workload tests and make sure your use case is suitable for EC.
> >> >
> >> >
> >> > On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak 
> >> > wrote:
> >> >> Has anyone scaled to a multi petabyte gluster setup? How well does
> >> >> erasure
> >> >> code do with such a large setup?
> >> >>
> >> >> Thanks
> >> >>
> >> >> ___
> >> >> Gluster-users mailing list
> >> >> Gluster-users@gluster.org
> >> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Registration and CfP open for Gluster Summit 2017

2017-06-30 Thread Amye Scavarda
I'm delighted to announce that registration and the call for proposals
for Gluster Summit 2017 in Prague, CZ is open.

We're changing it up a bit this year, anyone can register, and if
you'd like to apply for travel funding, please indicate this on the
registration form. Don't worry, you'll get a copy of your responses.
I'll be following up with the folks who indicate they're applying for
travel funding separately. Visa letters will also be a separate
application.

Registration:
https://goo.gl/forms/pry9xjnXz434urOo2

Call for Proposals:
Your submission should be a proposal for a 25 minute talk, a 5 minute
lightning talk, or a collaborative birds of a feather conversation.
Submitters should consider audience takeaways as well as how this
proposal will move the Gluster project forward.
We'll be accepting proposals until July 31st.
Accepted proposals will be notified the week of July 31st.
https://goo.gl/forms/1oM0nJgp1h0Zs3Z92

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to deal with FAILURES count in geo rep

2017-06-30 Thread mabi
Hello,
I have a replica 2 with a remote slave node for geo-replication (GlusterFS 
3.8.11 on Debian 8) and saw for the first time a non zero number in the 
FAILURES column when running:
gluster volume geo-replcation myvolume remotehost:remotevol status detail
Right now the number under the FAILURES column is 32 and have a few questions 
regarding how to deal with that:
- first what does 32 mean? is it the number of files which failed to be geo 
replicated onto to slave node?
- how can I find out which files failed to replicate?
- how can I make gluster geo-rep re-try to replicate these files?
Best regards,
Mabi___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Serkan Çoban
Did you test healing by increasing disperse.shd-max-threads?
What is your heal times per brick now?

On Fri, Jun 30, 2017 at 8:01 PM, Alastair Neil  wrote:
> We are using 3.10 and have a 7 PB cluster.  We decided against 16+3 as the
> rebuild time are bottlenecked by matrix operations which scale as the square
> of the number of data stripes.  There are some savings because of larger
> data chunks but we ended up using 8+3 and heal times are about half compared
> to 16+3.
>
> -Alastair
>
> On 30 June 2017 at 02:22, Serkan Çoban  wrote:
>>
>> >Thanks for the reply. We will mainly use this for archival - near-cold
>> > storage.
>> Archival usage is good for EC
>>
>> >Anything, from your experience, to keep in mind while planning large
>> > installations?
>> I am using 3.7.11 and only problem is slow rebuild time when a disk
>> fails. It takes 8 days to heal a 8TB disk.(This might be related with
>> my EC configuration 16+4)
>> 3.9+ versions has some improvements about this but I cannot test them
>> yet...
>>
>> On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak  wrote:
>> > Thanks for the reply. We will mainly use this for archival - near-cold
>> > storage.
>> >
>> >
>> > Anything, from your experience, to keep in mind while planning large
>> > installations?
>> >
>> >
>> > Sent from my Verizon, Samsung Galaxy smartphone
>> >
>> >  Original message 
>> > From: Serkan Çoban 
>> > Date: 6/29/17 4:39 AM (GMT-05:00)
>> > To: Jason Kiebzak 
>> > Cc: Gluster Users 
>> > Subject: Re: [Gluster-users] Multi petabyte gluster
>> >
>> > I am currently using 10PB single volume without problems. 40PB is on
>> > the way. EC is working fine.
>> > You need to plan ahead with large installations like this. Do complete
>> > workload tests and make sure your use case is suitable for EC.
>> >
>> >
>> > On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak 
>> > wrote:
>> >> Has anyone scaled to a multi petabyte gluster setup? How well does
>> >> erasure
>> >> code do with such a large setup?
>> >>
>> >> Thanks
>> >>
>> >> ___
>> >> Gluster-users mailing list
>> >> Gluster-users@gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Gandalf Corvotempesta
Il 30 giu 2017 3:51 PM,  ha scritto:

Note: I also noticed that you said “order”. Do you mean when we create via
volume set we have to make an order for bricks? I thought gluster handles
(and  do the math) itself.

Yes, you have to specify the exact order
Gluster is not flexible in this way and doesn't help you at all.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Alastair Neil
We are using 3.10 and have a 7 PB cluster.  We decided against 16+3 as the
rebuild time are bottlenecked by matrix operations which scale as the
square of the number of data stripes.  There are some savings because of
larger data chunks but we ended up using 8+3 and heal times are about half
compared to 16+3.

-Alastair

On 30 June 2017 at 02:22, Serkan Çoban  wrote:

> >Thanks for the reply. We will mainly use this for archival - near-cold
> storage.
> Archival usage is good for EC
>
> >Anything, from your experience, to keep in mind while planning large
> installations?
> I am using 3.7.11 and only problem is slow rebuild time when a disk
> fails. It takes 8 days to heal a 8TB disk.(This might be related with
> my EC configuration 16+4)
> 3.9+ versions has some improvements about this but I cannot test them
> yet...
>
> On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak  wrote:
> > Thanks for the reply. We will mainly use this for archival - near-cold
> > storage.
> >
> >
> > Anything, from your experience, to keep in mind while planning large
> > installations?
> >
> >
> > Sent from my Verizon, Samsung Galaxy smartphone
> >
> >  Original message 
> > From: Serkan Çoban 
> > Date: 6/29/17 4:39 AM (GMT-05:00)
> > To: Jason Kiebzak 
> > Cc: Gluster Users 
> > Subject: Re: [Gluster-users] Multi petabyte gluster
> >
> > I am currently using 10PB single volume without problems. 40PB is on
> > the way. EC is working fine.
> > You need to plan ahead with large installations like this. Do complete
> > workload tests and make sure your use case is suitable for EC.
> >
> >
> > On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak 
> wrote:
> >> Has anyone scaled to a multi petabyte gluster setup? How well does
> erasure
> >> code do with such a large setup?
> >>
> >> Thanks
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow write times to gluster disk

2017-06-30 Thread Pat Haley


Hi,

I was wondering if there were any additional test we could perform to 
help debug the group write-permissions issue?


Thanks

Pat


On 06/27/2017 12:29 PM, Pat Haley wrote:


Hi Soumya,

One example, we have a common working directory dri_fleat in the 
gluster volume


drwxrwsr-x 22 root dri_fleat 4.0K May  1 15:14 dri_fleat

my user (phaley) does not own that directory but is a member of the 
group  dri_fleat and should have write permissions.  When I go to the 
nfs-mounted version and try to use the touch command I get the following


ibfdr-compute-0-4(dri_fleat)% touch dum
touch: cannot touch `dum': Permission denied

One of the sub-directories under dri_fleat is "test" which phaley owns

drwxrwsr-x  2 phaley   dri_fleat 4.0K May  1 15:16 test

Under this directory (mounted via nfs) user phaley can write

ibfdr-compute-0-4(test)% touch dum
ibfdr-compute-0-4(test)%

I have put the packet captures in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestNFSmount/

capture_nfsfail.pcap   has the results from the failed touch experiment
capture_nfssucceed.pcap  has the results from the successful touch 
experiment


The command I used for these was

tcpdump -i ib0 -nnSs 0 host 172.16.1.119 -w /root/capture_nfstest.pcap

The brick log files are also in the above link.  If I read them 
correctly they both funny times.  Specifically I see entries from 
around 2017-06-27 14:02:37.404865  even though the system time was 
2017-06-27 12:00:00.


One final item, another reply to my post had a link for possible 
problems that could arise from users belonging to too many group. We 
have seen the above problem even with a user belonging to only 4 groups.


Let me know what additional information I can provide.

Thanks

Pat


On 06/27/2017 02:45 AM, Soumya Koduri wrote:



On 06/27/2017 10:17 AM, Pranith Kumar Karampuri wrote:

The only problem with using gluster mounted via NFS is that it does not
respect the group write permissions which we need.

We have an exercise coming up in the a couple of weeks.  It seems to me
that in order to improve our write times before then, it would be good
to solve the group write permissions for gluster mounted via NFS now.
We can then revisit gluster mounted via FUSE afterwards.

What information would you need to help us force gluster mounted via 
NFS

to respect the group write permissions?


Is this owning group or one of the auxiliary groups whose write 
permissions are not considered? AFAIK, there are no special 
permission checks done by gNFS server when compared to gluster native 
client.


Could you please provide simple steps to reproduce the issue and 
collect pkt trace and nfs/brick logs as well.


Thanks,
Soumya




--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley  Email:  pha...@mit.edu
Center for Ocean Engineering   Phone:  (617) 253-6824
Dept. of Mechanical EngineeringFax:(617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Arbiter node as VM

2017-06-30 Thread mabi
Thanks for the hints.
Now I added the arbiter 1 to my replica 2 using the volume add-brick command 
and it is now in the healing process in order to copy all the metadata files on 
my arbiter node.
On one of my replica nodes in the brick log file for that particular volume I 
notice a lot of the following warning message during ongoing healing:
[2017-06-30 14:04:42.050120] W [MSGID: 101088] 
[common-utils.c:3894:gf_backtrace_save] 0-myvolume-index: Failed to save the 
backtrace.
Does anyone have a idea what this is about? The only hint here is the word 
"index" which for me means it has something to do with indexing. But is this 
warning normal? anything I can do about it?
Regards,
M.

>  Original Message 
> Subject: Re: [Gluster-users] Arbiter node as VM
> Local Time: June 29, 2017 11:55 PM
> UTC Time: June 29, 2017 9:55 PM
> From: dougti+glus...@gmail.com
> To: mabi 
> Gluster Users 
>
> As long as the VM isn't hosted on one of the two Gluster nodes, that's 
> perfectly fine. One of my smaller clusters uses the same setup.
> As for your other questions, as long as it supports Unix file permissions, 
> Gluster doesn't care what filesystem you use. Mix & match as you wish. Just 
> try to keep matching Gluster versions across your nodes.
>
> On 29 June 2017 at 16:10, mabi  wrote:
>
>> Hello,
>>
>> I have a replica 2 GlusterFS 3.8.11 cluster on 2 Debian 8 physical servers 
>> using ZFS as filesystem. Now in order to avoid a split-brain situation I 
>> would like to add a third node as arbiter.
>> Regarding the arbiter node I have a few questions:
>> - can the arbiter node be a virtual machine? (I am planning to use Xen as 
>> hypervisor)
>> - can I use ext4 as file system on my arbiter? or does it need to be ZFS as 
>> the two other nodes?
>> - or should I use here XFS with LVM this provisioning as mentioned in the
>> - is it OK that my arbiter runs Debian 9 (Linux kernel v4) and my other two 
>> nodes run Debian 8 (kernel v3)?
>> - what about thin provisioning of my volume on the arbiter node 
>> (https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/)
>>  is this required? on my two other nodes I do not use any thin provisioning 
>> neither LVM but simply ZFS.
>> Thanks in advance for your input.
>> Best regards,
>> Mabi
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread gencer
I already tried 512MB but re-try again now and results are the same. Both 
without tuning;

 

Stripe 2 replica 2: dd performs 250~ mb/s but shard gives 77mb.

 

I attached two logs (shard and stripe logs)

 

Note: I also noticed that you said “order”. Do you mean when we create via 
volume set we have to make an order for bricks? I thought gluster handles (and  
do the math) itself.

 

Gencer

 

From: Krutika Dhananjay [mailto:kdhan...@redhat.com] 
Sent: Friday, June 30, 2017 3:50 PM
To: gen...@gencgiyen.com
Cc: gluster-user 
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Just noticed that the way you have configured your brick order during 
volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to 512MB? 
Could you try that?

If it doesn't help, could you share the volume-profile output for both the 
tests (separate)?

Here's what you do:

1. Start profile before starting your test - it could be dd or it could be file 
download.

# gluster volume profile  start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile  info` and 
redirect its output to a tmp file.

4. Stop profile

# gluster volume profile  stop

And attach the volume-profile output file that you saved at a temporary 
location in step 3.

-Krutika

 

On Fri, Jun 30, 2017 at 5:33 PM,  > wrote:

Hi Krutika,

 

Sure, here is volume info:

 

root@sr-09-loc-50-14-18:/# gluster volume info testvol

 

Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on

 

-Gencer.

 

From: Krutika Dhananjay [mailto:kdhan...@redhat.com 
 ] 
Sent: Friday, June 30, 2017 2:50 PM
To: gen...@gencgiyen.com  
Cc: gluster-user  >
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Could you please provide the volume-info output?

-Krutika

 

On Fri, Jun 30, 2017 at 4:23 PM,  > wrote:

Hi,

 

I have an 2 nodes with 20 bricks in total (10+10).

 

First test: 

 

2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes

 

“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s

 

Now same test without Stripe but with sharding. This results are same when I 
set shard size 4MB or 32MB. (Again 2x Replica here)

 

Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s

 

Now, If we do this test twice at the same time (two dd or two doewnload at the 
same time) it goes below 25/mb each or slower.

 

I thought sharding is at least equal or a little slower (maybe?) but these 
results are terribly slow.

 

I tried tuning (cache, window-size etc..). Nothing helps.

 

GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB 
each.

 

Is there any tweak/tuning out there to make it fast?

 

Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot 
use this on production as it is terribly slow. 

 

The reason behind I use shard instead of stripe is i would like to eleminate 
files that bigger than brick size.

 

Thanks,

Gencer.


___
Gluster-users mailing list
Gluster-users@gluster.org  
http://lists.gluster.org/mailman/listinfo/gluster-users

 

 



shard.log
Description: Binary data


stripe.log
Description: Binary data
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Krutika Dhananjay
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the
tests (separate)?

Here's what you do:
1. Start profile before starting your test - it could be dd or it could be
file download.
# gluster volume profile  start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile  info` and
redirect its output to a tmp file.

4. Stop profile
# gluster volume profile  stop

And attach the volume-profile output file that you saved at a temporary
location in step 3.

-Krutika


On Fri, Jun 30, 2017 at 5:33 PM,  wrote:

> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root@sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user 
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM,  wrote:
>
> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] setting gfid on .trashcan/... failed - total outage

2017-06-30 Thread Anoop C S
On Thu, 2017-06-29 at 17:13 +0200, Dietmar Putz wrote:
> Hello Anoop,
> 
> thank you for your reply
> 
> answers inside...
> 
> best regards
> 
> Dietmar
> 
> 
> On 29.06.2017 10:48, Anoop C S wrote:
> > On Wed, 2017-06-28 at 14:42 +0200, Dietmar Putz wrote:
> > > Hello,
> > > 
> > > recently we had two times a partial gluster outage followed by a total
> > > outage of all four nodes. Looking into the gluster mailing list i found
> > > a very similar case in
> > > http://lists.gluster.org/pipermail/gluster-users/2016-June/027124.html
> > 
> > If you are talking about a crash happening on bricks, were you able to find 
> > any backtraces from
> > any
> > of the brick logs?
> 
> yes, the crash happened on the bricks.
> i followed the hints in the mentioned similar case but unfortunately i 
> did not found any backtrace from any of the brick logs.

Usually a backtrace will be written to logs just before brick dies in case of 
SIG SEGV.

> 
> > 
> > > but i'm not sure if this issue is fixed...
> > > 
> > > even this outage happened on glusterfs 3.7.18 which gets no more updates
> > > since ~.20 i would kindly ask if this issue is known to be fixed in 3.8
> > > resp. 3.10... ?
> > > unfortunately i did not found corresponding informations in the release
> > > notes...
> > > 
> > > best regards
> > > Dietmar
> > > 
> > > 
> > > the partial outage started as shown below, the very first entries
> > > occurred in the brick-logs :
> > > 
> > > gl-master-04, brick1-mvol1.log :
> > > 
> > > [2017-06-23 16:35:11.373471] E [MSGID: 113020]
> > > [posix.c:2839:posix_create] 0-mvol1-posix: setting gfid on
> > > /brick1/mvol1/.trashcan//2290/uploads/170221_Sendung_Lieberum_01_AT.mp4_2017-06-23_163511
> > > failed
> > > [2017-06-23 16:35:11.392540] E [posix.c:3188:_fill_writev_xdata]
> > > (-->/usr/lib/x86_64-linux-
> > > gnu/glusterfs/3.7.18/xlator/features/trash.so(trash_truncate_readv_cbk+0x1ab)
> > > [0x7f4f8c2aaa0b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/
> > > storage/posix.so(posix_writev+0x1ff) [0x7f4f8caec62f]
> > > -->/usr/lib/x86_64-linux-
> > > gnu/glusterfs/3.7.18/xlator/storage/posix.so(_fill_writev_xdata+0x1c6)
> > > [0x7f4f8caec406] ) 0-mvol1-posix: fd: 0x7f4ef434225c inode:
> > > 0x7f4ef430bd6cgfid:-0
> > > 000--- [Invalid argument]
> > > ...
> > > 
> > > 
> > > gl-master-04 : etc-glusterfs-glusterd.vol.log
> > > 
> > > [2017-06-23 16:35:18.872346] W [rpcsvc.c:270:rpcsvc_program_actor]
> > > 0-rpc-service: RPC program not available (req 1298437 330) for
> > > 10.0.1.203:65533
> > > [2017-06-23 16:35:18.872421] E
> > > [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> > > to complete successfully
> > > 
> > > gl-master-04 : glustershd.log
> > > 
> > > [2017-06-23 16:35:42.536840] E [MSGID: 108006]
> > > [afr-common.c:4323:afr_notify] 0-mvol1-replicate-1: All subvolumes are
> > > down. Going offline until atleast one of them comes back up.
> > > [2017-06-23 16:35:51.702413] E [socket.c:2292:socket_connect_finish]
> > > 0-mvol1-client-3: connection to 10.0.1.156:49152 failed (Connection 
> > > refused)
> > > 
> > > 
> > > 
> > > gl-master-03, brick1-movl1.log :
> > > 
> > > [2017-06-23 16:35:11.399769] E [MSGID: 113020]
> > > [posix.c:2839:posix_create] 0-mvol1-posix: setting gfid on
> > > /brick1/mvol1/.trashcan//2290/uploads/170221_Sendung_Lieberum_01_AT.mp4_2017-06-23_163511
> > > failed
> > > [2017-06-23 16:35:11.418559] E [posix.c:3188:_fill_writev_xdata]
> > > (-->/usr/lib/x86_64-linux-
> > > gnu/glusterfs/3.7.18/xlator/features/trash.so(trash_truncate_readv_cbk+0x1ab)
> > > [0x7ff517087a0b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/
> > > storage/posix.so(posix_writev+0x1ff) [0x7ff5178c962f]
> > > -->/usr/lib/x86_64-linux-
> > > gnu/glusterfs/3.7.18/xlator/storage/posix.so(_fill_writev_xdata+0x1c6)
> > > [0x7ff5178c9406] ) 0-mvol1-posix: fd: 0x7ff4c814a43c inode:
> > > 0x7ff4c82e1b5cgfid:-0
> > > 000--- [Invalid argument]
> > > ...
> > > 
> > > 
> > > gl-master-03 : etc-glusterfs-glusterd.vol.log
> > > 
> > > [2017-06-23 16:35:19.879140] W [rpcsvc.c:270:rpcsvc_program_actor]
> > > 0-rpc-service: RPC program not available (req 1298437 330) for
> > > 10.0.1.203:65530
> > > [2017-06-23 16:35:19.879201] E
> > > [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> > > to complete successfully
> > > [2017-06-23 16:35:19.879300] W [rpcsvc.c:270:rpcsvc_program_actor]
> > > 0-rpc-service: RPC program not available (req 1298437 330) for
> > > 10.0.1.203:65530
> > > [2017-06-23 16:35:19.879314] E
> > > [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> > > to complete successfully
> > > [2017-06-23 16:35:19.879845] W [rpcsvc.c:270:rpcsvc_program_actor]
> > > 0-rpc-service: RPC program not available (req 1298437 330) for
> > > 10.0.1.203:65530
> > > [2017-06-23 16:35:19.879859] E
> > > [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> > > to 

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread gencer
Hi Krutika,

 

Sure, here is volume info:

 

root@sr-09-loc-50-14-18:/# gluster volume info testvol

 

Volume Name: testvol

Type: Distributed-Replicate

Volume ID: 30426017-59d5-4091-b6bc-279a905b704a

Status: Started

Snapshot Count: 0

Number of Bricks: 10 x 2 = 20

Transport-type: tcp

Bricks:

Brick1: sr-09-loc-50-14-18:/bricks/brick1

Brick2: sr-09-loc-50-14-18:/bricks/brick2

Brick3: sr-09-loc-50-14-18:/bricks/brick3

Brick4: sr-09-loc-50-14-18:/bricks/brick4

Brick5: sr-09-loc-50-14-18:/bricks/brick5

Brick6: sr-09-loc-50-14-18:/bricks/brick6

Brick7: sr-09-loc-50-14-18:/bricks/brick7

Brick8: sr-09-loc-50-14-18:/bricks/brick8

Brick9: sr-09-loc-50-14-18:/bricks/brick9

Brick10: sr-09-loc-50-14-18:/bricks/brick10

Brick11: sr-10-loc-50-14-18:/bricks/brick1

Brick12: sr-10-loc-50-14-18:/bricks/brick2

Brick13: sr-10-loc-50-14-18:/bricks/brick3

Brick14: sr-10-loc-50-14-18:/bricks/brick4

Brick15: sr-10-loc-50-14-18:/bricks/brick5

Brick16: sr-10-loc-50-14-18:/bricks/brick6

Brick17: sr-10-loc-50-14-18:/bricks/brick7

Brick18: sr-10-loc-50-14-18:/bricks/brick8

Brick19: sr-10-loc-50-14-18:/bricks/brick9

Brick20: sr-10-loc-50-14-18:/bricks/brick10

Options Reconfigured:

features.shard-block-size: 32MB

features.shard: on

transport.address-family: inet

nfs.disable: on

 

-Gencer.

 

From: Krutika Dhananjay [mailto:kdhan...@redhat.com] 
Sent: Friday, June 30, 2017 2:50 PM
To: gen...@gencgiyen.com
Cc: gluster-user 
Subject: Re: [Gluster-users] Very slow performance on Sharded GlusterFS

 

Could you please provide the volume-info output?

-Krutika

 

On Fri, Jun 30, 2017 at 4:23 PM,  > wrote:

Hi,

 

I have an 2 nodes with 20 bricks in total (10+10).

 

First test: 

 

2 Nodes with Distributed – Striped – Replicated (2 x 2)

10GbE Speed between nodes

 

“dd” performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster: 250-300mb/s

 

Now same test without Stripe but with sharding. This results are same when I 
set shard size 4MB or 32MB. (Again 2x Replica here)

 

Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s

 

Now, If we do this test twice at the same time (two dd or two doewnload at the 
same time) it goes below 25/mb each or slower.

 

I thought sharding is at least equal or a little slower (maybe?) but these 
results are terribly slow.

 

I tried tuning (cache, window-size etc..). Nothing helps.

 

GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and 4TB 
each.

 

Is there any tweak/tuning out there to make it fast?

 

Or is this an expected behavior? If its, It is unacceptable. So slow. I cannot 
use this on production as it is terribly slow. 

 

The reason behind I use shard instead of stripe is i would like to eleminate 
files that bigger than brick size.

 

Thanks,

Gencer.


___
Gluster-users mailing list
Gluster-users@gluster.org  
http://lists.gluster.org/mailman/listinfo/gluster-users

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Krutika Dhananjay
Could you please provide the volume-info output?

-Krutika

On Fri, Jun 30, 2017 at 4:23 PM,  wrote:

> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Some bricks are offline after restart, how to bring them online gracefully?

2017-06-30 Thread Atin Mukherjee
On Fri, Jun 30, 2017 at 1:31 AM, Jan  wrote:

> Hi all,
>
> Gluster and Ganesha are amazing. Thank you for this great work!
>
> I’m struggling with one issue and I think that you might be able to help
> me.
>
> I spent some time by playing with Gluster and Ganesha and after I gain
> some experience I decided that I should go into production but I’m still
> struggling with one issue.
>
> I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from
> centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
>
> Servers have a lot of resources and they run in a subnet on a stable
> network.
>
> I didn’t have any issues when I tested a single brick. But now I’d like to
> setup 17 replicated bricks and I realized that when I restart one of nodes
> then the result looks like this:
>
> sudo gluster volume status | grep ' N '
>
> Brick glunode0:/st/brick3/dir  N/A   N/AN   N/A
> Brick glunode1:/st/brick2/dir  N/A   N/AN   N/A
>
> Some bricks just don’t go online. Sometime it’s one brick, sometime tree
> and it’s not same brick – it’s random issue.
>
> I checked log on affected servers and this is an example:
>
> sudo tail /var/log/glusterfs/bricks/st-brick3-0.log
>
> [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:
> readv on 10.2.44.23:24007 failed (No data available)
> [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data
> available)
> [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: Exhausted all volfile servers
> [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5]
> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]
> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] )
> 0-:received signum (15), shutting down
> [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] 0-glusterfs:
> connection attempt on 10.2.44.23:24007 failed, (Network is unreachable)
>

This happens when connect () syscall fails with  ENETUNREACH errno as per
the followint code

if (ign_enoent)
{
ret = connect_loop
(priv->sock,
SA
(>peerinfo.sockaddr),

this->peerinfo.sockaddr_len);
} else
{
ret = connect
(priv->sock,
   SA
(>peerinfo.sockaddr),

this->peerinfo.sockaddr_len);

}


if (ret == -1 && errno == ENOENT && ign_enoent)
{
gf_log (this->name,
GF_LOG_WARNING,
   "Ignore failed connection attempt on %s,
(%s) ",
this->peerinfo.identifier, strerror
(errno));


/* connect failed with some other error than
EINPROGRESS
so, getsockopt (... SO_ERROR ...), will not catch
any
errors and return them to us, we need to remember
this
state, and take actions in
socket_event_handler
appropriately
*/
/* TBD: What about ENOENT, we will do getsockopt
there
as well, so how is that exempt from such a problem?
*/
priv->connect_failed =
1;
this->connect_failed =
_gf_true;


goto
handler;

}


if (ret == -1 && ((errno != EINPROGRESS) && (errno !=
ENOENT))) {
/* For unix path based sockets, the socket path
is
 * cryptic (md5sum of path) and may not be useful
for
 * the user in debugging so log it in
DEBUG

*/
gf_log (this->name, ((sa_family == AF_UNIX) ?
<= this is the log which gets generated
GF_LOG_DEBUG :
GF_LOG_ERROR),
"connection attempt on %s failed,
(%s)",
this->peerinfo.identifier, strerror
(errno));

IMO, this can only happen if there is an intermittent n/w failure?

@Raghavendra G/ Mohit - do you have any other opinion?

[2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request]
0-glusterfs: not connected (priv->connected = 0)

> [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit]
> 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster
> Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
>
> I think that important message is “Network is unreachable”.
>
> Question
> 1. Could you please tell me, is that normal when you have many bricks?
> Networks is definitely stable and other servers use it without problem and
> all servers run on a same pair of switches. My assumption is that in the
> same time many bricks try to connect and that doesn’t work.
>
> 2. Is 

[Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread gencer
Hi,

 

I have an 2 nodes with 20 bricks in total (10+10).

 

First test: 

 

2 Nodes with Distributed - Striped - Replicated (2 x 2)

10GbE Speed between nodes

 

"dd" performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster:
250-300mb/s

 

Now same test without Stripe but with sharding. This results are same when I
set shard size 4MB or 32MB. (Again 2x Replica here)

 

Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s

 

Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.

 

I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.

 

I tried tuning (cache, window-size etc..). Nothing helps.

 

GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are "xfs" and 4TB
each.

 

Is there any tweak/tuning out there to make it fast?

 

Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow. 

 

The reason behind I use shard instead of stripe is i would like to eleminate
files that bigger than brick size.

 

Thanks,

Gencer.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread gencer
Hi,

 

I have an 2 nodes with 20 bricks in total (10+10).

 

First test: 

 

2 Nodes with Distributed - Striped - Replicated (2 x 2)

10GbE Speed between nodes

 

"dd" performance: 400mb/s and higher

Downloading a large file from internet and directly to the gluster:
250-300mb/s

 

Now same test without Stripe but with sharding. This results are same when I
set shard size 4MB or 32MB. (Again 2x Replica here)

 

Dd performance: 70mb/s

Download directly to the gluster performance : 60mb/s

 

Now, If we do this test twice at the same time (two dd or two doewnload at
the same time) it goes below 25/mb each or slower.

 

I thought sharding is at least equal or a little slower (maybe?) but these
results are terribly slow.

 

I tried tuning (cache, window-size etc..). Nothing helps.

 

GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are "xfs" and 4TB
each.

 

Is there any tweak/tuning out there to make it fast?

 

Or is this an expected behavior? If its, It is unacceptable. So slow. I
cannot use this on production as it is terribly slow. 

 

The reason behind I use shard instead of stripe is i would like to eleminate
files that bigger than brick size.

 

Thanks,

Gencer.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster failure due to "0-management: Lock not released for "

2017-06-30 Thread Atin Mukherjee
On Thu, 29 Jun 2017 at 22:51, Victor Nomura  wrote:

> Thanks for the reply.  What would be the best course of action?  The data
> on the volume isn’t important right now but I’m worried when our setup goes
> to production we don’t have the same situation and really need to recover
> our Gluster setup.
>
>
>
> I’m assuming that to redo is to delete everything in the /var/lib/glusterd
> directory on each of the nodes and recreate the volume again. Essentially
> starting over.  If I leave the mount points the same and keep the
> data intact will the files still be there and accessible after? (I
> don’t delete the data on the bricks)
>

I dont think there is anything wrong at gluster stack. If you cross check
the n/w layer and make sure its up all the time then restarting glusterd on
all the nodes should resolve the stale locks.


>
> Regards,
>
>
>
> Victor Nomura
>
>
>
> *From:* Atin Mukherjee [mailto:amukh...@redhat.com]
> *Sent:* June-27-17 12:29 AM
>
>
> *To:* Victor Nomura
> *Cc:* gluster-users
>
> *Subject:* Re: [Gluster-users] Gluster failure due to "0-management: Lock
> not released for "
>
>
>
> I had looked at the logs shared by Victor privately and it seems to be
> there is a N/W glitch in the cluster which is causing the glusterd to lose
> its connection with other peers and as a side effect to this, lot of rpc
> requests are getting bailed out resulting glusterd to end up into a stale
> lock and hence you see that some of the commands failed with "another
> transaction is in progress or locking failed."
>
> Some examples of the symptom highlighted:
>
> [2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
> 22:52:02.719068. timeout = 600 for 192.168.150.53:24007
> [2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
> 22:52:02.716782. timeout = 600 for 192.168.150.52:24007
> [2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent =
> 2017-06-21 22:52:47.909169. timeout = 600 for 192.168.150.53:24007
> [2017-06-21 23:02:53.836991] E [MSGID: 106116]
> [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking
> failed on gfsnode3. Please check log file for details.
> [2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent =
> 2017-06-21 22:52:47.909175. timeout = 600 for 192.168.150.52:24007
>
> I'd like you to request to first look at the N/W layer and rectify the
> problems.
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee 
> wrote:
>
> Could you attach glusterd.log and cmd_history.log files from all the nodes?
>
>
>
> On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura  wrote:
>
> Hi All,
>
>
>
> I’m fairly new to Gluster (3.10.3) and got it going for a couple of months
> now but suddenly after a power failure in our building it all came crashing
> down.  No client is able to connect after powering back the 3 nodes I have
> setup.
>
>
>
> Looking at the logs, it looks like there’s some sort of “Lock” placed on
> the volume which prevents all the clients from connecting to the Gluster
> endpoint.
>
>
>
> I can’t even do a #gluster volume status all command IF more than 1 node
> is powered up.  I have to shutdown node2-3 and then I am able to issue the
> command on node1 to see volume status.  When all nodes are powered up and
> I check the peer status, it says that all peers are connected.  Trying to
> connect to the Gluster volume from all clients says gluster endpoint is not
> available and times out. There are no network issues and each node can
> ping each other and there are no firewalls or any other device between the
> nodes and clients.
>
>
>
> Please help if you think you know how to fix this.  I have a feeling it’s
> this “lock” that’s not “released” due to the whole setup losing power all
> of a sudden.  I’ve tried restarting all the nodes, restarting
> glusterfs-server etc. I’m out of ideas.
>
>
>
> Thanks in advance!
>
>
>
> Victor
>
>
>
> Volume Name: teravolume
>
> Type: Distributed-Replicate
>
> Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 3 x 2 = 6
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gfsnode1:/media/brick1
>
> Brick2: gfsnode2:/media/brick1
>
> Brick3: gfsnode3:/media/brick1
>
> Brick4: gfsnode1:/media/brick2
>
> Brick5: gfsnode2:/media/brick2
>
> Brick6: gfsnode3:/media/brick2
>
> Options Reconfigured:
>
> nfs.disable: on
>
>
>
>
>
> [2017-06-21 16:02:52.376709] W [MSGID: 106118]
> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not
> released for teravolume
>
> [2017-06-21 16:03:03.429032] I [MSGID: 106163]
> 

Re: [Gluster-users] Some bricks are offline after restart, how to bring them online gracefully?

2017-06-30 Thread Hari Gowtham
Hi,

Jan, by multiple times I meant whether you were able to do the whole
setup multiple times and face the same issue.
So that we have a consistent reproducer to work on.

As grepping shows that the process doesn't exist the bug I mentioned
doesn't hold good.
Seems like another issue irrelevant to the bug i mentioned (have
mentioned it now).

When you say too often, this means there is a way to reproduce it.
Please do let us know the steps you performed to check. but this
shouldn't happen if you try again.

You won't have this issue often. and as Mani mentioned do not write a
script to start force it.
If this issue exists with a proper reproducer we will take a look at it.

Sorry, forgot to provide the link for the fix:
patch : https://review.gluster.org/#/c/17101/

If you find a reproducer do file a bug at
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS


On Fri, Jun 30, 2017 at 3:33 PM, Manikandan Selvaganesh
 wrote:
> Hi Jan,
>
> It is not recommended that you automate the script for 'volume start force'.
> Bricks do not go offline just like that. There will be some genuine issue
> which triggers this. Could you please attach the entire glusterd.logs and
> the brick logs around the time so that someone would be able to look?
>
> Just to make sure, please check if you have any network outage(using iperf
> or some standard tool).
>
> @Hari, i think you forgot to provide the bug link, please provide so that
> Jan
> or someone can check if it is related.
>
>
> --
> Thanks & Regards,
> Manikandan Selvaganesan.
> (@Manikandan Selvaganesh on Web)
>
> On Fri, Jun 30, 2017 at 3:19 PM, Jan  wrote:
>>
>> Hi Hari,
>>
>> thank you for your support!
>>
>> Did I try to check offline bricks multiple times?
>> Yes – I gave it enough time (at least 20 minutes) to recover but it stayed
>> offline.
>>
>> Version?
>> All nodes are 100% equal – I tried fresh installation several times during
>> my testing, Every time it is CentOS Minimal install with all updates and
>> without any additional software:
>>
>> uname -r
>> 3.10.0-514.21.2.el7.x86_64
>>
>> yum list installed | egrep 'gluster|ganesha'
>> centos-release-gluster310.noarch 1.0-1.el7.centos @extras
>> glusterfs.x86_64 3.10.2-1.el7
>> @centos-gluster310
>> glusterfs-api.x86_64 3.10.2-1.el7
>> @centos-gluster310
>> glusterfs-cli.x86_64 3.10.2-1.el7
>> @centos-gluster310
>> glusterfs-client-xlators.x86_64  3.10.2-1.el7
>> @centos-gluster310
>> glusterfs-fuse.x86_643.10.2-1.el7
>> @centos-gluster310
>> glusterfs-ganesha.x86_64 3.10.2-1.el7
>> @centos-gluster310
>> glusterfs-libs.x86_643.10.2-1.el7
>> @centos-gluster310
>> glusterfs-server.x86_64  3.10.2-1.el7
>> @centos-gluster310
>> libntirpc.x86_64 1.4.3-1.el7
>> @centos-gluster310
>> nfs-ganesha.x86_64   2.4.5-1.el7
>> @centos-gluster310
>> nfs-ganesha-gluster.x86_64   2.4.5-1.el7
>> @centos-gluster310
>> userspace-rcu.x86_64 0.7.16-3.el7
>> @centos-gluster310
>>
>> Grepping for the brick process?
>> I’ve just tried it again. Process doesn’t exist when brick is offline.
>>
>> Force start command?
>> sudo gluster volume start MyVolume force
>>
>> That works! Thank you.
>>
>> If I have this issue too often then I can create simple script that greps
>> all bricks on the local server and force start when it’s offline. I can
>> schedule such script once after for example 5 minutes after boot.
>>
>> But I’m not sure if it’s good idea to automate it. I’d be worried that I
>> can force it up even when the node doesn’t “see” other nodes and cause split
>> brain issue.
>>
>> Thank you!
>>
>> Kind regards,
>> Jan
>>
>>
>> On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham  wrote:
>>>
>>> Hi Jan,
>>>
>>> comments inline.
>>>
>>> On Fri, Jun 30, 2017 at 1:31 AM, Jan  wrote:
>>> > Hi all,
>>> >
>>> > Gluster and Ganesha are amazing. Thank you for this great work!
>>> >
>>> > I’m struggling with one issue and I think that you might be able to
>>> > help me.
>>> >
>>> > I spent some time by playing with Gluster and Ganesha and after I gain
>>> > some
>>> > experience I decided that I should go into production but I’m still
>>> > struggling with one issue.
>>> >
>>> > I have 3x node CentOS 7.3 with the most current Gluster and Ganesha
>>> > from
>>> > centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
>>> >
>>> > Servers have a lot of resources and they run in a subnet on a stable
>>> > network.
>>> >
>>> > I didn’t have any issues when I tested a single brick. But now I’d like
>>> > to
>>> > setup 17 replicated bricks and I realized that when I restart one of
>>> > nodes
>>> > then the result looks like this:
>>> >
>>> > sudo gluster volume status | grep ' N '
>>> >
>>> > Brick glunode0:/st/brick3/dir  N/A   N/AN   

Re: [Gluster-users] Some bricks are offline after restart, how to bring them online gracefully?

2017-06-30 Thread Manikandan Selvaganesh
Hi Jan,

It is not recommended that you automate the script for 'volume start
force'.
Bricks do not go offline just like that. There will be some genuine issue
which triggers this. Could you please attach the entire glusterd.logs and
the brick logs around the time so that someone would be able to look?

Just to make sure, please check if you have any network outage(using iperf
or some standard tool).

@Hari, i think you forgot to provide the bug link, please provide so that
Jan
or someone can check if it is related.


--
Thanks & Regards,
Manikandan Selvaganesan.
(@Manikandan Selvaganesh on Web)

On Fri, Jun 30, 2017 at 3:19 PM, Jan  wrote:

> Hi Hari,
>
> thank you for your support!
>
> Did I try to check offline bricks multiple times?
> Yes – I gave it enough time (at least 20 minutes) to recover but it stayed
> offline.
>
> Version?
> All nodes are 100% equal – I tried fresh installation several times during
> my testing, Every time it is CentOS Minimal install with all updates and
> without any additional software:
>
> uname -r
> 3.10.0-514.21.2.el7.x86_64
>
> yum list installed | egrep 'gluster|ganesha'
> centos-release-gluster310.noarch 1.0-1.el7.centos @extras
>
> glusterfs.x86_64 3.10.2-1.el7
> @centos-gluster310
> glusterfs-api.x86_64 3.10.2-1.el7
> @centos-gluster310
> glusterfs-cli.x86_64 3.10.2-1.el7
> @centos-gluster310
> glusterfs-client-xlators.x86_64  3.10.2-1.el7
> @centos-gluster310
> glusterfs-fuse.x86_643.10.2-1.el7
> @centos-gluster310
> glusterfs-ganesha.x86_64 3.10.2-1.el7
> @centos-gluster310
> glusterfs-libs.x86_643.10.2-1.el7
> @centos-gluster310
> glusterfs-server.x86_64  3.10.2-1.el7
> @centos-gluster310
> libntirpc.x86_64 1.4.3-1.el7
>  @centos-gluster310
> nfs-ganesha.x86_64   2.4.5-1.el7
>  @centos-gluster310
> nfs-ganesha-gluster.x86_64   2.4.5-1.el7
>  @centos-gluster310
> userspace-rcu.x86_64 0.7.16-3.el7
> @centos-gluster310
>
> Grepping for the brick process?
> I’ve just tried it again. Process doesn’t exist when brick is offline.
>
> Force start command?
> sudo gluster volume start MyVolume force
>
> That works! Thank you.
>
> If I have this issue too often then I can create simple script that greps
> all bricks on the local server and force start when it’s offline. I can
> schedule such script once after for example 5 minutes after boot.
>
> But I’m not sure if it’s good idea to automate it. I’d be worried that I
> can force it up even when the node doesn’t “see” other nodes and cause
> split brain issue.
>
> Thank you!
>
> Kind regards,
> Jan
>
>
> On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham  wrote:
>
>> Hi Jan,
>>
>> comments inline.
>>
>> On Fri, Jun 30, 2017 at 1:31 AM, Jan  wrote:
>> > Hi all,
>> >
>> > Gluster and Ganesha are amazing. Thank you for this great work!
>> >
>> > I’m struggling with one issue and I think that you might be able to
>> help me.
>> >
>> > I spent some time by playing with Gluster and Ganesha and after I gain
>> some
>> > experience I decided that I should go into production but I’m still
>> > struggling with one issue.
>> >
>> > I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from
>> > centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
>> >
>> > Servers have a lot of resources and they run in a subnet on a stable
>> > network.
>> >
>> > I didn’t have any issues when I tested a single brick. But now I’d like
>> to
>> > setup 17 replicated bricks and I realized that when I restart one of
>> nodes
>> > then the result looks like this:
>> >
>> > sudo gluster volume status | grep ' N '
>> >
>> > Brick glunode0:/st/brick3/dir  N/A   N/AN   N/A
>> > Brick glunode1:/st/brick2/dir  N/A   N/AN   N/A
>> >
>>
>> did you try it multiple times?
>>
>> > Some bricks just don’t go online. Sometime it’s one brick, sometime
>> tree and
>> > it’s not same brick – it’s random issue.
>> >
>> > I checked log on affected servers and this is an example:
>> >
>> > sudo tail /var/log/glusterfs/bricks/st-brick3-0.log
>> >
>> > [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:
>> > readv on 10.2.44.23:24007 failed (No data available)
>> > [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify]
>> > 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data
>> > available)
>> > [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify]
>> > 0-glusterfsd-mgmt: Exhausted all volfile servers
>> > [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit]
>> > (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5]
>> > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]
>> > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] )
>> > 0-:received 

Re: [Gluster-users] Some bricks are offline after restart, how to bring them online gracefully?

2017-06-30 Thread Jan
Hi Hari,

thank you for your support!

Did I try to check offline bricks multiple times?
Yes – I gave it enough time (at least 20 minutes) to recover but it stayed
offline.

Version?
All nodes are 100% equal – I tried fresh installation several times during
my testing, Every time it is CentOS Minimal install with all updates and
without any additional software:

uname -r
3.10.0-514.21.2.el7.x86_64

yum list installed | egrep 'gluster|ganesha'
centos-release-gluster310.noarch 1.0-1.el7.centos @extras

glusterfs.x86_64 3.10.2-1.el7
@centos-gluster310
glusterfs-api.x86_64 3.10.2-1.el7
@centos-gluster310
glusterfs-cli.x86_64 3.10.2-1.el7
@centos-gluster310
glusterfs-client-xlators.x86_64  3.10.2-1.el7
@centos-gluster310
glusterfs-fuse.x86_643.10.2-1.el7
@centos-gluster310
glusterfs-ganesha.x86_64 3.10.2-1.el7
@centos-gluster310
glusterfs-libs.x86_643.10.2-1.el7
@centos-gluster310
glusterfs-server.x86_64  3.10.2-1.el7
@centos-gluster310
libntirpc.x86_64 1.4.3-1.el7
 @centos-gluster310
nfs-ganesha.x86_64   2.4.5-1.el7
 @centos-gluster310
nfs-ganesha-gluster.x86_64   2.4.5-1.el7
 @centos-gluster310
userspace-rcu.x86_64 0.7.16-3.el7
@centos-gluster310

Grepping for the brick process?
I’ve just tried it again. Process doesn’t exist when brick is offline.

Force start command?
sudo gluster volume start MyVolume force

That works! Thank you.

If I have this issue too often then I can create simple script that greps
all bricks on the local server and force start when it’s offline. I can
schedule such script once after for example 5 minutes after boot.

But I’m not sure if it’s good idea to automate it. I’d be worried that I
can force it up even when the node doesn’t “see” other nodes and cause
split brain issue.

Thank you!

Kind regards,
Jan


On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham  wrote:

> Hi Jan,
>
> comments inline.
>
> On Fri, Jun 30, 2017 at 1:31 AM, Jan  wrote:
> > Hi all,
> >
> > Gluster and Ganesha are amazing. Thank you for this great work!
> >
> > I’m struggling with one issue and I think that you might be able to help
> me.
> >
> > I spent some time by playing with Gluster and Ganesha and after I gain
> some
> > experience I decided that I should go into production but I’m still
> > struggling with one issue.
> >
> > I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from
> > centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
> >
> > Servers have a lot of resources and they run in a subnet on a stable
> > network.
> >
> > I didn’t have any issues when I tested a single brick. But now I’d like
> to
> > setup 17 replicated bricks and I realized that when I restart one of
> nodes
> > then the result looks like this:
> >
> > sudo gluster volume status | grep ' N '
> >
> > Brick glunode0:/st/brick3/dir  N/A   N/AN   N/A
> > Brick glunode1:/st/brick2/dir  N/A   N/AN   N/A
> >
>
> did you try it multiple times?
>
> > Some bricks just don’t go online. Sometime it’s one brick, sometime tree
> and
> > it’s not same brick – it’s random issue.
> >
> > I checked log on affected servers and this is an example:
> >
> > sudo tail /var/log/glusterfs/bricks/st-brick3-0.log
> >
> > [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:
> > readv on 10.2.44.23:24007 failed (No data available)
> > [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify]
> > 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data
> > available)
> > [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify]
> > 0-glusterfsd-mgmt: Exhausted all volfile servers
> > [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit]
> > (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5]
> > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]
> > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] )
> > 0-:received signum (15), shutting down
> > [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect]
> 0-glusterfs:
> > connection attempt on 10.2.44.23:24007 failed, (Network is unreachable)
> > [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request]
> > 0-glusterfs: not connected (priv->connected = 0)
> > [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit]
> > 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster
> > Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
> >
> > I think that important message is “Network is unreachable”.
> >
> > Question
> > 1. Could you please tell me, is that normal when you have many bricks?
> > Networks is definitely stable and other servers use it without problem
> and
> > all servers run on a same pair of switches. My assumption is that in the
> > same time 

Re: [Gluster-users] How to shutdown a node properly ?

2017-06-30 Thread Ravishankar N

On 06/30/2017 12:53 PM, Gandalf Corvotempesta wrote:
Yes but why killing gluster notifies all clients and a graceful 
shutdown don't?
I think this is a bug, if I'm shutting down a server, it's obvious 
that all clients should stop to connect to it


Oh It is a bug (or a known-issue ;-) ) alright but I do not know at what 
layer (kernel/tcp-ip or user-space/gluster) the fix needs to be done. 
Maybe someone who is familiar with the tcp layer and connection timeouts 
can pitch in.

-Ravi


Il 30 giu 2017 3:24 AM, "Ravishankar N" > ha scritto:


On 06/30/2017 12:40 AM, Renaud Fortier wrote:


On my nodes, when i use the system.d script to kill gluster
(service glusterfs-server stop) only glusterd is killed. Then I
guess the shutdown doesn’t kill everything !



Killing glusterd does not kill other gluster processes.

When you shutdown a node, everything obviously gets killed but the
client does not get notified immediately that the brick went down,
leading for it to wait for the 42 second ping-timeout after which
it assumes the brick is down. When you kill the brick manually
before shutdown, the client immediate  receives the notification
and you don't see the hang. See Xavi's description in Bug 1054694.

So if it is a planned shutdown or reboot, it is better to kill the
gluster processes before shutting the node down. BTW, you can use

https://github.com/gluster/glusterfs/blob/master/extras/stop-all-gluster-processes.sh


which automatically checks for pending heals etc before killing
the gluster processes.

-Ravi



*De :*Gandalf Corvotempesta
[mailto:gandalf.corvotempe...@gmail.com
]
*Envoyé :* 29 juin 2017 13:41
*À :* Ravishankar N 

*Cc :* gluster-users@gluster.org
; Renaud Fortier


*Objet :* Re: [Gluster-users] How to shutdown a node properly ?

Init.d/system.d script doesn't kill gluster automatically on
reboot/shutdown?

Il 29 giu 2017 5:16 PM, "Ravishankar N" > ha scritto:

On 06/29/2017 08:31 PM, Renaud Fortier wrote:

Hi,

Everytime I shutdown a node, I lost access (from clients)
to the volumes for 42 seconds (network.ping-timeout). Is
there a special way to shutdown a node to keep the access
to the volumes without interruption ? Currently, I use
the ‘shutdown’ or ‘reboot’ command.

`killall glusterfs glusterfsd glusterd` before issuing
shutdown or reboot. If it is a replica or EC volume, ensure
that there are no pending heals before bringing down a node.
i.e. `gluster volume heal volname info` should show 0 entries.


My setup is :

-4 gluster 3.10.3 nodes on debian 8 (jessie)

-3 volumes Distributed-Replicate 2 X 2 = 4

Thank you

Renaud

___

Gluster-users mailing list

Gluster-users@gluster.org 

http://lists.gluster.org/mailman/listinfo/gluster-users


___ Gluster-users
mailing list Gluster-users@gluster.org

http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to shutdown a node properly ?

2017-06-30 Thread Gandalf Corvotempesta
Yes but why killing gluster notifies all clients and a graceful shutdown
don't?
I think this is a bug, if I'm shutting down a server, it's obvious that all
clients should stop to connect to it

Il 30 giu 2017 3:24 AM, "Ravishankar N"  ha scritto:

> On 06/30/2017 12:40 AM, Renaud Fortier wrote:
>
> On my nodes, when i use the system.d script to kill gluster (service
> glusterfs-server stop) only glusterd is killed. Then I guess the shutdown
> doesn’t kill everything !
>
>
> Killing glusterd does not kill other gluster processes.
>
> When you shutdown a node, everything obviously gets killed but the client
> does not get notified immediately that the brick went down, leading for it
> to wait for the 42 second ping-timeout after which it assumes the brick is
> down. When you kill the brick manually before shutdown, the client
> immediate  receives the notification and you don't see the hang. See Xavi's
> description in Bug 1054694.
>
> So if it is a planned shutdown or reboot, it is better to kill the gluster
> processes before shutting the node down. BTW, you can use
> https://github.com/gluster/glusterfs/blob/master/extras/
> stop-all-gluster-processes.sh which automatically checks for pending
> heals etc before killing the gluster processes.
>
> -Ravi
>
>
>
>
> *De :* Gandalf Corvotempesta [mailto:gandalf.corvotempe...@gmail.com
> ]
> *Envoyé :* 29 juin 2017 13:41
> *À :* Ravishankar N  
> *Cc :* gluster-users@gluster.org; Renaud Fortier
>  
> *Objet :* Re: [Gluster-users] How to shutdown a node properly ?
>
>
>
> Init.d/system.d script doesn't kill gluster automatically on
> reboot/shutdown?
>
>
>
> Il 29 giu 2017 5:16 PM, "Ravishankar N"  ha
> scritto:
>
> On 06/29/2017 08:31 PM, Renaud Fortier wrote:
>
> Hi,
>
> Everytime I shutdown a node, I lost access (from clients) to the volumes
> for 42 seconds (network.ping-timeout). Is there a special way to shutdown a
> node to keep the access to the volumes without interruption ? Currently, I
> use the ‘shutdown’ or ‘reboot’ command.
>
> `killall glusterfs glusterfsd glusterd` before issuing shutdown or
> reboot. If it is a replica or EC volume, ensure that there are no pending
> heals before bringing down a node. i.e. `gluster volume heal volname info`
> should show 0 entries.
>
>
>
>
> My setup is :
>
> -4 gluster 3.10.3 nodes on debian 8 (jessie)
>
> -3 volumes Distributed-Replicate 2 X 2 = 4
>
>
>
> Thank you
>
> Renaud
>
>
>
> ___
>
> Gluster-users mailing list
>
> Gluster-users@gluster.org
>
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Some bricks are offline after restart, how to bring them online gracefully?

2017-06-30 Thread Hari Gowtham
Hi Jan,

comments inline.

On Fri, Jun 30, 2017 at 1:31 AM, Jan  wrote:
> Hi all,
>
> Gluster and Ganesha are amazing. Thank you for this great work!
>
> I’m struggling with one issue and I think that you might be able to help me.
>
> I spent some time by playing with Gluster and Ganesha and after I gain some
> experience I decided that I should go into production but I’m still
> struggling with one issue.
>
> I have 3x node CentOS 7.3 with the most current Gluster and Ganesha from
> centos-gluster310 repository (3.10.2-1.el7) with replicated bricks.
>
> Servers have a lot of resources and they run in a subnet on a stable
> network.
>
> I didn’t have any issues when I tested a single brick. But now I’d like to
> setup 17 replicated bricks and I realized that when I restart one of nodes
> then the result looks like this:
>
> sudo gluster volume status | grep ' N '
>
> Brick glunode0:/st/brick3/dir  N/A   N/AN   N/A
> Brick glunode1:/st/brick2/dir  N/A   N/AN   N/A
>

did you try it multiple times?

> Some bricks just don’t go online. Sometime it’s one brick, sometime tree and
> it’s not same brick – it’s random issue.
>
> I checked log on affected servers and this is an example:
>
> sudo tail /var/log/glusterfs/bricks/st-brick3-0.log
>
> [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] 0-glusterfs:
> readv on 10.2.44.23:24007 failed (No data available)
> [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No data
> available)
> [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_rpc_notify]
> 0-glusterfsd-mgmt: Exhausted all volfile servers
> [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5]
> -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5]
> -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] )
> 0-:received signum (15), shutting down
> [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] 0-glusterfs:
> connection attempt on 10.2.44.23:24007 failed, (Network is unreachable)
> [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request]
> 0-glusterfs: not connected (priv->connected = 0)
> [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit]
> 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster
> Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs)
>
> I think that important message is “Network is unreachable”.
>
> Question
> 1. Could you please tell me, is that normal when you have many bricks?
> Networks is definitely stable and other servers use it without problem and
> all servers run on a same pair of switches. My assumption is that in the
> same time many bricks try to connect and that doesn’t work.

no. it shouldnt happen if there are multiple bricks.
there was a bug related to this [1]
to verify if that was the issue I need to know a few things.
1) are all the node of the same version.
2) did you check grepping for the brick process using the ps command?
need to verify is the brick is still up and is not connected to glusterd alone.


>
> 2. Is there an option to configure a brick to enable some kind of
> autoreconnect or add some timeout?
> gluster volume set brick123 option456 abc ??
If the brick process is not seen in the ps aux | grep glusterfsd
The way to start a brick is to use the volume start force command.
If brick is not started there is no point configuring it. and to start
a brick we cant
use the configure command.

>
> 3. What it the recommend way to fix offline brick on the affected server? I
> don’t want to use “gluster volume stop/start” since affected bricks are
> online on other server and there is no reason to completely turn it off.
gluster volume start force will not bring down the bricks that are
already up and
running.

>
> Thank you,
> Jan
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Regards,
Hari Gowtham.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multi petabyte gluster

2017-06-30 Thread Serkan Çoban
>Thanks for the reply. We will mainly use this for archival - near-cold storage.
Archival usage is good for EC

>Anything, from your experience, to keep in mind while planning large 
>installations?
I am using 3.7.11 and only problem is slow rebuild time when a disk
fails. It takes 8 days to heal a 8TB disk.(This might be related with
my EC configuration 16+4)
3.9+ versions has some improvements about this but I cannot test them yet...

On Thu, Jun 29, 2017 at 2:49 PM, jkiebzak  wrote:
> Thanks for the reply. We will mainly use this for archival - near-cold
> storage.
>
>
> Anything, from your experience, to keep in mind while planning large
> installations?
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>  Original message 
> From: Serkan Çoban 
> Date: 6/29/17 4:39 AM (GMT-05:00)
> To: Jason Kiebzak 
> Cc: Gluster Users 
> Subject: Re: [Gluster-users] Multi petabyte gluster
>
> I am currently using 10PB single volume without problems. 40PB is on
> the way. EC is working fine.
> You need to plan ahead with large installations like this. Do complete
> workload tests and make sure your use case is suitable for EC.
>
>
> On Wed, Jun 28, 2017 at 11:18 PM, Jason Kiebzak  wrote:
>> Has anyone scaled to a multi petabyte gluster setup? How well does erasure
>> code do with such a large setup?
>>
>> Thanks
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users