[Gluster-users] Why nodeid==1 need to be checked and dealt with specially in "fuse-bridge.c"?

2017-03-15 Thread Zhitao Li
Hello, everyone,


I have been trying to optimize "ls" performance for Glusterfs recently. My 
volume is disperse(48 bricks  with redundancy 16), and I mount it with fuse. I 
create 1 little files in mount point. Then I execute "ls" command. In my 
cluster, it takes about 3 seconds.

I have a question about fuse_getattr function in "fuse-bridge.c" . Why need we 
check whether nodeid is equal to 1? , which means it is the mount point.  It is 
hard for me to get its meaning.

(In my case, I find the operation of fuse_getattr takes neer half time for 
"ls", that is why I want to know what the check means. )


[cid:7b3a479d-5cd6-4c3a-bc94-8c0e18d0f0c7]




I try to disable the special check, and then test "ls". It works normally and 
have a speedup 2x(about 1.3s without check). The reason is that in my case, 
"lookup" cost is much higher than "stat". Without the special check, getattr 
goes into "stat" instead of "lookup".


Could you tell me the meaning of the special check for "nodeid == 1"?

I would appreciate it if anyone could give some tips . Thank you!

Best regards,
Zhitao Li

Sent from Outlook
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] advice needed on configuring large gluster cluster

2017-03-15 Thread Serkan Çoban
Please find my comments inline.

> Hi
>
> we have a new gluster cluster we are planning on deploying.  We will have 24
> nodes each with JBOD, 39 8TB drives and 6, 900GB SSDs, and FDR IB
>
> We will not be using all of this as one volume , but I thought initially of
> using a distributed disperse volume.
>
> Never having attempted anything on this scale I have a couple of questions
> regarding EC and distibuted disperse volumes.
>
> Does a distributed dispersed volume have to start life as distributed
> dispersed, or can I  take a disperse volume and make it distributed by
> adding bricks?
Yes you can start with one subvolume and later you can increase the subvolumes.
But be careful about planning, if you start with m+n EC configuration,
you can add
another m+n subvolume to it.
>
> Does an EC scheme of 24+4 seem reasonable?  One requirement we will have is
> the need to tolerate two nodes down at once, as the nodes share a chassis.
> I assume that  distributed disperse volumes can be expanded in a similar
> fashion to distributed replicate volumes by adding additional disperse brick
> sets?
It is recommended in m+n configuration m should be power of two.
You can do 16+4 or 8+2. Higher m will cause slower healing but
parallel self heal
of EC volumes in 3.9+ will help. 8+2 configuration with one brick from
every node will
tolerate loss of two nodes.

>
> I would also like to consider adding a hot-tier using the SSDs,  I confess I
> have not done much reading on tiering, but am hoping I can use a different
> volume form for the hot tier.  Can I use create a disperse, or a distributed
> replicated?   If I am smoking rainbows then I can consider setting up a SSD
> only distributed disperse volume.
EC performance is quite good for our workload,I did not try any tier
in front of it
Test your workload without tier, if it works then KISS
>
> I'd also appreciate any feedback on likely performance issues and tuning
> tips?
You can find kernel performance tuning here:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/
You may also change client.event-threads, server.event-threads and
heal related parameters
but do not forget to test your workload after and before changing those values.
>
> Many Thanks
>
> -Alastair
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Demo in community meetings

2017-03-15 Thread Vijay Bellur
On Wed, Mar 15, 2017 at 12:33 PM, Vijay Bellur  wrote:

> On 03/14/2017 07:10 AM, Prasanna Kalever wrote:
>
>> Thanks for the opportunity.
>>
>> I will be happy to stream a demo on 'howto gluster-block' tomorrow.
>>
>> --
>> Prasanna
>>
>> On Mon, Mar 13, 2017 at 8:45 AM, Vijay Bellur  wrote:
>>
>>> Hi All,
>>>
>>> In the last meeting of maintainers, we discussed about reserving 15-30
>>> minutes in the community meeting for demoing new functionalities on
>>> anything
>>> related to Gluster. If you are working on something new or possess
>>> specialized knowledge of some intricate functionality, then this slot
>>> would
>>> be a great opportunity for sharing that with the community and obtaining
>>> real time feedback from seasoned Gluster folks in the meeting.
>>>
>>> Given that the slot is for 15-30 minutes, we would be able to accommodate
>>> 1-2 demos per meeting. This demo will happen over bluejeans and the URL
>>> would be available in the agenda for the meeting. If you are interested
>>> in
>>> kickstarting the demo series this week, please respond on this thread and
>>> let us know.
>>>
>>>
>
> Thank you Prasanna for your presentation and demo of gluster-block!
>
> Recording of the session can be found at [1].
>
>
Looks like the earlier link requires a login. Please use the new URL [2]
for accessing the sesion.

Thanks!
Vijay

[2] https://goo.gl/41mV3c
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Demo in community meetings

2017-03-15 Thread Vijay Bellur

On 03/14/2017 07:10 AM, Prasanna Kalever wrote:

Thanks for the opportunity.

I will be happy to stream a demo on 'howto gluster-block' tomorrow.

--
Prasanna

On Mon, Mar 13, 2017 at 8:45 AM, Vijay Bellur  wrote:

Hi All,

In the last meeting of maintainers, we discussed about reserving 15-30
minutes in the community meeting for demoing new functionalities on anything
related to Gluster. If you are working on something new or possess
specialized knowledge of some intricate functionality, then this slot would
be a great opportunity for sharing that with the community and obtaining
real time feedback from seasoned Gluster folks in the meeting.

Given that the slot is for 15-30 minutes, we would be able to accommodate
1-2 demos per meeting. This demo will happen over bluejeans and the URL
would be available in the agenda for the meeting. If you are interested in
kickstarting the demo series this week, please respond on this thread and
let us know.




Thank you Prasanna for your presentation and demo of gluster-block!

Recording of the session can be found at [1].

Regards,
Vijay

[1] https://goo.gl/Gbhrkd

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] performance.parallel-readdir option, and trusted.glusterfs.dht.linkto - permission denied

2017-03-15 Thread Andrzej Rzadkowolski
I have noticed strange behavior but let start from the begining.
We have 1.8P repo containing 11 nodes based on ubuntu 16.04, and currently
glusterfs 3.10. Gluster was never rebalanced, and probably never will be,
so across the bricks there is many likto files. Gluster serves data over
read-only built-in nfs3, and every data is owned by the same user which is
not root.
I reproduced this issue in smaller environment, and it seems that accessing
file which is linkto, results in cannot access: IO error. It occurs after
switching performance.parallel-readdir, and not from root or owner. Any
other users gets cannot access: IO error.
Debug logs below.

http://pastebin.com/M6R7bppA

BR


--
Andrzej Rzadkowolski
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] advice needed on configuring large gluster cluster

2017-03-15 Thread Alastair Neil
Hi

we have a new gluster cluster we are planning on deploying.  We will have
24 nodes each with JBOD, 39 8TB drives and 6, 900GB SSDs, and FDR IB

We will not be using all of this as one volume , but I thought initially of
using a distributed disperse volume.

Never having attempted anything on this scale I have a couple of questions
regarding EC and distibuted disperse volumes.

Does a distributed dispersed volume have to start life as distributed
dispersed, or can I  take a disperse volume and make it distributed by
adding bricks?

Does an EC scheme of 24+4 seem reasonable?  One requirement we will have is
the need to tolerate two nodes down at once, as the nodes share a chassis.
I assume that  distributed disperse volumes can be expanded in a similar
fashion to distributed replicate volumes by adding additional disperse
brick sets?

I would also like to consider adding a hot-tier using the SSDs,  I confess
I have not done much reading on tiering, but am hoping I can use a
different volume form for the hot tier.  Can I use create a disperse, or a
distributed replicated?   If I am smoking rainbows then I can consider
setting up a SSD only distributed disperse volume.

I'd also appreciate any feedback on likely performance issues and tuning
tips?

Many Thanks

-Alastair
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disperse mkdir fails

2017-03-15 Thread Xavier Hernandez

Hi Ram,

On 14/03/17 16:48, Ankireddypalle Reddy wrote:

Xavi,
   Thanks for checking this.  We have an external metadata server which 
keeps track of every file that gets written to the volume and has the 
capability to validate the file contents. Will use this capability to validate 
the file contents. Once the data is verified will the following sequence of 
steps be sufficient to restore the volume.

1) Rebalance the volume.


Probably this won't succeed if you are already having problems creating 
directories. First you should make sure that the volume is healthy.



2) After rebalance is complete, stop ingesting more data to the volume.
3) Let the pending heals complete.


These two steps would be useful before the rebalance. After letting 
self-heal to heal everything it can, the remaining damaged entries 
should be healed by hand. The exact procedure depends on the nature of 
the problems.



4) Stop the volume
5) For any heals that fail because of mismatching version/dirty extended 
attributes on the directories,  set this to a matching value on all the nodes.


It depends. Making the version/dirty attributes to match doesn't solve 
the underlying problem that caused them to get out of sync. For example, 
for a directory entry you should also make sure that all subdirectories 
exist on all bricks and they have the same attributes. If any directory 
is missing, you need to create it along with its attributes and contents 
recursively.


For each disperse set, you also need to make sure that all directories 
and files match (for files only attributes, not file contents).


If self-heal is unable to fix a problem, most probably it's more complex 
than simply fixing version/dirty, so be cautious.


Xavi



Thanks and Regards,
Ram

-Original Message-
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Tuesday, March 14, 2017 5:28 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org); 
gluster-users@gluster.org
Subject: Re: [Gluster-users] Disperse mkdir fails

Hi Ram,

On 13/03/17 15:02, Ankireddypalle Reddy wrote:

Xavi,
   CV_MAGNETIC directory on a single brick  has  155683 entries.  
There are altogether 60 bricks in the volume. I could provide the output if you 
still need that.


The problem is that not all bricks have the same number of entries:

glusterfs1:disk1 155674
glusterfs2:disk1 155675
glusterfs3:disk1 155718

glusterfs1:disk2 155688
glusterfs2:disk2 155687
glusterfs3:disk2 155730

glusterfs1:disk3 155675
glusterfs2:disk3 155674
glusterfs3:disk3 155717

glusterfs1:disk4 155684
glusterfs2:disk4 155683
glusterfs3:disk4 155726

glusterfs1:disk5 155698
glusterfs2:disk5 155695
glusterfs3:disk5 155738

glusterfs1:disk6 155668
glusterfs2:disk6 155667
glusterfs3:disk6 155710

glusterfs1:disk7 155687
glusterfs2:disk7 155689
glusterfs3:disk7 155732

glusterfs1:disk8 155673
glusterfs2:disk8 155675
glusterfs3:disk8 155718

glusterfs4:disk1 149097
glusterfs5:disk1 149097
glusterfs6:disk1 149098

glusterfs4:disk2 149097
glusterfs5:disk2 149097
glusterfs6:disk2 149098

glusterfs4:disk3 149097
glusterfs5:disk3 149097
glusterfs6:disk3 149098

glusterfs4:disk4 149097
glusterfs5:disk4 149097
glusterfs6:disk4 149098

glusterfs4:disk5 149097
glusterfs5:disk5 149097
glusterfs6:disk5 149098

glusterfs4:disk6 149097
glusterfs5:disk6 149097
glusterfs6:disk6 149098

glusterfs4:disk7 149097
glusterfs5:disk7 149097
glusterfs6:disk7 149098

glusterfs4:disk8 149097
glusterfs5:disk8 149097
glusterfs6:disk8 149098

An small difference could be explained by concurrent operations while 
retrieving this data, but some bricks are way out of sync.

trusted.ec.dirty and trusted.ec.version also show many discrepancies:

glusterfs1:disk1 trusted.ec.dirty=0x0ba4
glusterfs2:disk1 trusted.ec.dirty=0x0bb8
glusterfs3:disk1 trusted.ec.dirty=0x0016
glusterfs1:disk1 trusted.ec.version=0x00084db400084e11
glusterfs2:disk1 trusted.ec.version=0x00084e0700084e0c
glusterfs3:disk1 trusted.ec.version=0x0008426a00084e11

glusterfs1:disk2 trusted.ec.dirty=0x0ba5
glusterfs2:disk2 trusted.ec.dirty=0x0bb6
glusterfs3:disk2 trusted.ec.dirty=0x0017
glusterfs1:disk2 trusted.ec.version=0x0005ccb70005cd0a
glusterfs2:disk2 trusted.ec.version=0x0005cd05cd05
glusterfs3:disk2 trusted.ec.version=0x0005c1660005cd0a

glusterfs1:disk3 trusted.ec.dirty=0x0ba5
glusterfs2:disk3 trusted.ec.dirty=0x0bb5
glusterfs3:disk3 trusted.ec.dirty=0x0016
glusterfs1:disk3 trusted.ec.version=0x0005d0cb0005d123
glusterfs2:disk3 trusted.ec.version=0x0005d1190005d11e
glusterfs3:disk3 trusted.ec.version=0x0005c57f0005d123