[Gluster-devel] review request - Change the way client uuid is built

2016-09-20 Thread Raghavendra Gowdappa
Hi all,

[1] might have implications across different components in the stack. Your 
reviews are requested.



rpc : Change the way client uuid is built

Problem:
Today the main users of client uuid are protocol layers, locks, leases.
Protocolo layers requires each client uuid to be unique, even across
connects and disconnects. Locks and leases on the server side also use
the same client uid which changes across graph switches and across
file migrations. Which makes the graph switch and file migration
tedious for locks and leases.
As of today lock migration across graph switch is client driven,
i.e. when a graph switches, the client reassociates all the locks(which
were associated with the old graph client uid) with the new graphs
client uid. This means flood of fops to get and set locks for each fd.
Also file migration across bricks becomes even more difficult as
client uuid for the same client, is different on the other brick.

The exact set of issues exists for leases as well.

Hence the solution:
Make the migration of locks and leases during graph switch and migration,
server driven instead of client driven. This can be achieved by changing
the format of client uuid.

Client uuid currently:
%s(ctx uuid)-%s(protocol client name)-%d(graph id)%s(setvolume count/reconnect 
count)

Proposed Client uuid:
"CTX_ID:%s-GRAPH_ID:%d-PID:%d-HOST:%s-PC_NAME:%s-RECON_NO:%s"
-  CTX_ID: This is will be constant per client.
-  GRAPH_ID, PID, HOST, PC_NAME(protocol client name), RECON_NO(setvolume count)
remains the same.

With this, the first part of the client uuid, CTX_ID+GRAPH_ID remains
constant across file migration, thus the migration is made easier.

Locks and leases store only the first part CTX_ID+GRAPH_ID as their
client identification. This means, when the new graph connects,
the locks and leases xlator should walk through their database
to update the client id, to have new GRAPH_ID. Thus the graph switch
is made server driven and saves a lot of network traffic.

Change-Id: Ia81d57a9693207cd325d7b26aee4593fcbd6482c
BUG: 1369028
Signed-off-by: Poornima G 
Signed-off-by: Susant Palai 



[1] http://review.gluster.org/#/c/13901/10/

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] relative ordering of writes to same file from two different fds

2016-09-20 Thread Raghavendra Gowdappa
Hi all,

This mail is to figure out the behavior of write to same file from two 
different fds. As Ryan quotes in one of comments,



I think it’s not safe. in this case:
1. P1 write to F1 use FD1
2. after P1 write finish, P2 write to the same place use FD2
since they are not conflict with each other now, the order the 2 writes send to 
underlying fs is not determined. so the final data may be P1’s or P2’s.
this semantics is not the same with linux buffer io. linux buffer io will make 
the second write cover the first one, this is to say the final data is P2’s.
you can see it from linux NFS (as we are all network filesystem) 
fs/nfs/file.c:nfs_write_begin(), nfs will flush ‘incompatible’ request first 
before another write begin. the way 2 request is determine to be ‘incompatible’ 
is that they are from 2 different open fds.
I think write-behind behaviour should keep the same with linux page cache.



However, my understanding is that filesystems need not maintain the relative 
order of writes (as it received from vfs/kernel) on two different fds. Also, if 
we have to maintain the order it might come with increased latency. The 
increased latency can be because of having "newer" writes to wait on "older" 
ones. This wait can fill up write-behind buffer and can eventually result in a 
full write-behind cache and hence not able to "write-back" newer writes.

* What does POSIX say about it?
* How do other filesystems behave in this scenario?


Also, the current write-behind implementation has the concept of "generation 
numbers". To quote from comment:



uint64_t gen;/* Liability generation number. Represents 


the current 'state' of liability. Every 


new addition to the liability list bumps


the generation number.  





a newly arrived request is only required


to perform causal checks against the entries


in the liability list which were present


at the time of its addition. the generation 


number at the time of its addition is stored


in the request and used during checks.  





the liability list can grow while the request   


waits in the todo list waiting for its  


dependent operations to complete. however   


it is not of the request's concern to depend


itself on those new entries which arrived   


after it arrived (i.e, those that have a


liability generation higher than itself)


 */


So, if a single thread is doing writes on two different fds, generation numbers 
are sufficient to enforce the relative ordering. If 

Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Ric Wheeler

On 09/21/2016 07:03 AM, Gerard Braad wrote:

n Wed, Sep 21, 2016 at 11:58 AM, Dan Mick  wrote:

>Is it Tendrl or Tendryl?  (or the actual word, which would be 'tendril'
>and thus unambiguous and memorable)?

So, I am not the only person being confused about this


Sorry for injecting confusion around the spelling - I think it is just "tendrl":

https://github.com/Tendrl/tendrl

Regards,
Ric


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Gerard Braad
On Wed, Sep 21, 2016 at 11:58 AM, Dan Mick  wrote:
> Is it Tendrl or Tendryl?  (or the actual word, which would be 'tendril'
> and thus unambiguous and memorable)?

So, I am not the only person being confused about this

-- 

   Gerard Braad | http://gbraad.nl
   [ Doing Open Source Matters ]
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Heketi] Mailing list

2016-09-20 Thread Luis Pabón
You are completely correct Jeff.  We will move to a Google Group email list.
I have updated Heketi site with the new information:

https://github.com/heketi/heketi#community

We will update gluster-devel when we continue working together, for example,
on iSCSI and similar projects.

Thanks all,

- Luis

- Original Message -
From: "Jeff Darcy" 
To: "Luis Pabón" 
Cc: "gluster-devel" 
Sent: Tuesday, September 20, 2016 4:17:09 PM
Subject: Re: [Gluster-devel] [Heketi] Mailing list

> Hi gluster-devel,
>   At the Heketi project, we wanted to get better communication with the
> GlusterFS community.  We are a young project and didn't have our own
> mailing list, so we asked if we could also be part gluster-devel mailing
> list.  The plan is to Heketi specific emails to gluster-devel using the
> subject tag '[Heketi]'.  This is what is done in OpenStack, where they
> all share the same mailing list, and use the subject line tag for
> separate projects.
>   I consider this a pilot, nothing is set in stone, but I wanted to ask
> your opinion in the matter.

Personally, I'd rather see Heketi get its own mailing list(s) forthwith.
While it's fine for things that affect both projects to be crossposted,
putting general (potentially non-Gluster-related) Heketi traffic on a
Gluster mailing list has the following effects.

 * Gluster developers who have some interest in Heketi will have to
   "manually filter" which Heketi messages are actually relevant.

 * Gluster developers who have *no* interest in Heketi (yes, they
   exist) will have to set up more automatic filters.

 * Non-Gluster developers who want to follow Heketi will have to
   join a Gluster mailing list which has lots of stuff they couldn't
   care less about.

 * Searching for Heketi-related email gets weird, with lots of false
   positives on "Gluster" just because it's on our list.

 * Heketi developers might feel constrained in what they can say about
   Gluster, as compared to what they might say on a Heketi-specific
   list (even if public).

IMO the best place for any project XYZ to have its discussions is on
XYZ's own mailing list(s).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Heketi] Mailing list

2016-09-20 Thread Jeff Darcy
> Hi gluster-devel,
>   At the Heketi project, we wanted to get better communication with the
> GlusterFS community.  We are a young project and didn't have our own
> mailing list, so we asked if we could also be part gluster-devel mailing
> list.  The plan is to Heketi specific emails to gluster-devel using the
> subject tag '[Heketi]'.  This is what is done in OpenStack, where they
> all share the same mailing list, and use the subject line tag for
> separate projects.
>   I consider this a pilot, nothing is set in stone, but I wanted to ask
> your opinion in the matter.

Personally, I'd rather see Heketi get its own mailing list(s) forthwith.
While it's fine for things that affect both projects to be crossposted,
putting general (potentially non-Gluster-related) Heketi traffic on a
Gluster mailing list has the following effects.

 * Gluster developers who have some interest in Heketi will have to
   "manually filter" which Heketi messages are actually relevant.

 * Gluster developers who have *no* interest in Heketi (yes, they
   exist) will have to set up more automatic filters.

 * Non-Gluster developers who want to follow Heketi will have to
   join a Gluster mailing list which has lots of stuff they couldn't
   care less about.

 * Searching for Heketi-related email gets weird, with lots of false
   positives on "Gluster" just because it's on our list.

 * Heketi developers might feel constrained in what they can say about
   Gluster, as compared to what they might say on a Heketi-specific
   list (even if public).

IMO the best place for any project XYZ to have its discussions is on
XYZ's own mailing list(s).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Ric Wheeler

On 09/20/2016 08:09 PM, Joe Julian wrote:

Does this compare to ViPR?


I am not a ViPR expert, you would have to poke John Mark Walker for that :)

My assumption is that they might want to use these modules (from tendryl down to 
the ceph/gluster bits) to add support for ceph and gluster.


Regards,

Ric



On September 20, 2016 9:52:54 AM PDT, Ric Wheeler  wrote:

On 09/20/2016 10:23 AM, Gerard Braad wrote:

Hi Mrugesh, On Tue, Sep 20, 2016 at 3:10 PM, Mrugesh Karnik
 wrote:

I'd like to introduce the Tendrl project. Tendrl aims to build a
management interface for Ceph. We've pushed some documentation to the 


On Tue, Sep 20, 2016 at 3:15 PM, Mrugesh Karnik 
wrote:

I'd like to introduce the Tendrl project. Tendrl aims to build a
management interface for Gluster. We've pushed some documentation to 


It might help to introduce Tendrl as the "Universal Storage Manager'"
with a possibility to either manage Ceph and/or Gluster. I understand
you want specific feedback, but a clear definition of the tool would
be helpful.



(Apologies for reposting my response - gmail injected html into what I 
thought
was a text reply and it bounced from ceph-devel.)

Hi Gerard,

I see the goal differently.

It is better to think of tendryl as one component of a whole management
application stack. At the bottom, we will have ceph specific components
(ceph-mgr) and gluster specific components (glusterd), as well as other 
local
storage/file system components like libstoragemgt and so on.

Tendryl is the next layer up from that, but it itself is meant to be 
consumed by
presentation layers. For a stand alone thing that we hope to use at Red Hat,
there will be a universal storage manager stack with everything I mentioned
above in it, as
well as the GUI code.

Other projects will hopefully find this useful enough and plug some or all 
of
the components into other management stacks.

  From my point of view, the job is to try to provide as much as possible
re-usable components that will be generically interesting to a wide variety 
of
applications. It is definitely not about trying to make all storage stacks 
look
the same and force artificial new names/concepts/etc on the users. Of 
course,
any one application will tend to have a similar "skin" for UX elements to 
try
and make it consistent for users.

If we do it right, people passionate about Ceph but who don't care about 
Gluster
will be able to be avoid getting tied up in something out of their interest.
Same going the other way around for Gluster developers who don't care or 
know
about Ceph. Over time, this might extend to other storage types like Samba 
or
NFS Ganesha clusters,
etc.

Regards,

Ric



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] [Heketi] Mailing list

2016-09-20 Thread Luis Pabón
Hi gluster-devel,
  At the Heketi project, we wanted to get better communication with the
GlusterFS community.  We are a young project and didn't have our own
mailing list, so we asked if we could also be part gluster-devel mailing
list.  The plan is to Heketi specific emails to gluster-devel using the
subject tag '[Heketi]'.  This is what is done in OpenStack, where they
all share the same mailing list, and use the subject line tag for
separate projects.
  I consider this a pilot, nothing is set in stone, but I wanted to ask
your opinion in the matter.

Regards,

- Luis
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Joe Julian
Does this compare to ViPR? 

On September 20, 2016 9:52:54 AM PDT, Ric Wheeler  wrote:
>On 09/20/2016 10:23 AM, Gerard Braad wrote:
>> Hi Mrugesh,
>>
>> On Tue, Sep 20, 2016 at 3:10 PM, Mrugesh Karnik 
>wrote:
>>> I'd like to introduce the Tendrl project. Tendrl aims to build a
>>> management interface for Ceph. We've pushed some documentation to
>the
>> On Tue, Sep 20, 2016 at 3:15 PM, Mrugesh Karnik 
>wrote:
>>> I'd like to introduce the Tendrl project. Tendrl aims to build a
>>> management interface for Gluster. We've pushed some documentation to
>> It might help to introduce Tendrl as the "Universal Storage Manager'"
>> with a possibility to either manage Ceph and/or Gluster.
>> I understand you want specific feedback, but a clear definition of
>the
>> tool would be helpful.
>>
>
>(Apologies for reposting my response - gmail injected html into what I
>thought 
>was a text reply and it bounced from ceph-devel.)
>
>Hi Gerard,
>
>I see the goal differently.
>
>It is better to think of tendryl as one component of a whole management
>
>application stack. At the bottom, we will have ceph specific components
>
>(ceph-mgr) and gluster specific components (glusterd), as well as other
>local 
>storage/file system components like libstoragemgt and so on.
>
>Tendryl is the next layer up from that, but it itself is meant to be
>consumed by 
>presentation layers. For a stand alone thing that we hope to use at Red
>Hat, 
>there will be a universal storage manager stack with everything I
>mentioned 
>above in it, as well as the GUI code.
>
>Other projects will hopefully find this useful enough and plug some or
>all of 
>the components into other management stacks.
>
>From my point of view, the job is to try to provide as much as possible
>
>re-usable components that will be generically interesting to a wide
>variety of 
>applications. It is definitely not about trying to make all storage
>stacks look 
>the same and force artificial new names/concepts/etc on the users. Of
>course, 
>any one application will tend to have a similar "skin" for UX elements
>to try 
>and make it consistent for users.
>
>If we do it right, people passionate about Ceph but who don't care
>about Gluster 
>will be able to be avoid getting tied up in something out of their
>interest. 
>Same going the other way around for Gluster developers who don't care
>or know 
>about Ceph. Over time, this might extend to other storage types like
>Samba or 
>NFS Ganesha clusters, etc.
>
>Regards,
>
>Ric
>
>
>
>
>___
>Gluster-devel mailing list
>Gluster-devel@gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Ric Wheeler

On 09/20/2016 10:23 AM, Gerard Braad wrote:

Hi Mrugesh,

On Tue, Sep 20, 2016 at 3:10 PM, Mrugesh Karnik  wrote:

I'd like to introduce the Tendrl project. Tendrl aims to build a
management interface for Ceph. We've pushed some documentation to the

On Tue, Sep 20, 2016 at 3:15 PM, Mrugesh Karnik  wrote:

I'd like to introduce the Tendrl project. Tendrl aims to build a
management interface for Gluster. We've pushed some documentation to

It might help to introduce Tendrl as the "Universal Storage Manager'"
with a possibility to either manage Ceph and/or Gluster.
I understand you want specific feedback, but a clear definition of the
tool would be helpful.



(Apologies for reposting my response - gmail injected html into what I thought 
was a text reply and it bounced from ceph-devel.)


Hi Gerard,

I see the goal differently.

It is better to think of tendryl as one component of a whole management 
application stack. At the bottom, we will have ceph specific components 
(ceph-mgr) and gluster specific components (glusterd), as well as other local 
storage/file system components like libstoragemgt and so on.


Tendryl is the next layer up from that, but it itself is meant to be consumed by 
presentation layers. For a stand alone thing that we hope to use at Red Hat, 
there will be a universal storage manager stack with everything I mentioned 
above in it, as well as the GUI code.


Other projects will hopefully find this useful enough and plug some or all of 
the components into other management stacks.


From my point of view, the job is to try to provide as much as possible 
re-usable components that will be generically interesting to a wide variety of 
applications. It is definitely not about trying to make all storage stacks look 
the same and force artificial new names/concepts/etc on the users. Of course, 
any one application will tend to have a similar "skin" for UX elements to try 
and make it consistent for users.


If we do it right, people passionate about Ceph but who don't care about Gluster 
will be able to be avoid getting tied up in something out of their interest. 
Same going the other way around for Gluster developers who don't care or know 
about Ceph. Over time, this might extend to other storage types like Samba or 
NFS Ganesha clusters, etc.


Regards,

Ric




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and FreeBSD

2016-09-20 Thread Nigel Babu
> > We did get some regular contributions to have Gluster function on
> > FreeBSD, but they seem to be more sporadic now. If nobody steps up, I
> > would suggest to keep compiling on FreeBSD, but nothing more. Maybe at a
> > later time someone shows more interest.
> >
> > NetBSD on the other hand already runs some of the regression tests. And
> > it seems to hit valid problems in the code that we for whatever lucky
> > reason do not hit (yet?) on Linux. I see some value in the NetBSD
> > environment, and if the infra team with help from Manu can keep it
> > up-to-date it would be good to have it running.
>
> +1 to this approach regarding BSDs.

I'm ok with this approach. Can we then do the following?
* Document the issues when deploying FreeBSD.
* Remove any references to us officially supporting FreeBSD (are there any?)
* Ask for contributions from the FreeBSD community, especially if someone wants
  to take over maintainership of the port.

--
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Heketi] How pushing to heketi happens - especially about squashing

2016-09-20 Thread Michael Adam


- Original Message -
> Hi Michael,
>   We have a new mailing list, it is gluster-devel with [Heketi] in the
> subject.

Hi Luis,

I know, which is what triggered that other mail. :-) It is not really a new
mailing-list but a [Heketi] tag to be used in an old mailing list,
and it was also said to use it for 'gluster related' discussions
in heketi and not for general heketi dev discussions, which kind of soft
and unclear to me.
I was suggesting a mailing list for *all* heketi development related
discussions. This stuff here is not necessary interesting for the broader
gluster development community. But since you pulled the mail over from the
internal list, to gluster-devel, I'm going to reply here. :-)

> On the concept of github. It is always interesting to compare what we
> know and how we are used to something to something new. In Github,
> we do not need to let Github squash at all.  I was doing that as a 'pilot'.

Yeah, my request was to drop that because it creates bad commit messages
and may not be what the author intended. It fiddles with the author's patches,
which a UI should never do imho. That's just scary.

> The real method is for patches to be added to a PR, and if too many
> patches are added, for the author to squash them, and send a new one.
> This is documented in the Development Guide in the Wiki.

Yeah, I am questioning that, because I think it is flawed:

1) "too many patches ==> squash" is imho the wrong decision point.
   A PR can have as many commits as the author wants, as long as
   each commit is logical and complete. I even generally encourage
   more atomic patches. So imho that rule 'many patches ==> squash'
   creates the wrong incentive for squashing.

2) After the squashing, the author should imho not create a new
   PR but update the existing one. It has all the context and history
   for the patchset.

> The author should also note that their first patch/commit sent as
> a PR is the information used as the PR.  Lots of PRs are being sent
> with almost no information, and I have let this happen because most
> people are still ramping up.

Note that the commit message is NOT the PR title. That only happens
if you use github to squash... So if we don't use github to squash,
the PR title and initial description are less important for the final result!

> There is no reason why commit messages cannot be as detailed as
> those from Gerrit. 

No idea what gerrit has to do with commit messages.
Commit messages always come from the author, unless you
let some sofware fiddle with it. ;-)

> Here is an example:
> https://github.com/heketi/heketi/pull/393 .

I am a big fan of long commit messages, myself guilty of commits
with a message much longer than the actual patch.

And the PR title is by default only the message of the first
commit in the series, I think. So one should add content describing
the proposed patchset when creating the PR. Full agreement here.

But I think we should not over-estimate the title/description of
the PR. E.g. here are two examples of PRs that had a separate
title/description and the actual more detailed info was in the
commit messages of the patches:

https://github.com/heketi/heketi/pull/477
https://github.com/heketi/heketi/pull/499

Those were mangled together by a github-squash-push.

> The process to update changes is to update the forked
> branch, and not to amend the same change.  Amending makes it impossible
> to determine the changes from patch to patch, and makes it extremely hard
> on reviewers (me).

Hmm. My request is exactly to do amends. It is just standard and good (imho)
and even necessary git workflow. See below for more details on that under
comments to point #3.

> Here are my thoughts on your questions below:
> 
> 1) The the review should not squash the authors commits unless
>the author explicitly requests or approves that.
> [lpabon] Absolutely.  The pilot, although it worked well technically,
> it confuses those who come from other source control systems.

Sorry it was getting kinda jet-laggy late so my words were not the best...
I wanted to say:

The reviewer should not squash (or otherwise change) the author's
commits unless explicitly requested or approved, irrespective of
the tool used for doing the changes.

> 2) We should avoid using github to merge because this creates
>bad commit messages.
> [lpabon] I'm not sure what you mean by this, but I would not
> "avoid" github in any way.  That is like saying "avoid Gerrit".

I wanted to say: "In particular, and especially, we should not
use Gerrit for doing squashes. It creates bad commit messages."
And yeah, I do explicitly mean "Avoid that aspect of github."
Like "stay away from any feature in github that change the
original commits". (Not sure if there is more lurking. ;-)

> 3) (As a consequence of the above,) If we push delta-patches
>to update PRs, that can usually not be the final push, but
>needs a final iteration of force-pushing an amended 

Re: [Gluster-devel] Gluster and FreeBSD

2016-09-20 Thread Vijay Bellur
On Tue, Sep 20, 2016 at 12:31 AM, Niels de Vos  wrote:
> On Tue, Sep 20, 2016 at 09:16:54AM +0530, Nigel Babu wrote:
>> On Fri, Sep 09, 2016 at 06:07:41PM +0530, Nithya Balachandran wrote:
>> > Hi,
>> >
>> > I recently debugged a problem [1] where linkfiles were not created properly
>> > a gluster volume created using bricks running UFS . Whenever a linkfile was
>> > created, the sticky bit was not set on it causing the same file to be
>> > listed twice.
>> >
>> > From  https://www.freebsd.org/cgi/man.cgi?query=chmod=2
>> >
>> > The FreeBSD VM system totally ignores the sticky bit (S_ISVTX) for
>> > executables. On UFS-based file systems (FFS, LFS) the sticky bit may only
>> > be set upon directories.
>> >
>> > Based on this I do not think we can support UFS bricks for gluster volumes.
>> > However, I have not worked with FreeBSD so I would like folks who have to
>> > let me know if this is correct or if there is something I am missing.
>> >
>> > I was able to force the sticky bit on a file using a testfile attached to
>> > [1] but it is not straightforward and I am reluctant to propose this.
>> >
>> > Thanks,
>> > Nithya
>> >
>> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1176011
>>
>> Giving this thread a signal boost. We should think about this if we're going 
>> to
>> continue to support *BSD.
>>
>> Emmanuel, I know you work on NetBSD, but do you have thoughts to add here?
>
> We did get some regular contributions to have Gluster function on
> FreeBSD, but they seem to be more sporadic now. If nobody steps up, I
> would suggest to keep compiling on FreeBSD, but nothing more. Maybe at a
> later time someone shows more interest.
>
> NetBSD on the other hand already runs some of the regression tests. And
> it seems to hit valid problems in the code that we for whatever lucky
> reason do not hit (yet?) on Linux. I see some value in the NetBSD
> environment, and if the infra team with help from Manu can keep it
> up-to-date it would be good to have it running.

+1 to this approach regarding BSDs.

-Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Heketi] Block store related API design discussion

2016-09-20 Thread Luis Pabón
Awesome, Thanks guys.

- Luis

- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Niels de Vos" 
Cc: "Luis Pabón" , "gluster-devel" 
, "Stephen Watt" , "Ramakrishna 
Yekulla" , "Humble Chirammal" 
Sent: Tuesday, September 20, 2016 5:53:30 AM
Subject: Re: [Gluster-devel] [Heketi] Block store related API design discussion

On Mon, Sep 19, 2016 at 9:22 PM, Niels de Vos  wrote:

> On Mon, Sep 19, 2016 at 10:31:11AM -0400, Luis Pabón wrote:
> > Using qemu is interesting, but the I/O should be using the IO path of
> QEMU block API.  If not,
> > TCMU would not know how to work with QEMU dynamic QCOW2 files.
> >
> > Now, if TCMU already has this, then that would be great!
>
> It has a qcow2 header, maybe you guys are lucky!
>   https://github.com/open-iscsi/tcmu-runner/blob/master/qcow2.h


Sent the earlier mail before seeing this mail :-). So yes, what we
discussed is to see if this qemu in tcmu can internally use gfapi for doing
the operations or not is something we are trying to find out.


>
>
> Niels
>
> >
> > - Luis
> >
> > - Original Message -
> > From: "Prasanna Kalever" 
> > To: "Niels de Vos" 
> > Cc: "Luis Pabón" , "Stephen Watt" ,
> "gluster-devel" , "Ramakrishna Yekulla" <
> rre...@redhat.com>, "Humble Chirammal" 
> > Sent: Monday, September 19, 2016 7:13:36 AM
> > Subject: Re: [Gluster-devel] [Heketi] Block store related API design
> discussion
> >
> > On Mon, Sep 19, 2016 at 4:09 PM, Niels de Vos  wrote:
> > >
> > > On Mon, Sep 19, 2016 at 03:34:29PM +0530, Prasanna Kalever wrote:
> > > > On Mon, Sep 19, 2016 at 10:13 AM, Niels de Vos 
> wrote:
> > > > > On Tue, Sep 13, 2016 at 12:06:00PM -0400, Luis Pabón wrote:
> > > > >> Very good points.  Thanks Prasanna for putting this together.  I
> agree with
> > > > >> your comments in that Heketi is the high level abstraction API
> and it should have
> > > > >> an API similar of what is described by Prasanna.
> > > > >>
> > > > >> I definitely do not think any File Api should be available in
> Heketi,
> > > > >> because that is an implementation of the Block API.  The Heketi
> API should
> > > > >> be similar to something like OpenStack Cinder.
> > > > >>
> > > > >> I think that the actual management of the Volumes used for Block
> storage
> > > > >> and the files in them should be all managed by Heketi.  How they
> are
> > > > >> actually created is still to be determined, but we could have
> Heketi
> > > > >> create them, or have helper programs do that.
> > > > >
> > > > > Maybe a tool like qemu-img? If whatever iscsi service understand
> the
> > > > > format (at the very least 'raw'), you could get functionality like
> > > > > snapshots pretty simple.
> > > >
> > > > Niels,
> > > >
> > > > This is brilliant and subset of the Idea falls in one among my
> > > > thoughts, only concern is about building dependencies of qemu with
> > > > Heketi.
> > > > But at an advantage of easy and cool snapshots solution.
> > >
> > > And well tested as I understand that oVirt is moving to use qemu-img as
> > > well. Other tools are able to use the qcow2 format, maybe the iscsi
> > > servce that gets used does so too.
> > >
> > > Has there already been a decision on what Heketi will configure as
> iscsi
> > > service? I am aware of the tgt [1] and LIO/TCMU [2] projects.
> >
> > Niels,
> >
> > yes we will be using TCMU (Kernel Module) and TCMU-runner (user space
> > service) to expose file in Gluster volume as an iSCSI target.
> > more at [1], [2] & [3]
> >
> > [1] https://pkalever.wordpress.com/2016/06/23/gluster-
> solution-for-non-shared-persistent-storage-in-docker-container/
> > [2] https://pkalever.wordpress.com/2016/06/29/non-shared-
> persistent-gluster-storage-with-kubernetes/
> > [3] https://pkalever.wordpress.com/2016/08/16/read-write-
> once-persistent-storage-for-openshift-origin-using-gluster/
> >
> > --
> > Prasanna
> >
> > >
> > > Niels
> > >
> > > 1. http://stgt.sourceforge.net/
> > > 2. https://github.com/open-iscsi/tcmu-runner
> > >http://blog.gluster.org/2016/04/using-lio-with-gluster/
> > >
> > > >
> > > > --
> > > > Prasanna
> > > >
> > > > >
> > > > > Niels
> > > > >
> > > > >
> > > > >> We also need to document the exact workflow to enable a file in
> > > > >> a Gluster volume to be exposed as a block device.  This will help
> > > > >> determine where the creation of the file could take place.
> > > > >>
> > > > >> We can capture our decisions from these discussions in the
> > > > >> following page:
> > > > >>
> > > > >> https://github.com/heketi/heketi/wiki/Proposed-Changes
> > > > >>
> > > > >> - Luis
> > > > >>
> > > > >>
> > > > >> - Original Message -
> > > > >> From: "Humble Chirammal" 

[Gluster-devel] [Heketi] How pushing to heketi happens - especially about squashing

2016-09-20 Thread Luis Pabón
Hi Michael,
  We have a new mailing list, it is gluster-devel with [Heketi] in the
subject.  I probably will add this to the communications wiki page.

On the concept of github. It is always interesting to compare what we
know and how we are used to something to something new. In Github,
we do not need to let Github squash at all.  I was doing that as a 'pilot'.
The real method is for patches to be added to a PR, and if too many
patches are added, for the author to squash them, and send a new one.
This is documented in the Development Guide in the Wiki.

The author should also note that their first patch/commit sent as
a PR is the information used as the PR.  Lots of PRs are being sent
with almost no information, and I have let this happen because most
people are still ramping up.

There is no reason why commit messages cannot be as detailed as
those from Gerrit.  Here is an example: 
https://github.com/heketi/heketi/pull/393 .

The process to update changes is to update the forked
branch, and not to amend the same change.  Amending makes it impossible
to determine the changes from patch to patch, and makes it extremely hard
on reviewers (me).

Here are my thoughts on your questions below:

1) The the review should not squash the authors commits unless
   the author explicitly requests or approves that.
[lpabon] Absolutely.  The pilot, although it worked well technically,
it confuses those who come from other source control systems.

2) We should avoid using github to merge because this creates
   bad commit messages.
[lpabon] I'm not sure what you mean by this, but I would not
"avoid" github in any way.  That is like saying "avoid Gerrit".

3) (As a consequence of the above,) If we push delta-patches
   to update PRs, that can usually not be the final push, but
   needs a final iteration of force-pushing an amended patchset.
[lpabon] Do not amend patches.

NOTE on amended patches.  If I notice another one, I will *not* merge
the change.  Sorry to be a pain about that, but it makes it almost
impossible to review.  This is not Gerrit, this is Github, it
is something new, but in my opinion, it is more natural git workflow.

- Luis

- Original Message -
From: "Michael Adam" 
To:"Luis Pabón" 
Sent: Tuesday, September 20, 2016 4:50:01 AM
Subject: [RFC] [upstream] How pushing to heketi happens - especially about 
squashing

Hi all, hi Luis,

Since we do not have a real upstream ML yet (see my other mail), I want
use this list now for discussing about the way patches are merged into
heketi upstream.

[ tl;dr ? --> look for "summing up" at the bottom... ;-) ]

This is after a few weeks of working on the projects with you all
especially with Luis, and seeing how he does the project. And there
have been a few surprises on both ends.

While I still don't fully like or trust the github UI, it is
for instance better than gerrit (But as José sais: "That bar
is really low..." ;-). One point where it is better is it can
deal with patchsets, i.e. multiple patches submitted as one PR.

But github has the feature of instead of squashing the patches
instead of merging the patches as they are. This can be useful
or remotely correct in one situation, but I think generally it
should be avoided for reasons detailed below.


So in this mail, I am sharing a few observations from the past
few weeks, and a few concerns or problems I am having. I think
it is important with the growing team to clearly formulate
how both reviewers and patch submitters expect the process to work.


At least when I propose a patchset, I propose it exactly the way
I send it. Coming from Samba and Gluster development, for me as a
contributor and as a reviewer, the content of the commits, i.e.
the actual diffs as well as the layout into patches and the commit
messages are 'sacred' in the sense that this is what the patch
submitter proposed and signed-off on for pushing. Hence the reviewer
should imho not change the general layout of patches (e.g. by squashing
them) without consulting the author. Here are two examples where
pull request with two patches were squashed with the heketi method:

https://github.com/heketi/heketi/commit/bbc513ef214c5ec81b6cdb0a3a024944c9fe12ba
https://github.com/heketi/heketi/commit/bccab2ee8f70f6862d9bfee3a8cbdf6e47b5a8bf

You see what github does: it prints the title of the PR as main commit
message and creates a bullet list of the original commit messages.
Hence, it really creates pretty bad commits (A commit called
"Two minor patches (#499)" - really??)... :-)
This is not how these were intended by the authors. The actual result of
how the commits looks like in git after they have been merged.
(Btw, I don't look at the git log / code in github: it is difficult to see
the relevant things there. I look at it in a local git checkout in the shell.
This is the "everlasting", "sacred" content.)

So this tells me that:

1) Patches should not be squashed without consulting the author,
   

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Jeff Darcy
> If I understood brick-multiplexing correctly, add-brick/remove-brick
> add/remove graphs right? I don't think the grah-cleanup is in good
> shape, i.e. it should lead to memory leaks etc. Did you get a chance
> to think about it?

I haven't tried to address memory leaks specifically, but most of my
work has been fixing bugs have been latent for ages but weren't biting
us for one reason or another.  For example:

 * Clients weren't reconnecting properly if setvolume failed (as opposed
   to the connection itself failing).

 * FUSE wasn't updating to use a new graph in the proper place, causing
   requests to be sent down to the old graph (where they'd get stuck).

Those might sound simple, but each required hours of debugging to arrive
at a diagnosis and fix - and there are several more.  When I'm done
fixing these kinds of preexisting functional problems, I'll look into
preexisting memory leaks.  Thanks for the heads-up.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Minutes: Gluster Community Bug Triage meeting (Today)

2016-09-20 Thread Hari Gowtham
Hi all,

Today's triage meeting has been postponed to next week.

Meeting ended Tue Sep 20 12:06:01 2016 UTC. Information about MeetBot at 
http://wiki.debian.org/MeetBot .
Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-09-20/gluster_community_bug_triage_meeting.2016-09-20-12.00.html
Minutes (text): 
https://meetbot.fedoraproject.org/gluster-meeting/2016-09-20/gluster_community_bug_triage_meeting.2016-09-20-12.00.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-09-20/gluster_community_bug_triage_meeting.2016-09-20-12.00.log.html

- Forwarded Message -
> From: "Hari Gowtham" 
> To: "gluster-devel" 
> Sent: Tuesday, September 20, 2016 3:13:19 PM
> Subject: [Gluster-devel] REMINDER: Gluster Community Bug Triage meeting 
> (Today)
> 
> Hi all,
> 
> The weekly Gluster bug triage is about to take place in two hours
> 
> Meeting details:
> - location: #gluster-meeting on Freenode IRC
> ( https://webchat.freenode.net/?channels=gluster-meeting )
> - date: every Tuesday
> - time: 12:00 UTC
> (in your terminal, run: date -d "12:00 UTC")
> - agenda: https://public.pad.fsfe.org/p/gluster-bug-triage
> 
> Currently the following items are listed:
> * Roll Call
> * Status of last weeks action items
> * Group Triage
> * Open Floor
> 
> Appreciate your participation.
> 
> --
> Regards,
> Hari.
> 
> 

-- 
Regards, 
Hari. 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Jeff Darcy
> That's weird, since the only purpose of the mem-pool was precisely to
> improve performance of allocation of objects that are frequently
> allocated/released.

Very true, and I've long been an advocate of this approach.
Unfortunately, for this to work our allocator has to be more efficient
than the system's, and it's not - especially wrt locking.  Overhead is
high and contention is even higher, heavily outweighing any advantage.
Unless/until we put in the work to make mem-pools perform better at high
thread counts, avoiding them seems like the practical choice.

> * Consider http://review.gluster.org/15036/. With all communications
> going through the same socket, the problem this patch tries to solve
> could become worse.

I'll look into this.  Thanks!

> * We should consider the possibility of implementing a global thread
> pool, which would replace io-threads, epoll threads and maybe others.
> Synctasks should also rely on this thread pool. This has the benefit
> of better controlling the total number of threads. Otherwise when we
> have more threads than processor cores, we waste resources
> unnecessarily and we won't get a real gain. Even worse, it could start
> to degrade due to contention.

Also a good idea, though perhaps too hard/complex to tackle in the short
term.  I did take a stab at making io-threads use a single global set of
queues instead of per instance, to address a similar concern.  To make a
long story short, it didn't seem to make things any better for this
test.  I still think it's a good idea, though.

> * There are *too many* mutexes in the code.

Hear, hear.

> We should drastically reduce its use. Sometimes by using better
> structures that do not require blocking at all or even introducing RCU
> and/or rwlocks. One case that I've always had doubts is dict_t. Why
> does it need locks ? Once xlator should not modify a dict_t once it
> has been passed to another xlator, and if we assume that a dict can
> only be modified by a single xlator at a time, it's very unlikely that
> it needs to modify it from multiple threads.

I think in general you're right about dicts, but I also think it would
be interesting to disable dict locking and see what breaks.  I'll bet
there's something *somewhere* that tries to access dicts concurrently.
Callbacks for children of a cluster translator using the "fan out"
pattern seem particularly suspect.  What worries me is the classic
problem with race conditions; it's easy to have something that *appears*
to work when things aren't running in parallel enough to hit tiny timing
windows, but it's a lot harder to be *sure* you're safe even when they
do.  I think I'd lean toward a more conservative approach of finding the
particularly egregious high-contention cases, examining those particular
code paths carefully, and changing them to use a lock-free dict variant
or alternative.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Ric Wheeler
Tendryl is a component that can be used as part of what would be a new
management tool, but the project scope is not that of the whole stack that
would form a universal storage manager.

Regards,

Ric

On Sep 20, 2016 10:23, "Gerard Braad"  wrote:

> Hi Mrugesh,
>
> On Tue, Sep 20, 2016 at 3:10 PM, Mrugesh Karnik 
> wrote:
> > I'd like to introduce the Tendrl project. Tendrl aims to build a
> > management interface for Ceph. We've pushed some documentation to the
>
> On Tue, Sep 20, 2016 at 3:15 PM, Mrugesh Karnik 
> wrote:
> > I'd like to introduce the Tendrl project. Tendrl aims to build a
> > management interface for Gluster. We've pushed some documentation to
>
> It might help to introduce Tendrl as the "Universal Storage Manager'"
> with a possibility to either manage Ceph and/or Gluster.
> I understand you want specific feedback, but a clear definition of the
> tool would be helpful.
>
>
> regards,
>
>
> Gerard
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Heketi] Block store related API design discussion

2016-09-20 Thread Pranith Kumar Karampuri
On Mon, Sep 19, 2016 at 9:22 PM, Niels de Vos  wrote:

> On Mon, Sep 19, 2016 at 10:31:11AM -0400, Luis Pabón wrote:
> > Using qemu is interesting, but the I/O should be using the IO path of
> QEMU block API.  If not,
> > TCMU would not know how to work with QEMU dynamic QCOW2 files.
> >
> > Now, if TCMU already has this, then that would be great!
>
> It has a qcow2 header, maybe you guys are lucky!
>   https://github.com/open-iscsi/tcmu-runner/blob/master/qcow2.h


Sent the earlier mail before seeing this mail :-). So yes, what we
discussed is to see if this qemu in tcmu can internally use gfapi for doing
the operations or not is something we are trying to find out.


>
>
> Niels
>
> >
> > - Luis
> >
> > - Original Message -
> > From: "Prasanna Kalever" 
> > To: "Niels de Vos" 
> > Cc: "Luis Pabón" , "Stephen Watt" ,
> "gluster-devel" , "Ramakrishna Yekulla" <
> rre...@redhat.com>, "Humble Chirammal" 
> > Sent: Monday, September 19, 2016 7:13:36 AM
> > Subject: Re: [Gluster-devel] [Heketi] Block store related API design
> discussion
> >
> > On Mon, Sep 19, 2016 at 4:09 PM, Niels de Vos  wrote:
> > >
> > > On Mon, Sep 19, 2016 at 03:34:29PM +0530, Prasanna Kalever wrote:
> > > > On Mon, Sep 19, 2016 at 10:13 AM, Niels de Vos 
> wrote:
> > > > > On Tue, Sep 13, 2016 at 12:06:00PM -0400, Luis Pabón wrote:
> > > > >> Very good points.  Thanks Prasanna for putting this together.  I
> agree with
> > > > >> your comments in that Heketi is the high level abstraction API
> and it should have
> > > > >> an API similar of what is described by Prasanna.
> > > > >>
> > > > >> I definitely do not think any File Api should be available in
> Heketi,
> > > > >> because that is an implementation of the Block API.  The Heketi
> API should
> > > > >> be similar to something like OpenStack Cinder.
> > > > >>
> > > > >> I think that the actual management of the Volumes used for Block
> storage
> > > > >> and the files in them should be all managed by Heketi.  How they
> are
> > > > >> actually created is still to be determined, but we could have
> Heketi
> > > > >> create them, or have helper programs do that.
> > > > >
> > > > > Maybe a tool like qemu-img? If whatever iscsi service understand
> the
> > > > > format (at the very least 'raw'), you could get functionality like
> > > > > snapshots pretty simple.
> > > >
> > > > Niels,
> > > >
> > > > This is brilliant and subset of the Idea falls in one among my
> > > > thoughts, only concern is about building dependencies of qemu with
> > > > Heketi.
> > > > But at an advantage of easy and cool snapshots solution.
> > >
> > > And well tested as I understand that oVirt is moving to use qemu-img as
> > > well. Other tools are able to use the qcow2 format, maybe the iscsi
> > > servce that gets used does so too.
> > >
> > > Has there already been a decision on what Heketi will configure as
> iscsi
> > > service? I am aware of the tgt [1] and LIO/TCMU [2] projects.
> >
> > Niels,
> >
> > yes we will be using TCMU (Kernel Module) and TCMU-runner (user space
> > service) to expose file in Gluster volume as an iSCSI target.
> > more at [1], [2] & [3]
> >
> > [1] https://pkalever.wordpress.com/2016/06/23/gluster-
> solution-for-non-shared-persistent-storage-in-docker-container/
> > [2] https://pkalever.wordpress.com/2016/06/29/non-shared-
> persistent-gluster-storage-with-kubernetes/
> > [3] https://pkalever.wordpress.com/2016/08/16/read-write-
> once-persistent-storage-for-openshift-origin-using-gluster/
> >
> > --
> > Prasanna
> >
> > >
> > > Niels
> > >
> > > 1. http://stgt.sourceforge.net/
> > > 2. https://github.com/open-iscsi/tcmu-runner
> > >http://blog.gluster.org/2016/04/using-lio-with-gluster/
> > >
> > > >
> > > > --
> > > > Prasanna
> > > >
> > > > >
> > > > > Niels
> > > > >
> > > > >
> > > > >> We also need to document the exact workflow to enable a file in
> > > > >> a Gluster volume to be exposed as a block device.  This will help
> > > > >> determine where the creation of the file could take place.
> > > > >>
> > > > >> We can capture our decisions from these discussions in the
> > > > >> following page:
> > > > >>
> > > > >> https://github.com/heketi/heketi/wiki/Proposed-Changes
> > > > >>
> > > > >> - Luis
> > > > >>
> > > > >>
> > > > >> - Original Message -
> > > > >> From: "Humble Chirammal" 
> > > > >> To: "Raghavendra Talur" 
> > > > >> Cc: "Prasanna Kalever" , "gluster-devel" <
> gluster-devel@gluster.org>, "Stephen Watt" , "Luis
> Pabon" , "Michael Adam" ,
> "Ramakrishna Yekulla" , "Mohamed Ashiq Liyazudeen" <
> mliya...@redhat.com>
> > > > >> Sent: Tuesday, September 13, 2016 2:23:39 AM
> > > > >> Subject: Re: [Gluster-devel] 

Re: [Gluster-devel] [Heketi] Block store related API design discussion

2016-09-20 Thread Pranith Kumar Karampuri
On Mon, Sep 19, 2016 at 10:13 AM, Niels de Vos  wrote:

> On Tue, Sep 13, 2016 at 12:06:00PM -0400, Luis Pabón wrote:
> > Very good points.  Thanks Prasanna for putting this together.  I agree
> with
> > your comments in that Heketi is the high level abstraction API and it
> should have
> > an API similar of what is described by Prasanna.
> >
> > I definitely do not think any File Api should be available in Heketi,
> > because that is an implementation of the Block API.  The Heketi API
> should
> > be similar to something like OpenStack Cinder.
> >
> > I think that the actual management of the Volumes used for Block storage
> > and the files in them should be all managed by Heketi.  How they are
> > actually created is still to be determined, but we could have Heketi
> > create them, or have helper programs do that.
>
> Maybe a tool like qemu-img? If whatever iscsi service understand the
> format (at the very least 'raw'), you could get functionality like
> snapshots pretty simple.
>

Prasanna, Poornima and I just discussed about this. Prasanna is doing this
experiment to see if we can use qcow from tcmu-runner to get this piece
working. If yes, we definitely will get snapshots for free :-). Prasanna
will confirm it based on his experiments.


>
> Niels
>
>
> > We also need to document the exact workflow to enable a file in
> > a Gluster volume to be exposed as a block device.  This will help
> > determine where the creation of the file could take place.
> >
> > We can capture our decisions from these discussions in the
> > following page:
> >
> > https://github.com/heketi/heketi/wiki/Proposed-Changes
> >
> > - Luis
> >
> >
> > - Original Message -
> > From: "Humble Chirammal" 
> > To: "Raghavendra Talur" 
> > Cc: "Prasanna Kalever" , "gluster-devel" <
> gluster-devel@gluster.org>, "Stephen Watt" , "Luis
> Pabon" , "Michael Adam" ,
> "Ramakrishna Yekulla" , "Mohamed Ashiq Liyazudeen" <
> mliya...@redhat.com>
> > Sent: Tuesday, September 13, 2016 2:23:39 AM
> > Subject: Re: [Gluster-devel] [Heketi] Block store related API design
> discussion
> >
> >
> >
> >
> >
> > - Original Message -
> > | From: "Raghavendra Talur" 
> > | To: "Prasanna Kalever" 
> > | Cc: "gluster-devel" , "Stephen Watt" <
> sw...@redhat.com>, "Luis Pabon" ,
> > | "Michael Adam" , "Humble Chirammal" <
> hchir...@redhat.com>, "Ramakrishna Yekulla"
> > | , "Mohamed Ashiq Liyazudeen" 
> > | Sent: Tuesday, September 13, 2016 11:08:44 AM
> > | Subject: Re: [Gluster-devel] [Heketi] Block store related API design
> discussion
> > |
> > | On Mon, Sep 12, 2016 at 11:30 PM, Prasanna Kalever <
> pkale...@redhat.com>
> > | wrote:
> > |
> > | > Hi all,
> > | >
> > | > This mail is open for discussion on gluster block store integration
> with
> > | > heketi and its REST API interface design constraints.
> > | >
> > | >
> > | >  ___ Volume Request ...
> > | > |
> > | > |
> > | > PVC claim -> Heketi --->|
> > | > |
> > | > |
> > | > |
> > | > |
> > | > |__ BlockCreate
> > | > |   |
> > | > |   |__ BlockInfo
> > | > |   |
> > | > |___ Block Request (APIS)-> |__ BlockResize
> > | > |
> > | > |__ BlockList
> > | > |
> > | > |__ BlockDelete
> > | >
> > | > Heketi will have block API and volume API, when user submit a
> Persistent
> > | > volume claim, Kubernetes provisioner based on the storage class(from
> PVC)
> > | > talks to heketi for storage, heketi intern calls block or volume
> API's
> > | > based on request.
> > | >
> > |
> > | This is probably wrong. It won't be Heketi calling block or volume
> APIs. It
> > | would be Kubernetes calling block or volume API *of* Heketi.
> > |
> > |
> > | > With my limited understanding, heketi currently creates clusters from
> > | > provided nodes, creates volumes and handover them to the user.
> > | > For block related API's, it has to deal with files right ?
> > | >
> > | > Here is how block API's look like in short-
> > | > Create: heketi has to create file in the volume and export it as a
> iscsi
> > | > target device and hand it over to user.
> > | > Info: show block stores information across all the clusters,
> 

[Gluster-devel] REMINDER: Gluster Community Bug Triage meeting (Today)

2016-09-20 Thread Hari Gowtham
Hi all,

The weekly Gluster bug triage is about to take place in two hours

Meeting details:
- location: #gluster-meeting on Freenode IRC
( https://webchat.freenode.net/?channels=gluster-meeting )
- date: every Tuesday
- time: 12:00 UTC  
(in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

Appreciate your participation.

-- 
Regards, 
Hari. 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Pranith Kumar Karampuri
Jeff,
If I understood brick-multiplexing correctly,
add-brick/remove-brick add/remove graphs right? I don't think the
grah-cleanup is in good shape, i.e. it should lead to memory leaks etc. Did
you get a chance to think about it?

On Mon, Sep 19, 2016 at 6:56 PM, Jeff Darcy  wrote:

> I have brick multiplexing[1] functional to the point that it passes all
> basic AFR, EC, and quota tests.  There are still some issues with tiering,
> and I wouldn't consider snapshots functional at all, but it seemed like a
> good point to see how well it works.  I ran some *very simple* tests with
> 20 volumes, each 2x distribute on top of 2x replicate.
>
> First, the good news: it worked!  Getting 80 bricks to come up in the same
> process, and then run I/O correctly across all of those, is pretty cool.
> Also, memory consumption is *way* down.  RSS size went from 1.1GB before
> (total across 80 processes) to about 400MB (one process) with
> multiplexing.  Each process seems to consume approximately 8MB globally
> plus 5MB per brick, so (8+5)*80 = 1040 vs. 8+(5*80) = 408.  Just
> considering the amount of memory, this means we could support about three
> times as many bricks as before.  When memory *contention* is considered,
> the difference is likely to be even greater.
>
> Bad news: some of our code doesn't scale very well in terms of CPU use.
> To test performance I ran a test which would create 20,000 files across all
> 20 volumes, then write and delete them, all using 100 client threads.  This
> is similar to what smallfile does, but deliberately constructed to use a
> minimum of disk space - at any given, only one file per thread (maximum)
> actually has 4KB worth of data in it.  This allows me to run it against
> SSDs or even ramdisks even with high brick counts, to factor out slow disks
> in a study of CPU/memory issues.  Here are some results and observations.
>
> * On my first run, the multiplexed version of the test took 77% longer to
> run than the non-multiplexed version (5:42 vs. 3:13).  And that was after
> I'd done some hacking to use 16 epoll threads.  There's something a bit
> broken about trying to set that option normally, so that the value you set
> doesn't actually make it to the place that tries to spawn the threads.
> Bumping this up further to 32 threads didn't seem to help.
>
> * A little profiling showed me that we're spending almost all of our time
> in pthread_spin_lock.  I disabled the code to use spinlocks instead of
> regular mutexes, which immediately improved performance and also reduced
> CPU time by almost 50%.
>
> * The next round of profiling showed that a lot of the locking is in
> mem-pool code, and a lot of that in turn is from dictionary code.  Changing
> the dict code to use malloc/free instead of mem_get/mem_put gave another
> noticeable boost.
>
> At this point run time was down to 4:50, which is 20% better than where I
> started but still far short of non-multiplexed performance.  I can drive
> that down still further by converting more things to use malloc/free.
> There seems to be a significant opportunity here to improve performance -
> even without multiplexing - by taking a more careful look at our
> memory-management strategies:
>
> * Tune the mem-pool implementation to scale better with hundreds of
> threads.
>
> * Use mem-pools more selectively, or even abandon them altogether.
>
> * Try a different memory allocator such as jemalloc.
>
> I'd certainly appreciate some help/collaboration in studying these options
> further.  It's a great opportunity to make a large impact on overall
> performance without a lot of code or specialized knowledge.  Even so,
> however, I don't think memory management is our only internal scalability
> problem.  There must be something else limiting parallelism, and quite
> severely at that.  My first guess is io-threads, so I'll be looking into
> that first, but if anybody else has any ideas please let me know.  There's
> no *good* reason why running many bricks in one process should be slower
> than running them in separate processes.  If it remains slower, then the
> limit on the number of bricks and volumes we can support will remain
> unreasonably low.  Also, the problems I'm seeing here probably don't *only*
> affect multiplexing.  Excessive memory/CPU use and poor parallelism are
> issues that we kind of need to address anyway, so if anybody has any ideas
> please let me know.
>
>
>
> [1] http://review.gluster.org/#/c/14763/
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Xavier Hernandez



On 19/09/16 15:26, Jeff Darcy wrote:

I have brick multiplexing[1] functional to the point that it passes all basic 
AFR, EC, and quota tests.  There are still some issues with tiering, and I 
wouldn't consider snapshots functional at all, but it seemed like a good point 
to see how well it works.  I ran some *very simple* tests with 20 volumes, each 
2x distribute on top of 2x replicate.

First, the good news: it worked!  Getting 80 bricks to come up in the same 
process, and then run I/O correctly across all of those, is pretty cool.  Also, 
memory consumption is *way* down.  RSS size went from 1.1GB before (total 
across 80 processes) to about 400MB (one process) with multiplexing.  Each 
process seems to consume approximately 8MB globally plus 5MB per brick, so 
(8+5)*80 = 1040 vs. 8+(5*80) = 408.  Just considering the amount of memory, 
this means we could support about three times as many bricks as before.  When 
memory *contention* is considered, the difference is likely to be even greater.

Bad news: some of our code doesn't scale very well in terms of CPU use.  To 
test performance I ran a test which would create 20,000 files across all 20 
volumes, then write and delete them, all using 100 client threads.  This is 
similar to what smallfile does, but deliberately constructed to use a minimum 
of disk space - at any given, only one file per thread (maximum) actually has 
4KB worth of data in it.  This allows me to run it against SSDs or even 
ramdisks even with high brick counts, to factor out slow disks in a study of 
CPU/memory issues.  Here are some results and observations.

* On my first run, the multiplexed version of the test took 77% longer to run 
than the non-multiplexed version (5:42 vs. 3:13).  And that was after I'd done 
some hacking to use 16 epoll threads.  There's something a bit broken about 
trying to set that option normally, so that the value you set doesn't actually 
make it to the place that tries to spawn the threads.  Bumping this up further 
to 32 threads didn't seem to help.

* A little profiling showed me that we're spending almost all of our time in 
pthread_spin_lock.  I disabled the code to use spinlocks instead of regular 
mutexes, which immediately improved performance and also reduced CPU time by 
almost 50%.

* The next round of profiling showed that a lot of the locking is in mem-pool 
code, and a lot of that in turn is from dictionary code.  Changing the dict 
code to use malloc/free instead of mem_get/mem_put gave another noticeable 
boost.


That's weird, since the only purpose of the mem-pool was precisely to 
improve performance of allocation of objects that are frequently 
allocated/released.




At this point run time was down to 4:50, which is 20% better than where I 
started but still far short of non-multiplexed performance.  I can drive that 
down still further by converting more things to use malloc/free.  There seems 
to be a significant opportunity here to improve performance - even without 
multiplexing - by taking a more careful look at our memory-management 
strategies:

* Tune the mem-pool implementation to scale better with hundreds of threads.

* Use mem-pools more selectively, or even abandon them altogether.

* Try a different memory allocator such as jemalloc.

I'd certainly appreciate some help/collaboration in studying these options 
further.  It's a great opportunity to make a large impact on overall 
performance without a lot of code or specialized knowledge.  Even so, however, 
I don't think memory management is our only internal scalability problem.  
There must be something else limiting parallelism, and quite severely at that.  
My first guess is io-threads, so I'll be looking into that first, but if 
anybody else has any ideas please let me know.  There's no *good* reason why 
running many bricks in one process should be slower than running them in 
separate processes.  If it remains slower, then the limit on the number of 
bricks and volumes we can support will remain unreasonably low.  Also, the 
problems I'm seeing here probably don't *only* affect multiplexing.  Excessive 
memory/CPU use and poor parallelism are issues that we kind of need to address 
anyway, so if anybody has any ideas please let me know.


You have made a really good job :)

Some points I would look into:

* Consider http://review.gluster.org/15036/. With all communications 
going through the same socket, the problem this patch tries to solve 
could become worse.


* We should consider the possibility of implementing a global thread 
pool, which would replace io-threads, epoll threads and maybe others. 
Synctasks should also rely on this thread pool. This has the benefit of 
better controlling the total number of threads. Otherwise when we have 
more threads than processor cores, we waste resources unnecessarily and 
we won't get a real gain. Even worse, it could start to degrade due to 
contention.


* There are *too many* mutexes in the code. We should 

Re: [Gluster-devel] Jenkins Jobs on Gerrit

2016-09-20 Thread Nigel Babu
On Mon, Sep 12, 2016 at 08:04:37AM +0200, Niels de Vos wrote:
> Ah, ok, so the repository in GitHub will not be a mirror of the one in
> Gerrit that contains the JJB files? Do you plan to have the new
> repository (in Gerrit) also push to a repository on GitHub?

I do not at this point intend to push this repo to Github. Part of this is
trying to use Gerrit as the single source truth for our inrfa-related repos.
This is a commitment I made during the Gerrit upgrade.

Part of the gaps to fill up will be covered by cgit so that you can visually
see the repo through the web without cloning it.

> It seems you posted an incomplete URL in the 1st email, the project in
> Gerrit would be http://review.gluster.org/#/admin/projects/build-jobs

Indeed, thank you for pointing to the right one!


--
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and FreeBSD

2016-09-20 Thread Nigel Babu
On Tue, Sep 20, 2016 at 07:51:56AM +, Emmanuel Dreyfus wrote:
>
> An attempt to clarify some apparent confusion: Despite theit very similar
> names, *BSD are not different distributions of the same software like
> Linux distributions are. NetBSD and FreeBSD are distinct operating systems,
> with theit own kernels and userlands that diverged from a common ancestor
> 23 years ago.
>
> This is why you should not take FreeBSD behaviors for granted on NetBSD,
> and vice-versa.

Noted. I was making sure it didn't affect FreeBSD for sure.

--
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and FreeBSD

2016-09-20 Thread Emmanuel Dreyfus
On Tue, Sep 20, 2016 at 09:16:54AM +0530, Nigel Babu wrote:
> Giving this thread a signal boost. We should think about this if we're going 
> to
> continue to support *BSD.

An attempt to clarify some apparent confusion: Despite theit very similar
names, *BSD are not different distributions of the same software like 
Linux distributions are. NetBSD and FreeBSD are distinct operating systems,
with theit own kernels and userlands that diverged from a common ancestor
23 years ago.

This is why you should not take FreeBSD behaviors for granted on NetBSD, 
and vice-versa. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Introducing Tendrl

2016-09-20 Thread Gerard Braad
Hi Mrugesh,

On Tue, Sep 20, 2016 at 3:10 PM, Mrugesh Karnik  wrote:
> I'd like to introduce the Tendrl project. Tendrl aims to build a
> management interface for Ceph. We've pushed some documentation to the

On Tue, Sep 20, 2016 at 3:15 PM, Mrugesh Karnik  wrote:
> I'd like to introduce the Tendrl project. Tendrl aims to build a
> management interface for Gluster. We've pushed some documentation to

It might help to introduce Tendrl as the "Universal Storage Manager'"
with a possibility to either manage Ceph and/or Gluster.
I understand you want specific feedback, but a clear definition of the
tool would be helpful.


regards,


Gerard
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Introducing Tendrl

2016-09-20 Thread Mrugesh Karnik
Hi all,

I'd like to introduce the Tendrl project. Tendrl aims to build a
management interface for Gluster. We've pushed some documentation to
the documentation repository[1]. The documentation should provide an
understanding of the architecture and the components therein. This is
still a work in progress. So please feel free to ask questions and
make suggestions via the mailing list[2] and Github Issues[3]. There's
an IRC channel[4] as well.

Thanks.

[1] https://github.com/Tendrl/documentation
[2] https://www.redhat.com/mailman/listinfo/tendrl-devel
[3] https://github.com/Tendrl/documentation/issues
[4] #tendrl-devel on Freenode

-- 
Mrugesh
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel