1. Better protection for split-brain over time.
2. Policy based split-brain resolution.
3. Provide better availability with client quorum and replica 2.
I would add the following:
(4) Quorum enforcement - any kind - on by default.
(5) Fix the problem of volumes losing quorum because
Constantly filtering requests to use either N or N+1 bricks is going to be
complicated and hard to debug. Every data-structure allocation or loop
based on replica count will have to be examined, and many will have to be
modified. That's a *lot* of places. This also overlaps
One of the things holding up our data classification efforts (which include
tiering but also other stuff as well) has been the extension of the same
conceptual model from the I/O path to the configuration subsystem and
ultimately to the user experience. How does an administrator define a
One of my tasks for 3.6 is to update/improve the SSL code. Long ago, I
had decided that part of the next major update to SSL should include
switching from OpenSSL to PolarSSL. Why? Two reasons.
(1) The OpenSSL API is awful, and poorly documented to boot. We have to
go through some rather
I think the main question regards CentOS support, with further questions
about Debian/Ubuntu support.
I believe CentOS would leverage the EPEL support. PolarSSL is already
packaged for Debian (Wheezy) and Ubuntu (Trusty) so we should be set.
If we have to ship PolarSSL packages with our
My only concern is its 'pure' GPLv2+ license — is that compatible with
with our 'GPLv2 or LGPLv3+' license.
The answer that matters, as always, is that only a real lawyer can say.
My own uninformed guess is that we would be considered a derivative of
them (instead of vice versa) and thus we'd
The only thing that I find that may be an issue for some use cases is
https://polarssl.org/kb/generic/is-polarssl-fips-certified
Not meaning to sound flippant, but if we ever did seek FIPS
certification I suspect that our choice of SSL library would be the
least of our worries.
In addition, I would recommend considering using --xml and that
picking out the field you want to look at with a quick xml parser, and
then going with that directly. More stable than watching for a
specific return code.
Parsing XML to get one bit of information doesn't seem like a great
idea.
I see that most of the tests are doing umount and these may fail
sometimes because of EBUSY etc. I am wondering if we should change all
of them to umount -l.
Let me know if you foresee any problems.
I think I'd try umount -f first. Using -l too much can cause an
accumulation of zombie
Can't thank you enough for this :-)
+100
Justin has done a lot of hard, tedious work whipping this infrastructure into
better shape, and has significantly improved the project as a result. Such
efforts deserve to be recognized. Justin, I owe you a beer.
I have several patches queued up for 3.6, which have all passed
regression tests. Unfortunately, they're all in areas where our
resources are pretty thin, so getting the required +1 reviews
is proving to be a challenge. The patches are as follows:
* For heterogeneous bricks [1]
that doesn't eliminate some of these cases in favor of
tiering and nothing else, that would be great.
- Original Message -
From: Jeff Darcy jda...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Friday, May 23, 2014 3:30:39 PM
Subject: [Gluster-devel] Data classification
Am I right if I understood that the value for media-type is not
interpreted beyond the scope of matching rules? That is to say, we
don't need/have any notion of media-types that type check internally
for forming (sub)volumes using the rules specified.
Exactly. To us it's just an opaque ID.
Its possible to express your example using lists if their entries are allowed
to overlap. I see that you wanted a way to express a matrix (overlapping
rules) with gluster's tree-like syntax as backdrop.
A polytree may be a better term than matrix (DAG without cycles), i.e. when
there are
If I understand correctly the proposed data-classification
architecture, each server will have a number of bricks that will be
dynamically modified as needed: as more data-classifying conditions
are defined, a new layer of translators will be added (a new DHT or
AFR, or something else) and
Justin asked me, as the group's official Grumpy Old Man, to send a note
reminding people about the importance of reviewing patches early. Here
it is. As I see it, we've historically had two problems with reviews.
(1) Patches that don't get reviewed at all.
(2) Patches that have to be re-worked
There is an improved implementation of trash in gerrit and can help get
more traction with more reviews, rebases etc.
I see nine patches for this, all failing verification and all but one
inactive since March 10. Given our review rate, is this likely to
converge in only a week?
* RDMA
Maybe we should 2/dev/null ?
I don't like that approach because it might mask real errors as
well. If we're stuck with it then c'est la vie, but wherever
possible I recommend status=none instead.
___
Gluster-devel mailing list
Does the BSD version also support status=none
It seems to be msgfmt=quiet
We can define a dd shell function that does /bin/dd status=none or
/bin/dd msgfmt=quiet depending of the system
That definitely looks like the way to go. Thanks!
___
When submitting patches where there is an/some obvious person(s) to blame,
is it OK/desirable to request them as Code-Reviewers in gerrit?
Inviting reviewers with clear interest or knowledge in a piece of code is
not only OK but recommended. I think blame might not be a good idea,
though.
One possible solution is to convert directories into files managed by
storage/posix (some changes will also be required in dht and afr
probably). We will have full control about the format of this file,
so we'll be able to use the directory offset that we want to avoid
interferences with
Isn't some of this covered by crm/corosync/pacemaker/heartbeat?
Sorta, kinda, mostly no. Those implement virtual synchrony, which is
closely related to consensus but not quite the same even in a formal CS
sense. In practice, using them is *very* different. Two jobs ago, I
inherited a design
As part of the first phase, we aim to delegate the distributed configuration
store. We are exploring consul [1] as a replacement for the existing
distributed configuration store (sum total of /var/lib/glusterd/* across all
nodes). Consul provides distributed configuration store which is
Two characteristics of a language (tool chain) are important to me,
especially
when you spend a good part of your time debugging failures/bugs.
- Analysing core files.
- Ability to reason about space consumption. This becomes important in
the case of garbage collected languages.
I
Is there any reason not to consider zookeeper?
I did bring up that idea a while ago. I'm no Java fan myself, but still
I was surprised by the vehemence of the reactions. To put it politely,
many seemed to consider the dependency on Java unacceptable for both
resource and security reasons.
Hi guys, I wanted to share my experiences with Go. I have been using it
for the past few months and I have to say I am very impressed. Instead
of writing a massive email I created a blog entry:
http://goo.gl/g9abOi
Fantastic. Thanks, Luis!
___
Yes. I came across Salt currently for unified management for storage to
manage gluster and ceph which is still in planning phase. I could think of
a complete requirement of infra requirement to solve from glusterd to
unified management. Calamari ceph management already uses Salt. It would
Has anyone looked into whether LogCabin can provide the consistent small
storage based on RAFT for Gluster?
https://github.com/logcabin/logcabin
I have no experience with using it so I cannot say if it is good or suitable.
I do know the following project uses it and it's just not as
I'm curious about the current status of the New Style Replication (NSR)
feature.
I try to evaluate a log-based recovery by using the latest NSR code.
Please tell me if there are any known issues.
I'm currently trying to revive the NSR code, which was last worked on at
the end of April. At
Awesome, so now we can finally have proper add nodes as you go
setups without having to rebalance etc. Sweet!
I hate to be Mr. Negativity, but as the author of that patch I think it
behooves me to point out that things aren't quite that good yet. The
patch doesn't remove the need for
In the last few days, I've run into a couple of misunderstandings about
the status of some projects I've worked on. At first I set out to
correct those misunderstandings, but then I realized it wouldn't be a
bad idea to keep posting status to this llist periodically. Can't hurt,
anyway. This is
I might be misunderstanding some of this, but I thought that the way
of doing SSL (as you say, two years ago), and the mechanism, the
generation and transferring of certificates, and so on, was going to
be updated with a new mechanism in 3.6...?
SSL was updated in a couple of ways for 3.6:
I urge you guys to notify others before making basic style
changes like this.
Yes, all style changes - including the one being enforced by the
original version of checkpatch.pl - should be submitted for review.
___
Gluster-devel mailing list
- Original Message -
+1 to existing Linux kernel style. Moreover, its a style which is used
heavily in existing code base. I don't see any advantage in changing the
style now.
It's not a change. It's already common in our code, if not actually the *most*
common style.
% find .
More recently, a *completely separate* approach to
multi-threading - multi-threaded epoll - has been getting some
attention. Here's what I see as the pros and cons of this new approach.
You forgot:
CON: epoll is Linux specific and code using it is not easily portable.
Excellent point.
Without taking sides: the last grep is including else without either { or }.
That's true. I stand corrected.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
We should try comparing performance of multi-thread-epoll to
own-thread, shouldn't be hard to hack own-thread into non-SSL-socket
case.
Own-thread has always been available on non-SSL sockets, from the day it
was first implemented as part of HekaFS.
HOWEVER, if own-thread implies a thread
Recently there has been some controversy about the coding style being
enforced by checkpatch.pl since some weeks ago. Instead of pointing
fingers, I thought it might be useful to get some objective information
that can help us make the situation better. Accordingly, I ran
checkpatch.pl over our
- New changes coming in should adhere to that.
- Old changes if they are there and let them be.
Maybe the first change we should make is to prevent
the script from flagging errors in the surrounding
context (i.e. code which was already there).
___
- New changes coming in should adhere to that.
- Old changes if they are there and let them be.
Maybe the first change we should make is to prevent
the script from flagging errors in the surrounding
context (i.e. code which was already there).
OK, scratch that part. It already seems
I was wondering, is there a way to change / parameter to pass to
clusters DHT to change the distribution algorithm to only take into
account filename and not the preceding filesystem path?
i.e when a file is at: /mount/gluster/directory/filename.ext
To only hash on “filename.ext” ?
\ At the moment, our smoke tests in Jenkins only run on a
replicated volume. Extending that out to other volume types
should (in theory :) help catch other simple gotchas.
Xavi has put together a patch for doing just this, which I'd
like to apply and get us running:
We have been thinking of many approaches to address some of Glusterd's
correctness (during failures and at scale) and scalability concerns. A
recent email thread on Glusterd-2.0 was along these lines. While that
discussion is still valid, we have been considering dogfooding as a
viable option
(E) The user must specify an explicit option to see the status of
secondary volumes. Without this option, secondary volumes are hidden
and status for their constituent bricks will be shown as though they
were (directly) part of the corresponding primary volume.
IIUC, secondary volumes
As I read this I assume this is to ease administration, and not to ease
the code complexity mentioned above, right?
The code complexity needs to be eased, but I would assume that is a by
product of this change.
Correct. The goal is an easy-to-understand way for *users* to create
and
With the upcoming data compliance features in GlusterFS, a common
infrastructure[1] to support various mechanisms such as tiering, bitrot
detection etc. would prove to be helpful. Such an infrastructure extends
the current changelog design (keeping NSR in mind) and removes
constraints that
I would like to propose refactoring of the code managing
various daemons in glusterd. Unlike other high(er) level proposals
about feature design, this one is at the implementation
level. Please go through the details of the proposal below
and share your thoughts/suggestions on the approach.
I consistently get these when running regression tests on my own
machines, because glfsheal has a direct dependency on afr.so which is
nowhere in our library path. One way to fix this would be to set
LD_LIBRARY_PATH from within run-tests.sh (like we presumably already do
in our Jenkins tests or
1. The glfsheal makefile also links it to other .SOs (libglusterfs, libgfapi
etc). Is that OK to do?
As far as I can tell, yes. Those end up in /lib64 where they are found
without any special configuration on our part. Translators are different; we
install those in a different location that
On Fri, Dec 12, 2014 at 12:18:01AM -0500, Jeff Darcy wrote:
As far as I can tell, yes. Those end up in /lib64 where they are found
without any special configuration on our part. Translators are different;
we
install those in a different location that the system doesn't already know
This is right on some systems, but not universal. This is why libtool
links afr.so while warning it is not portable. You can have a hint
that modules may not be plain libraries here:
$ libtool link --help|grep module
-module build a library that can dlopened
So you're using DSO
Back to the point, it seems like we should use ld -l for some
things (e.g. libglusterfs) and dlopen for others (e.x. xlators),
but never cross the streams or just add a .so to the list of
objects. Does that sound right?
You should use libtool link everywhere, and with -module when it
Here is a proposal: we know that at the end of conservative merge, we
should end up with the situation where directory ctime/mtime is the
ctime of the most recently added children.
Won't the directory mtime change as the result of a rename or unlink?
Neither of those would be reflected in the
For BitRot[1] there's a need to track objects (files) for inactivity
for a certain period of time (after release()). I was considering
using timer wheel[2] to track object expiry as it's proven to be
scalable and used by the linux kernel. This could even be beneficial
as GlusterFS timer
(D) Secondary volumes may not be started and stopped by the user.
Instead, a secondary volume is automatically started or stopped along
with its primary.
Wouldn't it help in some cases to have secondary volumes running while
primary is not running? Some form of maintenance activity.
The birthday paradox says that with a 44-bit hash we're more likely than
not to start seeing collisions somewhere around 2^22 directory entries.
That 16-million-entry-directory would have a lot of collisions.
This is really the key point. The risks of the bit-stealing approach
have been
An alternative would be to convert directories into regular files from
the brick point of view.
The benefits of this would be:
* d_off would be controlled by gluster, so all bricks would have the
same d_off and order. No need to use any d_off mapping or transformation.
I don't think a
On Mon, Dec 22, 2014 at 09:30:29AM -0500, Jeff Darcy wrote:
By contrast, the failure mode for the map-caching approach - a simple
failure in readdir - is relatively benign. Such failures are also
likely to be less common, even if we adopt the *unprecedented*
requirement that the cache
Using GFID does not work for d_off. The GFID represents and inode, and a
d_off represents a directory entry. Therefore using GFID as an alternative
to d_off breaks down when you have hardlinks for the same inode in a single
directory.
Good point. So what *can* we do locally on a brick to
Ahead of Wednesday's community meeting, I'd like to get as much 4.0
status together as possible. Developers, please check out the
sub-projects on this page:
http://www.gluster.org/community/documentation/index.php/Planning40
If you're involved in any of those, please update the status section
Since the goto idiom that Gluster uses generates the same code — which
is what matters — I gave up my short-lived battle for not using it.
One handy rule of thumb is based on whether a goto is forward or back,
in (to a loop or other control contruct) or out. Forward and out are
generally OK -
Then there will be two things
1) Update the this page
http://www.gluster.org/community/documentation/index.php/CompilingRPMS#Common_Steps
i.e add yum install sqlite3-devel
The *real* canonical way to list dependencies is with BuildRequires lines in
glusterfs.spec.in (in your source tree).
1) Do we have to specify each host's address rewriting in your example - why
not something like this?
# gluster network add client-net 1.2.3.0/24
glusterd could then use a discovery process as I described earlier to
determine for each server what its IP address is on that subnet and
I could probably chip in too. I've run tons of my own science
experiments on Rackspace instead of our own hardware, becase that makes
my results more reproducible by others. If we can enable more people to
do likewise, that benefits everyone.
P.S. Hi Jesse. Small world, huh?
Jeff, ditto for ssl-authz.t?
I was able to reproduce the failure on slave30. This is the same race
that was fixed by http://review.gluster.org/9483 so I'll submit the
backport for that.
P.S. I'm done with slave30 unless/until something else comes up.
Looks like we have four volunteers:
* Ben Turner (primary GlusterFS perf tuning guy)
* Jeff Darcy (greybeard GlusterFS developer and scalability expert)
* Josh Boon (experienced GlusterFS guy - Ubuntu focused)
* Nico Schottelius (newer GlusterFS guy - familiar with Ubuntu/CentOS
and since we trust the client CA
Can you elaborate on that? Are you using a non-zero certificate depth? If
so, why? Even without any change in ssl-allow behavior, trusting arbitrary
clients as CAs seems like a security hole big enough to drive a container
ship through.
Even for
As i think more, maintaining list of CNs as part of ssl.auth-allow isn't
going to be easy from Manila side, unless Manila assumes that the first CN
in the list is the server, hence it should never touch that. What if the
admin pre-set the list of CNs, then Manila won't have a way to know which
I get frequent failures in SSL test on NetBSD, like here:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/823/console
I saw there was some SSL change, Does it ring a bell for someone?
From the log:
GERRIT_REFSPEC=refs/changes/42/3842/17
...
HEAD is now at 14d69de
One important example of the kind of concurrency related issues is,
When a glusterd comes back up (or reconnects to the cluster), it receives
information about the current view of cluster (crudely, the list of volumes
and the list of peers) as seen by a peer, from all available peers.
So far the winner seems to be Thursday@11, with Friday@11 close behind.
Every other time except for Friday@12 has at least one person who can't
make it. Even though it's 6am my time, I'm going to propose Thursday@11
starting on February 26. Last call for objections...
Thursday at 11:00 UTC
This is a proposal for a lightweight version of multi-network support,
somewhat limited in functionality but implementable quickly because it
seems unlikely that anyone will be able to spend much time on a full
version. Here's the corresponding feature page:
So far the winner seems to be Thursday@11, with Friday@11 close behind.
Every other time except for Friday@12 has at least one person who can't
make it. Even though it's 6am my time, I'm going to propose Thursday@11
starting on February 26. Last call for objections...
- Original Message
The inaugural GlusterFS 4.0 meeting on Friday was a great success.
Thanks to all who attended. Minutes are here:
http://meetbot.fedoraproject.org/gluster-meeting/2015-02-06/glusterfs_4.0.2015-02-06-12.05.html
One action item was to figure out when IRC meetings (#gluster-meeting on
Freenode)
This is *tomorrow* at 12:00 UTC (approximately 15.5 hours from now) in
#gluster-meeting on Freenode. See you all there!
- Original Message -
Perhaps it's not obvious to the broader community, but a bunch of people
have put a bunch of work into various projects under the 4.0 banner.
While debugging YASRTF (Yet Another Spurious Regression Test Failure), I
came across the following construct:
bash_function () {
command_1
[ $? -ne 0 ] return 1
command_2
return $?
}
TEST bash_function
And the winner is ... Dan Lambright!
http://review.gluster.org/#/c/1
Congrats, Dan.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
What I am saying is that if you have a slogan idea for Gluster, I want
to hear it. You can reply on list or send it to me directly. I will
collect all the proposals (yours and the ones that Red Hat comes up
with) and circle back around for community discussion in about a month
or so.
Is it always ok to consider xdata to be valid, even if op_ret 0?
I would say yes, simply because it's useful. For example, xdata might
contain extra information explaining the error, or suggesting a next
step. In NSR, when a non-coordinator receives a request it returns
EREMOTE. It would be
/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script
http://review.gluster.org/#/c/10075/ (Jeff Darcy)
socket: use OpenSSL multi-threading interfaces
this one nuked a CR+1 (from Kaleb) as well as V+1
In the absence of any other obvious way to fix this up, I'll
start
://review.gluster.org/#/c/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script
http://review.gluster.org/#/c/10075/ (Jeff Darcy)
socket: use OpenSSL multi-threading interfaces
this one nuked a CR+1 (from Kaleb) as well as V+1
In the absence of any other obvious way to fix
It was improperly clearing previously-set V+1 flags, even on success. That is
counterproductive in the most literal sense of the word.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
I've done the first one. I'll leave the others for you, so you
embed the skill :)
Done. Thanks! I also canceled the now-superfluous jobs. Maybe
in my Copious Spare Time(tm) I'll write a script to do this more
easily for other obviously-spurious regression results.
http://review.gluster.org/#/c/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script
Theses are the NetBSD regression failures for which we got fixes merged
recently. Doesn't it just need to be rebased?
Quite possibly. I wasn't looking at patch contents all that
When: February 26, 11:00 UTC (06:00 EST, etc.)
Where: #gluster-meeting on Freenode
Topics:
* action items from last meeting[1]
* general subproject status[2]
* ??? - reply on this thread to add something
* planning for next two weeks
[1]
Reminder: this is ~13.5 hours from now.
- Original Message -
When: February 26, 11:00 UTC (06:00 EST, etc.)
Where: #gluster-meeting on Freenode
Topics:
* action items from last meeting[1]
* general subproject status[2]
* ??? - reply on this thread to add something
*
Just concluded. Logs are here:
http://meetbot.fedoraproject.org/gluster-meeting/2015-02-26/gluster-meeting.2015-02-26-11.05.log.html
Meeting summary
Roll call (jdarcy, 11:05:46)
Current status (jdarcy, 11:06:33)
New business (and plans) (jdarcy, 11:22:14)
ACTION: jdarcy to start discussion
I found one issue that local is not allocated using GF_CALLOC and with a
mem-type.
This is a patch which *might* fix it.
It does. The memory corruption disapeared and the test can complete.
Interesting. I suspect this means that we *are* in the case where the
previous comment came from.
Apologies for breaking the build. I am out of office. Please revert
review #9492.
I'm not sure that it was so much one patch breaking the build as
some sort of in-flight collision merge weirdness. In any case,
don't worry about it. Stuff happens. The important thing is to
get the
As many of you have undoubtedly noticed, we're now in a situation where
*all* regression builds are now failing, with something like this:
-
cc1: warnings being treated as errors
I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND
like other xlators which also use mem_get for local (such as AFR). I am
suspecting crypt not doing mem_put might be the reason for the bug
mentioned.
My understanding was that mem_put should be called automatically
Got this backtrace in gNFS in one of the regression run:
I saw something pretty similar yesterday. In the one I
looked at, I saw this:
_mnt3_auth_check_host_in_netgroup
auth_params-expfile = XXX
...called...
exp_file_get_dir
file = YYY
The structure at YYY was invalid, and
The same problems that affect mainline are affecting release-3.7 too. We
need to get over this soon.
I think it's time to start skipping (or even deleting) some tests. For
example, volume-snapshot-clone.t alone is responsible for a huge number
of spurious failures. It's for a feature that
As we know, we have a patch from Manu which re-triggers a given failed
test. The idea was to reduce the burden of re-triggering the regression,
but I've been noticing it is failing in 2nd attempt as well and I've
seen this happening multiple times for patch [1]. I am not sure whether
I am
Just noticed something a bit weird on the regression tests
for CentOS 6.x:
[13:28:44] ./tests/features/weighted-rebalance.t
... ok 23 s
[13:46:50] ./tests/geo-rep/georep-rsync-changelog.t
I just had to clean up a couple of these - 7327 and 7331. Fortunately,
they both seem to have gone on their merry way instead of dying. Both
were in the pre-mount stage of their setup, but did have mounts active
and gsyncd processes running (in one case multiple of them). I suspect
that this is
Any plan to backport to release-3.7? volume-snapshot.t is quite
harmful there too.
http://review.gluster.org/#/c/10351/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
I propose that we don't drop test units but provide an ack to patches
that have known regression failures.
IIRC maintainers have had permission to issue such overrides since a
community meeting some months ago, but such overrides have remained
rare. What should we do to ensure that currently
IMO, if we start skipping test cases we may not be able to
uncover bugs. Thoughts?
The purpose of a regression test is to catch unanticipated
problems related to new changes. Once a problem is already
known, a series of tradeoffs must be evaluated.
* What new information is gained by
Are you meaning deleting the tests temporarily (to let other stuff
pass without being held up by it), or permanently?
Kind of both. I think the tests should be deleted so they don't
interfere with other work, but that means we'll have features without
tests. Lack of a test should be
1 - 100 of 352 matches
Mail list logo