Re: [Gluster-devel] Regression tests time

2018-01-25 Thread Jeff Darcy
On Wed, Jan 24, 2018, at 9:37 AM, Xavi Hernandez wrote: > That happens when we use arbitrary delays. If we use an explicit > check, it will work on all systems. You're arguing against a position not taken. I'm not expressing opposition to explicit checks. I'm just saying they don't come for

Re: [Gluster-devel] Regression tests time

2018-01-24 Thread Jeff Darcy
On Tue, Jan 23, 2018, at 12:58 PM, Xavi Hernandez wrote: > I've made some experiments [1] with the time that centos regression > takes to complete. After some changes the time taken to run a full > regression has dropped between 2.5 and 3.5 hours (depending on the run > time of 2 tests, see

[Gluster-devel] Revert of 56e5fdae (SSL change) - why?

2018-01-07 Thread Jeff Darcy
There's no explanation, or reference to one, in the commit message. In the comments, there's a claim that seems a bit exaggerated. > This is causing almost all the regressions to fail. durbaility-off.t is the > most affected test. This patch was merged on December 13. Regressions have passed

Re: [Gluster-devel] RFC: FUSE kernel features to be adopted by GlusterFS

2017-11-09 Thread Jeff Darcy
> So - nothing inherent to libfuse and nothing that would be relevant as > of today. > > But then, let me put it like this: what reason could we have to *not* go > with xglfs in 4.0? It's true that the deliverables are present in libfuse > and > libgfapi and it's just a thin glue. As such it

[Gluster-devel] New coding standard

2017-10-16 Thread Jeff Darcy
As of a few minutes ago, there's a new coding standard. You can view it here. https://github.com/gluster/glusterfs/blob/master/doc/developer-guide/coding-standard.md (or equivalent in your own updated source tree) In some ways it's more liberal than before - e.g. declarations are no longer

[Gluster-devel] Time weirdness

2017-10-12 Thread Jeff Darcy
Is it just me, or is there an inconsistency in how times are being shown on r.g.o? AFAICT on *searches* the times are being shown in raw UTC (e.g. 7:08pm for one of mine) but *specific patches* are showing adjusted times (e.g. 12:08PDT for the same event on the same patch).

Re: [Gluster-devel] Proposed Protocol changes for 4.0: Need feedback.

2017-09-01 Thread Jeff Darcy
How about getting some of the things currently in xdata into the protocol instead? There are literally hundreds of instances of 'dict_set.*xdata' but here are some that stand out as likely candidates. * "gfid-req" (everywhere) * GLUSTERFS_WRITE_IS_APPEND (AFR, arbiter, posix) *

Re: [Gluster-devel] Regression logging on release-3.8-fb branch

2017-07-18 Thread Jeff Darcy
On Mon, Jul 17, 2017, at 11:25 PM, Jeff Darcy wrote: > Since the crashes are in brick daemons, I strongly suspect that the crashes > are actually from https://review.gluster.org/#/c/17750/ (which does touch > server code) rather than 17771 (which does not). I'll try to

Re: [Gluster-devel] Regression logging on release-3.8-fb branch

2017-07-17 Thread Jeff Darcy
On Mon, Jul 17, 2017, at 10:53 PM, Nigel Babu wrote: > >> On Mon, Jul 17, 2017, at 02:00 PM, Jeff Darcy wrote: >>> On Mon, Jul 17, 2017, at 09:28 AM, Nigel Babu wrote: >>>> It appears that something changed in https://review.gluster.org/#/c/17771. >>>>

Re: [Gluster-devel] Regression logging on release-3.8-fb branch

2017-07-17 Thread Jeff Darcy
On Mon, Jul 17, 2017, at 02:00 PM, Jeff Darcy wrote: > On Mon, Jul 17, 2017, at 09:28 AM, Nigel Babu wrote: >> It appears that something changed in https://review.gluster.org/#/c/17771. >> The tarball with the logs for that patch was 32G. That doesn't sound right. >>

Re: [Gluster-devel] Regression logging on release-3.8-fb branch

2017-07-17 Thread Jeff Darcy
On Mon, Jul 17, 2017, at 09:28 AM, Nigel Babu wrote: > It appears that something changed in https://review.gluster.org/#/c/17771. > The tarball with the logs for that patch was 32G. That doesn't sound right. > Is something writing to log excessively after this patch landed? I don't see

Re: [Gluster-devel] crash in tests/bugs/core/bug-1432542-mpx-restart-crash.t

2017-07-13 Thread Jeff Darcy
On Thu, Jul 13, 2017, at 01:10 PM, Pranith Kumar Karampuri wrote: > I just observed that > https://build.gluster.org/job/centos6-regression/5433/consoleFull failed > because of this .t failure. Maybe we need to upgrade the version of gdb on these machines, because it didn't seem able to get

[Gluster-devel] New coding standard

2017-06-29 Thread Jeff Darcy
Let's open the *big* can of worms. :) We have a coding standard in our source tree. It's incredibly outdated. It's missing many things we actually do enforce, either through that commit script we inappropriately borrowed from a very different project or via code reviews. It does contain many

Re: [Gluster-devel] brick multiplexing and memory consumption

2017-06-20 Thread Jeff Darcy
On Tue, Jun 20, 2017, at 03:38 PM, Raghavendra Talur wrote: > Each process takes 795MB of virtual memory and resident memory is > 10MB each. Wow, that's even better than I thought. I was seeing about a 3x difference per brick (plus the fixed cost of a brick process) during development. Your

Re: [Gluster-devel] brick multiplexing and memory consumption

2017-06-20 Thread Jeff Darcy
On Tue, Jun 20, 2017, at 08:45 AM, Raghavendra Talur wrote: > Here is the data I gathered while debugging the considerable increase > in memory consumption by brick process when brick multiplexing is on. > > before adding 14th brick to it: 3163 MB before > glusterfs_graph_init is

Re: [Gluster-devel] Gerrit submit type (was tests/bugs/gfapi/bug-1447266/bug-1447266.t)

2017-05-16 Thread Jeff Darcy
On Tue, May 16, 2017, at 07:06 AM, Raghavendra Talur wrote: > > Submit > > type only comes into play after the decision has already been made to > > enable/allow submission. > > Not true. I have looked into this last year when I sent out the > mail[1] asking fast-forward to be the submit type.

Re: [Gluster-devel] Gerrit submit type (was tests/bugs/gfapi/bug-1447266/bug-1447266.t)

2017-05-15 Thread Jeff Darcy
On Sun, May 14, 2017, at 11:39 PM, Nigel Babu wrote: > We use the "cherry-pick" submit type for glusterfs on Gerrit[1]. In the > past, > Poornima has pointed this out as well. I believe there was no interest in > changing the submit type[2] because the other submit types do not add > metadata to

Re: [Gluster-devel] tests/bugs/gfapi/bug-1447266/bug-1447266.t

2017-05-14 Thread Jeff Darcy
On Sat, May 13, 2017, at 12:24 PM, Atin Mukherjee wrote: > We'd need https://review.gluster.org/#/c/17177/ to be merged before > this test starts working. Even though https://review.gluster.org/17177 > was dependent on https://review.gluster.org/17216 , gerrit didn't > disallow this patch to be

Re: [Gluster-devel] lock_revocation.t is hanging in regression tests

2017-05-10 Thread Jeff Darcy
On Wed, May 10, 2017, at 06:30 AM, Raghavendra G wrote: > marking it bad won't help as even bad tests are run by build system > (and they might hang). This is news to me. Did something change to make it this way? If so, we should change it back. There's no point in having a way to mark tests

[Gluster-devel] lock_revocation.t is hanging in regression tests

2017-05-09 Thread Jeff Darcy
After seeing this hang with multiplexing enabled yesterday, I saw it hang a test for a completely unrelated patch (https://review.gluster.org/17200) without multiplexing. Does anyone object to disabling this test while it's debugged? ___ Gluster-devel

[Gluster-devel] Multiplexing test report

2017-05-08 Thread Jeff Darcy
The following tests failed this week with multiplexing enabled. ./tests/bugs/quota/bug-1035576.t (Wstat: 0 Tests: 20 Failed: 2) 00:14:45 not ok 18 , LINENUM:44 00:14:45 FAILED COMMAND: [ 0001 != 0001 ]

[Gluster-devel] Tests that fail with multiplexing turned on

2017-05-01 Thread Jeff Darcy
Since the vast majority of our tests run without multiplexing, I'm going to start running regular runs of all tests with multiplexing turned on. You can see the patch here: https://review.gluster.org/#/c/17145/ There are currently two tests that fail with multiplexing. Note that these are all

Re: [Gluster-devel] [RFC] Reducing maintenance burden and moving fuse support to an external project

2017-03-04 Thread Jeff Darcy
> At the moment we have three top-level interfaces to maintain in > Gluster, these are FUSE, Gluster/NFS and gfapi. If any work is needed > to support new options, FOPs or other functionalities, we mostly have > to do the work 3x. Often one of the interfaces gets forgotten I think the picture's

Re: [Gluster-devel] IMP: Release 3.10: RC1 Pending bugs (Need fixes by 21st Feb)

2017-02-21 Thread Jeff Darcy
> > 2) Bug 1421590 - Bricks take up new ports upon volume restart after > > add-brick op with brick mux enabled > > - Status: *Atin/Samikshan/Jeff*, any update on this? > > - Can we document this as a known issue? What would be the way to > > get volume to use the older ports (a

Re: [Gluster-devel] 3.10 bugs

2017-02-17 Thread Jeff Darcy
> For bugs reported by users that do not test builds from the master > branch, we ideally follow these steps [0]: Please note that this is gluster-devel, not gluster-users. Everyone on -devel should be aware of the relationship between release branches and master. > - read the description of

[Gluster-devel] 3.10 bugs

2017-02-17 Thread Jeff Darcy
Please don't file bugs directly against 3.10 if they're also present in master. Since the fix has to go into master first anyway, the bug should be filed against that. Subsequently, the developer is responsible for cloning both the bug and the patch for 3.10 if a backport is needed.

Re: [Gluster-devel] Release 3.10 testing update: glusterfs-fuse RPM now depends on gfapi?

2017-02-16 Thread Jeff Darcy
> On an upgrade test, when upgrading clients, glusterfs-fuse RPM now > needed libgfapi, this is due to gf_attach being packaged as a part of > glusterfs-fuse. > > Jeff, we do not need to package gf_attach, right? This is a test tool, > if I am correct. It's *primarily* a test tool. It could

[Gluster-devel] Mixing style and other changes in a patch

2017-02-16 Thread Jeff Darcy
In the last few days, I've seen both of these kinds of review comments (not necessarily on my own patches or from the same reviewers). (a) "Please fix the style in the entire function where you changed one line." (b) "This style change should be in a separate patch." It's clearly not helpful

Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-16 Thread Jeff Darcy
> Debugging will involve getting far more/bigger files from customers > unless we have a script (?) to grep out only those messages pertaining > to the volume in question. IIUC, this would just be grepping for the > volname and then determining which brick each message pertains to > based on the

Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-16 Thread Jeff Darcy
> What about the log levels? Each volume can configure different log > levels. Will you carve > out a separate process in case log levels are changed for a volume? I don't think we need to go that far, but you do raise a good point. Log levels would need to be fetched from a brick-specific

Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-16 Thread Jeff Darcy
> As for file descriptor count/memory usage, I think we should be okay > as it is not any worse than that in the non-multiplexed approach we > have today. I don't think that's true. Each logging context allocates a certain amount of memory. Let's call that X. With N bricks in separate

[Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Jeff Darcy
One of the issues that has come up with multiplexing is that all of the bricks in a process end up sharing a single log file. The reaction from both of the people who have mentioned this is that we should find a way to give each brick its own log even when they're in the same process, and make

Re: [Gluster-devel] Attackers hitting vulnerable HDFS installations

2017-02-10 Thread Jeff Darcy
> It is true default glusterfs installation is too open. A simple > solution would be to introduce an access control, either by > IP whitelist, or better by shared secret. > > The obvious problem is that it breaks updates. At least peer > know each others and could agree on automatically creating

[Gluster-devel] Acknowledgements for brick multiplexing

2017-02-06 Thread Jeff Darcy
As many of you have probably noticed, brick multiplexing turned out to be a pretty massive effort. The patches might have my name on them, but it was really a team effort. I'd like to publicly thank some of the others who played a hand in getting it done. * Everyone who reviewed the

Re: [Gluster-devel] Postmortem for RPM build failures

2017-02-02 Thread Jeff Darcy
Thank you, Nigel. Post mortems like this can be uncomfortable, but they're how we as a team learn and improve. The good example is appreciated, as is all your hard work. ___ Gluster-devel mailing list Gluster-devel@gluster.org

Re: [Gluster-devel] tests/bitrot/bug-1373520.t is failing multiple times

2017-01-27 Thread Jeff Darcy
> Few of the failure links: > > https://build.gluster.org/job/centos6-regression/2934/console > https://build.gluster.org/job/centos6-regression/2911/console Looks familiar. Fix (probably) here: https://review.gluster.org/#/c/14763/72/tests/bitrot/bug-1373520.t

[Gluster-devel] Multiplexing status, January 24

2017-01-24 Thread Jeff Darcy
Coming down to the wire here. Here's the latest. https://review.gluster.org/#/c/14763/ With the latest patchset (61!) I've addressed most of people's review comments. I've also put a dent in the last functional area, which is full support for snapshots. Previously, snapshot bricks were

Re: [Gluster-devel] Priority based ping packet for 3.10

2017-01-19 Thread Jeff Darcy
> The more relevant question would be with TCP_KEEPALIVE and TCP_USER_TIMEOUT > on sockets, do we really need ping-pong framework in Clients? We might need > that in transport/rdma setups, but my question is concentrating on > transport/rdma. In other words would like to hear why do we need

Re: [Gluster-devel] Request for Comments: Tests Clean Up Plan

2017-01-11 Thread Jeff Darcy
> To avoid a DHT problem ;) it maybe better to take this sorted list and > assign tests in a cyclic fashion so that all chunks relatively take the > same amount of time to complete, than it being skewed due to the hash? I assume you don't really mean cyclic, because that would *guarantee* a

Re: [Gluster-devel] Request for Comments: Tests Clean Up Plan

2017-01-10 Thread Jeff Darcy
With regard to assigning files to chunks, I suggest we start by using an algorithm similar to that we use in DHT. hash=$(cat $filename | md5sum) # convert from hex to decimal? chunk=$((hash % number_of_chunks)) if [ x"$chunk" = x"$my_chunk_id" ]; then bash $filename # ...and so

[Gluster-devel] Tests with timing assumptions

2016-12-23 Thread Jeff Darcy
Just for "fun" I ran an experiment where I added a three-second sleep to the daemon startup code, to see which tests would fail because of timing assumptions. A three-second delay is something that could happen *at any time*, especially on a virtualized system (such as our test systems are)

Re: [Gluster-devel] Ability to skip regression jobs?

2016-12-20 Thread Jeff Darcy
> One way to reduce load is to figure out a way to award +1s to parent > patches when a dependent patch passes regression. It is recommended > that a change be split into as many smaller sets as possible. > Currently, if we send 4 patches that need to be serially merged, all 4 > have to pass

[Gluster-devel] Splitting patches, was Re: Ability to skip regression jobs?

2016-12-20 Thread Jeff Darcy
> One way to reduce load is to figure out a way to award +1s to parent > patches when a dependent patch passes regression. It is recommended > that a change be split into as many smaller sets as possible. > Currently, if we send 4 patches that need to be serially merged, all 4 > have to pass

Re: [Gluster-devel] static analysis updated

2016-12-19 Thread Jeff Darcy
> [xlators/experimental/fdl/src/dump-tmpl.c] -> > [xlators/experimental/fdl/src/dump-tmpl.c]: (error) syntax error > [xlators/experimental/fdl/src/recon-tmpl.c] -> > [xlators/experimental/fdl/src/recon-tmpl.c]: (error) syntax error > [xlators/experimental/jbr-client/src/fop-template.c] -> >

Re: [Gluster-devel] static analysis updated

2016-12-19 Thread Jeff Darcy
> Thank you Kaleb. Shall we start somwhere in terms of automation? > > The cppcheck results look the shortest[1]. If we can commit to fixing all of > them in the next 1 month, I can kick off a non-voting smoke job. We'll make > it > vote after 1 month. I guss the cli and experimental xlator (and

[Gluster-devel] Multiplexing status, November 22

2016-11-23 Thread Jeff Darcy
The end of the long dark tunnel might be in sight. I've been working on snapshots, and have only two tests that fail. One's for XML output, so I'm not even sure I care. Performance issues are resolved to my satisfaction for now, though I'm sure there are still plenty of opportunities for

[Gluster-devel] Dead translators

2016-11-17 Thread Jeff Darcy
As the first part of the general cleanup and technical-debt-reduction process, I'd like to start nuking some of the unused translators. If any of the following are still useful and not broken, please speak up. They'll always be in our git history, but there seems to be little reason to keep

Re: [Gluster-devel] Community Meetings - Feedback on new meeting format

2016-11-17 Thread Jeff Darcy
> This has resulted in several good changes, > a. Meetings are now more livelier with more people speaking up and > making themselves heard. > b. Each topic in the open floor gets a lot more time for discussion. > c. Developers are sending out weekly updates of works they are doing, > and linking

[Gluster-devel] More multiplexing results

2016-11-03 Thread Jeff Darcy
I know y'all are probably getting tired of these updates, but developing out in the open and all that. Executive summary: the combination of disabling memory pools and using jemalloc makes multiplexing shine. You can skip forward to ***RESULTS*** if you're not interested in my tired

[Gluster-devel] Multiplexing status, 02 November 2016

2016-11-02 Thread Jeff Darcy
Still chasing performance/scalability issues. Two main findings: (a) The mem-pool patch[1] *is* strictly necessary to avoid serious performance degradation at higher brick counts with multiplexing. For example, on my normal test system there was little benefit at 80 bricks, but on the system

Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Jeff Darcy
Another thought that just occurred to me: security. There's no broadcast/unicast equivalent of TLS, so you're not going to have that protection. Maybe it doesn't matter in some kinds of deployments, but in others it would matter very much. Also, a similarly-secure broadcast/multicast

Re: [Gluster-devel] Custom Transport layers

2016-10-29 Thread Jeff Darcy
> Hmmm, I never considered that side of things. I guess I had a somewhat > naive vision of packets floating through the ethernet visible to all > interfaces, but switched based networks are basically a star topology. > Are you saying the switch would likely be the choke point here? Not

Re: [Gluster-devel] Custom Transport layers

2016-10-28 Thread Jeff Darcy
> Is it possible to write custom transport layers for gluster?, data > transfer, not the management protocols. Pointers to the existing code > and/or docs :) would be helpful Is it *possible*? Yes. Is it easy or well documented? Definitely no. The two transports we have - TCP/UNIX-domain

Re: [Gluster-devel] GlusterFS-3.9.0 - Delete or disable experimental features

2016-10-28 Thread Jeff Darcy
> We need you opinions on this as these are your components. > > 3.9 is still building and shipping experimental features. The packages > being build currently include these. We shouldn't be doing this. > > I have 2 changes under review [1] & [2], which disable and delete > these respectively. >

Re: [Gluster-devel] Multiplexing status, October 26

2016-10-26 Thread Jeff Darcy
Here are some of the numbers. Note that these are *without* multiplexing, which is where these changes are really most beneficial, because I wanted to measure the effect of this patch on its own. Also, perf-test.sh is a truly awful test. For one thing it's entirely single-threaded, which

[Gluster-devel] Multiplexing status, October 26

2016-10-26 Thread Jeff Darcy
Ahead of the community meeting (barely), here's where things are. I've been in meetings all week, so progress has been a bit slower than usual. When I've been able to do direct work at all, it has mostly been on testing and mostly for the prerequisite io-threads patch

[Gluster-devel] Memory-pool experiments

2016-10-13 Thread Jeff Darcy
I spent yesterday following up on one of the items from the Berlin summit: memory usage and allocation-related performance. Besides being a general concern, this is a very specific worry with multiplexing because some of our code in this area scales very poorly to the hundreds of threads per

Re: [Gluster-devel] [Gluster-infra] Migration complete

2016-09-26 Thread Jeff Darcy
> Michael and I are happy to announce that the migration is now complete. Thank you both for all of your hard work. :) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] relative ordering of writes to same file from two different fds

2016-09-26 Thread Jeff Darcy
> compatible to linux page cache will always to be a better practice > way, because there is a lot local applications that has already rely > on its semantics. I don't think users even *know* how the page cache behaves. I don't think even its developers do, in the sense of being able to define

Re: [Gluster-devel] relative ordering of writes to same file from two different fds

2016-09-23 Thread Jeff Darcy
> > write-behind: implement causal ordering and other cleanup > > > Rules of causal ordering implemented:¬ > > > - If request A arrives after the acknowledgement (to the app,¬ > > > i.e, STACK_UNWIND) of another request B, then request B is¬ > > > said to have 'caused' request A.¬ > > > With

Re: [Gluster-devel] Fixing setfsuid/gid problems in posix xlator

2016-09-23 Thread Jeff Darcy
> Jiffin found an interesting problem in posix xlator where we have never been > using setfsuid/gid ( http://review.gluster.org/#/c/15545/ ), what I am > seeing regressions after this is, if the files are created using non-root > user then the file creation fails because that user doesn't have

Re: [Gluster-devel] relative ordering of writes to same file from two different fds

2016-09-22 Thread Jeff Darcy
> I don't understand the Jeff snippet above - if they are > non-overlapping writes to dfferent offsets, this would never happen. The question is not whether it *would* happen, but whether it would be *allowed* to happen, and my point is that POSIX is often a poor guide. Sometimes it's

Re: [Gluster-devel] relative ordering of writes to same file from two different fds

2016-09-21 Thread Jeff Darcy
> However, my understanding is that filesystems need not maintain the relative > order of writes (as it received from vfs/kernel) on two different fds. Also, > if we have to maintain the order it might come with increased latency. The > increased latency can be because of having "newer" writes to

Re: [Gluster-devel] [Heketi] Mailing list

2016-09-20 Thread Jeff Darcy
> Hi gluster-devel, > At the Heketi project, we wanted to get better communication with the > GlusterFS community. We are a young project and didn't have our own > mailing list, so we asked if we could also be part gluster-devel mailing > list. The plan is to Heketi specific emails to

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Jeff Darcy
> If I understood brick-multiplexing correctly, add-brick/remove-brick > add/remove graphs right? I don't think the grah-cleanup is in good > shape, i.e. it should lead to memory leaks etc. Did you get a chance > to think about it? I haven't tried to address memory leaks specifically, but most of

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-20 Thread Jeff Darcy
> That's weird, since the only purpose of the mem-pool was precisely to > improve performance of allocation of objects that are frequently > allocated/released. Very true, and I've long been an advocate of this approach. Unfortunately, for this to work our allocator has to be more efficient than

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-19 Thread Jeff Darcy
> I wonder if we are spending more time in io-threads. Does setting > idle-time in io-threads to 1 help with anything? > It might be useful to add instrumentation subsequently to dump > statistics (number of fops serviced, time spent in servicing) per > thread. Having the same visibility for our

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-19 Thread Jeff Darcy
FWIW, I did some further experiments. Disabling mem-pool entirely (in favor of plain malloc/free) brought run time down to 3:35, vs. 2:57 for the exact same thing without multiplexing. Somehow we're still not managing contention very well at this kind of thread count, but the clues and

Re: [Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-19 Thread Jeff Darcy
> I would like to collaborate in investigating the memory-management, and > also bringing multiplexing to snapshots. For starters, will be going > through your patch(1400+ lines of change, that's one big ass patch :p) That's nothing. I've seen 7000-line patches go in, without even any evidence

[Gluster-devel] Multiplexing - good news, bad news, and a plea for help

2016-09-19 Thread Jeff Darcy
I have brick multiplexing[1] functional to the point that it passes all basic AFR, EC, and quota tests. There are still some issues with tiering, and I wouldn't consider snapshots functional at all, but it seemed like a good point to see how well it works. I ran some *very simple* tests with

Re: [Gluster-devel] Libunwind

2016-09-12 Thread Jeff Darcy
I looked into this a bit over the weekend. Unfortunately, while libunwind *does* get function names for calls in shared libraries, it *doesn't* seem to handle calls through function pointers very well. Imagine that original_func in your main program finds shared_lib_func via dlsym and calls

[Gluster-devel] Libunwind

2016-09-08 Thread Jeff Darcy
In a few places in our code (e.g. gf_log_callingfn) we use the "backtrace" and "backtrace_symbols" functions from libc to log stack traces. Unfortunately, these functions don't seem very smart about dynamically loaded libraries - such as translators, where most of our code lives. They give us

Re: [Gluster-devel] File snapshot design propsals

2016-09-08 Thread Jeff Darcy
> 1) Doing file snapshot using shards: (This is suggested by shyam, tried to > keep the text as is) > If a block for such a file is written to with a higher version then the brick > xlators can perform a block copy and then change the new block to the new > version, and let the older version be as

Re: [Gluster-devel] Is r.g.o down?

2016-09-01 Thread Jeff Darcy
> Not able to access it for the past 20 minutes. Looks down to me as well, and isup.me seems to agree. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Multiplexing status, August 31

2016-08-31 Thread Jeff Darcy
(This is as much for my own reference as anything, but there seems to be a decent chance that others might be interested) The good news is that almost all of the tests in tests/basic, except for those related to problematic features (see below) pass now. Two of the AFR tests fail sporadically

Re: [Gluster-devel] [Gluster-users] CFP for Gluster Developer Summit

2016-08-22 Thread Jeff Darcy
Two proposals, both pretty developer-focused. (1) Gluster: The Ugly Parts Like any code base its size and age, Gluster has accumulated its share of dead, redundant, or simply inelegant code. This code makes us more vulnerable to bugs, and slows our entire development process for any feature.

[Gluster-devel] Brick multiplexing status

2016-08-19 Thread Jeff Darcy
For those who are interested, here's the current development status. The good news is that the current patch[1] works well enough for almost all of the basic tests and 22/32 of the basic/afr tests to run successfully. The exceptions have to do with specific features rather than base

Re: [Gluster-devel] Gluster Developer Summit Program Committee

2016-08-16 Thread Jeff Darcy
> As we get closer to the CfP wrapping up (August 31, per > http://www.gluster.org/pipermail/gluster-users/2016-August/028002.html ) - > we'll be looking for 3-4 people for the program committee to help arrange > the schedule. > Go ahead and respond here if you're interested, and I'll work to

Re: [Gluster-devel] NetBSD Regression Failures for 2 weeks

2016-08-09 Thread Jeff Darcy
> > *96* of *247* regressions failed > > That is huge. Agreed. I think there's an experiment we should do, which I've discussed with a couple of others: redefine EXPECT_WITHIN on NetBSD to double or triple the time given, and see if it makes a difference. Why? Because NetBSD (or perhaps the

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-25 Thread Jeff Darcy
> As far as I know, there's no explicit guarantee on the order in which > fini is called, so we cannot rely on it to do cleanup because ec needs > that all its underlying xlators be fully functional to finish the cleanup. What kind of cleanup are we talking about here? We already need to handle

Re: [Gluster-devel] Reducing regression runs (hopefully)

2016-07-25 Thread Jeff Darcy
> My vision in this regard is something like this: > * A patchset gets Verified +1. > * A meta job is kicked off which determines regression jobs to run. > If the patch only touches GFAPI, we kick off the GFAPI regression tests. If > it touches multiple modules, we kick off the tests for these

Re: [Gluster-devel] Reducing regression runs (hopefully)

2016-07-25 Thread Jeff Darcy
> I have a few proposals to reduce this turn around time: > > 1. We do not clear the Verified tag. This means if you want to re-run >regressions you have to manually trigger it. If your patch is rebased on >top >of another patch, you may have to retrigger failing regressions manually.

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Jeff Darcy
> Based on what I saw in code, this seems to get the job done. Comments > welcome: > http://review.gluster.org/14988 Good thinking. Thanks, Pranith! ___ Gluster-devel mailing list Gluster-devel@gluster.org

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Jeff Darcy
> > Excluding /build/install for no obvious reason at all > > This looks like it was done to remove the /build/install components from the > df -h outputs. Changing the path to /data/build/install broke this as it did > not strip the "/data" from the paths. > It did work when I changed the sed to

[Gluster-devel] Notifications (was Re: GF_PARENT_DOWN on SIGKILL)

2016-07-22 Thread Jeff Darcy
> I don't think we need any list traversal because notify sends it down > the graph. Good point. I think we need to change that, BTW. Relying on translators to propagate notifications has proven very fragile, as many of those events are overloaded to mean very different things to different

Re: [Gluster-devel] 3.7 regressions on NetBSD

2016-07-22 Thread Jeff Darcy
> I attempted to get us more space on NetBSD by creating a new partition called > /data and putting /build as a symlink to /data/build. This has caused > problems > with tests/basic/quota.t. It's marked as bad for master, but not for > release-3.7. This is possibly because we have a hard-coded

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Jeff Darcy
> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So xavi > and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN > event. OK, then that grinding sound you hear is my brain shifting gears. ;) It seems that cleanup_and_exit will call xlator.fini in some

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Jeff Darcy
> Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It will give > a chance for xlators to do any cleanup they need to do. For example ec can > complete the delayed xattrops. Nothing is triggered on SIGKILL. SIGKILL is explicitly defined to terminate a process *immediately*.

Re: [Gluster-devel] NetBSD machine job and machine changes

2016-07-20 Thread Jeff Darcy
> As you may know, the NetBSD machines have been having infra failures for > a while now. A lot of the failures were around disk space issues. Emmanuel > has > pointed out that our NetBSD machines have about 24 GB of unpartitioned space. > I've taken a few machines and partitioned that into f and

Re: [Gluster-devel] Progress on brick multiplexing

2016-07-15 Thread Jeff Darcy
> Just went through the commit message. I think similar to attaching if we also > have detaching, then we can simulate killing of bricks in afr using this > approach may be? Yes, that's pretty much the plan. Some work to add the new RPC and handler, a bit more to make the test libraries use it

[Gluster-devel] Progress on brick multiplexing

2016-07-15 Thread Jeff Darcy
For those who don't know, "brick multiplexing" is a term some of us have been using to mean running multiple brick "stacks" inside a single process with a single protocol/server instance. Discussion from a month or so ago is here:

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Jeff Darcy
> I absolutely hate what '-1' means though, it says 'I would prefer you > didn't submit this'. Somebody who doesn't know what he/she is doing still > goes ahead and sends his/her first patch and we say 'I would prefer you > didn't submit this'. It is like the tool is working against more >

Re: [Gluster-devel] Reducing merge conflicts

2016-07-14 Thread Jeff Darcy
> The feedback I got is, "it is not motivating to review patches that are > already merged by maintainer." I can totally understand that. I've been pretty active reviewing lately, and it's an *awful* demotivating grind. On the other hand, it's also pretty demotivating to see one's own hard work

Re: [Gluster-devel] One client can effectively hang entire gluster array

2016-07-12 Thread Jeff Darcy
> > * We might be able to tweak io-threads (which already runs on the > > bricks and already has a global queue) to schedule requests in a > > fairer way across clients. Right now it executes them in the > > same order that they were read from the network. > > This sounds to be an easier fix. We

Re: [Gluster-devel] One client can effectively hang entire gluster array

2016-07-08 Thread Jeff Darcy
> In either of these situations, one glusterfsd process on whatever peer the > client is currently talking to will skyrocket to *nproc* cpu usage (800%, > 1600%) and the storage cluster is essentially useless; all other clients > will eventually try to read or write data to the overloaded peer

Re: [Gluster-devel] Reducing merge conflicts

2016-07-08 Thread Jeff Darcy
(combining replies to multiple people) Pranith: > I agree about encouraging specific kind of review. At the same time we need > to make reviewing, helping users in the community as important as sending > patches in the eyes of everyone. It is very important to know these > statistics to move in

Re: [Gluster-devel] Reducing merge conflicts

2016-07-07 Thread Jeff Darcy
> What gets measured gets managed. Exactly. Reviewing is part of everyone's job, but reviews aren't tracked in any way that matters. Contrast that with the *enormous* pressure most of us are under to get our own patches in, and it's pretty predictable what will happen. We need to change that

[Gluster-devel] Reducing merge conflicts

2016-07-07 Thread Jeff Darcy
I'm sure a lot of you are pretty frustrated with how long it can take to get even a trivial patch through our Gerrit/Jenkins pipeline. I know I am. Slow tests, spurious failures, and bikeshedding over style issues are all contributing factors. I'm not here to talk about those today. What I

[Gluster-devel] Securing GlusterD management

2016-07-06 Thread Jeff Darcy
As some of you might already have noticed, GlusterD has been notably insecure ever since it was written. Unlike our I/O path, which does check access control on each request, anyone who can craft a CLI RPC request and send it to GlusterD's well known TCP port can do anything that the CLI

Re: [Gluster-devel] Wrong assumptions about disperse

2016-06-17 Thread Jeff Darcy
> The math used by disperse, if tested alone outside gluster, is much > faster than it seems. AFAIK the real problem of EC is the communications > layer. It adds a lot of latency and having to communicate simultaneously > and coordinate 6 or more bricks has a big impact. Thanks for posting this,

  1   2   3   4   >