Hello all. I'm new to the list but not to gluster.
We are using gluster to service NFS boot on a top500 cluster. It is a
Distributed-Replicate volume 3x9.
We are having a problem when one server in a subvolume goes down, we get
random missing files and split-brain errors in the nfs.log file.
We
Thank you for replying!
> Okay so 0-cm_shared-replicate-1 means these 3 bricks:
>
> Brick4: 172.23.0.6:/data/brick_cm_shared
> Brick5: 172.23.0.7:/data/brick_cm_shared
> Brick6: 172.23.0.8:/data/brick_cm_shared
The above is correct.
> Were there any pending self-heals for this volume? Is it
for glusterd,ctdb and vdo - as I need to
> 'put' dependencies for each of those.
>
> Now, I'm no longer using ctdb & NFS Ganesha (as my version of ctdb cannot use
> hpstnames and my environment is a little bit crazy), but I can still provide
> hints how I did it.
>
> Be
So, I have a solution I have written about in the based that is based on
gluster with CTDB for IP and a level of redundancy.
It's been working fine except for a few quirks I need to work out on
giant clusters when I get access.
I have 3x9 gluster volume, each are also NFS servers, using gluster
> Here is what was the setup :
I thought I'd share an update in case it helps others. Your ideas
inspired me to try a different approach.
We support 4 main distros (and a 2 variants of some). We try not to
provide our own versions of distro-supported packages like CTDB where
possible. So a
On Tue, Nov 05, 2019 at 05:05:08AM +0200, Strahil wrote:
> Sure,
>
> Here is what was the setup :
Thank you! You're very kind to send me this. I will verify it with my
setup soon. Hoping to to rid myself of these dep problems. Thank you !!!
Erik
Community Meeting Calendar:
APAC
questions on some rebalance errors, which I will send
in a separate email.
Erik
On Wed, Jan 29, 2020 at 06:20:34PM -0600, Erik Jacobson wrote:
> We are using gluster 4.1.6. We are using gluster NFS (not ganesha).
>
> Distributed/replicated with subvolume size 3 (6 total servers, 2
My question: Are the errors and anomalies below something I need to
investigate? Are should I not be worried?
I installed a test cluster to gluster 7.2 to run some tests, preparing
to see if we gain confidence to put this on the 5,120 node
supercomputer instead of gluster 4.1.6.
I started with
While it's still early, our testing is showing this issue fixed in
glusterfs7.2 (we were at 416).
Closing the loop in case people search for this.
Erik
On Sun, Jan 26, 2020 at 12:04:00PM -0600, Erik Jacobson wrote:
> > One last reply to myself.
>
> One of the test cases my
> The gluster NFS log has this entry:
> [2020-01-25 19:07:33.085806] E [MSGID: 109040]
> [dht-helper.c:1388:dht_migration_complete_check_task] 0-cm_shared-dht:
> 19bd72f0-6863-4f1d-80dc-a426db8670b8: failed to lookup the file on
> cm_shared-dht [Stale file handle]
> [2020-01-25 19:07:33.085848]
> yes I know but I already tried that and failed at implementing it.
> I'm now even suspecting gluster to have some kind of bug.
>
> Could you show me how to do it correctly? Which services goes into after?
> Do have example unit files for mounting gluster volumes?
I have had some struggles
gt;
>
>
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/44185096
> One last reply to myself.
One of the test cases my test scripts triggered turned out to actually
be due to my NFS RW mount options.
OLD RW NFS mount options:
"rw,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3"
NEW options that work better
rw,noatime,nolock,tcp,vers=3"
I had
We are using gluster 4.1.6. We are using gluster NFS (not ganesha).
Distributed/replicated with subvolume size 3 (6 total servers, 2
subvols).
The NFS clients use this for their root filesystem.
When I add 3 more gluster servers to add one more subvolume to the
storage volumes (so now subvolume
> looking through the last couple of week on this mailing list and reflecting
> our own experiences, I have to ask: what is the status of GlusterFS? So many
> people here reporting bugs and no solutions are in sight. GlusterFS clusters
> break left and right, reboots of a node have become a
0x5c701)
[0x7fa4fb1b8701] ) 0-cm_shared-replicate-0: Resetting event gen for
f2d7abf0-5444-48d6-863d-4b128502daf9
Thanks,
-Scott
On 4/8/20 8:31 AM, Erik Jacobson wrote:
> Hi team -
>
> We got an update to try more stuff from the community.
>
> I feel like I've been "given
/7.2/xlator/cluster/
afr.so 0x5c701
afr_inode_event_gen_reset
afr-common.c:755
Thanks
-Scott
On Thu, Apr 09, 2020 at 11:38:04AM +0530, Ravishankar N wrote:
>
> On 08/04/20 9:55 pm, Erik Jacobson wrote:
> > 9439138:[2020-04-08 15:48:44.737590] E
> > [afr-common.c:754:afr_ino
I wanted to share some positive news with the group here.
Summary: Using sharding and squashfs image files instead of expanded
directory trees for RO NFS OS images have led to impressive boot times of
2k diskless node clusters using 12 servers for gluster+tftp+etc+etc.
Details:
As you may have
<__FUNCTION__.20442> "afr_readdir_cbk",
unwind_to = 0x7fe63bb5dfbb "rda_fill_fd_cbk"}
On 4/15/20 8:14 AM, Erik Jacobson wrote:
> Scott - I was going to start with gluster74 since that is what he
> started at but it applies well to glsuter72 so I'll start tthere.
>
&
hed
strace: Process 30580 detached
> On 16/04/20 8:04 pm, Erik Jacobson wrote:
> > Quick update just on how this got set.
> >
> > gluster volume set cm_shared performance.parallel-readdir on
> >
> > Is something we did turn on, thinking it might make our NFS s
-volume
volume cm_shared-utime
On Thu, Apr 16, 2020 at 06:58:01PM +0530, Ravishankar N wrote:
>
> On 16/04/20 6:54 pm, Erik Jacobson wrote:
> > > The patch by itself is only making changes specific to AFR, so it should
> > > not
> > > affect other translator
> The patch by itself is only making changes specific to AFR, so it should not
> affect other translators. But I wonder how readdir-ahead is enabled in your
> gnfs stack. All performance xlators are turned off in gnfs except
> write-behind and AFAIK, there is no way to enable them via the CLI. Did
a system to try it on at this time.
THAK YOU!
I may have access to the 57 node test system if there is something you'd
like me to try with regards to why glusterfs74 is unstable in this
situation. Just let me know.
Erik
On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote:
> So in my test r
v_child_policy
$4 = AFR_FAV_CHILD_NONE
I am not sure what this signifies though. It appears to be a read
transaction with no event generation and no favorite child policy.
Feel free to ask for clarification in case my thought process went awry
somewhere.
Thanks,
-Scott
On Thu, Apr 02, 2020 at 02
to get to the glitch you found with the 7.4 version, as
> with
> every higher version, we expect more stability!
>
> True, maybe we should start a separate thread...
>
> Regards,
> Ravi
>
> Regards,
> Amar
>
> On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobs
THANK YOU for the hints. Very happy to have the help.
I'll reply to a couple things then dig in:
On Tue, Mar 31, 2020 at 03:27:59PM +0530, Ravishankar N wrote:
> From your reply in the other thread, I'm assuming that the file/gfid in
> question is not in genuine split-brain or needing heal. i.e.
suggested next steps?
>
> On 01/04/20 8:57 am, Erik Jacobson wrote:
> > Here are some back traces. They make my head hurt. Maybe you can suggest
> > something else to try next? In the morning I'll try to unwind this
> > myself too in the source code but I suspect
First, it's possible our analysis is off somewhere. I never get to your
print message. I put a debug statement at the start of the function so I
know we get there (just to verify my print statements were taking
affect).
I put a print statement for the if (call_count == 0) { call there, right
> (XID: 1fdba2bc,
READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)
I am missing something. I will see if Scott and I can work together
tomorrow. Happy for any more ideas, Thank you!!
On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:
> First, it's possib
━━
> sz_cui...@163.com
>
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users m
Hello all,
I am getting split-brain errors in the gnfs nfs.log when 1 gluster
server is down in a 3-brick/3-node gluster volume. It only happens under
intense load.
I reported this a few months ago but didn't have a repeatable test case.
Since then, we got reports from the field and I was able
t I will do that and
continue digging.
Any suggestions would be greatly appreciated as I think I'm starting to
tip over here on this one.
On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote:
> > Sadly I am not a developer, so I can't answer your questions.
>
> I'm not
Thank you for replying!! Responses below...
I have attached the volume def (meant to before).
I have attached a couple logs from one of the leaders.
> That's odd.
> As far as I know, the client's are accessing one of the gluster nodes
> that serves as NFS server and then syncs data across
Thank you so much for replying --
> > [2020-03-29 03:42:52.295532] E [MSGID: 108008]
> > [afr-read-txn.c:312:afr_read_txn_refresh_done] 0-cm_shared-replicate-0:
> > Failing ACCESS on gfid 8eed77d3-b4fa-4beb-a0e7-e46c2b71ffe1: split-brain
> > observed. [Input/output error]
> Since you say
> Hi Erik,
> Sadly I didn't have the time to take a look in your logs, but I would like to
> ask you whether you have statiatics of the network bandwidth usage.
> Could it be possible that the gNFS server is starved for bandwidth and fails
> to reach all bricks leading to 'split-brain' errors
> Sadly I am not a developer, so I can't answer your questions.
I'm not a FS o rnetwork developer either. I think there is a joke about
playing one on TV but maybe it's netflix now.
Enabling certain debug options made too much information for me to watch
personally (but an expert could
very far away
from typical.
Erik
>
>
>
> ━━━
> sz_cui...@163.com
>
>
> From: Strahil Nikolov
> Date: 2020-04-02 00:58
> To: Erik Jacobson; sz_cui...@163.com
> CC:
> Hmm, afr_inode_refresh_done() is called with error=0 and by the time we
> reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
> So afr_inode_refresh_done() is changing it to 5. Maybe you can put
> breakpoints/ log messages in afr_inode_refresh_done() at the places where
> error is getting
a test with gluster74 so that you can
say that's tested, we can run that test. I can do a special build.
THANK YOU!!
>
>
> -Ravi
>
>
> On 15/04/20 2:05 pm, Ravishankar N wrote:
>
>
> On 10/04/20 2:06 am, Erik Jacobson wrote:
>
> Once again thanks for
ne moves.
>
> If you would like us to also run a test with gluster74 so that you can
> say that's tested, we can run that test. I can do a special build.
>
> THANK YOU!!
>
> >
> >
> > -Ravi
> >
> >
> > On 15/04/20 2:05 pm
b03dc in gf_async (
> cbk=0x7fe640da8910 , xl=,
> async=0x7fe60c1738c8) at
> ../../../../libglusterfs/src/glusterfs/async.h:189
> #10 socket_event_poll_in (notify_handled=true, this=0x7fe63c066780)
> at socket.c:2642
> #11 socket_event_handler (fd=fd@entry=19, idx=idx@e
dx=idx@entry=10, gen=gen@entry=1,
data=data@entry=0x7fe63c066780, poll_in=,
poll_out=, poll_err=0, event_thread_died=0 '\000')
at socket.c:3040
#12 0x7fe647c84a5b in event_dispatch_epoll_handler (event=0x7fe617ffe014,
event_pool=0x563f5a98c750) at event-epoll.c:650
#13 event_dispatch
It is inconvenient for us to use MTU 9K for our gluster servers for
various reasons. We typically have bonded 10G interfaces.
We use distribute/replicate and gluster NFS for compute nodes.
My understanding is the negative to using 1500 MTU is just less
efficient use of the network. Are there
> On the other side allow jumbo frames and change mtu on even hundreds on
> nodes is extremely simple,
>
> you can just test it. I don't see "bunch of extra work" here, just use ssh
> and some scripting or something like ansible...
Our issue is we decided to simplify the configuration in our
Thank you !!!
We are going to try to run some experiments as well in the coming weeks.
Assuming I don't get re-routed, which often happens, I'll share if we
notice anything in our work load.
On Wed, May 06, 2020 at 07:41:56PM +0400, Dmitry Melekhov wrote:
>
> 06.05.2020 19:15, Erik Ja
> It is very hard to compare them because they are structurally very different.
> For example, GlusterFS performance will depend *a lot* on the underlying file
> system performance. Ceph eliminated that factor by using Bluestore.
> Ceph is very well performing for VM storage, since it's block
We never ran tests with Ceph mostly due to time constraints in
engineering. We also liked that, at least when I started as a novice,
gluster seemed easier to set up. We use the solution in automated
setup scripts for maintaining very large clusters. Simplicity in
automated setup is critical here
I agree with this assessment for the most part. I'll just add that,
during development of Gluster based solutions, we had internal use of
Redhat Gluster. This was over a year and a half ago when we started.
For my perhaps non-mainstream use cases, I found the latest versions of
gluster 7 actually
> For NVMe/SSD - raid controller is pointless , so JBOD makes most sense.
I am game for an education lesson here. We're still using spinng drives
with big RAID caches but we keep discussing SSD in the context of RAID. I
have read for many real-world workloads, RAID0 makes no sense with
modern
Hello all. Thanks again for gluster. We're having a strange problem
getting virtual machines started that are hosted on a gluster volume.
One of the ways we use gluster now is to make a HA-ish cluster head
node. A virtual machine runs in the shared storage and is backed up by 3
physical servers
> > Shortly after the sharded volume is made, there are some fuse mount
> > messages. I'm not 100% sure if this was just before or during the
> > big qemu-img command to make the 5T image
> > (qemu-img create -f raw -o preallocation=falloc
> > /adminvm/images/adminvm.img 5T)
> Any reason to have a
> Are you sure that there is no heals pending at the time of the power up
I was watching heals when the problem was persisting and it was all
clear. This was a great suggestion though.
> I checked my oVirt-based gluster and the only difference is:
> cluster.gra
> nular-entry-heal: enable
> The
I updated to 7.9, rebooted everything, and it started working.
I will have QE try to break it again and report back. I couldn't break
it but they're better at breaking things (which is hard to imagine :)
On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote:
> Thank you.
>
ks I should try something
else I'm happy to re-build it!!! We are @ 7.2 plus afr-event-gen-changes
patch.
I will keep a better eye on the fuse log to tie an error to the problem
starting.
THANKS AGAIN for responding and let me know if you have any more
clues!
Erik
>
> On Tue, Jan
nses,
Erik
>
> On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson wrote:
>
> > > Shortly after the sharded volume is made, there are some fuse mount
> > > messages. I'm not 100% sure if this was just before or during the
> > > big qemu-img command to m
state.
So something gets in to a bad state and stays that way but we don't know
how to cause it to happen at will. I will continue to try to reproduce
this as it's causing some huge problems in the field.
On Tue, Jan 26, 2021 at 07:40:19AM -0600, Erik Jacobson wrote:
> Thank you so m
Hello team -
First, I wish to state that I know we are supposed to move to Ganesha.
We had a lot of trouble with Ganesha in the past with our workload and
we still owe trying the very latest version and working with the
community. Some of our use cases are complicated and require very large
We think this fixed it. While there is random chance in there, we can't
repeat it in 7.9. So I'll close this thread out for now.
We'll ask for help again if needed. Thanks for all the kind responses,
Erik
On Fri, Jan 29, 2021 at 02:20:56PM -0600, Erik Jacobson wrote:
> I updated to
> I still have to grasp the "leader node" concept.
> Weren't gluster nodes "peers"? Or by "leader" you mean that it's
> mentioned in the fstab entry like
> /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0
> while the peer list includes l1,l2,l3 and a bunch of other nodes?
Right, it's a list of 24
> - Gluster sizing
> * We typically state compute nodes per leader but this is not for
> gluster per-se. Squashfs image objects are very efficient and
> probably would be fine for 2k nodes per leader. Leader nodes provide
> other services including console logs, system logs, and
A while back I was asked to make a blog or something similar to discuss
the use cases the team I work on (HPCM cluster management) at HPE.
If you are not interested in reading about what I'm up to, just delete
this and move on.
I really don't have a public blogging mechanism so I'll just
files on
nfs) method use heavy caching; I believe the max was 8G.
I don't have a recipe, they've just always been beefy enough for
gluster. Sorry I don't have a more scientific answer.
On Mon, Mar 22, 2021 at 02:24:17PM +0100, Diego Zuccato wrote:
> Il 19/03/2021 16:03, Erik Jacobson ha scri
> > The stuff I work on doesn't use containers much (unlike a different
> > system also at HPE).
> By "pods" I meant "glusterd instance", a server hosting a collection of
> bricks.
Oh ok. The term is overloaded in my world.
> > I don't have a recipe, they've just always been beefy enough for
> >
simultaneously.
>
> Thank you for sharing your thoughts.
>
> Sincerely,
>
> Ewen Chan
>
> ━━━
> From: gluster-users-boun...@gluster.org on
> behalf of Erik Jacobson
> Sent: March 19, 2021 11:03 AM
> To: gluster-users@gluster.org
On Tue, Sep 21, 2021 at 04:18:10PM +, Strahil Nikolov wrote:
> As far as I know a fix was introduced recently, so even missing to run the
> script won't be so critical - you can run it afterwards.
> I would use Ansible to roll out such updates on a set of nodes - this will
> prevent human
Hello all! I hope you are well.
We are starting a new software release cycle and I am trying to find a
way to upgrade customers from our build of gluster 7.9 to our build of
gluster 9.3
When we deploy gluster, we foribly remove all references to any host
names and use only IP addresses. This is
9-20 15:50:41.731542 +]
So I will dig in to the code some here.
On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote:
> Hello all! I hope you are well.
>
> We are starting a new software release cycle and I am trying to find a
> way to upgrade customers from our build of gl
family = AF_INET;
/* TODO: gf_resolve is a blocking call. kick in some
non blocking dns techniques */
On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson wrote:
> I missed the other important log snip:
>
> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolv
rds,
> Strahil Nikolov
>
>
> On Tue, Sep 21, 2021 at 0:46, Erik Jacobson
> wrote:
> I pretended I'm a low-level C programmer with network and filesystem
> experience for a few hours.
>
> I'm not sure what the right solution is but what was happening was
t will workaround the problem till it's solved.
>
> For RH you can check https://access.redhat.com/solutions/8709 (use RH dev
> subscription to read it, or ping me directly and I will try to summarize it
> for
> your OS version).
>
>
> Best Regards,
> Strahil Nikolov
70 matches
Mail list logo