[Gluster-users] split-brain errors under heavy load when one brick down

2019-09-16 Thread Erik Jacobson
Hello all. I'm new to the list but not to gluster. We are using gluster to service NFS boot on a top500 cluster. It is a Distributed-Replicate volume 3x9. We are having a problem when one server in a subvolume goes down, we get random missing files and split-brain errors in the nfs.log file. We

Re: [Gluster-users] split-brain errors under heavy load when one brick down

2019-09-18 Thread Erik Jacobson
Thank you for replying! > Okay so 0-cm_shared-replicate-1 means these 3 bricks: > > Brick4: 172.23.0.6:/data/brick_cm_shared > Brick5: 172.23.0.7:/data/brick_cm_shared > Brick6: 172.23.0.8:/data/brick_cm_shared The above is correct. > Were there any pending self-heals for this volume? Is it

Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-04 Thread Erik Jacobson
for glusterd,ctdb and vdo - as I need to > 'put' dependencies for each of those. > > Now, I'm no longer using ctdb & NFS Ganesha (as my version of ctdb cannot use > hpstnames and my environment is a little bit crazy), but I can still provide > hints how I did it. > > Be

[Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-03 Thread Erik Jacobson
So, I have a solution I have written about in the based that is based on gluster with CTDB for IP and a level of redundancy. It's been working fine except for a few quirks I need to work out on giant clusters when I get access. I have 3x9 gluster volume, each are also NFS servers, using gluster

Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-09 Thread Erik Jacobson
> Here is what was the setup : I thought I'd share an update in case it helps others. Your ideas inspired me to try a different approach. We support 4 main distros (and a 2 variants of some). We try not to provide our own versions of distro-supported packages like CTDB where possible. So a

Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-05 Thread Erik Jacobson
On Tue, Nov 05, 2019 at 05:05:08AM +0200, Strahil wrote: > Sure, > > Here is what was the setup : Thank you! You're very kind to send me this. I will verify it with my setup soon. Hoping to to rid myself of these dep problems. Thank you !!! Erik Community Meeting Calendar: APAC

Re: [Gluster-users] NFS clients show missing files while gluster volume rebalanced

2020-02-10 Thread Erik Jacobson
questions on some rebalance errors, which I will send in a separate email. Erik On Wed, Jan 29, 2020 at 06:20:34PM -0600, Erik Jacobson wrote: > We are using gluster 4.1.6. We are using gluster NFS (not ganesha). > > Distributed/replicated with subvolume size 3 (6 total servers, 2

[Gluster-users] question on rebalance errors gluster 7.2 (adding to distributed/replicated)

2020-02-10 Thread Erik Jacobson
My question: Are the errors and anomalies below something I need to investigate? Are should I not be worried? I installed a test cluster to gluster 7.2 to run some tests, preparing to see if we gain confidence to put this on the 5,120 node supercomputer instead of gluster 4.1.6. I started with

Re: [Gluster-users] gluster NFS hang observed mounting or umounting at scale

2020-02-13 Thread Erik Jacobson
While it's still early, our testing is showing this issue fixed in glusterfs7.2 (we were at 416). Closing the loop in case people search for this. Erik On Sun, Jan 26, 2020 at 12:04:00PM -0600, Erik Jacobson wrote: > > One last reply to myself. > > One of the test cases my

Re: [Gluster-users] gluster NFS hang observed mounting or umounting at scale

2020-01-25 Thread Erik Jacobson
> The gluster NFS log has this entry: > [2020-01-25 19:07:33.085806] E [MSGID: 109040] > [dht-helper.c:1388:dht_migration_complete_check_task] 0-cm_shared-dht: > 19bd72f0-6863-4f1d-80dc-a426db8670b8: failed to lookup the file on > cm_shared-dht [Stale file handle] > [2020-01-25 19:07:33.085848]

Re: [Gluster-users] No possible to mount a gluster volume via /etc/fstab?

2020-01-25 Thread Erik Jacobson
> yes I know but I already tried that and failed at implementing it. > I'm now even suspecting gluster to have some kind of bug. > > Could you show me how to do it correctly? Which services goes into after? > Do have example unit files for mounting gluster volumes? I have had some struggles

Re: [Gluster-users] gluster NFS hang observed mounting or umounting at scale

2020-01-25 Thread Erik Jacobson
gt; > > > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/44185096

Re: [Gluster-users] gluster NFS hang observed mounting or umounting at scale

2020-01-26 Thread Erik Jacobson
> One last reply to myself. One of the test cases my test scripts triggered turned out to actually be due to my NFS RW mount options. OLD RW NFS mount options: "rw,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3" NEW options that work better rw,noatime,nolock,tcp,vers=3" I had

[Gluster-users] NFS clients show missing files while gluster volume rebalanced

2020-01-29 Thread Erik Jacobson
We are using gluster 4.1.6. We are using gluster NFS (not ganesha). Distributed/replicated with subvolume size 3 (6 total servers, 2 subvols). The NFS clients use this for their root filesystem. When I add 3 more gluster servers to add one more subvolume to the storage volumes (so now subvolume

Re: [Gluster-users] GlusterFS problems & alternatives

2020-02-11 Thread Erik Jacobson
> looking through the last couple of week on this mailing list and reflecting > our own experiences, I have to ask: what is the status of GlusterFS? So many > people here reporting bugs and no solutions are in sight. GlusterFS clusters > break left and right, reboots of a node have become a

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-08 Thread Erik Jacobson
0x5c701) [0x7fa4fb1b8701] ) 0-cm_shared-replicate-0: Resetting event gen for f2d7abf0-5444-48d6-863d-4b128502daf9 Thanks, -Scott On 4/8/20 8:31 AM, Erik Jacobson wrote: > Hi team - > > We got an update to try more stuff from the community. > > I feel like I've been "given

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-09 Thread Erik Jacobson
/7.2/xlator/cluster/ afr.so 0x5c701 afr_inode_event_gen_reset afr-common.c:755 Thanks -Scott On Thu, Apr 09, 2020 at 11:38:04AM +0530, Ravishankar N wrote: > > On 08/04/20 9:55 pm, Erik Jacobson wrote: > > 9439138:[2020-04-08 15:48:44.737590] E > > [afr-common.c:754:afr_ino

[Gluster-users] Impressive boot times for big clusters: NFS, Image Objects, and Sharding

2020-04-08 Thread Erik Jacobson
I wanted to share some positive news with the group here. Summary: Using sharding and squashfs image files instead of expanded directory trees for RO NFS OS images have led to impressive boot times of 2k diskless node clusters using 12 servers for gluster+tftp+etc+etc. Details: As you may have

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-15 Thread Erik Jacobson
<__FUNCTION__.20442> "afr_readdir_cbk", unwind_to = 0x7fe63bb5dfbb "rda_fill_fd_cbk"} On 4/15/20 8:14 AM, Erik Jacobson wrote: > Scott - I was going to start with gluster74 since that is what he > started at but it applies well to glsuter72 so I'll start tthere. > &

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-16 Thread Erik Jacobson
hed strace: Process 30580 detached > On 16/04/20 8:04 pm, Erik Jacobson wrote: > > Quick update just on how this got set. > > > > gluster volume set cm_shared performance.parallel-readdir on > > > > Is something we did turn on, thinking it might make our NFS s

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-16 Thread Erik Jacobson
-volume volume cm_shared-utime On Thu, Apr 16, 2020 at 06:58:01PM +0530, Ravishankar N wrote: > > On 16/04/20 6:54 pm, Erik Jacobson wrote: > > > The patch by itself is only making changes specific to AFR, so it should > > > not > > > affect other translator

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-16 Thread Erik Jacobson
> The patch by itself is only making changes specific to AFR, so it should not > affect other translators. But I wonder how readdir-ahead is enabled in your > gnfs stack. All performance xlators are turned off in gnfs except > write-behind and AFAIK, there is no way to enable them via the CLI. Did

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-16 Thread Erik Jacobson
a system to try it on at this time. THAK YOU! I may have access to the 57 node test system if there is something you'd like me to try with regards to why glusterfs74 is unstable in this situation. Just let me know. Erik On Thu, Apr 16, 2020 at 12:03:33PM -0500, Erik Jacobson wrote: > So in my test r

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-07 Thread Erik Jacobson
v_child_policy $4 = AFR_FAV_CHILD_NONE I am not sure what this signifies though.  It appears to be a read transaction with no event generation and no favorite child policy. Feel free to ask for clarification in case my thought process went awry somewhere. Thanks, -Scott On Thu, Apr 02, 2020 at 02

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-17 Thread Erik Jacobson
to get to the glitch you found with the 7.4 version, as > with > every higher version, we expect more stability! > > True, maybe we should start a separate thread... > > Regards, > Ravi > > Regards, > Amar > > On Fri, Apr 17, 2020 at 2:46 AM Erik Jacobs

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-31 Thread Erik Jacobson
THANK YOU for the hints. Very happy to have the help. I'll reply to a couple things then dig in: On Tue, Mar 31, 2020 at 03:27:59PM +0530, Ravishankar N wrote: > From your reply in the other thread, I'm assuming that the file/gfid in > question is not in genuine split-brain or needing heal. i.e.

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-04 Thread Erik Jacobson
suggested next steps? > > On 01/04/20 8:57 am, Erik Jacobson wrote: > > Here are some back traces. They make my head hurt. Maybe you can suggest > > something else to try next? In the morning I'll try to unwind this > > myself too in the source code but I suspect

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-05 Thread Erik Jacobson
First, it's possible our analysis is off somewhere. I never get to your print message. I put a debug statement at the start of the function so I know we get there (just to verify my print statements were taking affect). I put a print statement for the if (call_count == 0) { call there, right

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-05 Thread Erik Jacobson
> (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null) I am missing something. I will see if Scott and I can work together tomorrow. Happy for any more ideas, Thank you!! On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote: > First, it's possib

Re: [Gluster-users] Cann't mount NFS,please help!

2020-04-01 Thread Erik Jacobson
━━ > sz_cui...@163.com > > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users m

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-28 Thread Erik Jacobson
Hello all, I am getting split-brain errors in the gnfs nfs.log when 1 gluster server is down in a 3-brick/3-node gluster volume. It only happens under intense load. I reported this a few months ago but didn't have a repeatable test case. Since then, we got reports from the field and I was able

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-31 Thread Erik Jacobson
t I will do that and continue digging. Any suggestions would be greatly appreciated as I think I'm starting to tip over here on this one. On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote: > > Sadly I am not a developer, so I can't answer your questions. > > I'm not

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-29 Thread Erik Jacobson
Thank you for replying!! Responses below... I have attached the volume def (meant to before). I have attached a couple logs from one of the leaders. > That's odd. > As far as I know, the client's are accessing one of the gluster nodes > that serves as NFS server and then syncs data across

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-30 Thread Erik Jacobson
Thank you so much for replying -- > > [2020-03-29 03:42:52.295532] E [MSGID: 108008] > > [afr-read-txn.c:312:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: > > Failing ACCESS on gfid 8eed77d3-b4fa-4beb-a0e7-e46c2b71ffe1: split-brain > > observed. [Input/output error] > Since you say

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-30 Thread Erik Jacobson
> Hi Erik, > Sadly I didn't have the time to take a look in your logs, but I would like to > ask you whether you have statiatics of the network bandwidth usage. > Could it be possible that the gNFS server is starved for bandwidth and fails > to reach all bricks leading to 'split-brain' errors

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-03-30 Thread Erik Jacobson
> Sadly I am not a developer, so I can't answer your questions. I'm not a FS o rnetwork developer either. I think there is a joke about playing one on TV but maybe it's netflix now. Enabling certain debug options made too much information for me to watch personally (but an expert could

Re: [Gluster-users] 回复: Re: Cann't mount NFS,please help!

2020-04-01 Thread Erik Jacobson
very far away from typical. Erik > > > > ━━━ > sz_cui...@163.com > > > From: Strahil Nikolov > Date: 2020-04-02 00:58 > To: Erik Jacobson; sz_cui...@163.com > CC:

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-02 Thread Erik Jacobson
> Hmm, afr_inode_refresh_done() is called with error=0 and by the time we > reach afr_txn_refresh_done(), it becomes 5(i.e. EIO). > So afr_inode_refresh_done() is changing it to 5. Maybe you can put > breakpoints/ log messages in afr_inode_refresh_done() at the places where > error is getting

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-15 Thread Erik Jacobson
a test with gluster74 so that you can say that's tested, we can run that test. I can do a special build. THANK YOU!! > > > -Ravi > > > On 15/04/20 2:05 pm, Ravishankar N wrote: > > > On 10/04/20 2:06 am, Erik Jacobson wrote: > > Once again thanks for

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-15 Thread Erik Jacobson
ne moves. > > If you would like us to also run a test with gluster74 so that you can > say that's tested, we can run that test. I can do a special build. > > THANK YOU!! > > > > > > > -Ravi > > > > > > On 15/04/20 2:05 pm

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-15 Thread Erik Jacobson
b03dc in gf_async ( > cbk=0x7fe640da8910 , xl=, > async=0x7fe60c1738c8) at > ../../../../libglusterfs/src/glusterfs/async.h:189 > #10 socket_event_poll_in (notify_handled=true, this=0x7fe63c066780) > at socket.c:2642 > #11 socket_event_handler (fd=fd@entry=19, idx=idx@e

Re: [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

2020-04-15 Thread Erik Jacobson
dx=idx@entry=10, gen=gen@entry=1, data=data@entry=0x7fe63c066780, poll_in=, poll_out=, poll_err=0, event_thread_died=0 '\000') at socket.c:3040 #12 0x7fe647c84a5b in event_dispatch_epoll_handler (event=0x7fe617ffe014, event_pool=0x563f5a98c750) at event-epoll.c:650 #13 event_dispatch

[Gluster-users] MTU 9000 question

2020-05-06 Thread Erik Jacobson
It is inconvenient for us to use MTU 9K for our gluster servers for various reasons. We typically have bonded 10G interfaces. We use distribute/replicate and gluster NFS for compute nodes. My understanding is the negative to using 1500 MTU is just less efficient use of the network. Are there

Re: [Gluster-users] MTU 9000 question

2020-05-06 Thread Erik Jacobson
> On the other side allow jumbo frames and change mtu on even hundreds on > nodes is extremely simple, > > you can just test it. I don't see "bunch of extra work" here, just use ssh > and some scripting or something like ansible... Our issue is we decided to simplify the configuration in our

Re: [Gluster-users] MTU 9000 question

2020-05-06 Thread Erik Jacobson
Thank you !!! We are going to try to run some experiments as well in the coming weeks. Assuming I don't get re-routed, which often happens, I'll share if we notice anything in our work load. On Wed, May 06, 2020 at 07:41:56PM +0400, Dmitry Melekhov wrote: > > 06.05.2020 19:15, Erik Ja

Re: [Gluster-users] State of Gluster project

2020-06-17 Thread Erik Jacobson
> It is very hard to compare them because they are structurally very different. > For example, GlusterFS performance will depend *a lot* on the underlying file > system performance. Ceph eliminated that factor by using Bluestore. > Ceph is very well performing for VM storage, since it's block

Re: [Gluster-users] State of Gluster project

2020-06-17 Thread Erik Jacobson
We never ran tests with Ceph mostly due to time constraints in engineering. We also liked that, at least when I started as a novice, gluster seemed easier to set up. We use the solution in automated setup scripts for maintaining very large clusters. Simplicity in automated setup is critical here

Re: [Gluster-users] State of Gluster project

2020-06-21 Thread Erik Jacobson
I agree with this assessment for the most part. I'll just add that, during development of Gluster based solutions, we had internal use of Redhat Gluster. This was over a year and a half ago when we started. For my perhaps non-mainstream use cases, I found the latest versions of gluster 7 actually

Re: [Gluster-users] State of Gluster project

2020-06-22 Thread Erik Jacobson
> For NVMe/SSD - raid controller is pointless , so JBOD makes most sense. I am game for an education lesson here. We're still using spinng drives with big RAID caches but we keep discussing SSD in the context of RAID. I have read for many real-world workloads, RAID0 makes no sense with modern

[Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-25 Thread Erik Jacobson
Hello all. Thanks again for gluster. We're having a strange problem getting virtual machines started that are hosted on a gluster volume. One of the ways we use gluster now is to make a HA-ish cluster head node. A virtual machine runs in the shared storage and is backed up by 3 physical servers

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
> > Shortly after the sharded volume is made, there are some fuse mount > > messages. I'm not 100% sure if this was just before or during the > > big qemu-img command to make the 5T image > > (qemu-img create -f raw -o preallocation=falloc > > /adminvm/images/adminvm.img 5T) > Any reason to have a

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
> Are you sure that there is no heals pending at the time of the power up I was watching heals when the problem was persisting and it was all clear. This was a great suggestion though. > I checked my oVirt-based gluster and the only difference is: > cluster.gra > nular-entry-heal: enable > The

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-29 Thread Erik Jacobson
I updated to 7.9, rebooted everything, and it started working. I will have QE try to break it again and report back. I couldn't break it but they're better at breaking things (which is hard to imagine :) On Fri, Jan 29, 2021 at 01:11:50PM -0600, Erik Jacobson wrote: > Thank you. >

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-26 Thread Erik Jacobson
ks I should try something else I'm happy to re-build it!!! We are @ 7.2 plus afr-event-gen-changes patch. I will keep a better eye on the fuse log to tie an error to the problem starting. THANKS AGAIN for responding and let me know if you have any more clues! Erik > > On Tue, Jan

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-27 Thread Erik Jacobson
nses, Erik > > On Wed, Jan 27, 2021 at 5:28 PM Erik Jacobson wrote: > > > > Shortly after the sharded volume is made, there are some fuse mount > > > messages. I'm not 100% sure if this was just before or during the > > > big qemu-img command to m

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-01-26 Thread Erik Jacobson
state. So something gets in to a bad state and stays that way but we don't know how to cause it to happen at will. I will continue to try to reproduce this as it's causing some huge problems in the field. On Tue, Jan 26, 2021 at 07:40:19AM -0600, Erik Jacobson wrote: > Thank you so m

[Gluster-users] gnfs exports netmask handling can incorrectly deny access to clients

2021-01-30 Thread Erik Jacobson
Hello team - First, I wish to state that I know we are supposed to move to Ganesha. We had a lot of trouble with Ganesha in the past with our workload and we still owe trying the very latest version and working with the community. Some of our use cases are complicated and require very large

Re: [Gluster-users] qemu raw image file - qemu and grub2 can't find boot content from VM

2021-02-01 Thread Erik Jacobson
We think this fixed it. While there is random chance in there, we can't repeat it in 7.9. So I'll close this thread out for now. We'll ask for help again if needed. Thanks for all the kind responses, Erik On Fri, Jan 29, 2021 at 02:20:56PM -0600, Erik Jacobson wrote: > I updated to

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-23 Thread Erik Jacobson
> I still have to grasp the "leader node" concept. > Weren't gluster nodes "peers"? Or by "leader" you mean that it's > mentioned in the fstab entry like > /l1,l2,l3:gv0 /mnt/gv0 glusterfs defaults 0 0 > while the peer list includes l1,l2,l3 and a bunch of other nodes? Right, it's a list of 24

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-19 Thread Erik Jacobson
> - Gluster sizing > * We typically state compute nodes per leader but this is not for > gluster per-se. Squashfs image objects are very efficient and > probably would be fine for 2k nodes per leader. Leader nodes provide > other services including console logs, system logs, and

[Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-19 Thread Erik Jacobson
A while back I was asked to make a blog or something similar to discuss the use cases the team I work on (HPCM cluster management) at HPE. If you are not interested in reading about what I'm up to, just delete this and move on. I really don't have a public blogging mechanism so I'll just

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-22 Thread Erik Jacobson
files on nfs) method use heavy caching; I believe the max was 8G. I don't have a recipe, they've just always been beefy enough for gluster. Sorry I don't have a more scientific answer. On Mon, Mar 22, 2021 at 02:24:17PM +0100, Diego Zuccato wrote: > Il 19/03/2021 16:03, Erik Jacobson ha scri

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-22 Thread Erik Jacobson
> > The stuff I work on doesn't use containers much (unlike a different > > system also at HPE). > By "pods" I meant "glusterd instance", a server hosting a collection of > bricks. Oh ok. The term is overloaded in my world. > > I don't have a recipe, they've just always been beefy enough for > >

Re: [Gluster-users] Gluster usage scenarios in HPC cluster management

2021-03-19 Thread Erik Jacobson
simultaneously. > > Thank you for sharing your thoughts. > > Sincerely, > > Ewen Chan > > ━━━ > From: gluster-users-boun...@gluster.org on > behalf of Erik Jacobson > Sent: March 19, 2021 11:03 AM > To: gluster-users@gluster.org

Re: [Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

2021-09-21 Thread Erik Jacobson
On Tue, Sep 21, 2021 at 04:18:10PM +, Strahil Nikolov wrote: > As far as I know a fix was introduced recently, so even missing to run the > script won't be so critical - you can run it afterwards. > I would use Ansible to roll out such updates on a set of nodes - this will > prevent human

[Gluster-users] gluster update question regarding new DNS resolution requirement

2021-09-20 Thread Erik Jacobson
Hello all! I hope you are well. We are starting a new software release cycle and I am trying to find a way to upgrade customers from our build of gluster 7.9 to our build of gluster 9.3 When we deploy gluster, we foribly remove all references to any host names and use only IP addresses. This is

Re: [Gluster-users] gluster update question regarding new DNS resolution requirement

2021-09-20 Thread Erik Jacobson
9-20 15:50:41.731542 +] So I will dig in to the code some here. On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote: > Hello all! I hope you are well. > > We are starting a new software release cycle and I am trying to find a > way to upgrade customers from our build of gl

[Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

2021-09-20 Thread Erik Jacobson
family = AF_INET; /* TODO: gf_resolve is a blocking call. kick in some non blocking dns techniques */ On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson wrote: > I missed the other important log snip: > > The message "E [MSGID: 101075] [common-utils.c:520:gf_resolv

Re: [Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

2021-09-21 Thread Erik Jacobson
rds, > Strahil Nikolov > > > On Tue, Sep 21, 2021 at 0:46, Erik Jacobson > wrote: > I pretended I'm a low-level C programmer with network and filesystem > experience for a few hours. > > I'm not sure what the right solution is but what was happening was

Re: [Gluster-users] gluster update question regarding new DNS resolution requirement

2021-09-21 Thread Erik Jacobson
t will workaround the problem till it's solved. > > For RH you can check https://access.redhat.com/solutions/8709 (use RH dev > subscription to read it, or ping me directly and I will try to summarize it > for > your OS version). > > > Best Regards, > Strahil Nikolov