On 12/31 11:44, Jeremie Courreges-Anglas wrote: > On Sat, Dec 28 2019, Jeremie Courreges-Anglas <[email protected]> wrote: > > On Sat, Dec 28 2019, Jeremie Courreges-Anglas <[email protected]> wrote: > > > > [...] > > > >> I removed test_fiber.rb and am now running the tests again, trying to > >> sort out why I'm hitting resource exhaustion... > > > > Build restarted with DEBUG=-g. > > Back to this test_fiber.rb problem. Looks like sparc register windows > handling has been fragile for some time in ruby land: > > https://redmine.ruby-lang.org/issues/5244 > > The source code mentions a discussion with David Miller: > > https://marc.info/?l=linux-sparc&m=131914569320660&w=2 > > I have tried to use "ta 0x03" instead of "flushw [....%o7]" in > coroutine_flush_register_windows(), it didn't help. > > I couldn't find a core dump for test_fiber.rb, pointers for proper > debugging welcome. Instead, here's a simple reproducer.
After over 9 months of slacking and hoping someone else would fix it, I finally took the time to look into this. Both jca@ and kn@ offered sparc64 access, and that was instrumental in debugging this issue. Thanks kn@ for setting up my account and going out of his way so I can get access from my IPv4-only network. Between Ruby 2.6 and 2.7, the default coroutine implementation for Ruby changed. Before Ruby 2.7, the coroutine implementation was pretty much the same since Ruby 1.9. In Ruby 2.7, the design is similar and equally horrific. Fibers still switch by copying the entire stack to the heap and calling setjmp, and then overwriting the entire stack with the stack of the fiber you are switching to (which was previously copied to the heap), then calling longjmp after overwriting the stack. However, there were some implementation differences between the implementations causing the new implementation to not work. I was able to modify the new implementation enough so that it does work correctly on sparc64, passing test_fiber.rb, as long as the specific files related to fibers are compiled without optimization. With that, ruby was able to pass the bootstrap tests, but it quickly failed when getting to test-all with a SIGABRT. From the core dump, I was able to tell that was an issue I already fixed upstream earlier this year. With that patch also backported, now test-all runs for much longer on sparc64 before also dying with a SIGABRT. Unfortunately, no core dump for that one, so I'm not sure what might be causing it. Maybe it was over the core dump limit and I need some ulimit -c or login.conf coredumpsize magic? Anyway, I think this is probably enough of an improvement to make Ruby 2.7 usable on sparc64. However, since I'm way outside my area of expertise, I'd like to get an OK on this from someone who uses sparc64 and ideally other non-amd64 non-i386 arches. Here's a link to the fiber (copy coroutine) fix: https://github.com/jeremyevans/ruby/commit/7008ca3838d33bd2d594251fc8c7f4ce52dde72d Briefly, the changes made: 1) Don't flush register windows when saving the stack until right before copying the stack to the heap. This fixes local variables not being correct after the stack is restored. 2) Add a char[128] buffer to a couple functions. This is necessary (Ruby crashes without it), though I'm not sure why. 3) Make sure we copy the entire current stack frame before copying the stack to the heap by calling a function and returning the address of the local variable inside of it (this causes a compiler warning). This is necessary to ensure we don't have a partial copy of the current stack frame. The Ruby fiber maintainer has briefly reviewed these changes and is OK with them, it is likely they will be upstreamed if they don't cause problems on other arches. It would also be great to get some feedback on this on arches besides amd64, i386, and sparc64, if only to make sure that it doesn't make things worse. I would like to switch the default Ruby version in ports from 2.6 to 2.7 before the end of the year, but I want to be sure that won't break any of the arches we currently build packages for. Thanks, Jeremy Index: Makefile =================================================================== RCS file: /home/cvs/ports/lang/ruby/2.7/Makefile,v retrieving revision 1.5 diff -u -p -u -p -r1.5 Makefile --- Makefile 5 Oct 2020 17:13:13 -0000 1.5 +++ Makefile 21 Oct 2020 17:41:15 -0000 @@ -6,6 +6,13 @@ SHARED_LIBS = ruby27 0.0 NEXTVER = 2.8 PKGSPEC-main ?= ruby->=2.7.0,<${NEXTVER} +REVISION-main = 0 + +MASTER_SITES0 = https://github.com/jeremyevans/ruby/commit/ +PATCHFILES = ruby-2.7-copy-coro-fix{7008ca3838d33bd2d594251fc8c7f4ce52dde72d}.diff:0 \ + ruby-2.7-ffi-closure-fix{040cfc89b9a110cd0fb2abdcd35e8215b4a71f60}.diff:0 +PATCH_DIST_STRIP = -p1 + PSEUDO_FLAVORS= no_ri_docs bootstrap # Do not build the RI docs on slow arches .if ${MACHINE_ARCH:Malpha} || ${MACHINE_ARCH:Marm} || ${MACHINE_ARCH:Mhppa} Index: distinfo =================================================================== RCS file: /home/cvs/ports/lang/ruby/2.7/distinfo,v retrieving revision 1.3 diff -u -p -u -p -r1.3 distinfo --- distinfo 5 Oct 2020 17:13:13 -0000 1.3 +++ distinfo 21 Oct 2020 17:41:47 -0000 @@ -1,2 +1,6 @@ +SHA256 (ruby-2.7-copy-coro-fix.diff) = aZxV06Be48hCTvOY/vqWQJlPRepB/l+2K0GZIfMS9Sg= +SHA256 (ruby-2.7-ffi-closure-fix.diff) = sFczOlxQZTGSDkXpvB0chFg81V4gWJYQavTgt3tCq5s= SHA256 (ruby-2.7.2.tar.gz) = blcG0NTuTh4viD2512hYa00GVn3r6jU8eW7EXoMhw9Q= +SIZE (ruby-2.7-copy-coro-fix.diff) = 2498 +SIZE (ruby-2.7-ffi-closure-fix.diff) = 410 SIZE (ruby-2.7.2.tar.gz) = 16836767 Index: patches/patch-common_mk =================================================================== RCS file: /home/cvs/ports/lang/ruby/2.7/patches/patch-common_mk,v retrieving revision 1.1.1.1 diff -u -p -u -p -r1.1.1.1 patch-common_mk --- patches/patch-common_mk 2 Jan 2020 21:19:57 -0000 1.1.1.1 +++ patches/patch-common_mk 21 Oct 2020 17:02:22 -0000 @@ -4,6 +4,9 @@ Enable verbose mode when building. Don't regenerate rdoc documentation during install. +Force building coroutine code without optimization on sparc64 +to avoid crashes. + Index: common.mk --- common.mk.orig +++ common.mk @@ -34,3 +37,20 @@ Index: common.mk pre-install-doc:: install-prereq do-install-doc: $(PROGRAM) pre-install-doc $(INSTRUBY) --make="$(MAKE)" $(INSTRUBY_ARGS) --install=rdoc $(INSTALL_DOC_OPTS) +@@ -1522,6 +1522,16 @@ update-man-date: PHONY + -e 'BEGIN{@vcs=VCS.detect(ARGV.shift)}' \ + -e '$$_.sub!(/^(\.Dd ).*/){[email protected](ARGF.path).strftime("%B %d, %Y")}' \ + "$(srcdir)" "$(srcdir)"/man/*.1 ++ ++.if "sparc64" == ${MACHINE} ++cont.o: cont.c ++ @$(ECHO) compiling $< ++ $(Q) $(CC) $(CFLAGS) $(XCFLAGS) $(CPPFLAGS) -O0 $(COUTFLAG)$@ -c $< ++ ++coroutine/copy/Context.o: coroutine/copy/Context.c ++ @$(ECHO) compiling $< ++ $(Q) $(CC) $(CFLAGS) $(XCFLAGS) $(CPPFLAGS) -O0 $(COUTFLAG)$@ -c $< ++.endif + + HELP_EXTRA_TASKS = "" +
