[Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
Hi hackers, While looking over a system call trace while debugging another issue, I noticed a LOT of system calls to __sigprocmask14 (on NetBSD). It turns out these are generated by setjmp() and longjmp(), so upon each minor GC it make do two superfluous system calls. According to POSIX, it is actually /unspecified/ if those affect the signal mask: http://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html This setting of the signal mask is probably even unintended and may be the cause of bugs when the mask is set since the GC might reset any masked signals. The following program prints signaling and GOT SIGINT with Chicken master on NetBSD (using csi): (use posix) (set-signal-handler! signal/int (lambda (sig) (print GOT SIGINT))) (signal-mask! signal/int) (let lp ((i 0)) (if (= i 1000) (begin (print signaling) (process-signal (current-process-id) signal/int) (lp 0)) (lp (add1 i With the attached patch it only prints signaling, which is the same behavior this program shows on Linux with both versions. I also made a small test but I wasn't sure whether the GC is guaranteed to be triggered so it might not be a very valid test (it doesn't when you compile it, but for me it does in csi). This test is attached. If it's okay, feel free to add it to the testsuite. So, to solve this, it's more correct to use sigsetjmp/siglongjmp to explicitly set the mask or not. However, according to this GNULib manual http://www.gnu.org/software/gnulib/manual/gnulib.html#sigsetjmp this function isn't supported by MingW, MSVC and Minix. That's why the attached patch uses _sigsetjmp and _siglongjmp. Those are in POSIX, and pretty widely supported (but they're slated to be *possibly* removed in future POSIX versions). According to the aforementioned GNULib manual, _sigsetjmp/_siglongjmp are the fastest way to set and restore the registers without signal mask. I've also attached two outputs from the resurrected Chicken benchmarks by Mario: https://github.com/mario-goulart/chicken-benchmarks These clearly show about a 5% speed improvement across the board. (you can see a clear comparison by using the compare.scm program from that repo). I compiled Chicken with itself before running this benchmark. This may have an effect because according to Mario the compilation time is included in the measurements. Cheers, Peter -- http://sjamaan.ath.cx -- The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. -- Donald Knuth From c21fd1ae69cdae80c563753d56c66480eb5fe2a3 Mon Sep 17 00:00:00 2001 From: Peter Bex peter@xs4all.nl Date: Wed, 13 Jun 2012 22:54:26 +0200 Subject: [PATCH] Use _setjmp/_longjmp instead of setjmp/longjmp to prevent unneccessary system call overhead for saving/restoring the signal mask on systems where this is done --- chicken.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/chicken.h b/chicken.h index 837a51c..31960b0 100644 --- a/chicken.h +++ b/chicken.h @@ -874,8 +874,8 @@ DECL_C_PROC_p0 (128, 1,0,0,0,0,0,0,0) # define C_gettimeofday gettimeofday # define C_gmtime gmtime # define C_localtimelocaltime -# define C_setjmp setjmp -# define C_longjmp longjmp +# define C_setjmp _setjmp +# define C_longjmp _longjmp # define C_alloca alloca # define C_strerror strerror # define C_isalpha isalpha -- 1.7.9.1 ((repetitions . 10) (installation-prefix . /home/sjamaan/chicken-test) (csc-options . ) (results (triangl . 10.951) (travinit . 0.907) (traverse . 3.569) (takr . 7.649) (takl . 2.825) (tak . 4.519) (sort1 . 41.279) (slatex . 10.483) (scheme . 0.724) (sboyer . 410.743) (puzzle . 0.682) (psyntax . 22.867) (nucleic2 . 50.316) (nqueens . 0.77) (nfa . 33.036) (nestedloop . 45.192) (nboyer . 177.593) (nbody . 49.134) (mazefun . 42.822) (maze . 2.124) (lattice . 118.835) (kanren . 62.096) (hanoi . 6.023) (graphs . 15.114) (fread . 17.907) (fprint . 2.769) (fibc . 18.013) (fib . 2.034) (fft . 0.501) (earley . 0.62) (dynamic . 2.404) (div-rec . 1.031) (div-iter . 0.339) (destructive . 1.219) (deriv . 10.01) (dderiv . 9.12) (ctak . 2.708) (cpstak . 6.139) (conform . 1.853) (browse . 1.85) (boyer . 1.446) (binarytrees . 1.224) (0 . 0))) ((repetitions . 10) (installation-prefix . /home/sjamaan/chicken-master) (csc-options . ) (results (triangl . 10.715) (travinit . 0.911) (traverse . 3.59) (takr . 7.663) (takl . 2.733) (tak . 4.562) (sort1 . 42.108) (slatex .
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
Hi, On Wed, 13 Jun 2012 23:22:02 +0200 Peter Bex peter@xs4all.nl wrote: I've also attached two outputs from the resurrected Chicken benchmarks by Mario: https://github.com/mario-goulart/chicken-benchmarks These clearly show about a 5% speed improvement across the board. (you can see a clear comparison by using the compare.scm program from that repo). I compiled Chicken with itself before running this benchmark. This may have an effect because according to Mario the compilation time is included in the measurements. Just a small clarification: the timings in the log files don't take into account the compilation time. Those numbers are just for the execution time. The message at the end of the execution of run.scm (Total run time: ...) does take compilation time into account (sorry, Peter -- I thought you were talking about that). So, the normalized results you see when you feed compare.scm the log files consider only execution time of the benchmarked programs. Sorry for the confusion. Best wishes. Mario -- http://parenteses.org/mario ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
Hi Peter, On Wed, 13 Jun 2012 23:22:02 +0200 Peter Bex peter@xs4all.nl wrote: I've also attached two outputs from the resurrected Chicken benchmarks by Mario: https://github.com/mario-goulart/chicken-benchmarks These clearly show about a 5% speed improvement across the board. (you can see a clear comparison by using the compare.scm program from that repo). I compiled Chicken with itself before running this benchmark. This may have an effect because according to Mario the compilation time is included in the measurements. Unfortunately I could not observe any improvement on Linux x86. The performance is actually slightly worse with your patch (using master, c48a109d668f3186bb4a213940c0b0b81a1ad03d). I run the benchmarks with no csc options and with -O3. Here are the results ([2] is the chicken with the _setjmp/_longjmp patch): +---[1]: |- installation-prefix: /home/mario/local/chicken-2012-06-13 |- csc-options: |- repetitions: 10 +---[2]: |- installation-prefix: /home/mario/local/chicken-2012-06-13-longjmp |- csc-options: |- repetitions: 10 Displaying normalized results (larger numbers indicate better results) Programs [1] [2] 0_1.00__1.00 binarytrees___1.03__1.00 boyer_1.01__1.00 browse1.00__1.00 conform___1.00__1.00 cpstak1.01__1.00 ctak__1.00__1.00 dderiv1.00__1.00 deriv_1.00__1.00 destructive___1.02__1.00 div-iter__1.00__1.00 div-rec___1.00__1.00 dynamic___1.00__1.00 earley1.06__1.00 fft___1.03__1.00 fib___1.03__1.00 fibc__1.00__1.00 fprint1.01__1.00 fread_1.01__1.00 graphs1.00__1.00 hanoi_1.00__1.00 kanren1.00__1.00 lattice___1.04__1.00 maze__1.01__1.00 mazefun___1.00__1.00 nbody_1.00__1.00 nboyer1.01__1.00 nestedloop1.01__1.00 nfa___1.03__1.00 nqueens___1.00__1.00 nucleic2__1.03__1.00 paraffins_1.00__1.00 peval_1.00__1.00 psyntax___1.00__1.00 puzzle1.00__1.00 sboyer1.01__1.00 scheme1.05__1.00 slatex1.05__1.00 sort1_1.00__1.00 tak___1.03__1.00 takl__1.01__1.00 takr__1.03__1.00 traverse__1.00__1.00 travinit__1.03__1.00 triangl___1.03__1.00 +---[1]: |- installation-prefix: /home/mario/local/chicken-2012-06-13 |- csc-options: -O3 |- repetitions: 10 +---[2]: |- installation-prefix: /home/mario/local/chicken-2012-06-13-longjmp |- csc-options: -O3 |- repetitions: 10 Displaying normalized results (larger numbers indicate better results) Programs [1] [2] 0_1.00__1.00 binarytrees___1.00__1.00 boyer_1.02__1.00 browse1.00__1.00 conform___1.01__1.00 cpstak1.00__1.01 ctak__1.00__1.00 dderiv1.01__1.00 deriv_1.00__1.00 destructive___1.04__1.00 div-iter__1.03__1.00 div-rec___1.00__1.01 dynamic___1.00__1.00 earley1.03__1.00 fft___1.04__1.00 fib___1.00__1.00 fibc__1.00__1.00 fprint1.00__1.01 fread_1.01__1.00 graphs1.00__1.00 hanoi_1.00__1.00 kanren1.00__1.00 lattice___1.00__1.00 maze__1.00__1.01 mazefun___1.00__1.00 nbody_1.03__1.00 nboyer1.02__1.00 nestedloop1.02__1.00 nfa___1.01__1.00 nqueens___1.05__1.00 nucleic2__1.00__1.00 paraffins_1.00__1.00 peval_1.00__1.00 psyntax___1.00__1.00 puzzle1.02__1.00 sboyer1.00__1.00 scheme1.09__1.00
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
On Jun 13, 2012, at 7:28 PM, Mario Domenech Goulart wrote: Unfortunately I could not observe any improvement on Linux x86. The performance is actually slightly worse with your patch (using master, c48a109d668f3186bb4a213940c0b0b81a1ad03d). I run the benchmarks with no csc options and with -O3. Here are the results ([2] is the chicken with the _setjmp/_longjmp patch): There won't be any improvement on Linux because _setjmp == setjmp; Linux doesn't save the signal mask on setjmp() unless the obscure __FAVOR_BSD is #defined. The performance regression you observed could just be statistical noise as well -- but, sometimes gcc will inline known calls and it might do that for setjmp and not _setjmp, even though setjmp is just a macro alias for _setjmp. Only way to be sure is to look at the assembly output. Other than figuring that out, it would be a good idea to test on mingw and OS X (I was going to do this). However testing on other platforms like cygwin or Solaris (or more obscure?) is problematic. It is not really a question of whether _setjmp works but if every platform supports _setjmp. I don't know if this is something to throw in before the release, if one is coming soon, unless we are going to test every supported platform before release. Anyone else? JIm ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
Jim Ursetto scripsit: Other than figuring that out, it would be a good idea to test on mingw and OS X (I was going to do this). However testing on other platforms like cygwin or Solaris (or more obscure?) is problematic. It is not really a question of whether _setjmp works but if every platform supports _setjmp. On Cygwin, _setjmp and _longjmp are supported. I can test, but probably not until next week. On Solaris, _setjmp and _longjmp don't manipulate the signal mask, whereas setjmp and longjmp do, according to the man pages. I can't test on Solaris. -- John Cowan co...@ccil.org http://www.ccil.org/~cowan Humpty Dump Dublin squeaks through his norse Humpty Dump Dublin hath a horrible vorse But for all his kinks English / And his irismanx brogues Humpty Dump Dublin's grandada of all rogues. --Cousin James ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
On Jun 13, 2012, at 8:51 PM, John Cowan wrote: On Cygwin, _setjmp and _longjmp are supported. I can test, but probably not until next week. Cool, thanks. We can probably assume that if it compiles, it works fine. On Solaris, _setjmp and _longjmp don't manipulate the signal mask, whereas setjmp and longjmp do, according to the man pages. I can't test on Solaris. Peter mentioned to me that _setjmp is supported on Solaris but not on very old Solaris (like, 2.5). Question is, are there any supported systems that don't support _setjmp at all. We believe that, if a system does support _setjmp, it will do the right thing. Jim ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Re: [Chicken-hackers] [PATCH] Change setjmp/longjmp to _setjmp/_longjmp to avoid overhead and fix signal masking bug
Jim Ursetto scripsit: Peter mentioned to me that _setjmp is supported on Solaris but not on very old Solaris (like, 2.5). I can't believe that anyone really cares about a 17-year-old release. Backward compat is fine, but there need to be limits. Does current Chicken build with gcc 2.7? It's just about as old. -- Henry S. Thompson said, / Syntactic, structural, John Cowan Value constraints we / Express on the fly. co...@ccil.org Simon St. Laurent: Your / Incomprehensible http://www.ccil.org/~cowan Abracadabralike / schemas must die! ___ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers