Hi Sam,

On 02. 05. 20 14:26, Sam Tobin-Hochstadt wrote:
> I successfully reproduced this on the first try, which is good. Here's
> my debugging advice (I'm also looking at it):
> 
> 1. To use a binary with debugging symbols, use
> `racket/src/build/racket/racket3m` from the checkout of the Racket
> repository that you built.
> 2. When running racket in GDB, there are lots of segfaults because of
> the GC; you'll want to use `handle SIGSEGV nostop noprint`
> 3. It may not work for this situation because of parallelism, but if
> you can reproduce the bug using `rr` [1] it will be almost infinitely
> easier to find and fix.

thanks for the hints and also thanks for opening the Github issue for
that. I'll try to post my results (if any) there.

> 
> I'm also curious about your experience with Racket CS and futures.
> It's unlikely to have the _same_ bugs, but it would be good to find
> the ones there are. :)

This is going to be a really hard one. With all the tricks I learned
during past weeks, I get almost 400 frames per second with my experiment
using 3m and unsafe operations. Without unsafe operations it goes down
to 300 and without unsafe operations and with the de-optimized flip
function as shown in the example + set-argb-pixels, I am at about 50 fps
(that is presumably a completely "safe" version without relying on my
bounds and type checking).

With CS, I am unable to get quickly working anything else than the
de-optimized version with set-argb-pixels and I am at about 5 fps. Also,
the thread scheduling is "interesting" at best. I am postponing the work
on that - I sort of assume, that it can take another few weeks to
understand how to properly use all the fixnum/flonum related stuff with CS.


Thanks again!
Dominik


> 
> [1] https://rr-project.org
> 
> On Sat, May 2, 2020 at 7:56 AM Dominik Pantůček
> <dominik.pantu...@trustica.cz> wrote:
>>
>> Hello fellow Racketeers,
>>
>> during my research into how Racket can be used as generic software
>> rendering platform, I've hit some limits of Racket's (native) thread
>> handling. Once I started getting SIGSEGVs, I strongly suspected I am
>> doing too much unsafe operations - and to be honest, that was true.
>> There was one off-by-one memory access :).
>>
>> But that was easy to resolve - I just switched to safe/contracted
>> versions of everything and found and fixed the bug. But I still got
>> occasional SIGSEGV. So I dug even deeper (during last two months I've
>> read most of the JIT inlining code) than before and noticed that the
>> crashes disappear when I refrain from calling bytes-set! in parallel
>> using futures.
>>
>> So I started creating a minimal-crashing-example. At first, I failed
>> miserably. Just filling a byte array over and over again, I was unable
>> to reproduce the crash. But then I realized, that in my application,
>> threads come to play and that might be the case. And suddenly, creating
>> MCE was really easy:
>>
>> Create new eventspace using parameterize/make-eventspace, put the actual
>> code in application thread (thread ...) and make the main thread wait
>> for this application thread using thread-wait. Before starting the
>> application thread, I create a simple window, bitmap and a canvas, that
>> I keep redrawing using refresh-now after each iteration. Funny thing is,
>> now it keeps crashing even without actually modifying the bitmap in
>> question. All I need to do is to mess with some byte array in 8 threads.
>> Sometimes it takes a minute on my computer before it crashes, sometimes
>> it needs more, but it eventually crashes pretty consistently.
>>
>> And it is just 60 lines of code:
>>
>> #lang racket/gui
>>
>> (require racket/future racket/fixnum racket/cmdline)
>>
>> (define width 800)
>> (define height 600)
>>
>> (define framebuffer (make-fxvector (* width height)))
>> (define pixels (make-bytes (* width height 4)))
>>
>> (define max-depth 0)
>>
>> (command-line
>>  #:once-each
>>  (("-d" "--depth") d "Futures binary partitioning depth" (set! max-depth
>> (string->number d))))
>>
>> (file-stream-buffer-mode (current-output-port) 'none)
>>
>> (parameterize ((current-eventspace (make-eventspace)))
>>   (define win (new frame%
>>                    (label "test")
>>                    (width width)
>>                    (height height)))
>>   (define bmp (make-bitmap width height))
>>   (define canvas (new canvas%
>>                       (parent win)
>>                       (paint-callback
>>                        (λ (c dc)
>>                          (send dc draw-bitmap bmp 0 0)))
>>                       ))
>>
>>   (define (single-run)
>>     (define (do-bflip start end (depth 0))
>>       (cond ((fx< depth max-depth)
>>              (define cnt (fx- end start))
>>              (define cnt2 (fxrshift cnt 1))
>>              (define mid (fx+ start cnt2))
>>              (let ((f (future
>>                        (λ ()
>>                          (do-bflip start mid (fx+ depth 1))))))
>>                (do-bflip mid end (fx+ depth 1))
>>                (touch f)))
>>             (else
>>              (for ((i (in-range start end)))
>>                (define c (fxvector-ref framebuffer i))
>>                (bytes-set! pixels (+ (* i 4) 0) #xff)
>>                (bytes-set! pixels (+ (* i 4) 1) (fxand (fxrshift c 16)
>> #xff))
>>                (bytes-set! pixels (+ (* i 4) 2) (fxand (fxrshift c 8) #xff))
>>                (bytes-set! pixels (+ (* i 4) 3) (fxand c #xff))))))
>>     (do-bflip 0 (* width height))
>>     (send canvas refresh-now))
>> (send win show #t)
>>
>>   (define appthread
>>     (thread
>>      (λ ()
>>        (let loop ()
>>          (single-run)
>>          (loop)))))
>>   (thread-wait appthread))
>>
>> Note: the code is deliberately de-optimized to highlight the problem.
>> Not even mentioning CPU cache coherence here....
>>
>> Running this from command-line, I can adjust the number of threads.
>> Running with 8 threads:
>>
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    1m18,162s
>> user    7m11,936s
>> sys     0m3,832s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    3m44,005s
>> user    20m10,920s
>> sys     0m11,702s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    2m1,650s
>> user    10m58,392s
>> sys     0m6,445s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    8m8,666s
>> user    45m52,359s
>> sys     0m25,184s
>> $
>>
>> With 4 threads it didn't crash even after quite some time:
>>
>> $ time racket crash.rkt -d 2
>> ^Cuser break
>>   context...:
>>    "crash.rkt": [running body]
>>    temp35_0
>>    for-loop
>>    run-module-instance!
>>    perform-require!
>>
>> real    20m18,706s
>> user    61m38,546s
>> sys     0m22,719s
>> $
>>
>>
>> I'll re-run the 4-thread test overnight.
>>
>> What would be the best approach to debugging this issue? I assume I'll
>> load the racket binary in gdb and see the stack traces at the moment of
>> the crash, but that won't reveal the source of the problem (judging
>> based on my previous experience of debugging heavily multi-threaded
>> applications). Also I probably need a build with debugging symbols,
>> which is my plan for this afternoon.
>>
>> I am running this on:
>>
>> model name      : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
>>
>> HT is enabled.
>>
>> Although this is just a side project, my work (that is the paid-for
>> work) relies heavily on futures and GUI, so I would really like to nail
>> down and fix this problem.
>>
>> Any suggestions are welcome.
>>
>>
>> Cheers,
>> Dominik
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/racket-users/9e49fa26-5234-17eb-7dad-09df8a84b147%40trustica.cz.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/9fa03ba1-37ab-899a-462b-9d01b54ff832%40trustica.cz.

Reply via email to