Hi Dex,

On 02. 05. 20 14:10, Dexter Lagan wrote:
> Hello,
> 
>   I’ve been getting inconsistent results as well. A while ago I made a
> benchmark based on a parallel spectral norm computation. The benchmark
> works fine on Windows on most systems and uses all cores, but crashes
> randomly on other systems. I haven’t been able to figure out why. On
> Linux it doesn’t seem to use more than one core. I’d be interested to
> know if this is related. Here’s the benchmark code :
> 
> https://github.com/DexterLagan/benchmark

Beware that (processor-count) returns the number of HT-cores, so your
v1.3 is actually requesting twice the number of threads as there are
HTs. At least on Linux this is the case (checked right now).

Interesting idea... 16 threads:

$ time racket crash.rkt -d 4
SIGSEGV MAPERR si_code 1 fault on addr (nil)
Aborted (core dumped)

real    6m37,579s
user    32m55,192s
sys     0m35,124s

So that is consistent to what I see.

Have you tried using future-visualizer[1] for checking why it uses only
single CPU thread? Last summer I spent quite some time with it to help
me find the right futures usage patterns that actually enable the
speculative computation in parallel. Usually if your code is too deep
and keeps allocating "something" each frame, it goes back to the runtime
thread for each allocation.


Cheers,
Dominik

[1] https://docs.racket-lang.org/future-visualizer/index.html

> 
> Dex
> 
>> On May 2, 2020, at 1:56 PM, Dominik Pantůček
>> <dominik.pantu...@trustica.cz> wrote:
>>
>> Hello fellow Racketeers,
>>
>> during my research into how Racket can be used as generic software
>> rendering platform, I've hit some limits of Racket's (native) thread
>> handling. Once I started getting SIGSEGVs, I strongly suspected I am
>> doing too much unsafe operations - and to be honest, that was true.
>> There was one off-by-one memory access :).
>>
>> But that was easy to resolve - I just switched to safe/contracted
>> versions of everything and found and fixed the bug. But I still got
>> occasional SIGSEGV. So I dug even deeper (during last two months I've
>> read most of the JIT inlining code) than before and noticed that the
>> crashes disappear when I refrain from calling bytes-set! in parallel
>> using futures.
>>
>> So I started creating a minimal-crashing-example. At first, I failed
>> miserably. Just filling a byte array over and over again, I was unable
>> to reproduce the crash. But then I realized, that in my application,
>> threads come to play and that might be the case. And suddenly, creating
>> MCE was really easy:
>>
>> Create new eventspace using parameterize/make-eventspace, put the actual
>> code in application thread (thread ...) and make the main thread wait
>> for this application thread using thread-wait. Before starting the
>> application thread, I create a simple window, bitmap and a canvas, that
>> I keep redrawing using refresh-now after each iteration. Funny thing is,
>> now it keeps crashing even without actually modifying the bitmap in
>> question. All I need to do is to mess with some byte array in 8 threads.
>> Sometimes it takes a minute on my computer before it crashes, sometimes
>> it needs more, but it eventually crashes pretty consistently.
>>
>> And it is just 60 lines of code:
>>
>> #lang racket/gui
>>
>> (require racket/future racket/fixnum racket/cmdline)
>>
>> (define width 800)
>> (define height 600)
>>
>> (define framebuffer (make-fxvector (* width height)))
>> (define pixels (make-bytes (* width height 4)))
>>
>> (define max-depth 0)
>>
>> (command-line
>> #:once-each
>> (("-d" "--depth") d "Futures binary partitioning depth" (set! max-depth
>> (string->number d))))
>>
>> (file-stream-buffer-mode (current-output-port) 'none)
>>
>> (parameterize ((current-eventspace (make-eventspace)))
>>  (define win (new frame%
>>                   (label "test")
>>                   (width width)
>>                   (height height)))
>>  (define bmp (make-bitmap width height))
>>  (define canvas (new canvas%
>>                      (parent win)
>>                      (paint-callback
>>                       (λ (c dc)
>>                         (send dc draw-bitmap bmp 0 0)))
>>                      ))
>>
>>  (define (single-run)
>>    (define (do-bflip start end (depth 0))
>>      (cond ((fx< depth max-depth)
>>             (define cnt (fx- end start))
>>             (define cnt2 (fxrshift cnt 1))
>>             (define mid (fx+ start cnt2))
>>             (let ((f (future
>>                       (λ ()
>>                         (do-bflip start mid (fx+ depth 1))))))
>>               (do-bflip mid end (fx+ depth 1))
>>               (touch f)))
>>            (else
>>             (for ((i (in-range start end)))
>>               (define c (fxvector-ref framebuffer i))
>>               (bytes-set! pixels (+ (* i 4) 0) #xff)
>>               (bytes-set! pixels (+ (* i 4) 1) (fxand (fxrshift c 16)
>> #xff))
>>               (bytes-set! pixels (+ (* i 4) 2) (fxand (fxrshift c 8)
>> #xff))
>>               (bytes-set! pixels (+ (* i 4) 3) (fxand c #xff))))))
>>    (do-bflip 0 (* width height))
>>    (send canvas refresh-now))
>> (send win show #t)
>>
>>  (define appthread
>>    (thread
>>     (λ ()
>>       (let loop ()
>>         (single-run)
>>         (loop)))))
>>  (thread-wait appthread))
>>
>> Note: the code is deliberately de-optimized to highlight the problem.
>> Not even mentioning CPU cache coherence here....
>>
>> Running this from command-line, I can adjust the number of threads.
>> Running with 8 threads:
>>
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    1m18,162s
>> user    7m11,936s
>> sys    0m3,832s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    3m44,005s
>> user    20m10,920s
>> sys    0m11,702s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    2m1,650s
>> user    10m58,392s
>> sys    0m6,445s
>> $ time racket crash.rkt -d 3
>> SIGSEGV MAPERR si_code 1 fault on addr (nil)
>> Aborted (core dumped)
>>
>> real    8m8,666s
>> user    45m52,359s
>> sys    0m25,184s
>> $
>>
>> With 4 threads it didn't crash even after quite some time:
>>
>> $ time racket crash.rkt -d 2
>> ^Cuser break
>>  context...:
>>   "crash.rkt": [running body]
>>   temp35_0
>>   for-loop
>>   run-module-instance!
>>   perform-require!
>>
>> real    20m18,706s
>> user    61m38,546s
>> sys    0m22,719s
>> $
>>
>>
>> I'll re-run the 4-thread test overnight.
>>
>> What would be the best approach to debugging this issue? I assume I'll
>> load the racket binary in gdb and see the stack traces at the moment of
>> the crash, but that won't reveal the source of the problem (judging
>> based on my previous experience of debugging heavily multi-threaded
>> applications). Also I probably need a build with debugging symbols,
>> which is my plan for this afternoon.
>>
>> I am running this on:
>>
>> model name    : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
>>
>> HT is enabled.
>>
>> Although this is just a side project, my work (that is the paid-for
>> work) relies heavily on futures and GUI, so I would really like to nail
>> down and fix this problem.
>>
>> Any suggestions are welcome.
>>
>>
>> Cheers,
>> Dominik
>>
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to racket-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/racket-users/9e49fa26-5234-17eb-7dad-09df8a84b147%40trustica.cz.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-users+unsubscr...@googlegroups.com
> <mailto:racket-users+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/3D174C6C-646A-494D-BF77-A476A9AF6C6F%40gmail.com
> <https://groups.google.com/d/msgid/racket-users/3D174C6C-646A-494D-BF77-A476A9AF6C6F%40gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/721a4698-34b9-20b2-9c0a-fbe14784b9f3%40trustica.cz.

Reply via email to