I'm not sure this is the problem that you're seeing, but I see a problem with the example. It boils down to the fact that futures do not provide concurrency.
That may sound like a surprising claim, because the whole point of futures is to run multiple things at a time. But futures merely offer best-effort parallelism; they do not provide any guarantee of concurrency. As a consequence, trying to treat an fsemaphore as a lock can go wrong. If a future manages to take an fsemaphore lock, but the future is not demanded by the main thread --- or in a chain of future demands that are demanded by the main thread --- then nothing obliges the future to continue running; it can hold the lock forever. (I put the blame on femspahores. Adding fsemaphores to the future system was something like adding mutation to a purely functional language. The addition makes certain things possible, but it also breaks local reasoning that the original design was supposed to enable.) In your example program, I see (define workers (do-start-workers)) (displayln "started") (for ((i 10000)) (mfqueue-enqueue! mfq 1)) where `do-start-workers` creates a chain of futures, but there's no `touch` on the root future while the loop calls `mfqueue-enqueue!`. Therefore, the loop can block on an fsemaphore because some future has taken the lock but stopped running for whatever reason. In this case, adding `(thread (lambda () (touch workers)))` before the loop after "started" might fix the example. In other words, you can use the `thread` concurrency construct in combination with the `future` parallelism construct to ensure progress. I think this will work because all futures in the program end up in a linear dependency chain. If there were a tree of dependencies, then I think you'd need a `thread` for each `future` to make sure that every future has an active demand. If you're seeing a deadlock at the `(touch workers)`, though, my explanation doesn't cover what you're seeing. I haven't managed to trigger the deadlock myself. At Sat, 23 May 2020 18:51:23 +0200, Dominik Pantůček wrote: > Hello again with futures! > > I started working on futures-based workers and got quickly stuck with a > dead-lock I think does not originate in my code (although it is two > semaphores, 8 futures, so I'll refrain from strong opinions here). > > I implemented a very simple futures-friendly queue using mutable pairs > and created a minimal-deadlocking-example[1]. I am running racket 3m > 7.7.0.4 which includes fixes for the futures-related bugs I discovered > recently. > > Sometimes the code just runs fine and shows the numbers of worker > iterations performed in different futures (as traced by the 'fid' > argument). But sometimes it locks in a state where there is one last > number in the queue (0 - zero) and yet the fsemaphore-count for the > count fsemaphore returns 0. Which means the semaphore was decremented > twice somewhere. The code is really VERY simple and I do not see a > race-condition within the code, that would allow any code path to > decrement the fsema-count fsemaphore twice once the worker future > receives 0. > > I am able to reproduce the behavior with racket3m running under gdb and > get the stack traces for all the threads pretty consistently. The > deadlock is apparently at: > > 2 Thread 0x7ffff7fca700 (LWP 46368) "mfqueue.rkt" > futex_wait_cancelable (private=<optimized out>, expected=0, > futex_word=0x5555559d8e78) at ../sysdeps/nptl/futex-internal.h:183 > > But that is just where the issue is showing up. The real question is how > the counter gets decremented twice (given that fsemaphores should be > futures-safe). > > Any hints would be VERY appreciated! > > > Cheers, > Dominik > > [1] http://pasterack.org/pastes/28883 > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/5dcf1260-e8bf-d719-adab-5a0fd937 > 8075%40trustica.cz. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/20200523112413.15a%40sirmail.smtp.cs.utah.edu.