[racket-users] Re: Do futures actually work? What are they useful for?

mrmyers . random . suffix Mon, 11 Sep 2017 15:39:38 -0700

As far as I'm aware, futures usually shouldn't improve performance outside 
of networking or hardware-latency type situations. The main goal of futures 
is just time-sharing, not improving performance. It doesn't genuinely do 
things in parallel, it just interleaves the execution of several things at 
once.


To actually have multiple things happening at the same time, you would need 
to use places, since those take place on separate OS level threads.

On Monday, September 11, 2017 at 4:46:18 PM UTC-4, Alexis King wrote:
>
> Yesterday, a user asked a question on Stack Overflow[1] about attempting 
> to parallelize a solution to the n-queens problem. The program in 
> question is short, so I can reproduce it here in its entirety: 
>
>   #lang racket 
>
>   ; following returns true if queens are on diagonals:   
>   (define (check-diagonals bd) 
>     (for*/or ([r1 (in-range (length bd))] 
>               [r2 (in-range (add1 r1) (length bd))]) 
>       (= (abs (- r1 r2)) 
>          (abs (- (list-ref bd r1) 
>                  (list-ref bd r2)))))) 
>
>   ; set board size: 
>   (define N 8) 
>   ; start searching: 
>   (for ([brd (in-permutations (range N))]) 
>     (unless (check-diagonals brd) 
>       (displayln brd))) 
>
> I make no comment on the efficiency of the algorithm (or even its 
> correctness), as I didn’t write it. Either way, it does seem like the 
> sort of thing that could be parallelized. The above program runs very 
> quickly, but if N is increased to something larger, like 11, it takes 
> an appreciable amount of time: 
>
>   $ time racket n-queens.rkt 
>   [program output elided] 
>          14.44 real        13.92 user         0.32 sys 
>
> Fortunately, this seems like a textbook use case for fine-grained 
> parallel evaluation of subexpressions. Theoretically, it seems like it 
> should be possible to use for/async from racket/future to parallelize 
> the top-level for loop. However, making that change does not improve 
> performance at all; in fact, it significantly reduces it. Merely running 
> this version with N=9 takes ~6.5 seconds; with N=10, it takes ~55. 
>
> This is, however, not too surprising. Running the code with the futures 
> visualizer (using N=7) indicates that displayln is not legal from within 
> a future, preventing any parallel execution from ever actually taking 
> place. Presumably, we can fix this by creating futures that just compute 
> the results, then print them serially: 
>
>   (define results-futures 
>     (for/list ([brd (in-permutations (range N))]) 
>       (future (λ () (cons (check-diagonals brd) brd))))) 
>
>   (for ([result-future (in-list results-futures)]) 
>     (let ([result (touch result-future)]) 
>       (unless (car result) 
>         (displayln (cdr result))))) 
>
> With this change, we get a small speedup over the naïve attempt with 
> for/async, but we’re still far slower than the serial version. Now, with 
> N=9, it takes ~4.58 seconds, and with N=10, it takes ~44. 
>
> Taking a look at the futures visualizer for this program (again, with 
> N=7), there are now no blocks, but there are some syncs (on 
> jit_on_demand and allocate memory). However, after some time spent 
> jitting, execution seems to get going, and it actually runs a lot of 
> futures in parallel[2]! After a little bit of that, however, the 
> parallel execution seems to run out of steam, and things seem to start 
> running relatively serially again[3]. 
>
> I wasn’t really sure what was going on here, but I thought maybe it was 
> because some of the futures are just too *small*. It seems possible that 
> the overhead of scheduling thousands of futures (or, in the case of 
> N=10, millions) was far outweighing the actual runtime of the work being 
> done in the futures. Fortunately, this seems like something that could 
> be solved by just grouping the work into chunks, something that’s 
> relatively doable using in-slice: 
>
>   (define results-futures 
>     (for/list ([brds (in-slice 10000 (in-permutations (range N)))]) 
>       (future 
>        (λ () 
>          (for/list ([brd (in-list brds)]) 
>            (cons (check-diagonals brd) brd)))))) 
>
>   (for* ([results-future (in-list results-futures)] 
>          [result (in-list (touch results-future))]) 
>     (unless (car result) 
>       (displayln (cdr result)))) 
>
> It seems that my suspicion was correct, because that change helps a 
> whole lot. Now, running the parallel version of the program only takes 
> ~3.9 seconds for N=10, a speedup of more than a factor of 10 over the 
> previous version using futures. However, unfortunately, that’s still 
> *slower* than the completely serial version, which only takes ~1.4 
> seconds. Increasing N to 11 makes the parallel version take ~44 seconds, 
> and if the chunk size provided to in-slice is increased to 100,000, it 
> takes even longer, ~55 seconds. 
>
> Taking a look at the future visualizer for that version of the program, 
> with N=11 and a chunk size of 1,000,000, I see that there seem to be 
> some periods of extended parallelism, but the futures get blocked a lot 
> on memory allocation[4]. This makes sense, since now each future is 
> handling much more work, but it means the futures are synchronizing 
> constantly, likely leading to the significant performance overhead I’m 
> seeing. 
>
> At this point, I’m not sure there’s much else I know how to tweak to 
> improve future performance. I tried cutting down allocation by using 
> vectors instead of lists and specialized fixnum operations, but for 
> whatever reason, that seemed to completely destroy parallelism. I 
> thought that maybe that was because the futures were never starting up 
> in parallel, so I replaced future with would-be-future, but the results 
> were confusing to me, and I didn’t really understand what they meant. 
>
> Either way, this is a little troubling, since this task seems extremely 
> contrived, so I would hope it would be something that futures could 
> easily cope with. I think I’ve heard mixed thoughts on whether or not 
> futures have “panned out” — my understanding is that they seem promising 
> but really been able to be applied practically, in contrast with places, 
> which have been used to good effect for the parallel build process, for 
> example. Is there something that I can do to get a real performance 
> boost with futures on this problem, or is it a lost cause? 
>
> Furthermore, if this problem really is a lost cause, what are futures 
> actually useful for? The documentation would imply that they are useful 
> for lots of number crunching using machine integers or flonums, but 
> that’s a pretty specific domain. I would hope that they would be able 
> to work with simple list-based operations as well, since operations that 
> are truly exclusively numeric are relatively uncommon in my experience. 
>
> Thanks, 
> Alexis 
>
> [1]: https://stackoverflow.com/q/46144576/465378 
> [2]: http://i.imgur.com/zdXt4Ei.png 
> [3]: http://i.imgur.com/e6yPlJ8.png 
> [4]: http://i.imgur.com/hzPDsy9.png 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Re: Do futures actually work? What are they useful for?

Reply via email to