[racket-users] Re: Need help with parallelizing a procedure

Zelphir Kaltstahl Sat, 12 Aug 2017 16:18:06 -0700

On Saturday, August 12, 2017 at 12:21:19 PM UTC+2, Zelphir Kaltstahl wrote:
> I want to parallelize a procedure which looks like this:
> 
> ~~~
> (define (gini-index subsets label-column-index)
>   (for/sum ([subset (in-list subsets)])
>     (for/sum ([label (in-list (list 0 1))])
>       (calc-proportion subset
>                        label
>                        label-column-index))))
> ~~~
> 
> I tried some variations of using places without success and then I found: 
> https://rosettacode.org/wiki/Parallel_calculations#Racket
> 
> Where the code is:
> 
> 
> ~~~
> #lang racket
> (require math)
> (provide main)
>  
> (define (smallest-factor n)
>   (list (first (first (factorize n))) n))
>  
> (define numbers 
>   '(112272537195293 112582718962171 112272537095293
>     115280098190773 115797840077099 1099726829285419))
>  
> (define (main)
>   ; create as many instances of Racket as
>   ; there are numbers:
>   (define ps 
>     (for/list ([_ numbers])
>       (place ch
>              (place-channel-put 
>               ch
>               (smallest-factor
>                (place-channel-get ch))))))
>   ; send the numbers to the instances:
>   (map place-channel-put ps numbers)
>   ; get the results and find the maximum:
>   (argmax first (map place-channel-get ps)))
> ~~~
> 
> So inside the list places are created and it seems that the whole definition 
> of what they are supposed to do is wrapped in that (place ...) expression. I 
> tried to do the same for my example:
> 
> ~~~
> (define (gini-index subsets label-column-index)
>   (for*/list ([subset (in-list subsets)]
>               [label (in-list (list 0 1))])
>     (place pch
>            (place-channel-put pch (list subset label label-column-index))
>            (let ([data (place-channel-get pch)])
>              (calc-proportion (first data)
>                               (second data)
>                               (third data))))))
> ~~~
> 
> The `subset` inside `(place-channel-put pch (list subset label 
> label-column-index))` gets underlined and the error is:
> 
> subset: identifier used out of context
> 
> (I) In the example from Rosetta code it is all easy, as here is only passed 
> one number and is does not need a name or anything, but in my example I am 
> not sure how to do it.
> 
> (II) A second thing I tried to do was using a place more than once (put, get 
> then put get to the channel again), but it did not work and my program simply 
> did nothing anymore, no cpu load or anything, but also did not finish, 
> probably waiting for an answer from the place and never getting any. Is it in 
> general not possible to use a place more than once?


Meanwhile I could find a way to use places which is the following:

~~~
(define (gini-index subsets label-column-index)
  ;; (displayln "1 calculating gini index")
  #|
  Takes:
  - a list of place descriptors
  - subsets (should always be two in this implementation)
  - labels (should alway be a (list 0 1)
  Returns:
  - a list of place descriptors
  |#
  (define (iter-subsets subsets labels place-descriptors)  ; call with empty
    (cond [(empty? subsets) place-descriptors]
          [else (iter-subsets (rest subsets)
                              labels
                              (cons (iter-labels (first subsets) labels empty)
                                    place-descriptors))]))

  (define (iter-labels subset labels place-descriptors)
    (cond [(empty? labels) place-descriptors]
          [else (let ([a-place (dynamic-place "decision-tree-places.rkt"
                                              'place-calc-proportion-main)])
                  (place-channel-put a-place
                                     (list subset (first labels) 
label-column-index))
                  (iter-labels subset
                               (rest labels)
                               (cons a-place place-descriptors)))]))

  (let ([places (flatten (iter-subsets subsets (list 0 1) empty))])
    (let ([result (for/sum ([a-place (in-list places)])
                    (place-channel-get a-place))])
      (display "result: ") (displayln result)
      result)))
~~~

I could not find a more elegant solution for creating the places and keeping a 
handle on them for calling place-channel-get on them.

However, my tests confirm exactly what you said: It's waaaay slower than even 
the single core implementation. The overhead seems to be huge for starting 
another Racket instance and all that goes with that.

I wondered about futures too, because I read (most of ;) the parallelism guide. 
I only thought that it would not work out, because of actually using 
potentially large lists of vectors and not only floats. I once (some months 
ago) ran the example for futures, which shows that they fail when allocating 
large integers. This and thinking about lists of vectors made me discard the 
idea of using futures for this.

Do you think using futures would work if the data is:

1) a list of vectors of numbers (floats or (usually not so huge) exact 
integers) and
2) 2 exact integer numbers

?

Can you show me some example code for using places multiple times?

I'd like to compare different implementations (creating places, futures, 
creating places only once, single core, others which might or might not exist) 
to conclude what is best to use here.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Re: Need help with parallelizing a procedure

Reply via email to