On 10/5/2018 10:32 AM, Matthew Flatt wrote:
At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: > Again, I am really surprised that you mention that places are not > separate processes. Documentation does say they are separate racket > virtual machines, how is this accomplished if not by using separate > processes? Each place is an OS thread within the Racket process. The virtual machine is essentially instantiated once in each thread, where things that look like global variables at the C level are actually thread-local variables to make them place-specific. Still, there is some sharing among the threads. > My workers are really doing Z3 style work - number crushing and lots of > searching. No IO (writing to disk) or communication so I would expect > them to really max out all CPUs. My best guess is that it's memory-allocation bottlenecks, probably at the point of using mmap() and mprotect(). Maybe things don't scale well beyond the 4-core machines that I use. On my machines, the enclosed program can max out CPU use with system time being a small fraction. It scales ok from 1 to 4 places (i.e., real time increased only some). The machine's core are hyperthreaded, and the example maxes out CPU utilization at 8 --- but it takes twice as long in real time, so the hardware threads don't help much in this case. Running two processes with 4 places takes about the same real time as running one process with 8 places, as does 2 processes with 2 places. Do you see similar effects, or does this little example stop scaling before the number of processes matches the number of cores?

As Matthew said, this may be a case where multiple processes are better.

One thing that likely is vastly different between your two systems is the memory architecture.  On Paulo's many-core machine, each group of [probably] 6 CPUs will have its own physical bank of memory which is close to it and which it uses preferentially.  Access to a different bank may be very costly.  Paulo's machine may be spending a much greater percentage of time moving data between VM instances that are located in different memory regions ... something Matthew can't see on his quad-core.

Paulo, you might take a look at how memory is being allocated [not sure what tools you have for this] and see what happens if you restrict the process to running on various groups of CPUs.  It may be that some banks of your memory are "closer" than others.

Hope this helps,

You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to