It's difficult to be sure from your description, but it sounds like the problem may just be the usual one of scaling parallelism when communication is involved.
Red is probably synchronization. It might be synchronization due to the communication you have between places, it might be synchronization on Racket's internal data structures, or it might be that the OS has to synchronize actions from multiple places within the same process (e.g., multiple places are allocating and calling OS functions like mmap and mprotect, which the OS has to synchronize within a process). We've tried to minimize sharing among places, and it's important that they can GC independently, but there are still various forms of sharing to manage internally. In contrast, running separate processes for Z3 should scale well, especially if the Z3 task is compute-intensive with minimal I/0 --- a best-case scenario for the OS. A parallel `raco setup` runs into similar issues. In recent development builds, you might experiment with passing `--processes` to `raco setup` to have it use separate processes instead of places within a single OS process, but I think you'll still find that it tops out well below your machine's compute capacity. Partly, dependencies constrain parallelism. Partly, the processes have to communicate more and there's a lot of I/O. At Fri, 5 Oct 2018 11:43:36 +0200, "'Paulo Matos' via Racket Users" wrote: > All, > > A quick update on this problem which is in my critical path. > I just noticed, in an attempt to reproduce it, that during the package > setup part of the racket compilation procedure the same happens. > > I am running `make CPUS=24 in-place`on a 36 cpu machine and I see that > not only sometimes the racket process status goes from 'R' to 'D' (which > also happens in my case), the CPUs are never really working at 100% with > a lot of the work being done at kernel level. > > Has anyone ever noticed this? > > On 01/10/2018 11:13, 'Paulo Matos' via Racket Users wrote: > > > > Hi, > > > > I am not sure this is an issue with places or what it could be but my > > devops-fu is poor and I am not even sure how to debug something like > > this so maybe someone with more knowledge than me on this might chime in > > to hint on a possible debug method. > > > > I was running some benchmarks and noticed something odd for the first > > time (although it doesn't mean it was ok before, just that this is the > > first time I am actually analysing this issue). > > > > My program (the master) will create N places (the workers) and each > > place will start by issuing a rosette call which will trigger a call to > > the z3 smt solver. So, N instances of Z3 will run and after it is done > > it will run pure racket code that implements a graph search algorithm. > > This N worker places are actually in a sync call waiting for messages > > from the master and the work is being done by a thread on the worker > > place. The master is either waiting for the timeout to arrive or for a > > solution to be sent from a worker. > > > > The interesting thing is that when the Z3 instances are running I get > > all my 16 CPUs (on a dedicated machine) working at 100%. When the racket > > code is running the search, they are all holding off at around 60%-80% > > with a huge portion of it in the kernel (red bars in htop). > > > > Since the Z3 calls come before the threads inside the places are started > > and we get to the sync call, is it possible something bad is happening > > in the sync call that uses the kernel so much? Take a look at htop > > during Z3 and during the search - screenshots attached. > > > > Are there any suggestions on what the problem might be or how I could > > start to understand why the kernel is so active? > > > > Kind regards, > > > > > > -- > Paulo Matos > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.