Re: [racket-users] Places code not using all the CPU
On 05/10/2018 19:23, Matthew Flatt wrote: > > We should certainly update the documentation with information about the > limits of parallelism via places. > Added PR: https://github.com/racket/racket/pull/2304 -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
On 08/10/2018 22:12, Philip McGrath wrote: > This is much closer to the metal than where I usually spend my time, > but, if it terns out that multiple OS processes is better than OS > threads in this case, Distributed Places might provide an easier path to > move to multiple processes than using `subprocess` directly: > http://docs.racket-lang.org/distributed-places/index.html > Sam mentioned trying that yesterday and I developed the loci library before I did try them. Looking at the API, I can only say that at the moment my library is certainly easier to use in the localhost. Once I get to try to implement remote loci I will look into distributed places and try to improve on that. -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
Hi all, Apologies for the delay in sending this email but I have been trying to implement and test an alternative and wanted to be sure it works before sending this off. So, as Matthew suggested this problem has to do with memory allocation. The --no-alloc option in Matthew's suggested snippet does not show the delay I usually see in the thread CPU usage although thread creation is still quite slow past around 20 places. I started developing loci [1] to solve this problem instance yesterday and I got it to a point where I can prove that subprocesses solve the problem I am seeing. No point attaching a screenshot of htop with all bars full to 100%... that's what happens. Also, process creation is almost instantaneous and there's no delay compared to threads. In the evening after I had almost everything sorted, Sam suggested on Slack that I try distributed-places and use them locally. I haven't tried this and I cannot say if it works better or worse but it seems certainly harder to use than loci as my library uses the same API as places. Part of the development was pretty quick because I noticed Matthew had been playing with this before: https://github.com/racket/racket/blob/master/pkgs/racket-benchmarks/tests/racket/benchmarks/places/place-processes.rkt (might be worth noting that the code doesn't work with current racket) I will adding contracts, tests and documentation throughout the week and then replace places in my system with loci so I can dog-food the library. Next step is to add remote loci at which point I will want to compare with distributed-places and possibly improve on it. If anyone has comments, suggestions or complaints on the library please let me know but keep in mind it's barely a day old. Paulo Matos 1: https://github.com/LinkiTools/racket-loci https://pkgd.racket-lang.org/pkgn/search?q=loci On 05/10/2018 19:23, Matthew Flatt wrote: > At Fri, 5 Oct 2018 17:55:47 +0200, Paulo Matos wrote: >> Matthew, Sam, do you understand why this is happening? > > I still think it's probably allocation, and probably specifically > content on the process's page table. Do you see different behavior with > a non-allocating variant (via `--no-alloc` below)? > > We should certainly update the documentation with information about the > limits of parallelism via places. > > > > #lang racket > > (define (go n alloc?) > (place/context p > (let ([v (vector (if alloc? 0.0 0))] >[inc (if alloc? 1.0 1)]) >(let loop ([i 30]) > (unless (zero? i) >(vector-set! v 0 (+ (vector-ref v 0) inc)) >(loop (sub1 i) > (printf "Place ~a done~n" n) > n)) > > (module+ main > (define alloc? #t) > (define cores > (command-line > #:once-each > [("--no-alloc") "Non-allocating variant" (set! alloc? #f)] > #:args (cores) > (string->number cores))) > > (time >(map place-wait > (for/list ([i (in-range cores)]) > (printf "Starting core ~a~n" i) > (go i alloc?) > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
I just confirmed that this is due to memory allocation locking in the kernel. If your places do no allocation then all is fine. Paulo Matos On 08/10/2018 21:39, James Platt wrote: > I wonder if this has anything to do with mitigation for Spectre, Meltdown or > the other speculative execution vulnerabilities that have been identified > recently. I understand that some or all of the patches affect the > performance of multi-CPU processing in general. > > James > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
This is much closer to the metal than where I usually spend my time, but, if it terns out that multiple OS processes is better than OS threads in this case, Distributed Places might provide an easier path to move to multiple processes than using `subprocess` directly: http://docs.racket-lang.org/distributed-places/index.html On Mon, Oct 8, 2018 at 7:39 PM James Platt wrote: > I wonder if this has anything to do with mitigation for Spectre, Meltdown > or the other speculative execution vulnerabilities that have been > identified recently. I understand that some or all of the patches affect > the performance of multi-CPU processing in general. > > James > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
I wonder if this has anything to do with mitigation for Spectre, Meltdown or the other speculative execution vulnerabilities that have been identified recently. I understand that some or all of the patches affect the performance of multi-CPU processing in general. James -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
On 10/5/2018 10:32 AM, Matthew Flatt wrote: At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: > Again, I am really surprised that you mention that places are not > separate processes. Documentation does say they are separate racket > virtual machines, how is this accomplished if not by using separate > processes? Each place is an OS thread within the Racket process. The virtual machine is essentially instantiated once in each thread, where things that look like global variables at the C level are actually thread-local variables to make them place-specific. Still, there is some sharing among the threads. > My workers are really doing Z3 style work - number crushing and lots of > searching. No IO (writing to disk) or communication so I would expect > them to really max out all CPUs. My best guess is that it's memory-allocation bottlenecks, probably at the point of using mmap() and mprotect(). Maybe things don't scale well beyond the 4-core machines that I use. On my machines, the enclosed program can max out CPU use with system time being a small fraction. It scales ok from 1 to 4 places (i.e., real time increased only some). The machine's core are hyperthreaded, and the example maxes out CPU utilization at 8 --- but it takes twice as long in real time, so the hardware threads don't help much in this case. Running two processes with 4 places takes about the same real time as running one process with 8 places, as does 2 processes with 2 places. Do you see similar effects, or does this little example stop scaling before the number of processes matches the number of cores? As Matthew said, this may be a case where multiple processes are better. One thing that likely is vastly different between your two systems is the memory architecture. On Paulo's many-core machine, each group of [probably] 6 CPUs will have its own physical bank of memory which is close to it and which it uses preferentially. Access to a different bank may be very costly. Paulo's machine may be spending a much greater percentage of time moving data between VM instances that are located in different memory regions ... something Matthew can't see on his quad-core. Paulo, you might take a look at how memory is being allocated [not sure what tools you have for this] and see what happens if you restrict the process to running on various groups of CPUs. It may be that some banks of your memory are "closer" than others. Hope this helps, George -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
if not I will have to redesign my system to use 'subprocess' Expanding on this, for students on the list... Having many worker host processes is not necessarily a bad thing. It can be more programmer work, but it simplifies the parallelism in a way (e.g., "let the Linux kernel worry about it" :), and it potentially gives you better isolation and resilience for some kinds of defects (in native code used via FFI, in Racket code, and even in the suspiciously sturdy Racket VM/backend). If appropriate for your application, you can also consider a worker pool, with a health metric, sometimes reusing workers to avoid process startup times, and sometimes retiring, and perhaps sometimes benching workers for an induced big GC if that makes sense compared to retiring/unpooling, and maybe sometimes quarantining workers for debugging/dumps while keeping the system running. You can also spread your workers across multiple hosts, not just CPUs/cores. You can even use the worker pool to introduce new changes to a running system (being very rapid, or as an additional mechanism beyond normal testing for production), and do A/B performance/correctness of changes, and change rollback. If your data to be communicated to/from a worker is relatively small and won't be a bottleneck, you can simply push it through the stdin and stdout of each process; otherwise, you can get judicious/clever with the many available host OS mechanisms. (Students: Being able to get our hands dirty and engineer systems beyond a framework, when necessary, is one of the reasons we get CS/SE/EE/CE degrees and broad experience, rather than only collect a binder full of Certified Currently-Popular JS Framework Technician certs. Those oppressive student loans, and/or years of self-guided open source experience, might not be in vain. :) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
At Fri, 5 Oct 2018 17:55:47 +0200, Paulo Matos wrote: > Matthew, Sam, do you understand why this is happening? I still think it's probably allocation, and probably specifically content on the process's page table. Do you see different behavior with a non-allocating variant (via `--no-alloc` below)? We should certainly update the documentation with information about the limits of parallelism via places. #lang racket (define (go n alloc?) (place/context p (let ([v (vector (if alloc? 0.0 0))] [inc (if alloc? 1.0 1)]) (let loop ([i 30]) (unless (zero? i) (vector-set! v 0 (+ (vector-ref v 0) inc)) (loop (sub1 i) (printf "Place ~a done~n" n) n)) (module+ main (define alloc? #t) (define cores (command-line #:once-each [("--no-alloc") "Non-allocating variant" (set! alloc? #f)] #:args (cores) (string->number cores))) (time (map place-wait (for/list ([i (in-range cores)]) (printf "Starting core ~a~n" i) (go i alloc?) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
I was trying to create a much more elaborate example when Matthew sent his tiny one which is enough to show the problem. I started a 64core machine on aws to show the issue. I see a massive degradation as the number of places increases. I use this slightly modified code: #lang racket (define (go n) (place/context p (let ([v (vector 0.0)]) (let loop ([i 30]) (unless (zero? i) (vector-set! v 0 (+ (vector-ref v 0) 1.0)) (loop (sub1 i) (printf "Place ~a done~n" n) n)) (module+ main (define cores (command-line #:args (cores) (string->number cores))) (time (map place-wait (for/list ([i (in-range cores)]) (printf "Starting core ~a~n" i) (go i) Here's the results in the video (might take a few minutes until it is live): https://youtu.be/cDe_KF6nmJM The guide says about places: "The place form creates a place, which is effectively a new Racket instance that can run in parallel to other places, including the initial place." I think this is misleading at the moment. If this behaviour can be 'fixed' then great, if not I will have to redesign my system to use 'subprocess' to start another racket process and a footnote should be added to places in documentation to alert the users about this behaviour. Matthew, Sam, do you understand why this is happening? On 05/10/2018 16:51, Sam Tobin-Hochstadt wrote: > I tried this same program on my desktop, which also has 4 (i7-4770) > cores with hyperthreading. Here's what I see: > > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 1 > N: 1, cpu: 5808/5808.0, real: 5804 > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 2 > N: 2, cpu: 12057/6028.5, real: 6063 > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 3 > N: 3, cpu: 23377/7792., real: 7914 > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 4 > N: 4, cpu: 41155/10288.75, real: 10357 > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 6 > N: 6, cpu: 89932/14988., real: 15687 > [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) > plt] time r ~/Downloads/p.rkt 8 > N: 8, cpu: 165152/20644.0, real: 21104 > > Real time goes up about 80% from 1-4 places, and then doubles again > from 4 to 8. System time for 8 places is also about 10x what it is for > 2 places, but only gets up to 2 seconds. > On Fri, Oct 5, 2018 at 10:32 AM Matthew Flatt wrote: >> >> At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: >>> Again, I am really surprised that you mention that places are not >>> separate processes. Documentation does say they are separate racket >>> virtual machines, how is this accomplished if not by using separate >>> processes? >> >> Each place is an OS thread within the Racket process. The virtual >> machine is essentially instantiated once in each thread, where things >> that look like global variables at the C level are actually >> thread-local variables to make them place-specific. Still, there is >> some sharing among the threads. >> >>> My workers are really doing Z3 style work - number crushing and lots of >>> searching. No IO (writing to disk) or communication so I would expect >>> them to really max out all CPUs. >> >> My best guess is that it's memory-allocation bottlenecks, probably at >> the point of using mmap() and mprotect(). Maybe things don't scale well >> beyond the 4-core machines that I use. >> >> On my machines, the enclosed program can max out CPU use with system >> time being a small fraction. It scales ok from 1 to 4 places (i.e., >> real time increased only some). The machine's core are hyperthreaded, >> and the example maxes out CPU utilization at 8 --- but it takes twice >> as long in real time, so the hardware threads don't help much in this >> case. Running two processes with 4 places takes about the same real >> time as running one process with 8 places, as does 2 processes with 2 >> places. >> >> Do you see similar effects, or does this little example stop scaling >> before the number of processes matches the number of cores? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Racket Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to racket-users+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
I tried this same program on my desktop, which also has 4 (i7-4770) cores with hyperthreading. Here's what I see: [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 1 N: 1, cpu: 5808/5808.0, real: 5804 [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 2 N: 2, cpu: 12057/6028.5, real: 6063 [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 3 N: 3, cpu: 23377/7792., real: 7914 [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 4 N: 4, cpu: 41155/10288.75, real: 10357 [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 6 N: 6, cpu: 89932/14988., real: 15687 [samth@huor:~/work/grant_parallel_compilers/nsf_submissions (master) plt] time r ~/Downloads/p.rkt 8 N: 8, cpu: 165152/20644.0, real: 21104 Real time goes up about 80% from 1-4 places, and then doubles again from 4 to 8. System time for 8 places is also about 10x what it is for 2 places, but only gets up to 2 seconds. On Fri, Oct 5, 2018 at 10:32 AM Matthew Flatt wrote: > > At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: > > Again, I am really surprised that you mention that places are not > > separate processes. Documentation does say they are separate racket > > virtual machines, how is this accomplished if not by using separate > > processes? > > Each place is an OS thread within the Racket process. The virtual > machine is essentially instantiated once in each thread, where things > that look like global variables at the C level are actually > thread-local variables to make them place-specific. Still, there is > some sharing among the threads. > > > My workers are really doing Z3 style work - number crushing and lots of > > searching. No IO (writing to disk) or communication so I would expect > > them to really max out all CPUs. > > My best guess is that it's memory-allocation bottlenecks, probably at > the point of using mmap() and mprotect(). Maybe things don't scale well > beyond the 4-core machines that I use. > > On my machines, the enclosed program can max out CPU use with system > time being a small fraction. It scales ok from 1 to 4 places (i.e., > real time increased only some). The machine's core are hyperthreaded, > and the example maxes out CPU utilization at 8 --- but it takes twice > as long in real time, so the hardware threads don't help much in this > case. Running two processes with 4 places takes about the same real > time as running one process with 8 places, as does 2 processes with 2 > places. > > Do you see similar effects, or does this little example stop scaling > before the number of processes matches the number of cores? > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: > Again, I am really surprised that you mention that places are not > separate processes. Documentation does say they are separate racket > virtual machines, how is this accomplished if not by using separate > processes? Each place is an OS thread within the Racket process. The virtual machine is essentially instantiated once in each thread, where things that look like global variables at the C level are actually thread-local variables to make them place-specific. Still, there is some sharing among the threads. > My workers are really doing Z3 style work - number crushing and lots of > searching. No IO (writing to disk) or communication so I would expect > them to really max out all CPUs. My best guess is that it's memory-allocation bottlenecks, probably at the point of using mmap() and mprotect(). Maybe things don't scale well beyond the 4-core machines that I use. On my machines, the enclosed program can max out CPU use with system time being a small fraction. It scales ok from 1 to 4 places (i.e., real time increased only some). The machine's core are hyperthreaded, and the example maxes out CPU utilization at 8 --- but it takes twice as long in real time, so the hardware threads don't help much in this case. Running two processes with 4 places takes about the same real time as running one process with 8 places, as does 2 processes with 2 places. Do you see similar effects, or does this little example stop scaling before the number of processes matches the number of cores? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. p.rkt Description: Binary data
Re: [racket-users] Places code not using all the CPU
On 05/10/2018 14:15, Matthew Flatt wrote: > It's difficult to be sure from your description, but it sounds like the > problem may just be the usual one of scaling parallelism when > communication is involved. > Matthew, thanks for the reply. The interesting thing here is that there is no communication between places _most of the time_. It works as a ring topology where every worker only communicates with the master and the master with all workers. This communication is relatively rare, as in a message sent every few minutes. > Red is probably synchronization. It might be synchronization due to the > communication you have between places, it might be synchronization on > Racket's internal data structures, or it might be that the OS has to > synchronize actions from multiple places within the same process (e.g., > multiple places are allocating and calling OS functions like mmap and > mprotect, which the OS has to synchronize within a process). We've > tried to minimize sharing among places, and it's important that they > can GC independently, but there are still various forms of sharing to > manage internally. In contrast, running separate processes for Z3 > should scale well, especially if the Z3 task is compute-intensive with > minimal I/0 --- a best-case scenario for the OS. > So, here you have pointed out to something that's surprising to me: "OS has to synchronize actions from multiple places within the same process (e.g., multiple places are allocating and calling OS functions like mmap and mprotect, which the OS has to synchronize within a process)." I thought each place was its own process similar to issuing a call of racket itself on the body of the place. Now it seems somehow places are all in the same process... in which case they'll probably even share mutexes, although these low level details are a bit foggy in my mind. > A parallel `raco setup` runs into similar issues. In recent development > builds, you might experiment with passing `--processes` to `raco setup` > to have it use separate processes instead of places within a single OS > process, but I think you'll still find that it tops out well below your > machine's compute capacity. Partly, dependencies constrain parallelism. > Partly, the processes have to communicate more and there's a lot of > I/O. Again, I am really surprised that you mention that places are not separate processes. Documentation does say they are separate racket virtual machines, how is this accomplished if not by using separate processes? My workers are really doing Z3 style work - number crushing and lots of searching. No IO (writing to disk) or communication so I would expect them to really max out all CPUs. -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
It's difficult to be sure from your description, but it sounds like the problem may just be the usual one of scaling parallelism when communication is involved. Red is probably synchronization. It might be synchronization due to the communication you have between places, it might be synchronization on Racket's internal data structures, or it might be that the OS has to synchronize actions from multiple places within the same process (e.g., multiple places are allocating and calling OS functions like mmap and mprotect, which the OS has to synchronize within a process). We've tried to minimize sharing among places, and it's important that they can GC independently, but there are still various forms of sharing to manage internally. In contrast, running separate processes for Z3 should scale well, especially if the Z3 task is compute-intensive with minimal I/0 --- a best-case scenario for the OS. A parallel `raco setup` runs into similar issues. In recent development builds, you might experiment with passing `--processes` to `raco setup` to have it use separate processes instead of places within a single OS process, but I think you'll still find that it tops out well below your machine's compute capacity. Partly, dependencies constrain parallelism. Partly, the processes have to communicate more and there's a lot of I/O. At Fri, 5 Oct 2018 11:43:36 +0200, "'Paulo Matos' via Racket Users" wrote: > All, > > A quick update on this problem which is in my critical path. > I just noticed, in an attempt to reproduce it, that during the package > setup part of the racket compilation procedure the same happens. > > I am running `make CPUS=24 in-place`on a 36 cpu machine and I see that > not only sometimes the racket process status goes from 'R' to 'D' (which > also happens in my case), the CPUs are never really working at 100% with > a lot of the work being done at kernel level. > > Has anyone ever noticed this? > > On 01/10/2018 11:13, 'Paulo Matos' via Racket Users wrote: > > > > Hi, > > > > I am not sure this is an issue with places or what it could be but my > > devops-fu is poor and I am not even sure how to debug something like > > this so maybe someone with more knowledge than me on this might chime in > > to hint on a possible debug method. > > > > I was running some benchmarks and noticed something odd for the first > > time (although it doesn't mean it was ok before, just that this is the > > first time I am actually analysing this issue). > > > > My program (the master) will create N places (the workers) and each > > place will start by issuing a rosette call which will trigger a call to > > the z3 smt solver. So, N instances of Z3 will run and after it is done > > it will run pure racket code that implements a graph search algorithm. > > This N worker places are actually in a sync call waiting for messages > > from the master and the work is being done by a thread on the worker > > place. The master is either waiting for the timeout to arrive or for a > > solution to be sent from a worker. > > > > The interesting thing is that when the Z3 instances are running I get > > all my 16 CPUs (on a dedicated machine) working at 100%. When the racket > > code is running the search, they are all holding off at around 60%-80% > > with a huge portion of it in the kernel (red bars in htop). > > > > Since the Z3 calls come before the threads inside the places are started > > and we get to the sync call, is it possible something bad is happening > > in the sync call that uses the kernel so much? Take a look at htop > > during Z3 and during the search - screenshots attached. > > > > Are there any suggestions on what the problem might be or how I could > > start to understand why the kernel is so active? > > > > Kind regards, > > > > > > -- > Paulo Matos > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
All, A quick update on this problem which is in my critical path. I just noticed, in an attempt to reproduce it, that during the package setup part of the racket compilation procedure the same happens. I am running `make CPUS=24 in-place`on a 36 cpu machine and I see that not only sometimes the racket process status goes from 'R' to 'D' (which also happens in my case), the CPUs are never really working at 100% with a lot of the work being done at kernel level. Has anyone ever noticed this? On 01/10/2018 11:13, 'Paulo Matos' via Racket Users wrote: > > Hi, > > I am not sure this is an issue with places or what it could be but my > devops-fu is poor and I am not even sure how to debug something like > this so maybe someone with more knowledge than me on this might chime in > to hint on a possible debug method. > > I was running some benchmarks and noticed something odd for the first > time (although it doesn't mean it was ok before, just that this is the > first time I am actually analysing this issue). > > My program (the master) will create N places (the workers) and each > place will start by issuing a rosette call which will trigger a call to > the z3 smt solver. So, N instances of Z3 will run and after it is done > it will run pure racket code that implements a graph search algorithm. > This N worker places are actually in a sync call waiting for messages > from the master and the work is being done by a thread on the worker > place. The master is either waiting for the timeout to arrive or for a > solution to be sent from a worker. > > The interesting thing is that when the Z3 instances are running I get > all my 16 CPUs (on a dedicated machine) working at 100%. When the racket > code is running the search, they are all holding off at around 60%-80% > with a huge portion of it in the kernel (red bars in htop). > > Since the Z3 calls come before the threads inside the places are started > and we get to the sync call, is it possible something bad is happening > in the sync call that uses the kernel so much? Take a look at htop > during Z3 and during the search - screenshots attached. > > Are there any suggestions on what the problem might be or how I could > start to understand why the kernel is so active? > > Kind regards, > > -- Paulo Matos -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Places code not using all the CPU
I attach yet another example where this behaviour is much more noticiable. This is on a 64 core dedicated machine in amazon aws. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.