Thanks: I've pushed the syncronization code to the handin repo. And yes, you're right that `thread` (which is what I think the handin server uses) doesn't make use of multiple cores. Places sound like the right construct to use here, but I can already predict one problem: 2htdp/universe depends on racket/gui so it can't be run except in the main place.
My best guess at a solution would be to provide an alternative version of 2htdp/universe that doesn't depend on racket/gui (and so doesn't actually open any windows or do any GUI stuff but instead either raises errors or simply simulates the tick handler only) and then set up a namespace for evaluating student programs in another place that has that dummy version of 2htdp/universe. (I think that the dummy version of 2htdp/universe would be useful even without different places for running tests on student code so the server doesn't have windows popping up anyway!). That change is a more significant change, however, and would probably take the form of a change to the handin server that set up the separate places. I'd be happy to provide advice if you want to look into this change. I don't think it will be too hard, but it will require actual work. Or I could put it on a list of things I want to work on (that sadly doesn't seem to ever shrink...) I imagine it would be possible to do this in the checker itself if you wanted to experiment there. (That wouldn't help with the 2htdp/universe dependency problem, however.) Providing an alternate version of 2htdp/universe is also in the category of not-too-hard, but requiring-work. Mostly, I guess, refactoring the library to move dependencies around and then building a simple layer on top of the refactored code that avoids the racket/gui dependency. (Or maybe it's already factored well!) Robby On Tue, Dec 1, 2015 at 6:55 AM, Paolo Giarrusso <p.giarru...@gmail.com> wrote: > Hi! > After a new deadline, I got good news and bad news. > > # Good news > > I think *this* bug is fixed. Evidence: instead of crashing at the > first reboot under load, the server survived to 10-20 automated > reboots with the students submitting en masse without never showing > the bug. So not only the patch makes sense, but it seems to be for the > same bug. > > # Bad news > > The setup still didn't scale, though this wasn't as bad, and part of > it was due to my setup. One student compared it to new releases from > Blizzard. While our beefy server isn't even remotely sweating O_O. > > So I'd like to understand Racket threads and the handin server, to > plan accordingly: > > - Does the whole handin server actually run on *one* processor, > because of Racket multithreading? > - What's your largest deployment with active checkers? > - Do you actually use this for HtDP courses, or only for advanced > classes (as a number of signs suggest)? > > 1. Under load, requesting the home page takes more than 20 seconds, so > my watchdog scripts restarts the server. We have a watchdog script > because when we didn't, the server just hung sometimes, so for our > previous lecture (smaller, only ~100 students instead of 500, and no > checkers) this watchdog script did wonders. > 2. Here's the watchdog: > curl --max-time 20 -s > https://handin-ps.informatik.uni-tuebingen.de:7979/ > /dev/null || { > docker restart handin-server-production; } > That's even running every minute :-( > > Usually that request takes 20 ms, so (naive me thought) how on Earth > could this balloon to 20 seconds? > Now that I know of Racket threads, I understand: that includes both > the web server and the checkers, together with an unspecified number > of big-bang instances from students. For extra fun, one students > called animate with big-bang as step function — essentially, a sweet > HtDP fork bomb. > > I don't expect a patch for this, I'm just trying to understand things > and contemplating workarounds, beyond a more lenient watchdog (or > disabling it altogether and acting by hand), which I guess won't be > enough. > > Cheers, > Paolo > > On 29 November 2015 at 16:12, Robby Findler <ro...@eecs.northwestern.edu> > wrote: >> >> >> On Sunday, November 29, 2015, Paolo Giarrusso <p.giarru...@gmail.com> wrote: >>> >>> On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote: >>> > Yes, I think you're right. I originally wrote that because I was >>> > thinking that this code might be involved in evaluating the user's >>> > submission, but I am not pretty sure I was wrong about that. >>> >>> "not pretty sure"? >> >> >> Sorry. No "not". >> >> >>> >>> >>> AFAICS, `auto-reload-value` is used to extract the `checker` binding from >>> the various checker.rkt. but the lock will not be held while running >>> `checker`. (Luckily we're not using hooks, I haven't studied that code). >> >> >> Yes that's also what I noticed and why I sent a second diff. Or did I miss >> another place? > > Was just rechecking because of the above confusion. We agree. > > -- > Paolo G. Giarrusso - Ph.D. Student, Tübingen University > http://ps.informatik.uni-tuebingen.de/team/giarrusso/ > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.