Hi!
After a new deadline, I got good news and bad news.

# Good news

I think *this* bug is fixed. Evidence: instead of crashing at the
first reboot under load, the server survived to 10-20 automated
reboots with the students submitting en masse without never showing
the bug. So not only the patch makes sense, but it seems to be for the
same bug.

# Bad news

The setup still didn't scale, though this wasn't as bad, and part of
it was due to my setup. One student compared it to new releases from
Blizzard. While our beefy server isn't even remotely sweating O_O.

So I'd like to understand Racket threads and the handin server, to
plan accordingly:

- Does the whole handin server actually run on *one* processor,
because of Racket multithreading?
- What's your largest deployment with active checkers?
- Do you actually use this for HtDP courses, or only for advanced
classes (as a number of signs suggest)?

1. Under load, requesting the home page takes more than 20 seconds, so
my watchdog scripts restarts the server. We have a watchdog script
because when we didn't, the server just hung sometimes, so for our
previous lecture (smaller, only ~100 students instead of 500, and no
checkers) this watchdog script did wonders.
2. Here's the watchdog:
curl --max-time 20 -s
https://handin-ps.informatik.uni-tuebingen.de:7979/ > /dev/null || {
docker restart handin-server-production; }
That's even running every minute :-(

Usually that request takes 20 ms, so (naive me thought) how on Earth
could this balloon to 20 seconds?
Now that I know of Racket threads, I understand: that includes both
the web server and the checkers, together with an unspecified number
of big-bang instances from students. For extra fun, one students
called animate with big-bang as step function — essentially, a sweet
HtDP fork bomb.

I don't expect a patch for this, I'm just trying to understand things
and contemplating workarounds, beyond a more lenient watchdog (or
disabling it altogether and acting by hand), which I guess won't be
enough.

Cheers,
Paolo

On 29 November 2015 at 16:12, Robby Findler <ro...@eecs.northwestern.edu> wrote:
>
>
> On Sunday, November 29, 2015, Paolo Giarrusso <p.giarru...@gmail.com> wrote:
>>
>> On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote:
>> > Yes, I think you're right. I originally wrote that because I was
>> > thinking that this code might be involved in evaluating the user's
>> > submission, but I am not pretty sure I was wrong about that.
>>
>> "not pretty sure"?
>
>
> Sorry. No "not".
>
>
>>
>>
>> AFAICS, `auto-reload-value` is used to extract the `checker` binding from
>> the various checker.rkt. but the lock will not be held while running
>> `checker`. (Luckily we're not using hooks, I haven't studied that code).
>
>
> Yes that's also what I noticed and why I sent a second diff. Or did I miss
> another place?

Was just rechecking because of the above confusion. We agree.

-- 
Paolo G. Giarrusso - Ph.D. Student, Tübingen University
http://ps.informatik.uni-tuebingen.de/team/giarrusso/

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to