I was complaining about the performance of the handin server under load last 
weekend, and I figured I should try to actually measure the behavior of the 
server under load. I’ve observed something somewhat surprising, and I have some 
ideas, though I thought I’d ask here before doing something silly.

Context: About a week ago, my students had to write a store-passing interpreter 
for a small language. My solution is about 900 lines. I’m running the handin 
server on a VPS from linode. It has 8 cores, but that’s pretty much irrelevant 
because I’m currently only running a single-threaded handin checker. It also 
has 8 Gigabytes of RAM.

I wrote a small program to test the server’s load (attached). It loads a file 
into a text% and then submits it at intervals of 1 second to the handin server.

Checking one submission takes about 30 seconds. Here’s a representative result:

'(success
  31387.169189453125
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))

That is, it took 31.387 seconds from submission to finish, and produced the 
given sequence of messages and successes.

Watching `top`, it looks like memory usage of the racket process rises to about 
450M and stays there for the duration of the checking.

With two submissions, things scale pretty nicely:

'(success
  40799.35205078125
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))
'(success
  70203.70190429688
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))

Memory usage tops out as before at 450-500Meg.

However, things get bad with 4 simultaneous submissions:

'(success
  60706.62109375
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))
'(success
  112289.29296875
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))
'(exn
  124927.43603515625
  "upload error: handin terminated due to time limit (program doesn't 
terminate?)")
'(success
  147680.46704101562
  ((message "checking submission username(s)")
   (message "reading submission")
   (message "creating text file")
   (message "checking submission")
   (message "executing your code")
   (message "running tests")
   (message-final "done testing. 0 tests failed.")
   (success)))

The first one takes 60 seconds, the second one takes 112 seconds, the third one 
times out (timeout is set to 120 seconds), and the fourth one finishes at 150 
seconds (the timeout counter gets reset when tests start running). This result 
appears to be characteristic.

The interesting thing to note here is that in the time taken to unsuccessfully 
evaluate four submissions, the handin engine could have successfully finished 
five submissions if they’d been evaluated sequentially.

For this load, then, it looks like it would make more sense to simply put 
submissions in a queue, and handle them one at a time.

This has the obvious drawback that students with submissions that fail fast (no 
test coverage, lines longer that 120 chars, written in the wrong language) will 
probably take much (much) longer to discover it, but honestly, I don’t mind 
penalizing those who failed to check things on their own machines before 
submitting.

So, before I plunge into implementation-land: any thoughts about this? Anything 
obviously wrong with this idea?

***

As an aside, another easy add would be to have a list of host/port 
combinations, and have the handin client choose one at random to submit to. The 
obvious problem with this would be the post-facto synchronization of the handin 
results.

***

Oh! One other thing; it kills my students when their handins run out of memory 
or time and they get no info on test case passes, and neither do it. It seems 
to me like it would be fairly straightforward to use a logger to split the test 
case output into a different stream, and write them separately. Any thoughts 
about this?



John


-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: server-stress-test.rkt
Description: Binary data

Reply via email to