I was complaining about the performance of the handin server under load last weekend, and I figured I should try to actually measure the behavior of the server under load. I’ve observed something somewhat surprising, and I have some ideas, though I thought I’d ask here before doing something silly.
Context: About a week ago, my students had to write a store-passing interpreter for a small language. My solution is about 900 lines. I’m running the handin server on a VPS from linode. It has 8 cores, but that’s pretty much irrelevant because I’m currently only running a single-threaded handin checker. It also has 8 Gigabytes of RAM. I wrote a small program to test the server’s load (attached). It loads a file into a text% and then submits it at intervals of 1 second to the handin server. Checking one submission takes about 30 seconds. Here’s a representative result: '(success 31387.169189453125 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) That is, it took 31.387 seconds from submission to finish, and produced the given sequence of messages and successes. Watching `top`, it looks like memory usage of the racket process rises to about 450M and stays there for the duration of the checking. With two submissions, things scale pretty nicely: '(success 40799.35205078125 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) '(success 70203.70190429688 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) Memory usage tops out as before at 450-500Meg. However, things get bad with 4 simultaneous submissions: '(success 60706.62109375 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) '(success 112289.29296875 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) '(exn 124927.43603515625 "upload error: handin terminated due to time limit (program doesn't terminate?)") '(success 147680.46704101562 ((message "checking submission username(s)") (message "reading submission") (message "creating text file") (message "checking submission") (message "executing your code") (message "running tests") (message-final "done testing. 0 tests failed.") (success))) The first one takes 60 seconds, the second one takes 112 seconds, the third one times out (timeout is set to 120 seconds), and the fourth one finishes at 150 seconds (the timeout counter gets reset when tests start running). This result appears to be characteristic. The interesting thing to note here is that in the time taken to unsuccessfully evaluate four submissions, the handin engine could have successfully finished five submissions if they’d been evaluated sequentially. For this load, then, it looks like it would make more sense to simply put submissions in a queue, and handle them one at a time. This has the obvious drawback that students with submissions that fail fast (no test coverage, lines longer that 120 chars, written in the wrong language) will probably take much (much) longer to discover it, but honestly, I don’t mind penalizing those who failed to check things on their own machines before submitting. So, before I plunge into implementation-land: any thoughts about this? Anything obviously wrong with this idea? *** As an aside, another easy add would be to have a list of host/port combinations, and have the handin client choose one at random to submit to. The obvious problem with this would be the post-facto synchronization of the handin results. *** Oh! One other thing; it kills my students when their handins run out of memory or time and they get no info on test case passes, and neither do it. It seems to me like it would be fairly straightforward to use a logger to split the test case output into a different stream, and write them separately. Any thoughts about this? John -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
server-stress-test.rkt
Description: Binary data