I don't know what's going on here, but could it be that two threads
are, in parallel, trying to load the same implementation of an
unloaded checker and then stomping on each other?

The file handin-server/private/reloadable has some dynamic-requires
without appropriate syncronization around them, at least that I see,
which seems suspicious.

Robby


On Wed, Nov 25, 2015 at 6:35 AM, Paolo Giarrusso <p.giarru...@gmail.com> wrote:
> Hi all,
> it's me, handin server guy again. Sorry to bother.
>
> Our handin server started "crashing" with "bad variable linkage" errors at 
> deadline time (presumably under somewhat high load), and since it happened 
> twice, I thought I'd report it. Any ideas on what's causing this?
>
> After this "crash", the server keeps running, but rejects all submissions 
> because the same checker keeps not loading.
>
> ==
>
> [1|2015-11-23T14:51:31] (re)loading module from (file 
> /var/handin_config/info1-teaching-material/checkers/06-Datentypen/REDACTED-USER-NAME/../checker.rkt)
> [1|2015-11-23T14:51:33] ERROR: link: bad variable linkage;
> [1|2015-11-23T14:51:33]  reference to a variable that is uninitialized
> [1|2015-11-23T14:51:33]   reference phase level: 0
> [1|2015-11-23T14:51:33]   variable module: 
> "/var/handin_home/handin/handin-server/checker.rkt"
> [1|2015-11-23T14:51:33]   variable phase: 0
> [1|2015-11-23T14:51:33]   reference in module: 
> "/var/handin_config/info1-teaching-material/checkers/checker-extras.rkt"
> [1|2015-11-23T14:51:33]   in: submission-eval
>
> Bigger log fragment available at 
> https://gist.github.com/Blaisorblade/7f9c6e7f4f456b588a8a
>
> Other info:
> - Restarting the server does fix the error. Somehow.
> - For those unfamiliar with the handin server: it has code which 
> automatically reloads checkers, as witnessed by the log above 
> (https://github.com/ps-tuebingen/handin/blob/master/handin-server/private/reloadable.rkt).
>  But that code doesn't fix the problem.
> - Googling suggests that stale compiled code might be there. But the source 
> code hadn't changed. (Also, I found no description of how this arises).
> - Since the server gets sometimes "stuck", I built a trivial watchdog (a 
> cronjob) that restarts the server if the status server becomes too slow. The 
> above happened after the server was restarted by the watchdog.
>
> One set of hypothesis:
> is it possible that stopping the server at the wrong moment corrupts compiled 
> files? (But then, why does the first restart not fix the problem?)
> Do you take care to make compilation atomic with `rename`?
>
> However, according to docs, the server is designed to survive brutal restarts.
>
> One non-standard thing I do is that I have a `checker-extras.rkt` module with 
> some utilities shared across checkers*, and that's not deployed as part of 
> the server (for various reasons), but together with the checkers, so it's 
> loaded with (require "../checker-extras.rkt"), and seems to be compiled, 
> probably when starting the server. Could this interfere badly with the 
> reloading code or with restarting?
>
> *I'm aware of your checker utilities, but here we have slightly different 
> requirements.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to