Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-12-02 Thread Robby Findler
Thanks: I've pushed the syncronization code to the handin repo.

And yes, you're right that `thread` (which is what I think the handin
server uses) doesn't make use of multiple cores. Places sound like the
right construct to use here, but I can already predict one problem:
2htdp/universe depends on racket/gui so it can't be run except in the
main place.

My best guess at a solution would be to provide an alternative version
of 2htdp/universe that doesn't depend on racket/gui (and so doesn't
actually open any windows or do any GUI stuff but instead either
raises errors or simply simulates the tick handler only) and then set
up a namespace for evaluating student programs in another place that
has that dummy version of 2htdp/universe. (I think that the dummy
version of 2htdp/universe would be useful even without different
places for running tests on student code so the server doesn't have
windows popping up anyway!).

That change is a more significant change, however, and would probably
take the form of a change to the handin server that set up the
separate places. I'd be happy to provide advice if you want to look
into this change. I don't think it will be too hard, but it will
require actual work. Or I could put it on a list of things I want to
work on (that sadly doesn't seem to ever shrink...)

I imagine it would be possible to do this in the checker itself if you
wanted to experiment there. (That wouldn't help with the
2htdp/universe dependency problem, however.)

Providing an alternate version of 2htdp/universe is also in the
category of not-too-hard, but requiring-work. Mostly, I guess,
refactoring the library to move dependencies around and then building
a simple layer on top of the refactored code that avoids the
racket/gui dependency. (Or maybe it's already factored well!)

Robby



On Tue, Dec 1, 2015 at 6:55 AM, Paolo Giarrusso  wrote:
> Hi!
> After a new deadline, I got good news and bad news.
>
> # Good news
>
> I think *this* bug is fixed. Evidence: instead of crashing at the
> first reboot under load, the server survived to 10-20 automated
> reboots with the students submitting en masse without never showing
> the bug. So not only the patch makes sense, but it seems to be for the
> same bug.
>
> # Bad news
>
> The setup still didn't scale, though this wasn't as bad, and part of
> it was due to my setup. One student compared it to new releases from
> Blizzard. While our beefy server isn't even remotely sweating O_O.
>
> So I'd like to understand Racket threads and the handin server, to
> plan accordingly:
>
> - Does the whole handin server actually run on *one* processor,
> because of Racket multithreading?
> - What's your largest deployment with active checkers?
> - Do you actually use this for HtDP courses, or only for advanced
> classes (as a number of signs suggest)?
>
> 1. Under load, requesting the home page takes more than 20 seconds, so
> my watchdog scripts restarts the server. We have a watchdog script
> because when we didn't, the server just hung sometimes, so for our
> previous lecture (smaller, only ~100 students instead of 500, and no
> checkers) this watchdog script did wonders.
> 2. Here's the watchdog:
> curl --max-time 20 -s
> https://handin-ps.informatik.uni-tuebingen.de:7979/ > /dev/null || {
> docker restart handin-server-production; }
> That's even running every minute :-(
>
> Usually that request takes 20 ms, so (naive me thought) how on Earth
> could this balloon to 20 seconds?
> Now that I know of Racket threads, I understand: that includes both
> the web server and the checkers, together with an unspecified number
> of big-bang instances from students. For extra fun, one students
> called animate with big-bang as step function — essentially, a sweet
> HtDP fork bomb.
>
> I don't expect a patch for this, I'm just trying to understand things
> and contemplating workarounds, beyond a more lenient watchdog (or
> disabling it altogether and acting by hand), which I guess won't be
> enough.
>
> Cheers,
> Paolo
>
> On 29 November 2015 at 16:12, Robby Findler  
> wrote:
>>
>>
>> On Sunday, November 29, 2015, Paolo Giarrusso  wrote:
>>>
>>> On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote:
>>> > Yes, I think you're right. I originally wrote that because I was
>>> > thinking that this code might be involved in evaluating the user's
>>> > submission, but I am not pretty sure I was wrong about that.
>>>
>>> "not pretty sure"?
>>
>>
>> Sorry. No "not".
>>
>>
>>>
>>>
>>> AFAICS, `auto-reload-value` is used to extract the `checker` binding from
>>> the various checker.rkt. but the lock will not be held while running
>>> `checker`. (Luckily we're not using hooks, I haven't studied that code).
>>
>>
>> Yes that's also what I noticed and why I sent a second diff. Or did I miss
>> another place?
>
> Was just rechecking because of the above confusion. We agree.
>
> --
> Paolo G. Giarrusso - 

Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-12-01 Thread Paolo Giarrusso
Hi!
After a new deadline, I got good news and bad news.

# Good news

I think *this* bug is fixed. Evidence: instead of crashing at the
first reboot under load, the server survived to 10-20 automated
reboots with the students submitting en masse without never showing
the bug. So not only the patch makes sense, but it seems to be for the
same bug.

# Bad news

The setup still didn't scale, though this wasn't as bad, and part of
it was due to my setup. One student compared it to new releases from
Blizzard. While our beefy server isn't even remotely sweating O_O.

So I'd like to understand Racket threads and the handin server, to
plan accordingly:

- Does the whole handin server actually run on *one* processor,
because of Racket multithreading?
- What's your largest deployment with active checkers?
- Do you actually use this for HtDP courses, or only for advanced
classes (as a number of signs suggest)?

1. Under load, requesting the home page takes more than 20 seconds, so
my watchdog scripts restarts the server. We have a watchdog script
because when we didn't, the server just hung sometimes, so for our
previous lecture (smaller, only ~100 students instead of 500, and no
checkers) this watchdog script did wonders.
2. Here's the watchdog:
curl --max-time 20 -s
https://handin-ps.informatik.uni-tuebingen.de:7979/ > /dev/null || {
docker restart handin-server-production; }
That's even running every minute :-(

Usually that request takes 20 ms, so (naive me thought) how on Earth
could this balloon to 20 seconds?
Now that I know of Racket threads, I understand: that includes both
the web server and the checkers, together with an unspecified number
of big-bang instances from students. For extra fun, one students
called animate with big-bang as step function — essentially, a sweet
HtDP fork bomb.

I don't expect a patch for this, I'm just trying to understand things
and contemplating workarounds, beyond a more lenient watchdog (or
disabling it altogether and acting by hand), which I guess won't be
enough.

Cheers,
Paolo

On 29 November 2015 at 16:12, Robby Findler  wrote:
>
>
> On Sunday, November 29, 2015, Paolo Giarrusso  wrote:
>>
>> On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote:
>> > Yes, I think you're right. I originally wrote that because I was
>> > thinking that this code might be involved in evaluating the user's
>> > submission, but I am not pretty sure I was wrong about that.
>>
>> "not pretty sure"?
>
>
> Sorry. No "not".
>
>
>>
>>
>> AFAICS, `auto-reload-value` is used to extract the `checker` binding from
>> the various checker.rkt. but the lock will not be held while running
>> `checker`. (Luckily we're not using hooks, I haven't studied that code).
>
>
> Yes that's also what I noticed and why I sent a second diff. Or did I miss
> another place?

Was just rechecking because of the above confusion. We agree.

-- 
Paolo G. Giarrusso - Ph.D. Student, Tübingen University
http://ps.informatik.uni-tuebingen.de/team/giarrusso/

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-29 Thread Paolo Giarrusso
On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote:
> Yes, I think you're right. I originally wrote that because I was
> thinking that this code might be involved in evaluating the user's
> submission, but I am not pretty sure I was wrong about that.

"not pretty sure"?

AFAICS, `auto-reload-value` is used to extract the `checker` binding from the 
various checker.rkt. but the lock will not be held while running `checker`. 
(Luckily we're not using hooks, I haven't studied that code).

> So, yes I
> agree that putting this into the production server may be worth a try
> (depending on how severe the problem you're having is I suppose).
> 
> It may make sense to try it out yourself a little bit between due
> dates or something. I did some testing, but more is better.

Yes, we have a staging server for that, I think these courses were my first 
time :-).

As a side note, having proper unit tests is somewhat hard though, many 
interfaces seem not designed for testability. (Many side effects get in the 
way; fixing this is easyish but tedious).
I'm almost ready to "unit test" checkers, but haven't looked into automatic 
submissions.

> Also, while I was reading the code again, however, I noticed that my
> diff didn't have enough syncronization. Here's an improved one.

>  (reload-module modspec path)
>  (set! proc (dynamic-require modspec procname))
>  (reload)
> -(lambda xs (reload) (apply proc xs
> +(lambda xs (protect (reload)) (apply proc xs



> diff --git a/info.rkt b/info.rkt
> index 00a42f9..a5686d5 100644
> --- a/info.rkt
> +++ b/info.rkt
> @@ -11,6 +11,7 @@
> "net-lib"
> "pconvert-lib"
> "sandbox-lib"
> +   "rackunit-lib"
> "web-server-lib"))
>  (define build-deps '("gui-doc"
>   "racket-doc"
> 
> 
> On Thu, Nov 26, 2015 at 6:23 PM, Paolo Giarrusso wrote:
> > On 25 November 2015 at 14:54, Robby Findler wrote:
> >> I'm still not completely
> >> sure, but since you seem to be able to provoke the error, that
> >> emboldens me to suggest you apply the diff below and see if it goes
> >> away.
> >
> > I'm doing this. Annoyingly, I can't force the crash at will yet, and
> > an automated handin-server stress-tester is not in our plans yet :-|.
> >
> >> That diff is probably not what we'd want in the end, since it is too
> >> much locking (we would want a namespace-specific lock not a global
> >> one) but if the error does go away, that means that this is probably
> >> the right place to put the lock in.
> >
> > Makes sense.
> >
> > But for my production environment, I guess that this patch won't
> > overly restrict concurrency: I guess namespaces correspond to checkers
> > in the handin-server, right? So, since I only ever open one
> > assignment, I should only have one namespace anyway?
> >
> > Cheers,
> > Paolo

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-29 Thread Robby Findler
On Sunday, November 29, 2015, Paolo Giarrusso  wrote:

> On Friday, November 27, 2015 at 3:44:20 AM UTC+1, Robby Findler wrote:
> > Yes, I think you're right. I originally wrote that because I was
> > thinking that this code might be involved in evaluating the user's
> > submission, but I am not pretty sure I was wrong about that.
>
> "not pretty sure"?


Sorry. No "not".



>
> AFAICS, `auto-reload-value` is used to extract the `checker` binding from
> the various checker.rkt. but the lock will not be held while running
> `checker`. (Luckily we're not using hooks, I haven't studied that code).


Yes that's also what I noticed and why I sent a second diff. Or did I miss
another place?

Robby



>
> > So, yes I
> > agree that putting this into the production server may be worth a try
> > (depending on how severe the problem you're having is I suppose).
> >
> > It may make sense to try it out yourself a little bit between due
> > dates or something. I did some testing, but more is better.
>
> Yes, we have a staging server for that, I think these courses were my
> first time :-).
>
> As a side note, having proper unit tests is somewhat hard though, many
> interfaces seem not designed for testability. (Many side effects get in the
> way; fixing this is easyish but tedious).
> I'm almost ready to "unit test" checkers, but haven't looked into
> automatic submissions.
>
> > Also, while I was reading the code again, however, I noticed that my
> > diff didn't have enough syncronization. Here's an improved one.
>
> >  (reload-module modspec path)
> >  (set! proc (dynamic-require modspec procname))
> >  (reload)
> > -(lambda xs (reload) (apply proc xs
> > +(lambda xs (protect (reload)) (apply proc xs
>
>
>
> > diff --git a/info.rkt b/info.rkt
> > index 00a42f9..a5686d5 100644
> > --- a/info.rkt
> > +++ b/info.rkt
> > @@ -11,6 +11,7 @@
> > "net-lib"
> > "pconvert-lib"
> > "sandbox-lib"
> > +   "rackunit-lib"
> > "web-server-lib"))
> >  (define build-deps '("gui-doc"
> >   "racket-doc"
> >
> >
> > On Thu, Nov 26, 2015 at 6:23 PM, Paolo Giarrusso wrote:
> > > On 25 November 2015 at 14:54, Robby Findler wrote:
> > >> I'm still not completely
> > >> sure, but since you seem to be able to provoke the error, that
> > >> emboldens me to suggest you apply the diff below and see if it goes
> > >> away.
> > >
> > > I'm doing this. Annoyingly, I can't force the crash at will yet, and
> > > an automated handin-server stress-tester is not in our plans yet :-|.
> > >
> > >> That diff is probably not what we'd want in the end, since it is too
> > >> much locking (we would want a namespace-specific lock not a global
> > >> one) but if the error does go away, that means that this is probably
> > >> the right place to put the lock in.
> > >
> > > Makes sense.
> > >
> > > But for my production environment, I guess that this patch won't
> > > overly restrict concurrency: I guess namespaces correspond to checkers
> > > in the handin-server, right? So, since I only ever open one
> > > assignment, I should only have one namespace anyway?
> > >
> > > Cheers,
> > > Paolo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-26 Thread Paolo Giarrusso
On 25 November 2015 at 14:54, Robby Findler  wrote:
> I'm still not completely
> sure, but since you seem to be able to provoke the error, that
> emboldens me to suggest you apply the diff below and see if it goes
> away.

I'm doing this. Annoyingly, I can't force the crash at will yet, and
an automated handin-server stress-tester is not in our plans yet :-|.

> That diff is probably not what we'd want in the end, since it is too
> much locking (we would want a namespace-specific lock not a global
> one) but if the error does go away, that means that this is probably
> the right place to put the lock in.

Makes sense.

But for my production environment, I guess that this patch won't
overly restrict concurrency: I guess namespaces correspond to checkers
in the handin-server, right? So, since I only ever open one
assignment, I should only have one namespace anyway?

Cheers,
Paolo

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-26 Thread Robby Findler
Yes, I think you're right. I originally wrote that because I was
thinking that this code might be involved in evaluating the user's
submission, but I am not pretty sure I was wrong about that. So, yes I
agree that putting this into the production server may be worth a try
(depending on how severe the problem you're having is I suppose).

It may make sense to try it out yourself a little bit between due
dates or something. I did some testing, but more is better.

Also, while I was reading the code again, however, I noticed that my
diff didn't have enough syncronization. Here's an improved one.

Robby

diff --git a/handin-server/private/reloadable.rkt
b/handin-server/private/reloadable.rkt
index 1055822..32de38f 100644
--- a/handin-server/private/reloadable.rkt
+++ b/handin-server/private/reloadable.rkt
@@ -1,8 +1,36 @@
 #lang racket/base

 (require syntax/moddep "logger.rkt")
+(module mon racket/base
+  (define sema (make-semaphore 1))
+  (define-syntax-rule
+(provide/monitor (id x ...))
+(begin
+  (define -id
+(let ([id (λ (x ...)
+(call-with-semaphore
+ sema
+ (λ () (id x ...])
+  id))
+  (provide (rename-out [-id id]
+  (define-syntax-rule
+(protect e)
+(call-with-semaphore sema (λ () e)))
+  (provide provide/monitor protect))
+(require (submod "." mon))

-(provide reload-module)
+(module+ test
+  (module m racket/base
+(require (submod ".." ".." mon))
+(define (f x) (* 2 (g x)))
+(define (g x) (+ x 1))
+(provide/monitor (f x))
+(provide/monitor (g x)))
+  (require (submod "." m) rackunit)
+  (check-equal? (g 2) 3)
+  (check-equal? (f 11) 24))
+
+(provide/monitor (reload-module modspec path))
 (define (reload-module modspec path)
   ;; the path argument is not needed (could use resolve-module-path here), but
   ;; its always known when this function is called
@@ -20,7 +48,7 @@

 ;; pulls out a value from a module, reloading the module if its source file was
 ;; modified
-(provide auto-reload-value)
+(provide/monitor (auto-reload-value modspec valname))
 (define module-times (make-hash))
 (define (auto-reload-value modspec valname)
   (define path0 (resolve-module-path modspec #f))
@@ -43,7 +71,7 @@
 ;; pulls out a procedure from a module, and returns a wrapped procedure that
 ;; automatically reloads the module if the file was changed whenever the
 ;; procedure is used
-(provide auto-reload-procedure)
+(provide/monitor (auto-reload-procedure x y))
 (define (auto-reload-procedure modspec procname)
   (let ([path (resolve-module-path modspec #f)] [date #f] [proc #f] [poll #f])
 (define (reload)
@@ -55,4 +83,4 @@
 (reload-module modspec path)
 (set! proc (dynamic-require modspec procname))
 (reload)
-(lambda xs (reload) (apply proc xs
+(lambda xs (protect (reload)) (apply proc xs
diff --git a/info.rkt b/info.rkt
index 00a42f9..a5686d5 100644
--- a/info.rkt
+++ b/info.rkt
@@ -11,6 +11,7 @@
"net-lib"
"pconvert-lib"
"sandbox-lib"
+   "rackunit-lib"
"web-server-lib"))
 (define build-deps '("gui-doc"
  "racket-doc"


On Thu, Nov 26, 2015 at 6:23 PM, Paolo Giarrusso  wrote:
> On 25 November 2015 at 14:54, Robby Findler  
> wrote:
>> I'm still not completely
>> sure, but since you seem to be able to provoke the error, that
>> emboldens me to suggest you apply the diff below and see if it goes
>> away.
>
> I'm doing this. Annoyingly, I can't force the crash at will yet, and
> an automated handin-server stress-tester is not in our plans yet :-|.
>
>> That diff is probably not what we'd want in the end, since it is too
>> much locking (we would want a namespace-specific lock not a global
>> one) but if the error does go away, that means that this is probably
>> the right place to put the lock in.
>
> Makes sense.
>
> But for my production environment, I guess that this patch won't
> overly restrict concurrency: I guess namespaces correspond to checkers
> in the handin-server, right? So, since I only ever open one
> assignment, I should only have one namespace anyway?
>
> Cheers,
> Paolo

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-25 Thread Robby Findler
I don't know what's going on here, but could it be that two threads
are, in parallel, trying to load the same implementation of an
unloaded checker and then stomping on each other?

The file handin-server/private/reloadable has some dynamic-requires
without appropriate syncronization around them, at least that I see,
which seems suspicious.

Robby


On Wed, Nov 25, 2015 at 6:35 AM, Paolo Giarrusso  wrote:
> Hi all,
> it's me, handin server guy again. Sorry to bother.
>
> Our handin server started "crashing" with "bad variable linkage" errors at 
> deadline time (presumably under somewhat high load), and since it happened 
> twice, I thought I'd report it. Any ideas on what's causing this?
>
> After this "crash", the server keeps running, but rejects all submissions 
> because the same checker keeps not loading.
>
> ==
>
> [1|2015-11-23T14:51:31] (re)loading module from (file 
> /var/handin_config/info1-teaching-material/checkers/06-Datentypen/REDACTED-USER-NAME/../checker.rkt)
> [1|2015-11-23T14:51:33] ERROR: link: bad variable linkage;
> [1|2015-11-23T14:51:33]  reference to a variable that is uninitialized
> [1|2015-11-23T14:51:33]   reference phase level: 0
> [1|2015-11-23T14:51:33]   variable module: 
> "/var/handin_home/handin/handin-server/checker.rkt"
> [1|2015-11-23T14:51:33]   variable phase: 0
> [1|2015-11-23T14:51:33]   reference in module: 
> "/var/handin_config/info1-teaching-material/checkers/checker-extras.rkt"
> [1|2015-11-23T14:51:33]   in: submission-eval
>
> Bigger log fragment available at 
> https://gist.github.com/Blaisorblade/7f9c6e7f4f456b588a8a
>
> Other info:
> - Restarting the server does fix the error. Somehow.
> - For those unfamiliar with the handin server: it has code which 
> automatically reloads checkers, as witnessed by the log above 
> (https://github.com/ps-tuebingen/handin/blob/master/handin-server/private/reloadable.rkt).
>  But that code doesn't fix the problem.
> - Googling suggests that stale compiled code might be there. But the source 
> code hadn't changed. (Also, I found no description of how this arises).
> - Since the server gets sometimes "stuck", I built a trivial watchdog (a 
> cronjob) that restarts the server if the status server becomes too slow. The 
> above happened after the server was restarted by the watchdog.
>
> One set of hypothesis:
> is it possible that stopping the server at the wrong moment corrupts compiled 
> files? (But then, why does the first restart not fix the problem?)
> Do you take care to make compilation atomic with `rename`?
>
> However, according to docs, the server is designed to survive brutal restarts.
>
> One non-standard thing I do is that I have a `checker-extras.rkt` module with 
> some utilities shared across checkers*, and that's not deployed as part of 
> the server (for various reasons), but together with the checkers, so it's 
> loaded with (require "../checker-extras.rkt"), and seems to be compiled, 
> probably when starting the server. Could this interfere badly with the 
> reloading code or with restarting?
>
> *I'm aware of your checker utilities, but here we have slightly different 
> requirements.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-25 Thread Paolo Giarrusso
Hi and thanks for reacting promptly!

On 25 November 2015 at 13:52, Robby Findler  wrote:
> I don't know what's going on here, but could it be that two threads
> are, in parallel, trying to load the same implementation of an
> unloaded checker and then stomping on each other?

Interesting! Sounds consistent with the logs:

[2|2015-11-23T14:51:31] (re)loading module from (file
/var/handin_config/info1-teaching-material/checkers/06-Datentypen/REDACTED-USER-NAME/checker.rkt)
[1|2015-11-23T14:51:31] (re)loading module from (file
/var/handin_config/info1-teaching-material/checkers/06-Datentypen/REDACTED-USER-NAME/checker.rkt)

[... error from thread 1, then error from thread 2...]

> The file handin-server/private/reloadable has some dynamic-requires
> without appropriate syncronization around them, at least that I see,
> which seems suspicious.

I don't know if that's *the* problem, but I've probably built a
testcase for it. (I also wonder about the set!, but hopefully they
don't modify global variables).

And unlike I thought, no compiled files get written on the server
(only when testing checkers locally).

Is it still plausible that avoiding checker-extras (or precompiling
it) would help?

BTW, docs don't seem to mention synchronization:
http://docs.racket-lang.org/reference/Module_Names_and_Loading.html?q=dynamic-require#%28def._%28%28quote._~23~25kernel%29._dynamic-require%29%29
Should I file an issue on those docs? (I'm afraid I couldn't say much though).

Cheers,
Paolo

> On Wed, Nov 25, 2015 at 6:35 AM, Paolo Giarrusso  
> wrote:
>> Hi all,
>> it's me, handin server guy again. Sorry to bother.
>>
>> Our handin server started "crashing" with "bad variable linkage" errors at 
>> deadline time (presumably under somewhat high load), and since it happened 
>> twice, I thought I'd report it. Any ideas on what's causing this?
>>
>> After this "crash", the server keeps running, but rejects all submissions 
>> because the same checker keeps not loading.
>>
>> ==
>>
>> [1|2015-11-23T14:51:31] (re)loading module from (file 
>> /var/handin_config/info1-teaching-material/checkers/06-Datentypen/REDACTED-USER-NAME/../checker.rkt)
>> [1|2015-11-23T14:51:33] ERROR: link: bad variable linkage;
>> [1|2015-11-23T14:51:33]  reference to a variable that is uninitialized
>> [1|2015-11-23T14:51:33]   reference phase level: 0
>> [1|2015-11-23T14:51:33]   variable module: 
>> "/var/handin_home/handin/handin-server/checker.rkt"
>> [1|2015-11-23T14:51:33]   variable phase: 0
>> [1|2015-11-23T14:51:33]   reference in module: 
>> "/var/handin_config/info1-teaching-material/checkers/checker-extras.rkt"
>> [1|2015-11-23T14:51:33]   in: submission-eval
>>
>> Bigger log fragment available at 
>> https://gist.github.com/Blaisorblade/7f9c6e7f4f456b588a8a
>>
>> Other info:
>> - Restarting the server does fix the error. Somehow.
>> - For those unfamiliar with the handin server: it has code which 
>> automatically reloads checkers, as witnessed by the log above 
>> (https://github.com/ps-tuebingen/handin/blob/master/handin-server/private/reloadable.rkt).
>>  But that code doesn't fix the problem.
>> - Googling suggests that stale compiled code might be there. But the source 
>> code hadn't changed. (Also, I found no description of how this arises).
>> - Since the server gets sometimes "stuck", I built a trivial watchdog (a 
>> cronjob) that restarts the server if the status server becomes too slow. The 
>> above happened after the server was restarted by the watchdog.
>>
>> One set of hypothesis:
>> is it possible that stopping the server at the wrong moment corrupts 
>> compiled files? (But then, why does the first restart not fix the problem?)
>> Do you take care to make compilation atomic with `rename`?
>>
>> However, according to docs, the server is designed to survive brutal 
>> restarts.
>>
>> One non-standard thing I do is that I have a `checker-extras.rkt` module 
>> with some utilities shared across checkers*, and that's not deployed as part 
>> of the server (for various reasons), but together with the checkers, so it's 
>> loaded with (require "../checker-extras.rkt"), and seems to be compiled, 
>> probably when starting the server. Could this interfere badly with the 
>> reloading code or with restarting?
>>
>> *I'm aware of your checker utilities, but here we have slightly different 
>> requirements.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.



-- 
Paolo G. Giarrusso - Ph.D. Student, Tübingen University
http://ps.informatik.uni-tuebingen.de/team/giarrusso/

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails 

Re: [racket-users] "bad variable linkage" after restarting handin server under load

2015-11-25 Thread Robby Findler
The stomping on I was worried about would  happen at a lower-level as
I don't think that, in general, dynamic-require is thread-safe. After
all, it loads and runs arbitrary code, altho in this case it appears
to be a system level lack of thread safety? I'm still not completely
sure, but since you seem to be able to provoke the error, that
emboldens me to suggest you apply the diff below and see if it goes
away.

That diff is probably not what we'd want in the end, since it is too
much locking (we would want a namespace-specific lock not a global
one) but if the error does go away, that means that this is probably
the right place to put the lock in.

Robby

☕  git diff | cat
diff --git a/handin-server/private/reloadable.rkt
b/handin-server/private/reloadable.rkt
index 1055822..0089fcb 100644
--- a/handin-server/private/reloadable.rkt
+++ b/handin-server/private/reloadable.rkt
@@ -1,8 +1,33 @@
 #lang racket/base

 (require syntax/moddep "logger.rkt")
+(module mon racket/base
+  (define sema (make-semaphore 1))
+  (define-syntax-rule
+(provide/monitor (id x ...))
+(begin
+  (define -id
+(let ([id (λ (x ...)
+(call-with-semaphore
+ sema
+ (λ () (id x ...])
+  id))
+  (provide (rename-out [-id id]
+  (provide provide/monitor))
+(require (submod "." mon))

-(provide reload-module)
+(module+ test
+  (module m racket/base
+(require (submod ".." ".." mon))
+(define (f x) (* 2 (g x)))
+(define (g x) (+ x 1))
+(provide/monitor (f x))
+(provide/monitor (g x)))
+  (require (submod "." m) rackunit)
+  (check-equal? (g 2) 3)
+  (check-equal? (f 11) 24))
+
+(provide/monitor (reload-module modspec path))
 (define (reload-module modspec path)
   ;; the path argument is not needed (could use resolve-module-path here), but
   ;; its always known when this function is called
@@ -20,7 +45,7 @@

 ;; pulls out a value from a module, reloading the module if its source file was
 ;; modified
-(provide auto-reload-value)
+(provide/monitor (auto-reload-value modspec valname))
 (define module-times (make-hash))
 (define (auto-reload-value modspec valname)
   (define path0 (resolve-module-path modspec #f))
@@ -43,7 +68,7 @@
 ;; pulls out a procedure from a module, and returns a wrapped procedure that
 ;; automatically reloads the module if the file was changed whenever the
 ;; procedure is used
-(provide auto-reload-procedure)
+(provide/monitor (auto-reload-procedure x y))
 (define (auto-reload-procedure modspec procname)
   (let ([path (resolve-module-path modspec #f)] [date #f] [proc #f] [poll #f])
 (define (reload)
☕  [robby@gongguan] ~/git/exp/plt/extra-pkgs/handin
☕

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.