Just to update the email list for posterity on the rare process crash that I saw in production of a server app that launched many thousands of short-lived processes...

Matthew Flatt kindly did some debugging, and, if I understand correctly, the cause he found was a combination of the app's Linux servers using address space randomization, and the app's code having thunder thighs. On very rare occasions, the planets would align just wrong, and randomization would mean that the address range of the app's stack would be pushed outside the range that the GC expected. Or something like that.

The app developers are currently stress-testing the code with address space randomization disabled on a test server. So far they haven't been able to elicit another crash.

We also will be making this app more svelte on the stack, now that the Dr. has pointed out the weight problem.

I wanted to mention that no fault of Racket is implicated here, and that Racket has been nicely reliable for this app...

I believe that any huge stack frames of this app are due to historical peculiarities of the apps's code, not a fault of Racket. The code has been in production for years, and (p)reinvented a few wheels that it would not need to with contemporary Racket.

Some Googling suggests that people have encountered a similar problem with address space randomization messing things up for the official Java JVM. You can also find occasional mentions of this if you Google for it with the names of some other language implementations. I think it's not a well-known problem, and an app developer needs to have enough volume to encounter the problem, followed by the will to investigate a crash rather than consider the occasional freak crash to be acceptable.

--
http://www.neilvandyke.org/

_________________________________________________
 For list-related administrative tasks:
 http://lists.racket-lang.org/listinfo/users

Reply via email to