On Sat, Dec 05, 2009 at 07:24:00PM +0100, Henrik Sarvell wrote:
> I wasn't doing that and my site was subsequently visited by bots
> (hence the random feeling time factor that was involved) which
> triggered output on stderr, which I hadn't directed to a log file,
> which in turn somehow tripped up the redirection in picolisp, hence
> the strange problems I experienced.
This was one of the problems. We observed it mainly while writing
debug logs. Here I had mistakenly written to Henrik to redirect
stderr, but forgot about stdout.
> After starting with the above line I haven't been able to reproduce the pro=
> Maybe Alex can elaborate?
However, there seems to have been another reason, which I believe to see
also in the logs. I had seen sporadic errors in production applications
(quite seldom, only every few months), which I finally believe to have
traced down to two problems:
- Functions which handle I/O events (wait, key, sync and listen) were
not completely reentry-safe. This could give problems in the GUI
if, for example, 'sync' was called internally by DB functions.
- The Post/Redirect/Get mechanism in the GUI could fail, if that
sequence was disturbed by another out-of-band POST caused by a
browser with multiple parallel HTTP connections to the client. This
happened typically if the server was too sluggish to complete the
Post/Redirect/Get sequence before the next event occurred.
I fixed both issues about two weeks ago in the train over the Alps from
Slovenia to Bavaria. That's why I called these the "alpine bugs" :-)
The first one was fixed by an improved internal event handling (waitFd()
in "src/io.c"), so that nested calls to waitFd() will not wait for
file/socket descriptors already handled at a higher level.
The second one was repaired by introducing a sequenctial event number
into the GUI protocol (passed as "&*Evt=" argument in the URL).
After that, I could not reproduce these problems any more, and also
Henrik's server runs without problems since he upgraded to the new