Re: Core dump when throwing an exception from a resumed partial continuation

2013-03-21 Thread Andy Wingo
On Fri 15 Mar 2013 22:01, Brent Pinkney b...@4dst.com writes:

 When I resume the continuation in another thread, all works perfectly
 UNLESS the continued execution throws and exception.
 Then guile exits with a core dump.

 By contrast if I resume the continuation in the same thread and then
 throw and exception all works as expected.

I think I know what this is.

So, a delimited continuation should capture that part of the dynamic
environment made in its extent.  (See Oleg Kiselyov and Chung-Chieh
Shan's Delimited Dynamic Binding paper.)  That is what Guile does, for
fluids, prompts, and dynamic-wind blocks.

Our implementation of exception handling uses a fluid,
%exception-handler (boot-9.scm:86).  However that fluid references a
stack of exception handlers on the heap.  There is the problem: an
exception in a reinstated delimited continuation continuation will walk
the captured exception handler stack from the heap, not from its own
dynamic environment.  Therefore it could abort to a continuation that is
not present on the new thread.

The solution is to have the exception handler find the next handler from
the dynamic environment.  This will need a new primitive to walk the
dynamic stack, I think.

I can't look at this atm as I broke my arm (!) and so typing is tough.
For now as a workaround I suggest you put a catch #t in each of your
delimited continuations.  This way all throws will be handled by catches
established by the continuation.

Regards,

Andy
-- 
http://wingolog.org/



Re: Core dump when throwing an exception from a resumed partial continuation

2013-03-21 Thread Andrew Gaylard

On 03/21/13 11:43, Andy Wingo wrote:

On Fri 15 Mar 2013 22:01, Brent Pinkney b...@4dst.com writes:


When I resume the continuation in another thread, all works perfectly
UNLESS the continued execution throws and exception.
Then guile exits with a core dump.

By contrast if I resume the continuation in the same thread and then
throw and exception all works as expected.

I think I know what this is.

So, a delimited continuation should capture that part of the dynamic
environment made in its extent.  (See Oleg Kiselyov and Chung-Chieh
Shan's Delimited Dynamic Binding paper.)  That is what Guile does, for
fluids, prompts, and dynamic-wind blocks.

Our implementation of exception handling uses a fluid,
%exception-handler (boot-9.scm:86).  However that fluid references a
stack of exception handlers on the heap.  There is the problem: an
exception in a reinstated delimited continuation continuation will walk
the captured exception handler stack from the heap, not from its own
dynamic environment.  Therefore it could abort to a continuation that is
not present on the new thread.

The solution is to have the exception handler find the next handler from
the dynamic environment.  This will need a new primitive to walk the
dynamic stack, I think.

I can't look at this atm as I broke my arm (!) and so typing is tough.
For now as a workaround I suggest you put a catch #t in each of your
delimited continuations.  This way all throws will be handled by catches
established by the continuation.

Regards,

Andy

Andy,

Thanks for giving this some thought -- sorry to hear about your arm!

This does shed some light on things. If I change this:

(throw 'oops) ; should not crash the vm

to this:

(catch #t
(λ ()
(throw 'oops)) ; should not crash the vm
(λ ()
(display Success!)(newline))) ; never reached

the VM still cores; Success is never shown. However, you've probably
spotted my mistake: the handler should be (λ (key . args) ... ).

But this core shows up differently in the stack-trace in gdb:

#0 scm_error (key=0x1001854c0, subr=0x0, message=0x7e7ef518 
Wrong number of arguments to ~A, args=0x100db95b0, rest=0x4) at 
error.c:62


... which is exactly the exception one would expect. Fixing the handler 
thus:


(catch #t
(λ ()
(throw 'oops)) ; should not crash the vm
(λ (key . args)
(display Success!)(newline))) ; works!

...solves the problem, and the VM doesn't core any more.

So it seems that although we *did* have a catch around our resumption,
there must have been some (different) error in its handler, which caused a
second exception, which caused the VM to crash.

Unfortunately, the test-case we made handles this second exception fine.
It'd be great to be able to distill this problem down to a pithy test-case.
(Our app is 4500 lines and still growing, so it's not really a candidate to
send to the list.)

The same problem happens (VM cores) if I do this:

(catch 'not-oops
(λ ()
(throw 'oops)) ; should not crash the vm
(λ (key . args)
(display Success!)(newline))); never reached

So your answer to surround the resumption with a (catch #t ...) is a
good workaround. For our code, anyway.

(I'm now off to go read 
http://www.cs.indiana.edu/~sabry/papers/delim-dyn-bind.pdf :)

--
Andrew




Re: Core dump when throwing an exception from a resumed partial continuation

2013-03-21 Thread Andy Wingo
On Thu 21 Mar 2013 14:53, Andrew Gaylard a...@computer.org writes:

 (catch #t
 (λ ()
 (throw 'oops)) ; should not crash the vm
 (λ ()
 (display Success!)(newline))) ; never reached

 the VM still cores; Success is never shown. However, you've probably
 spotted my mistake: the handler should be (λ (key . args) ... ).

The core dump is another bug. but fixing the handler is the key thing:

 (catch #t
 (λ ()
 (throw 'oops)) ; should not crash the vm
 (λ (key . args)
 (display Success!)(newline))) ; works!

 ...solves the problem, and the VM doesn't core any more.

Yep

Happy hacking :)

A
-- 
http://wingolog.org/



Re: Core dump when throwing an exception from a resumed partial continuation

2013-03-19 Thread Andrew Gaylard

On 03/15/13 23:30, Andy Wingo wrote:

On Fri 15 Mar 2013 22:01, Brent Pinkney b...@4dst.com writes:


I am using partial continuations to resume a computation when an
external system returns with an answer.
I am using (call-with-prompt ...) and (abort-to-prompt)

When I resume the continuation in another thread, all works perfectly

Neat :)


UNLESS the continued execution throws and exception.
Then guile exits with a core dump.

That's not good!  Can you work up a short test case?


We've tried to create a short test-case.  Unfortunately, it doesn't seem 
to trigger the core.  However, the app we're creating triggers the 
core-dump every time. So, to dig into this problem, I built a debuggable VM.


What we see in the debuggable cores is the first backtrace.  You'll note 
that aside from the stack overflow at frame #3, the pattern of Abort to 
unknown prompt is repeated /ad infinitum/. Well, certainly to a stack 
depth of 28,000 :).  So the stack overflow is understandable.  I guess 
the question is,  why does guile get stuck in a loop aborting to an 
unknown prompt?.


This is on Linux x86 Ubuntu 12.04, both 32- and 64-bit.  The same code 
crashes the same VM at the same point on Solaris SPARC 64-bit, but that 
core does not appear to show this repetitive pattern.  When I say the 
same VM, I mean it: all dependencies except for the kernel and libc 
are built from identical sources, using as near as possible the same 
configure flags:


   gcc-4.7.2
   bdw-gc-7.2d
   libtool-2.2.10
   gmp-5.0.2
   libiconv-1.14
   libunistring-0.9.3
   libffi-3.0.10
   readline-6.1
   guile-2.0.7

Ubuntu's guile also shows the same problem.

To understand how guile gets into this state, I put a breakpoint in the 
VM at the point where it first calls abort. That reveals the second 
backtrace below.  This shows what happens immediately before the VM goes 
bananas, and fills up the stack.  Which is exactly what happens when gdb 
allows guile to continue beyond the breakpoint.


I then tried stepping through the scm_c_abort code in frame #2, and it 
indeed does not find anything in the wind list.  Certainly, the list 
returned by scm_i_dynwinds has 12 entries in it.  It's just that none of 
them match.


I'd be really grateful for any help on this -- as you can tell, I'm not 
a VM hacker!


--
Andrew

#0  0x0033b416 in __kernel_vsyscall ()
#1  0x004ff1df in __GI_raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64

#2  0x00502825 in __GI_abort () at abort.c:91
#3  0x00a106b7 in vm_error_stack_overflow (vp=0x9c9cfc0) at vm.c:516
#4  0x00a204a4 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac055e70, nargs=4) at vm-engine.c:166
#5  0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac055e70, nargs=4) at vm.c:741
#6  0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#7  0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd4c8, 
args=0x95dd4c8) at eval.c:748
#8  0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x937df10, 
args=0x95dd4d0) at eval.c:588

#9  0x00a0ba1d in scm_throw (key=0x937df10, args=0x95dd4d0) at throw.c:104
#10 0x00a102ff in vm_error (msg=0xa6631d VM: Too many arguments, 
arg=0x16) at vm.c:414

#11 0x00a105e6 in vm_error_too_many_args (nargs=5) at vm.c:490
#12 0x00a11a42 in vm_regular_engine (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac056770, nargs=5) at vm-engine.c:104
#13 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac056770, nargs=5) at vm.c:741
#14 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#15 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd550, 
args=0x95dd550) at eval.c:748
#16 0x00975f7c in scm_apply_1 (proc=0x93b50d0, arg1=0x9362130, 
args=0x95dd558) at eval.c:588

#17 0x00a0ba1d in scm_throw (key=0x9362130, args=0x95dd558) at throw.c:104
#18 0x00a0c097 in scm_ithrow (key=0x9362130, args=0x95dd558, noreturn=1) 
at throw.c:441
#19 0x009735bf in scm_error_scm (key=0x9362130, subr=0x99105b0, 
message=0x99105c0, args=0x95dd5c8, data=0x4) at error.c:95
#20 0x00973576 in scm_error (key=0x9362130, subr=0xa4cd3b abort, 
message=0xa4cd23 Abort to unknown prompt, args=0x95dd5c8, rest=0x4) at 
error.c:62
#21 0x00973b6b in scm_misc_error (subr=0xa4cd3b abort, 
message=0xa4cd23 Abort to unknown prompt, args=0x95dd5c8) at error.c:316
#22 0x0096aef5 in scm_c_abort (vm=0x9cb29e8, tag=0x9c08af0, n=5, 
argv=0xac056960, cookie=6614) at control.c:209

#23 0x00a0fe36 in vm_abort (vm=0x9cb29e8, n=0, vm_cookie=6614) at vm.c:264
#24 0x00a18942 in vm_regular_engine (vm=0x9cb29e8, program=0x93b5260, 
argv=0xac0571f4, nargs=6) at vm-i-system.c:1528
#25 0x00a34d00 in scm_c_vm_run (vm=0x9cb29e8, program=0x93b50d0, 
argv=0xac0571e0, nargs=5) at vm.c:741
#26 0x00a3585e in scm_call_with_vm (vm=0x9cb29e8, proc=0x93b50d0, 
args=0x304) at vm.c:1033
#27 0x009763e1 in scm_apply (proc=0x93b50d0, arg1=0x95dd678, 
args=0x95dd678) at eval.c:748
#28 0x00975f7c in scm_apply_1 (proc=0x93b50d0, 

Core dump when throwing an exception from a resumed partial continuation

2013-03-15 Thread Brent Pinkney

Hi,

I am using partial continuations to resume a computation when an 
external system returns with an answer.

I am using (call-with-prompt ...) and (abort-to-prompt)

When I resume the continuation in another thread, all works perfectly 
UNLESS the continued execution throws and exception.

Then guile exits with a core dump.

By contrast if I resume the continuation in the same thread and then 
throw and exception all works as expected.


Is this a known issue?

All assistance welcomed.


Thanks

Brent



Re: Core dump when throwing an exception from a resumed partial continuation

2013-03-15 Thread Andy Wingo
Hi,

On Fri 15 Mar 2013 22:01, Brent Pinkney b...@4dst.com writes:

 I am using partial continuations to resume a computation when an
 external system returns with an answer.
 I am using (call-with-prompt ...) and (abort-to-prompt)

 When I resume the continuation in another thread, all works perfectly

Neat :)

 UNLESS the continued execution throws and exception.
 Then guile exits with a core dump.

That's not good!  Can you work up a short test case?

Thanks,

Andy
-- 
http://wingolog.org/