This just in.
It's a linking issue.
When I changed my two fn names from send() to my_send()
and from recv() to my_recv() ... no more problem.
Different behavior on Fedora 17 and Fedora 18.
I will post more if I learn something useful.
----- Original Message -----
From: "Michael Goulish" <mgoul...@redhat.com>
Sent: Tuesday, February 19, 2013 7:40:16 AM
Subject: the killer node
Well, it looks like one of my nodes can kill the other one by doing a put.
No errors reported by either messenger before the fatality.
I'd like to see if someone else can confirm this result,
and maybe see something that I am not seeing.
compile and run scripts are provided in the directory, called "node".
I am testing this against unpatched 0.4 RC1 code. ( But result was same with
Ken's recent patch for infinite credit. )
1. Two instances of one program are used. Node A only receives,
Node B only sends to it.
2. Start node A first, with the script "r1".
It will go through its main loop, trying to receive
and timing out, for as long as you like.
3. Start node B, with script r2.
It will pause after formatting it first message, and will
then do a dramatic 5-second countdown. Then it calls
put ( not send! ) and node *A* dies horribly, its core
file spattering the hard disk.
Node B is unaware of the carnage it has caused, sedated
by a sleep loop, tragically still expecting to call send
and start talking to its partner, node A.
( see attached -- if you dare. )