Re: IKC reconnect

Zero Hero Sat, 20 Mar 2010 11:57:09 -0700

Hi Philip,

OK, I see there is a subtle issue here of the interplay between IKC and
POE.  You alluded to this
in your previous email when you said "keeping a request is hard".  I thought
about this a bit and I
think I see your point.

It seems like this is a problem which may be naturally solved by having a
tiny bit of control over a
POE queue.  We could solve the problem like this:

1. On an error, push the event back onto the queue stack, so that it would
get retried (note we
    are putting it back in-order, so we can't just post it again, it has to
go back to the front of the
    line so it will get retried.

2. We'd need an event which halted/unhalted a POE queue's progress.  This
would get set by
    the success of our reconnect.

Really, we just need a way to make a queue pause, until some other thing (in
this case reconnection)
happens.  The event is a "user defined" event, so we'd need a primitive to
let us signal to POE
to start the queue.

Are there things in POE that let me do 1-2 above.  Or perhaps I'm missing a
far simpler solution that
doesn't require these primitives?

I think the race you mention would be solved by simply having two places
where the queue can
be halted.  One would be the existing mechanism in the monitor's detection
routine.  The
other is when we try to send, and get the socket dead error.  Detecting the
dead socket at the point of
sending is unavoidable, since this is how TCP/IP detects dead sockets, and
is also the lowest latency
method.  Using the monitor to detect (via ping polling) is useful for
detecting problems, but seems more
useful when there are very few ikc requests flowing through the system (so
the two methods, polling
and detect on send are complementary).

As you mention, testing is hard, but this is always the case with highly
concurrent things.
In this case, I think that testing could be done simply by turning off the
monitor polling, to see that the
"detect on send" works.

Zero

On Sat, Mar 20, 2010 at 8:56 AM, Philip Gwyn <phi...@awale.qc.ca> wrote:

>
> On 19-Mar-2010 roger bush wrote:
> > Is it similarly possible to reconnect on a failed call (at the cost
> > of some latency)?  Or are there other side effects from this that
> > are unwanted?
>
> If by "failed call" you mean "IKC request to a disconnected remote kernel"
> then
> currently, no.  But it would be relatively trivial to add an monitor for
> that.
>
> > This seems to be the standard way to do things over TCP/IP, since we
> > discover sockets are dead via an attempt to read them.
>
> With IKC, you can know before hand when something has disconnected.  But
> there
> is a race condition between posting a request and getting the
> monitored disconnect event.  In fact, the "monitor for failed request"
> would
> also have a race condition unless I can be very clever about detecting
> failure
> were the disconnect happens after the Responder has done it's work.
>
> This might not be clear.  Here's a list of the steps a request goes
> through before "hitting the wire" :
>
> 1- Your session
> 2- IKC::Responder
> 3- IKC::Channel
> 4- Wheel::ReadWrite
> 5- Driver::SysRW
>
> If the disconnect happens before step 2, then adding a monitor for it is
> trivial.  If the disconnect happens during 3, 4 or 5, then it will be
> trickier
> to detect. Especially writing test cases for it.
>
> Thinking furthur, if you don't trust a IKC connection to stay open, you
> need
> some sort of 'acknowledge' from the other side.  And that is currently
> beyond
> the scope of IKC.
>
> -Philip
>
>

Re: IKC reconnect

Reply via email to