On Mon, Feb 16, 2009 at 19:12, Pavlin Radoslavov <[email protected]> wrote: > Victor Faion <[email protected]> wrote: > >> On Fri, Feb 13, 2009 at 19:56, Victor Faion <[email protected]> wrote: >> > On Fri, Feb 13, 2009 at 16:28, Victor Faion <[email protected]> wrote: >> >> On Thu, Feb 12, 2009 at 18:55, Pavlin Radoslavov >> >> <[email protected]> wrote: >> >>> Victor Faion <[email protected]> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> I was trying to setup a process that tries connect to its neighbours >> >>>> over TCP and basically I wanted it to keep trying to connect to its >> >>>> neighbours until it can, but I was having some trouble as the process >> >>>> basically stops trying to connect when it can't connect the first >> >>>> time. >> >>>> >> >>>> I iterate over all the neighbour objects calling their connect >> >>>> function which calls send_tcp_open_bind_connect. The callback given to >> >>>> send_tcp_open_bind_connect just checks if there was an error and if >> >>>> there was it calls connectRetry() which pretty much does the same >> >>>> thing as connect (calls send_tcp_open_bind_connect and passes it the >> >>>> same callback as connect). The problem is the first time when it calls >> >>>> connect and fails, it just calls the socket4_user_0_1_error_event >> >>>> function (saying ``Transport endpoint is not connected'' which is >> >>>> expected) but then it doesn't go back into connectRetry() and no >> >>>> connection is made when its neighbours are actually listening for this >> >>>> connection. Is there a better/easier way of doing this polling or am I >> >>>> just doing the recursing with the callback the wrong way? >> >>> >> >>> Is connectRetry() a method in your protocol? >> >>> >> >> >> >> >> >> Yeah, connect() takes in the parameters needed to call >> >> send_tcp_open_bind_connect() and saves them into the Neighbour object. >> >> Then connectRetry() uses the cached values to call >> >> send_tcp_open_bind_connect() if it fails the first time. >> >> >> >> >> >>> In your event handler for socket4_user_0_1_error_event you need to >> >>> handle the error conditions (e.g., schedule a call to >> >>> connectRetry()). >> >>> >> >> >> >> >> >> I tried to avoid this as this means iterating over all the neighbours >> >> again, checking each sockid and matching against the sockid received >> >> in socket4_user_0_1_error_event to figure out which neighbour's >> >> connect function to call again. Anyway I tried doing it like this but >> >> it still doesn't repeatedly try to connect to a neighbour. It goes in >> >> this order: >> >> >> >> 1. Try to connect normally using the neighbour's conect() (shouldn't be >> >> able to) >> >> >> >> 2. Callback for send_tcp_open_bind_connect gets called (and the >> >> XrlError object received is XrlError::OKAY() for some reason) >> >> >> >> 3. socketx_user_0_1_error_event gets called and says ``Transport >> >> endpoint is not connected fatal'' >> >> >> >> 4. Then socketx_user_0_1_error_event iterates over the neighbours, >> >> when it matches the one which has the sockid that >> >> socketx_user_0_1_error_event received it calls connect() again. >> >> >> >> 5. Then I get a warning that says ``Handling method for >> >> socket4_user/0.1/error_event failed: XrlCmdError 102 Command failed >> >> socket error'' >> >> >> >> 6. Then the same thing as step 2 happens. >> >> >> >> The cycle ends there, connect() only gets called twice because >> >> socketx_user_0_1_error_event only gets called once. Not sure why this >> >> happens, something to do with that warning. Why does that happen >> >> though? >> >> >> >> >> >>> Also, are you saying that the first time you call >> >>> send_tcp_open_bind_connect() and it fails, the callback for that XRL >> >>> is not called at all? I would guess the callback might be called >> >>> after socket4_user_0_1_error_event is received, but I wouldn't bet >> >>> on the ordering. >> >>> >> >>> Pavlin >> >>> >> >> >> >> >> >> Well the callback gets called but the problem is that I'm not sure >> >> which of the callback and the error event handler get called last in >> >> order to reschedule the connecting. >> >> >> >> Victor >> >> >> > >> > >> > Sorry the reason for step 5 above was because my >> > socket4_user_0_1_error_event was returning >> > XrlCmdError::COMMAND_FAILED("socket error"). However when I changed it >> > to return XrlCmdError::OKAY() basically it goes through steps 1-4 from >> > above except sometimes, it doesn't happen in the order above but in >> > the order 1, 3, 4, 2. When this happens it ends in step 2 and a >> > connection is not made. This happens because the callback sets the >> > sockid of the neighbour when a connection attempt is made, and the >> > error handler uses this sockid to know which neighbour to connect to. >> > So when the new sockid doesn't get set, the error handler doesn't find >> > the neighbour. Not sure how to get the new sockid into the event >> > handler when it doesn't get set into the callback. >> > >> >> >> Hello, >> >> Sorry to restart this thread, I'm not sure how to handle the case when >> a router cannot connect to another router. I don't understand why when >> I call send_tcp_open_bind_connect to another router (which isn't even >> online) the callback to send_tcp_open_bind_connect receives >> XrlError::OKAY(). I wanted to handle this error in the callback as I >> don't have enough information to handle it in the >> socket4_user_0_1_error_event function. I couldn't find any code in >> XORP that does this sort of thing. Where do the errors that get passed >> into the callback for send_tcp_open_bind_connect get set? > > For a reason that is unclear to me without further investigation, > the order of the send_tcp_open_bind_connect callback and the > socket4_user_0_1_error_event upcall are reversed > (always/occasionally?). I had a quick look in the FEA, and the > callback should be received first, but obviously from your > description this doesn't seem to be the case. > The correct solution should be to investigate the issue and fix it. > This might require understanding of the FEA I/O internals and some > XRL-related knowledge. Unfortunately, I can't give you an estimate > how soon I/we can allocate the resources to fix that, so your best > bet would be to submit a Bugzilla entry. >
Yeah they are reversed, I think it only happens when I reschedule send_tcp_open_bind_connect from socket4_user_0_1_error_event. In that case, it calls socket4_user_0_1_error_event before the callback. I can submit a bug report, what sort of information should I include? Here's the relevant output from when I start xorp_rtrmgr. At first it attempts to connect normally, then goes to the callback, then socket4_user_0_1_error_event, matches the sockid because it has been set in the callback then attempts to reconnect to that IP. However after this is goes back to socket4_user_0_1_error_event before the callback (and so the new sockid is not set and there is no match and no reconnection attempt). [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] Connecting to 146.169.3.10 [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] CB for 92a9030b-02ba706b-0006c2a9-3c670000 [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] socketx_user_0_1_error_event 92a9030b-02ba706b-0006c2a9-3c670000 Transport endpoint is not connected fatal [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] sockid match: 146.169.3.10 [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] connect retry [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] socketx_user_0_1_error_event 92a9030b-02ba706b-0006d31d-3c670000 Transport endpoint is not connected fatal [ 2009/02/17 12:06:07 INFO xorp_bpsf XrlBpsfTarget ] CB for 92a9030b-02ba706b-0006d31d-3c670000 > For your own purpose you need to move forward by using some > workaround. One possible solution that comes to mind is to have a > map of states per sockid that can be populated/updated regardless of > the order of the callbacks and the upcalls. E.g., if an upcall is > received before the sockid is known, a new entry is created for that > sockid and the state is set according to the upcall error. Then, > after the send_tcp_open_bind_connect callback is invoked, at that > time the sockid entry can be used for its intended purpose (and the > error condition is already filled-in). Sounds good I will try to implement this :-) > On the other hand, if the upcall is inbound_connect_event or > outbound_connect_event (instead of error_event), then only after the > send_tcp_open_bind_connect callback is called, then you take the > appropriate actions. > Doesn't this also assume that inbound_connect_event/outbound_connect_event get called after the callback? (Not sure if this is what actually happens). > Hope that helps, > Pavlin > Thanks for the help, I'll try to use your suggestion and see how that goes. Victor _______________________________________________ Xorp-hackers mailing list [email protected] http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
