[ https://forge.continuent.org/jira/browse/SEQUOIA-506?page=all ]
     
Marc Herbert reopened SEQUOIA-506:
----------------------------------

     Assign To:     (was: Marc Herbert)

Re-opened for a minor issue. There is a short time lapse (less than the ping 
interval) when existing pinger threads are not aware of controllers that just 
came back up. Some network operations may abusively fail (for instance throw 
some NoMoreControllerException) during this.

The appropriate fix may involve something like
   catch(  ) { sleep(N*ping_interval) ; retry(); }



> break dependency on TCP timeouts
> --------------------------------
>
>          Key: SEQUOIA-506
>          URL: https://forge.continuent.org/jira/browse/SEQUOIA-506
>      Project: Sequoia
>         Type: Improvement
>   Components: Core
>     Versions: Sequoia 2.8.2, Sequoia 2.8.1, Sequoia 2.7, Sequoia 2.6.1, 
> Sequoia 2.6, Sequoia 2.5
>     Reporter: Marc Herbert
>      Fix For: Sequoia 2.10.3

>
>
> [Also affects the driver]
> When a controller is down for some reason, the drivers need to notice it as 
> soon as possible in order to reroute 
> the query to another controller. Unfortunately TCP/IP sockets (used both by 
> Java and Carob) try be default for a looooong 
> time to retransmit/reconnect/receive/whatever, even infinite in some cases 
> (see for instance 
> SO_RCVTIMEO in "man 7 socket" on linux). 
> Please note that if a controller fails but the hosting operating system is 
> still up, then there is no issue because 
> the operating system will explicitely close the socket. Also note that this 
> issue is related to sockets already connected 
> to the failing controller, because new connections is a very different story 
> (system timeouts are much shorter in 
> this case). 
> The Very Great And Portable And Final Solution would be to use only 
> asynchronous (non-blocking) I/O or UDP, so 
> we can implement our own timeouts and have full control over this. 
> See the java.nio for Java and "man select/poll/socket" in C. 
> Unfortunately programming asynchronous I/O is much more complex and we 
> probably cannot afford 
> such a huge refactoring in the short/middle term. 
> So one alternative is to tweak socket settings like this: 
> setsockopt(...,SO_SNDTIMEO,...) 
> Or even less portable (and intrusive!), to tweak system default settings for 
> all sockets. 
> This issue to collect useful information and URLs about these SocketTimeouts 
> C - setsockopt() 
> Quoted from here: http://www.developerweb.net/forum/showthread.php?t=3439 
> "SO_{SND,RCV}TIMEO are probably the most widely 
> unimplemented, or strangely/incompatibly implemented, of all 
> common sockopts in existence". 
> Maybe this message is a bit old and progress has been achieved. Anyway using 
> this 
> solution will need testing. 
> Java 
> Is there any difference or is the Java socket API a simple 1-to-1 mapping of 
> the C interface? 
> I (MH) noticed at least one (not related) difference on linux with a 1.4.2 
> Sun JVM. The JVM always call 
> setsockopt(SO_REUSEADDR,...) on new sockets. That is, SO_REUSEADDR is the 
> default in java 
> whereas it's NOT in C. 
> Linux 
> It seems that SO_XXXTIMEO are implemented from kernel 2.4 and above 
> (confirmation anyone?) 
> zless 
> /usr/share/doc/linux-doc-2.6.10/Documentation/networking/ip-sysctl.txt.gz 
> (badly) documents the very useful: tcp_retries2 
> This the number of retransmissions before linux disconnects the socket. 
> Time between retransmissions grows exponentially. Warning: 
> since all send() -ing happens asynchronously, only the NEXT socket operation 
> will 
> report an error.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://forge.continuent.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to