Attached please find the updated specification of this case.
I've incorporated the comments suggested in the discussion.


--

                                                K. Poon.
                                                [email protected]

This case introduces four TCP level socket options, TCP_RTO_INITIAL,
TCP_RTO_MIN, TCP_RTO_MAX, and TCP_LINGER2.  This case also documents
the existing options TCP_CONN_ABORT_THRESHOLD and TCP_ABORT_THRESHOLD.


TCP_RTO_INITIAL, TCP_RTO_MIN, TCP_RTO_MAX
-----------------------------------------

An application can use these three options to change/retrieve
respectively the initial, minimum and maximum retransmission timeout
values of a TCP connection.  The option value is an uint32_t and the
unit is milli-second (actual value is rounded up to the nearest clock
tick interval).  The lower bound of the options is 1ms.  The upper
bound of TCP_RTO_INITIAL is 20s.  And the upper bound of TCP_RTO_MIN
and TCP_RTO_MAX is 2 hours.  0 means no change on the timeout value.

These options correspond to the private TCP parameters
tcp_rexmit_interval_initial, tcp_rexmit_interval_min and
tcp_rexmit_interval_max, which control the default values used in an
IP stack.  The lower and upper bounds of the options are the same as
that of the private TCP parameters.

Note that these socket options do not change TCP's dynamic RTO
calcuation.  They only set the boundaries and initial value.  For
example, the default initial RTO is 3s because of "conservative"
(meaning it seems to work) reason.  It is really not correct since
there is no estimation.  If an app knows what it is, it is good to let
it set the value.  There must be a maximum value when TCP
exponentially backs off RTO in doing retransmission.  It does not make
sense to back off without bound.  If an app will abort the connection
after a fixed time with no response from its peer, it makes sense to
allow the app to control the maximum RTO.  Otherwise, it may happen
that there is no retransmission at all before the app terminates the
connection.  Similarly, there must be a minimum.  RTO calculation may
not converge quickly or the algorithm may not work well for certain
type of network (*) and may cause false timeout.  If an app knows
better, it makes sense to let it set the value.


TCP_CONN_ABORT_THRESHOLD, TCP_ABORT_THRESHOLD
---------------------------------------------

An application can use these two options to change/retreive the total
time a TCP connection spent doing retransmission without getting back
an acknowledgement.  TCP_CONN_ABORT_THRESHOLD controls this interval
before a connection is established.  TCP_ABORT_THRESHOLD controls the
interval after a connection is established.  The option value is an
uint32_t and the unit is milli-second (actual value is rounded up to
the nearest clock tick interval).  The lower bound of
TCP_CONN_ABORT_THRESHOLD is 1000ms and the upper bound is UINT32_MAX
ms.  The lower bound of TCP_ABORT_THRESHOLD is 500ms and the upper
bound is UINT32_MAX ms.  The option value 0 means that TCP will try to
retransmit forever without timing out.

These options correspond to the private TCP parameters
tcp_ip_abort_cinterval and tcp_ip_abort_interval, which control the
default values used in an IP stack.  The lower and upper bound of
the options are the same as that of the private parameters.


TCP_LINGER2
-----------

An application can use this option to change/retrieve the amount of
time a closed TCP connection stays in FIN-WAIT-2 state.  The option
value is an int and the unit is second.  The lower bound is 1s and the
upper bound is the value of the TCP private parameter
tcp_fin_wait_2_flush_interval.  0 means no change on the value.
Negative value is not allowed.

This option is introduced to ease porting Linux applications using
this option to Solaris.  It corresponds to the private TCP parameters
tcp_fin_wait_2_flush_interval, which controls the default value used
in an IP stack.  The unit of this parameter is milli-second, which is
different from that of the socket option because the option value unit
needs to be the same as in Linux.  The lower and upper bound of the
option are the same (after unit conversion) as the private TCP
parameter.  The default value of tcp_fin_wait_2_flush_interval is also
changed to 60s.


Diff of tcp(7P)


***************
*** 216,223 ****
--- 216,245 ----
       If the local TCP receives no acknowledgements from its  peer
       for  a  period  of time, (for example, if the remote machine
       crashes), the connection is closed and an error is returned.
+      The TCP level socket options, TCP_CONN_ABORT_THRESHOLD and
+      TCP_ABORT_THRESHOLD can be used to change and retrieve this
+      period of time.  The option value is uint32_t and the unit is
+      milli-second.  TCP_CONN_ABORT_THRESHOLD and TCP_ABORT_THRESHOLD 
+      control respecively this period before and after a connection
+      is established.  If the application does not want TCP to time
+      out, it can use the option value 0.
  
+      During this period, TCP tries to retransmit the unacknowledged
+      data multiple times, each after a timeout.  And the timeout
+      interval is exponentially backed off.  The TCP level socket
+      options, TCP_RTO_INITIAL, TCP_RTO_MIN, and TCP_RTO_MAX can be
+      used to control the timeout interval.  TCP_RTO_INITIAL controls
+      the initial retransmission timeout period.  TCP_RTO_MIN and 
+      TCP_RTO_MAX control the minimum and maximum timeout period
+      respectively.  The option value is an uint32_t and the unit
+      is milli-second.
  
+      Note that the default values of the above options,
+      TCP_CONN_ABORT_THRESHOLD, TCP_ABORT_THRESHOLD, TCP_RTO_MIN,
+      TCP_RTO_MAX, and TCP_RTO_INITIAL are appropriate for most 
+      situations.  An application should only alter their values
+      in special circumstances and when it has detailed knowledge
+      of the network environment.
+ 
       TCP follows the congestion control  algorithm  described  in
       RFC  2581,  and  also supports the initial congestion window
       (cwnd) changes in RFC 3390. The initial cwnd calculation can
***************
*** 434,439 ****
--- 456,475 ----
       the  TCP  ndd  parameter  tcp_keepalive_abort_interval.  The
       default is eight minutes.
  
+      After an application closes a TCP connection, TCP enters the
+      shutdown sequence.  But if the peer does not respond (it 
+      crashes), the connection will be stuck in this state
+      (FIN-WAIT-2).  To prevent this, SunOS starts a timer when TCP
+      enters this state.  If the timer fires and the shtudown
+      sequence has not completed, the connection will be freed.  The
+      socket option TCP_LINGER2 can be used to change and retrieve
+      this timeout period.  The opton value is an int and the unit
+      is second.  The option value cannot be set higher than the
+      system default value, which is controlled by the TCP private
*      parameter tcp_fin_wait_2_flush_interval.  The default value is
+      appropriate for most situations.  An application should only
+      change the value in some special circumstances and when it has
+      detailed knowledge of the network environment.
+      
+ 
  SEE ALSO
       svcs(1), ndd(1M), ioctl(2), read(2),  svcadm(1M),  write(2),
       accept(3SOCKET),       bind(3SOCKET),      connect(3SOCKET),



(*) Existing mobile network is a good example.  It buffers
    extensively (how much depends on the provider) and TCP's
    current algorithm does not work well with that. 

_______________________________________________
opensolaris-arc mailing list
[email protected]

Reply via email to