Hi Ulrich,

Thanks for your reply!

>
> > I was testing the performance of open-iscsi initiator with IET target
> > over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
> > raw disk sequential write by
>
> > $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576
>
> > , in which /dev/sdb is the iSCSI device. I also measured TCP
> > throughput using iperf with the default setup except "-n 1024M". And I
> > got the following data on iSCSI throughput and TCP throughput v.s. rtt
>
> > rtt (ms)        iSCSI throughput by dd (MB/s)   TCP throughput by
> > iperf (Mbit/s)
> > 0.2               11.3
> > 94.3
> > 4                  11.1
> > 94.3
> > 8                  10.2
> > 94.3
> > 12                8.6
> > 94.2
> > 16                7.2
> > 94.2
> > 20                6.0
> > 94.1
>
> > local disk throughput by dd was 26.7 MB/s.
>
> > As shown in the table above, iSCSI throughput declined rapidly with
> > rtt increased from 0.2ms to 20ms. TCP throughput, however, only
> > dropped less than 1 percent.
>
> From what I know the (estimated) RTT (Round Trip Time) increases if a link 
> problem
> (i.e. lost packets) was detected (if other parameters are unchanged).

As explained at the beginning of my first thread, I was doing an
experiment. And the experiment was done on two laptops over a straight-
through cable. The RTT was increased intentionally, as I was measuring
the iSCSI performance against RTT changes. The other parameters of the
link, such as packet loss etc, were not changed and no packet loss was
observed when using ping over the link.

> > Then I used Wireshark to grab the traces of iSCSI and iperf and I
> > found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
> > but with iperf TCP segments could be as large as 65000+ bytes.
>
> How would you transport such a segmen unfragmented?

> > I also skimmed through the iSCSI specification, but it seemed no luck
> > there either...
>
> > I know the Ethernet MTU is 1500 byte long and that might be the reason
> > of the 1448 byte TCP segments, but iperf did get to send much larger
> > TCP segments of 65000+ bytes...
>
> over which layer 2?

As Mike suggested in his reply, this could be a jumbo frame. The
following is the data of a 65160 packet captured by Wireshark:

No.     Time        Source                S_Port Destination
D_Port Protocol Info
    266 0.137810    10.0.0.1              56099  10.0.0.2
5001 TCP      56099 > 5001 [ACK] Seq=376505 Ack=1 Win=92 Len=65160
[Packet size limited during capture]

Frame 266 (65226 bytes on wire, 58 bytes captured)
    Arrival Time: Jan  4, 2010 04:44:33.711762000
    [Time delta from previous captured frame: 0.000206000 seconds]
    [Time delta from previous displayed frame: 0.002861000 seconds]
    [Time since reference or first frame: 0.137810000 seconds]
    Frame Number: 266
    Frame Length: 65226 bytes
    Capture Length: 58 bytes
    [Frame is marked: True]
    [Protocols in frame: eth:ip:tcp]
    [Coloring Rule Name: TCP]
    [Coloring Rule String: tcp]
Ethernet II, Src: HonHaiPr_0f:35:65 , Dst: Ibm_8d:59:02
    Destination: Ibm_8d:59:02
        Address: Ibm_8d:59:02
        .... ...0 .... .... .... .... = IG bit: Individual address
(unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique
address (factory default)
    Source: HonHaiPr_0f:35:65
        Address: HonHaiPr_0f:35:65
        .... ...0 .... .... .... .... = IG bit: Individual address
(unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique
address (factory default)
    Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.1 (10.0.0.1), Dst: 10.0.0.2 (10.0.0.2)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN:
0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 65212
    Identification: 0x8729 (34601)
    Flags: 0x04 (Don't Fragment)
        0... = Reserved bit: Not set
        .1.. = Don't fragment: Set
        ..0. = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (0x06)
    Header checksum: 0xa10f [correct]
        [Good: True]
        [Bad : False]
    Source: 10.0.0.1 (10.0.0.1)
    Destination: 10.0.0.2 (10.0.0.2)
Transmission Control Protocol, Src Port: 56099 (56099), Dst Port:
commplex-link (5001), Seq: 376505, Ack: 1, Len: 65160
    Source port: 56099 (56099)
    Destination port: commplex-link (5001)
    [Stream index: 0]
    Sequence number: 376505    (relative sequence number)
    [Next sequence number: 441665    (relative sequence number)]
    Acknowledgement number: 1    (relative ack number)
    Header length: 32 bytes
    Flags: 0x10 (ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgement: Set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...0 = Fin: Not set
    Window size: 92
    Checksum: 0x12b2 [unchecked, not all data available]
        [Good Checksum: False]
        [Bad Checksum: False]
[Packet size limited during capture: TCP truncated]



> Both would be
> valid, but due to layer 2 and layer 3 restrictions (ISO OSI talk), only 
> sending
> more packets while waiting for an answer will be a valid assumption (unless 
> you
> have a dedicated single-hop line).

I totally agree that more concurrent IO requests, i.e. more packets
being sent at the same time, will boost the throughput of iSCSI.


> > I first thought this was because of the small default value (8192) for
> > MaxRecvDataSegmentLength. So I increased that value to 262144. But in
> > a later test with 16ms rtt, I found the iSCSI throughput was only
> > improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
> > 1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
> > not be the reason.
>
> I think the question is how big the TCP receive window will be.
>
> > So does anyone have any idea about this: why iSCSI is not fully
> > utilizing the bandwidth on long rtt links by increasing the TCP
> > segment size?
>
> Sorry, but I think utilizing a high-delay conncetion works via increasing the
> window size (i.e. number of packets), not the size of the segments.

Thank you for the idea of receiver window size. I checked the ACK
segments from the iSCSI target, i.e. the TCP receiver, the window size
is quite large, 24531, which is much larger than the size of the TCP
segments sent to the target, which is only 1448.

Based on these numbers, it seems that the receiver has realized the
large bandwidth and long RTT character of the link and advertised a
big window; however, the sender, i.e. the iSCSI initiator, seems not
to take the advantage of this. Do you maybe have any idea about why?

The following is the TCP info of a typical TCP ACK from the target and
a typical packet sent from the initiator to the target.

A typical ACK from the target, i.e. the TCP receiver.

No.     Time        Source                S_Port Destination
D_Port Protocol Info
 245660 39.597116   10.0.0.2              iscsi-target
10.0.0.1              52270  TCP      iscsi-target > 52270 [ACK]
Seq=94129 Ack=288633393 Win=24531 Len=0 TSV=15486798 TSER=14244966

Transmission Control Protocol, Src Port: iscsi-target (3260), Dst
Port: 52270 (52270), Seq: 94129, Ack: 288633393, Len: 0
    Source port: iscsi-target (3260)
    Destination port: 52270 (52270)
    [Stream index: 0]
    Sequence number: 94129    (relative sequence number)
    Acknowledgement number: 288633393    (relative ack number)
    Header length: 32 bytes
    Flags: 0x10 (ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgement: Set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...0 = Fin: Not set
    Window size: 24531
    Checksum: 0x3145 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Options: (12 bytes)
        NOP
        NOP
        Timestamps: TSval 15486798, TSecr 14244966
-------------------------------------------------------------
And for comparison, the following is the info of a typical TCP packet
sent from the initiator to the target.

No.     Time        Source                S_Port Destination
D_Port Protocol Info
 245697 39.608704   10.0.0.1              52270  10.0.0.2
iscsi-target TCP      [TCP segment of a reassembled PDU]

Transmission Control Protocol, Src Port: 52270 (52270), Dst Port:
iscsi-target (3260), Seq: 288707601, Ack: 94129, Len: 1448
    Source port: 52270 (52270)
    Destination port: iscsi-target (3260)
    [Stream index: 0]
    Sequence number: 288707601    (relative sequence number)
    [Next sequence number: 288709049    (relative sequence number)]
    Acknowledgement number: 94129    (relative ack number)
    Header length: 32 bytes
    Flags: 0x10 (ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgement: Set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..0. = Syn: Not set
        .... ...0 = Fin: Not set
    Window size: 3050
    Checksum: 0x19d1 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Options: (12 bytes)
        NOP
        NOP
        Timestamps: TSval 14244971, TSecr 15486796
    [SEQ/ACK analysis]
        [Number of bytes in flight: 9736]
    [Reassembled PDU in frame: 245702]
    TCP segment data (1448 bytes)


Do you maybe have any idea why the iSCSI initiator is not taking the
advantage of the large receiver window size?

Thanks a lot!

Jack
-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.


Reply via email to