Vladislav Bolkhovitin, on 09/08/2011 09:55 PM wrote:
> Mike Christie, on 09/02/2011 12:15 PM wrote:
>> On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
>>> Hi,
>>>
>>> I've done some tests and looks like open-iscsi doesn't support full duplex 
>>> speed
>>> on bidirectional data transfers from a single drive.
>>>
>>> My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
>>> from a
>>> ramdisk or nullio iSCSI device. One dd is reading and another one is 
>>> writing. I'm
>>> watching throughput using vmstat. When any of the dd's working alone, I 
>>> have full
>>> single direction link utilization (~120 MB/s) in both directions, but when 
>>> both
>>> transfers working in parallel, throughput on any of them immediately drops 
>>> in 2
>>> times to 55-60 MB/s (sum is the same 120 MB/s).
>>>
>>> For sure, I tested bidirectional possibility of a single TCP connection and 
>>> it
>>> does provide near 2 times throughput increase (~200 MB/s).
>>>
>>> Interesting, that doing another direction transfer from the same device 
>>> imported
>>> from another iSCSI target provides expected full duplex 2x aggregate 
>>> throughput
>>> increase.
>>>
>>> I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
>>> capable to
>>> provide full duplex transfers, but from some look on the open-iscsi code I 
>>> can't
>>> see the serialization point in it. Looks like open-iscsi receives and sends 
>>> data
>>> in different threads (the requester process and per connection iscsi_q_X 
>>> workqueue
>>> correspondingly), so should be capable to have full duplex.
>>
>> Yeah, we send from the iscsi_q workqueue and receive from the network
>> softirq if the net driver supports NAPI.
>>>
>>> Does anyone have idea what could be the serialization point preventing full 
>>> duplex
>>> speed?
>>
>> Did you do any lock profiliing and is the session->lock look the
>> problem? It is taken in both the receive and xmit paths and also the
>> queuecommand path.
> 
> Just done it. /proc/lock_stat says that there is no significant contention for
> session->lock.
> 
>>From other side, session->lock is a spinlock, so, if it was the serialization
> point, we would see big CPU consumption on the initiator. But we have a 
> plenty of
> CPU time there.
> 
> So, there must be other serialization point.

Update. Using sg_dd with blk_sgio=1 (SG_IO) instead of dd I was able to achieve
bidi speed 92 MB/s in each direction.

Thus, the iSCSI stack works as expected well and the serialization point must be
somewhere higher in the block stack. Both buffered and direct dd demonstrate the
same serialized behavior described above.

Vlad

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to