Mike Christie, on 09/02/2011 12:15 PM wrote:
> On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
>> Hi,
>>
>> I've done some tests and looks like open-iscsi doesn't support full duplex 
>> speed
>> on bidirectional data transfers from a single drive.
>>
>> My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
>> from a
>> ramdisk or nullio iSCSI device. One dd is reading and another one is 
>> writing. I'm
>> watching throughput using vmstat. When any of the dd's working alone, I have 
>> full
>> single direction link utilization (~120 MB/s) in both directions, but when 
>> both
>> transfers working in parallel, throughput on any of them immediately drops 
>> in 2
>> times to 55-60 MB/s (sum is the same 120 MB/s).
>>
>> For sure, I tested bidirectional possibility of a single TCP connection and 
>> it
>> does provide near 2 times throughput increase (~200 MB/s).
>>
>> Interesting, that doing another direction transfer from the same device 
>> imported
>> from another iSCSI target provides expected full duplex 2x aggregate 
>> throughput
>> increase.
>>
>> I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
>> capable to
>> provide full duplex transfers, but from some look on the open-iscsi code I 
>> can't
>> see the serialization point in it. Looks like open-iscsi receives and sends 
>> data
>> in different threads (the requester process and per connection iscsi_q_X 
>> workqueue
>> correspondingly), so should be capable to have full duplex.
> 
> Yeah, we send from the iscsi_q workqueue and receive from the network
> softirq if the net driver supports NAPI.
> 
>>
>> Does anyone have idea what could be the serialization point preventing full 
>> duplex
>> speed?
>>
> 
> Did you do any lock profiliing and is the session->lock look the
> problem? It is taken in both the receive and xmit paths and also the
> queuecommand path.

Just done it. /proc/lock_stat says that there is no significant contention for
session->lock.

>From other side, session->lock is a spinlock, so, if it was the serialization
point, we would see big CPU consumption on the initiator. But we have a plenty 
of
CPU time there.

So, there must be other serialization point.

Thanks,
Vlad



-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to