Re: Full duplex

2011-09-10 Thread Vladislav Bolkhovitin
Vladislav Bolkhovitin, on 09/08/2011 09:55 PM wrote:
 Mike Christie, on 09/02/2011 12:15 PM wrote:
 On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
 Hi,

 I've done some tests and looks like open-iscsi doesn't support full duplex 
 speed
 on bidirectional data transfers from a single drive.

 My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
 from a
 ramdisk or nullio iSCSI device. One dd is reading and another one is 
 writing. I'm
 watching throughput using vmstat. When any of the dd's working alone, I 
 have full
 single direction link utilization (~120 MB/s) in both directions, but when 
 both
 transfers working in parallel, throughput on any of them immediately drops 
 in 2
 times to 55-60 MB/s (sum is the same 120 MB/s).

 For sure, I tested bidirectional possibility of a single TCP connection and 
 it
 does provide near 2 times throughput increase (~200 MB/s).

 Interesting, that doing another direction transfer from the same device 
 imported
 from another iSCSI target provides expected full duplex 2x aggregate 
 throughput
 increase.

 I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
 capable to
 provide full duplex transfers, but from some look on the open-iscsi code I 
 can't
 see the serialization point in it. Looks like open-iscsi receives and sends 
 data
 in different threads (the requester process and per connection iscsi_q_X 
 workqueue
 correspondingly), so should be capable to have full duplex.

 Yeah, we send from the iscsi_q workqueue and receive from the network
 softirq if the net driver supports NAPI.

 Does anyone have idea what could be the serialization point preventing full 
 duplex
 speed?

 Did you do any lock profiliing and is the session-lock look the
 problem? It is taken in both the receive and xmit paths and also the
 queuecommand path.
 
 Just done it. /proc/lock_stat says that there is no significant contention for
 session-lock.
 
From other side, session-lock is a spinlock, so, if it was the serialization
 point, we would see big CPU consumption on the initiator. But we have a 
 plenty of
 CPU time there.
 
 So, there must be other serialization point.

Update. Using sg_dd with blk_sgio=1 (SG_IO) instead of dd I was able to achieve
bidi speed 92 MB/s in each direction.

Thus, the iSCSI stack works as expected well and the serialization point must be
somewhere higher in the block stack. Both buffered and direct dd demonstrate the
same serialized behavior described above.

Vlad

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Full duplex

2011-09-10 Thread Vladislav Bolkhovitin
Vladislav Bolkhovitin, on 09/10/2011 05:44 PM wrote:
 Vladislav Bolkhovitin, on 09/08/2011 09:55 PM wrote:
 Mike Christie, on 09/02/2011 12:15 PM wrote:
 On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
 Hi,

 I've done some tests and looks like open-iscsi doesn't support full duplex 
 speed
 on bidirectional data transfers from a single drive.

 My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
 from a
 ramdisk or nullio iSCSI device. One dd is reading and another one is 
 writing. I'm
 watching throughput using vmstat. When any of the dd's working alone, I 
 have full
 single direction link utilization (~120 MB/s) in both directions, but when 
 both
 transfers working in parallel, throughput on any of them immediately drops 
 in 2
 times to 55-60 MB/s (sum is the same 120 MB/s).

 For sure, I tested bidirectional possibility of a single TCP connection 
 and it
 does provide near 2 times throughput increase (~200 MB/s).

 Interesting, that doing another direction transfer from the same device 
 imported
 from another iSCSI target provides expected full duplex 2x aggregate 
 throughput
 increase.

 I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
 capable to
 provide full duplex transfers, but from some look on the open-iscsi code I 
 can't
 see the serialization point in it. Looks like open-iscsi receives and 
 sends data
 in different threads (the requester process and per connection iscsi_q_X 
 workqueue
 correspondingly), so should be capable to have full duplex.

 Yeah, we send from the iscsi_q workqueue and receive from the network
 softirq if the net driver supports NAPI.

 Does anyone have idea what could be the serialization point preventing 
 full duplex
 speed?

 Did you do any lock profiliing and is the session-lock look the
 problem? It is taken in both the receive and xmit paths and also the
 queuecommand path.

 Just done it. /proc/lock_stat says that there is no significant contention 
 for
 session-lock.

 From other side, session-lock is a spinlock, so, if it was the 
 serialization
 point, we would see big CPU consumption on the initiator. But we have a 
 plenty of
 CPU time there.

 So, there must be other serialization point.
 
 Update. Using sg_dd with blk_sgio=1 (SG_IO) instead of dd I was able to 
 achieve
 bidi speed 92 MB/s in each direction.
 
 Thus, the iSCSI stack works as expected well and the serialization point must 
 be
 somewhere higher in the block stack. Both buffered and direct dd demonstrate 
 the
 same serialized behavior described above.

...even using the corresponding iSCSI device formatted in ext4 with the dd's
working over 2 _separate_ files.

In other words, apparently, for user space applications not smart enough to use 
sg
interface there is no way to use full duplex capability of the link. No tricks 
can
give them the double throughput.

I tried with Fibre Channel and see the same with the only difference that if the
same device from the same target imported as 2 LUNs (i.e. as multipath), both
those LUNs can work bidirectionally. In iSCSI you need to import this device 
from
2 separate iSCSI targets to achieve that.

Vlad

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



RE: Full duplex

2011-09-06 Thread Ron Gilad (Sanrad)
Hi Mike
Sorry to disturb, but do you know how to unsubscribe from this group?
I have followed the instructions and it doesn't work.
Thanks,
Ron

-Original Message-
From: open-iscsi@googlegroups.com [mailto:open-iscsi@googlegroups.com] On 
Behalf Of Mike Christie
Sent: Friday, September 02, 2011 7:15 PM
To: open-iscsi@googlegroups.com
Cc: Vladislav Bolkhovitin
Subject: Re: Full duplex

On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
 Hi,
 
 I've done some tests and looks like open-iscsi doesn't support full duplex 
 speed
 on bidirectional data transfers from a single drive.
 
 My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
 from a
 ramdisk or nullio iSCSI device. One dd is reading and another one is writing. 
 I'm
 watching throughput using vmstat. When any of the dd's working alone, I have 
 full
 single direction link utilization (~120 MB/s) in both directions, but when 
 both
 transfers working in parallel, throughput on any of them immediately drops in 
 2
 times to 55-60 MB/s (sum is the same 120 MB/s).
 
 For sure, I tested bidirectional possibility of a single TCP connection and it
 does provide near 2 times throughput increase (~200 MB/s).
 
 Interesting, that doing another direction transfer from the same device 
 imported
 from another iSCSI target provides expected full duplex 2x aggregate 
 throughput
 increase.
 
 I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
 capable to
 provide full duplex transfers, but from some look on the open-iscsi code I 
 can't
 see the serialization point in it. Looks like open-iscsi receives and sends 
 data
 in different threads (the requester process and per connection iscsi_q_X 
 workqueue
 correspondingly), so should be capable to have full duplex.

Yeah, we send from the iscsi_q workqueue and receive from the network
softirq if the net driver supports NAPI.

 
 Does anyone have idea what could be the serialization point preventing full 
 duplex
 speed?
 

Did you do any lock profiliing and is the session-lock look the
problem? It is taken in both the receive and xmit paths and also the
queuecommand path.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Full duplex

2011-09-02 Thread Mike Christie
On 09/01/2011 10:04 PM, Vladislav Bolkhovitin wrote:
 Hi,
 
 I've done some tests and looks like open-iscsi doesn't support full duplex 
 speed
 on bidirectional data transfers from a single drive.
 
 My test is simple: 2 dd's doing big transfers in parallel over 1 GbE link 
 from a
 ramdisk or nullio iSCSI device. One dd is reading and another one is writing. 
 I'm
 watching throughput using vmstat. When any of the dd's working alone, I have 
 full
 single direction link utilization (~120 MB/s) in both directions, but when 
 both
 transfers working in parallel, throughput on any of them immediately drops in 
 2
 times to 55-60 MB/s (sum is the same 120 MB/s).
 
 For sure, I tested bidirectional possibility of a single TCP connection and it
 does provide near 2 times throughput increase (~200 MB/s).
 
 Interesting, that doing another direction transfer from the same device 
 imported
 from another iSCSI target provides expected full duplex 2x aggregate 
 throughput
 increase.
 
 I tried several iSCSI targets + I'm pretty confident that iSCSI-SCST is 
 capable to
 provide full duplex transfers, but from some look on the open-iscsi code I 
 can't
 see the serialization point in it. Looks like open-iscsi receives and sends 
 data
 in different threads (the requester process and per connection iscsi_q_X 
 workqueue
 correspondingly), so should be capable to have full duplex.

Yeah, we send from the iscsi_q workqueue and receive from the network
softirq if the net driver supports NAPI.

 
 Does anyone have idea what could be the serialization point preventing full 
 duplex
 speed?
 

Did you do any lock profiliing and is the session-lock look the
problem? It is taken in both the receive and xmit paths and also the
queuecommand path.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.