Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,

> I'll do it tomorrow : Today is President's Day in the US, and I am
> spending the day with my family.

thank you. Enjoy your day.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 16:32 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Unfortunately you did not had good results with the MSG_MORE applied
> > to the page fragments.
> 
> I agree. We should submit only the submit the patch from this message:
> 
> Message-ID: 
> <1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com>
> http://mid.gmane.org/1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com
> 
> > I think I'll submit the part only dealing with the metadata.
> 
> May I submit the patch or do you want to do it yourself?

I'll do it tomorrow : Today is President's Day in the US, and I am
spending the day with my family.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,

> Unfortunately you did not had good results with the MSG_MORE applied
> to the page fragments.

I agree. We should submit only the submit the patch from this message:

Message-ID: <1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com>
http://mid.gmane.org/1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com

> I think I'll submit the part only dealing with the metadata.

May I submit the patch or do you want to do it yourself?

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 15:08 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> may submit your latest patch for upstream? Or do you plan on doing that
> yourself?

Unfortunately you did not had good results with the MSG_MORE applied to
the page fragments.

I think I'll submit the part only dealing with the metadata.

Then later we might take care of the page themselves.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,
may submit your latest patch for upstream? Or do you plan on doing that
yourself?

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,
may submit your latest patch for upstream? Or do you plan on doing that
yourself?

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 15:08 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 may submit your latest patch for upstream? Or do you plan on doing that
 yourself?

Unfortunately you did not had good results with the MSG_MORE applied to
the page fragments.

I think I'll submit the part only dealing with the metadata.

Then later we might take care of the page themselves.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,

 Unfortunately you did not had good results with the MSG_MORE applied
 to the page fragments.

I agree. We should submit only the submit the patch from this message:

Message-ID: 1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com
http://mid.gmane.org/1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com

 I think I'll submit the part only dealing with the metadata.

May I submit the patch or do you want to do it yourself?

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Eric Dumazet
On Mon, 2014-02-17 at 16:32 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 
  Unfortunately you did not had good results with the MSG_MORE applied
  to the page fragments.
 
 I agree. We should submit only the submit the patch from this message:
 
 Message-ID: 
 1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com
 http://mid.gmane.org/1391886759.10160.114.ca...@edumazet-glaptop2.roam.corp.google.com
 
  I think I'll submit the part only dealing with the metadata.
 
 May I submit the patch or do you want to do it yourself?

I'll do it tomorrow : Today is President's Day in the US, and I am
spending the day with my family.

Thanks !


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-17 Thread Thomas Glanzmann
Hello Eric,

 I'll do it tomorrow : Today is President's Day in the US, and I am
 spending the day with my family.

thank you. Enjoy your day.

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> Yes, this is much better : 2 frames per request/response, instead of 4.

perfect. I send out the page to the iscsi target list in your name since
you did the work and I added me as signed off I hope that is how it is
handled or should I have added my name to the from line and mentioned in
the description of the patch that you did the heavy lifting?

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 22:36 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > I was simply thinking about something like :
> > (might need further changes, but I guess this should solve your case)
> 
> thank you for your patch. It did not apply on top of Linux tip, so I put
> in the changes manually and fixed up another call to tx_data that your
> forgot in your initial patch to make it apply.
> 
> I gave it another run, can you confirm that it now behaves better?
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2
> 
> And look at that roundtrip graph it is perfect. Also filesystem is now
> created in 3 seconds instead of 4.

Yes, this is much better : 2 frames per request/response, instead of 4.

13:32:04.665367 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
384:432, ack 2529, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.665483 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
2529:3089, ack 432, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.665642 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
432:480, ack 3089, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.665756 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
3089:3649, ack 480, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.665933 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
480:528, ack 3649, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666046 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
3649:4209, ack 528, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.666214 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
528:576, ack 4209, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666333 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
4209:4769, ack 576, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.78 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
576:624, ack 4769, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666790 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
4769:5329, ack 624, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.666983 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
624:672, ack 5329, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667097 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
5329:5889, ack 672, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.667280 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
672:720, ack 5889, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667324 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
5889:6449, ack 720, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.667500 IP 10.101.0.12.43418 > 10.101.99.5.3260: Flags [P.], seq 
720:768, ack 6449, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667540 IP 10.101.99.5.3260 > 10.101.0.12.43418: Flags [P.], seq 
6449:7009, ack 768, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> I was simply thinking about something like :
> (might need further changes, but I guess this should solve your case)

thank you for your patch. It did not apply on top of Linux tip, so I put
in the changes manually and fixed up another call to tx_data that your
forgot in your initial patch to make it apply.

I gave it another run, can you confirm that it now behaves better?

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2

And look at that roundtrip graph it is perfect. Also filesystem is now
created in 3 seconds instead of 4.

https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-22:34:57.png

Nab, do you consider this patch for upstream? Would you take if I clean
it up?

Cheers,
Thomas

PS: I'm asleep for the next 8 hours.

diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index e655b04..0eb9681 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1168,7 +1168,7 @@ send_data:
iov_count = cmd->iov_misc_count;
}
 
-   tx_sent = tx_data(conn, [0], iov_count, tx_size);
+   tx_sent = tx_data(conn, [0], iov_count, tx_size, 0);
if (tx_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1199,7 +1199,8 @@ send_hdr:
iov.iov_base = cmd->pdu;
iov.iov_len = tx_hdr_size;
 
-   tx_sent = tx_data(conn, , 1, tx_hdr_size);
+   data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
+tx_sent = tx_data(conn, , 1, tx_hdr_size, data_len ? MSG_MORE : 0);
if (tx_hdr_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1208,7 +1209,6 @@ send_hdr:
return -1;
}
 
-   data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
/*
 * Set iov_off used by padding and data digest tx_data() calls below
 * in order to determine proper offset into cmd->iov_data[]
@@ -1252,7 +1252,8 @@ send_padding:
if (cmd->padding) {
struct kvec *iov_p = >iov_data[iov_off++];
 
-   tx_sent = tx_data(conn, iov_p, 1, cmd->padding);
+   tx_sent = tx_data(conn, iov_p, 1, cmd->padding,
+ conn->conn_ops->DataDigest ? MSG_MORE : 0);
if (cmd->padding != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1266,7 +1267,7 @@ send_datacrc:
if (conn->conn_ops->DataDigest) {
struct kvec *iov_d = >iov_data[iov_off];
 
-   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
+   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
if (ISCSI_CRC_LEN != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1352,11 +1353,13 @@ static int iscsit_do_rx_data(
 
 static int iscsit_do_tx_data(
struct iscsi_conn *conn,
-   struct iscsi_data_count *count)
+   struct iscsi_data_count *count,
+   int flags)
 {
int data = count->data_length, total_tx = 0, tx_loop = 0, iov_len;
struct kvec *iov_p;
struct msghdr msg;
+struct msghdr msg = { .msg_flags = flags };
 
if (!conn || !conn->sock || !conn->conn_ops)
return -1;
@@ -1366,8 +1369,6 @@ static int iscsit_do_tx_data(
return -1;
}
 
-   memset(, 0, sizeof(struct msghdr));
-
iov_p = count->iov;
iov_len = count->iov_count;
 
@@ -1411,7 +1412,8 @@ int tx_data(
struct iscsi_conn *conn,
struct kvec *iov,
int iov_count,
-   int data)
+   int data,
+   int flags)
 {
struct iscsi_data_count c;
 
@@ -1424,7 +1426,7 @@ int tx_data(
c.data_length = data;
c.type = ISCSI_TX_DATA;
 
-   return iscsit_do_tx_data(conn, );
+   return iscsit_do_tx_data(conn, , flags);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 18:15 +0100, Thomas Glanzmann wrote:

> The iSCSI target uses one function to send all outbound data. So in
> order to do it right every function that is sending data in multiple
> chunks need to mark it correctly. Of course someone could also do some
> wild guessing and saying that everything that is below 512 Bytes gets
> pushed out. I wonder what Nab has to say about this?

I was simply thinking about something like :
(might need further changes, but I guess this should solve your case)

diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index 0819e688a398..44f0d62a88d6 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1165,7 +1165,7 @@ send_data:
iov_count = cmd->iov_misc_count;
}
 
-   tx_sent = tx_data(conn, [0], iov_count, tx_size);
+   tx_sent = tx_data(conn, [0], iov_count, tx_size, 0);
if (tx_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1196,7 +1196,8 @@ send_hdr:
iov.iov_base = cmd->pdu;
iov.iov_len = tx_hdr_size;
 
-   tx_sent = tx_data(conn, , 1, tx_hdr_size);
+   data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
+   tx_sent = tx_data(conn, , 1, tx_hdr_size, data_len ? MSG_MORE : 0);
if (tx_hdr_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1205,7 +1206,6 @@ send_hdr:
return -1;
}
 
-   data_len = cmd->tx_size - tx_hdr_size - cmd->padding;
/*
 * Set iov_off used by padding and data digest tx_data() calls below
 * in order to determine proper offset into cmd->iov_data[]
@@ -1249,7 +1249,8 @@ send_padding:
if (cmd->padding) {
struct kvec *iov_p = >iov_data[iov_off++];
 
-   tx_sent = tx_data(conn, iov_p, 1, cmd->padding);
+   tx_sent = tx_data(conn, iov_p, 1, cmd->padding,
+ conn->conn_ops->DataDigest ? MSG_MORE : 0);
if (cmd->padding != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1263,7 +1264,7 @@ send_datacrc:
if (conn->conn_ops->DataDigest) {
struct kvec *iov_d = >iov_data[iov_off];
 
-   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
+   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
if (ISCSI_CRC_LEN != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err("tx_data() returned -EAGAIN\n");
@@ -1349,11 +1350,12 @@ static int iscsit_do_rx_data(
 
 static int iscsit_do_tx_data(
struct iscsi_conn *conn,
-   struct iscsi_data_count *count)
+   struct iscsi_data_count *count,
+   int flags)
 {
int data = count->data_length, total_tx = 0, tx_loop = 0, iov_len;
struct kvec *iov_p;
-   struct msghdr msg;
+   struct msghdr msg = { .msg_flags = flags };
 
if (!conn || !conn->sock || !conn->conn_ops)
return -1;
@@ -1363,8 +1365,6 @@ static int iscsit_do_tx_data(
return -1;
}
 
-   memset(, 0, sizeof(struct msghdr));
-
iov_p = count->iov;
iov_len = count->iov_count;
 
@@ -1408,7 +1408,8 @@ int tx_data(
struct iscsi_conn *conn,
struct kvec *iov,
int iov_count,
-   int data)
+   int data,
+   int flags)
 {
struct iscsi_data_count c;
 
@@ -1421,7 +1422,7 @@ int tx_data(
c.data_length = data;
c.type = ISCSI_TX_DATA;
 
-   return iscsit_do_tx_data(conn, );
+   return iscsit_do_tx_data(conn, , flags);
 }
 
 void iscsit_collect_login_stats(
diff --git a/drivers/target/iscsi/iscsi_target_util.h 
b/drivers/target/iscsi/iscsi_target_util.h
index e4fc34a02f57..1b4f06801adc 100644
--- a/drivers/target/iscsi/iscsi_target_util.h
+++ b/drivers/target/iscsi/iscsi_target_util.h
@@ -54,7 +54,7 @@ extern int iscsit_print_dev_to_proc(char *, char **, off_t, 
int);
 extern int iscsit_print_sessions_to_proc(char *, char **, off_t, int);
 extern int iscsit_print_tpg_to_proc(char *, char **, off_t, int);
 extern int rx_data(struct iscsi_conn *, struct kvec *, int, int);
-extern int tx_data(struct iscsi_conn *, struct kvec *, int, int);
+extern int tx_data(struct iscsi_conn *, struct kvec *, int, int, int);
 extern void iscsit_collect_login_stats(struct iscsi_conn *, u8, u8);
 extern struct iscsi_tiqn *iscsit_snmp_get_tiqn(struct iscsi_conn *);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> Yep, but the problem (at least on your pcap), is about sending the 48
> bytes headers in  TCP segment of its own, then the 512 byte payload in
> a separate segment.

I agree.

> I suspect the sendpage() is only used for the payload. No need for
> MSG_MORE here.

I see.

> The MSG_MORE would need to be set on the first part (48 bytes header),
> so that TCP stack will defer the push of the segment at the time the 512
> bytes payload is added.

The iSCSI target uses one function to send all outbound data. So in
order to do it right every function that is sending data in multiple
chunks need to mark it correctly. Of course someone could also do some
wild guessing and saying that everything that is below 512 Bytes gets
pushed out. I wonder what Nab has to say about this?

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 17:57 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Note : We did some patches in the MSG_MORE logic for sendpage(), but
> > in your case I do not think its related
> > (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious
> 
> thank you for the pointer. The iSCSI target code actually uses sendpage
> whenever it can.

Yep, but the problem (at least on your pcap), is about sending the 48
bytes headers in  TCP segment of its own, then the 512 byte payload in a
separate segment.

I suspect the sendpage() is only used for the payload. No need for
MSG_MORE here.

The MSG_MORE would need to be set on the first part (48 bytes header),
so that TCP stack will defer the push of the segment at the time the 512
bytes payload is added.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> Note : We did some patches in the MSG_MORE logic for sendpage(), but
> in your case I do not think its related
> (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious

thank you for the pointer. The iSCSI target code actually uses sendpage
whenever it can.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 16:00 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> > Idea would be to set this flag when calling sendmsg() of the 48 bytes
> > of the header, and not set it on the sendmsg() of the 512 bytes of the
> > payload.
> 
> I see.
> 
> > iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
> > it would be nice to add a new _initial_ flags parameter to
> > iscsi_sw_tcp_xmit_segment()
> 
> This is for the iscsi initiator implementation. I'm interested in iSCSI
> target code, but I already found it and experiemented a little bit, but
> I need to dig deeper if I want to prepare a patch.

Fantastic !

Let me know if you want some help.

Note : We did some patches in the MSG_MORE logic for sendpage(), but in
your case I do not think its related

(git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> Idea would be to set this flag when calling sendmsg() of the 48 bytes
> of the header, and not set it on the sendmsg() of the 512 bytes of the
> payload.

I see.

> iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
> it would be nice to add a new _initial_ flags parameter to
> iscsi_sw_tcp_xmit_segment()

This is for the iscsi initiator implementation. I'm interested in iSCSI
target code, but I already found it and experiemented a little bit, but
I need to dig deeper if I want to prepare a patch.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 15:19 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> I get the idea. However I'm a little bit confused, when I do a 'git grep
> MSG_MORE' I don't see much references in the Linux kernel who use it at
> all. So do you have an example for me where this flags needs to be
> applied?

Idea would be to set this flag when calling sendmsg() of the 48 bytes of
the header, and not set it on the sendmsg() of the 512 bytes of the
payload.

iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
it would be nice to add a new _initial_ flags parameter to
iscsi_sw_tcp_xmit_segment()



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> > BTW this problem demonstrates there is room for improvement in iCSCI,
> > using MSG_MORE to avoid sending two small segments in separate frames.

> With the fix, new pcap is more explicit about this suboptimal behavior :

> 05:34:16.280900 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 
> 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.280949 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
> 5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 
> 4294935370], length 48

> 05:34:16.280982 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 48
> 05:34:16.281000 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 512

> 05:34:16.281107 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 
> 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.281157 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
> 5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 
> 4294935370], length 48

> 05:34:16.281190 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 48
> 05:34:16.281208 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 512

> 05:34:16.281337 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 
> 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
> 05:34:16.281390 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
> 5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 
> 4294935370], length 48

> 05:34:16.281423 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 48
> 05:34:16.281440 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
> 55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
> 1732452], length 512

I get the idea. However I'm a little bit confused, when I do a 'git grep
MSG_MORE' I don't see much references in the Linux kernel who use it at
all. So do you have an example for me where this flags needs to be
applied?

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:50 -0800, Eric Dumazet wrote:
> On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
> > On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> > > Here is the combined patch, could you test it ?
> > 
> > Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> > ("tcp: autocork should not hold first packet in write queue")
> > in your tree.
> > 
> > 
> 
> BTW this problem demonstrates there is room for improvement in iCSCI,
> using MSG_MORE to avoid sending two small segments in separate frames.
> 
> [1] 00:32:35.726568 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 
> 145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], 
> length 48
> [2] 00:32:35.838074 IP 10.101.0.13.27778 > 10.101.99.5.3260: Flags [.], ack 
> 193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
> [3] 00:32:35.838099 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 
> 193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], 
> length 512
> 
> [1] & [3] could be coalesced, and [2] would be avoided.
> 

With the fix, new pcap is more explicit about this suboptimal behavior :

05:34:16.280900 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54353, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.280949 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.280982 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281000 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512

05:34:16.281107 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 54913, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281157 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.281190 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281208 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512

05:34:16.281337 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [.], ack 55473, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281390 IP 10.101.0.13.41531 > 10.101.99.5.3260: Flags [P.], seq 
5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.281423 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281440 IP 10.101.99.5.3260 > 10.101.0.13.41531: Flags [P.], seq 
55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: dropped CC accidently]

> 10.101.99.5 or 10.101.0.13?

10.101.99.5 (iSCSI Target)

tcpdump -i bond0.101 -s 0 -w /tmp/tcp_auto_corking_on_patched.pcap host 
esx-03.v101.campusvl.de

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2

This pcap was taken on which host ?

10.101.99.5 or  10.101.0.13 ?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> What is your NIC model and driver?

I have four Intel Corporation I350 Gigabit Network Connection (rev 01).

(node-62) [~/work/linux-2.6] lspci -v | pbot
http://pbot.rmdir.de/rgu6yHMBDVQpflMmbcJACg
(node-62) [~/work/linux-2.6] ip a s | pbot
http://pbot.rmdir.de/xJjRT8u-ekC6mrWgl09ZtQ
(node-62) [~/work/linux-2.6] dmesg | pbot
http://pbot.rmdir.de/MigrSPtxGmp0fI1CRgXsHw

I do 802.3ad link aggregation layer 2 hash with two network cards to one
switch.

I'm running:
Linux node-62 3.14.0-rc1+ #23 SMP Sat Feb 8 14:27:47 CET 2014 x86_64 GNU/Linux

Driver: igb

If you need remote access to the machine let me know.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:
> Hello Eric,

> 
> It fixes my case but if you look at the round trip time it is not even
> close what it used to be. So while this fixes my problem I'm still for
> disabling it by default.
> 
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Very nice.

Now we have to check your NIC and how TX completion is performed.

What is your NIC model and driver ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
> On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> > Here is the combined patch, could you test it ?
> 
> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.
> 
> 

BTW this problem demonstrates there is room for improvement in iCSCI,
using MSG_MORE to avoid sending two small segments in separate frames.

[1] 00:32:35.726568 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 
145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], 
length 48
[2] 00:32:35.838074 IP 10.101.0.13.27778 > 10.101.99.5.3260: Flags [.], ack 
193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
[3] 00:32:35.838099 IP 10.101.99.5.3260 > 10.101.0.13.27778: Flags [P.], seq 
193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], 
length 512

[1] & [3] could be coalesced, and [2] would be avoided.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
> ("tcp: autocork should not hold first packet in write queue")
> in your tree.

confirmed:

(node-62) [~/work/linux-2.6] git show a181ceb501b31b4bf8812a5c84c716cc31d82c2d 
| head
commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
Author: Eric Dumazet 
Date:   Tue Dec 17 09:58:30 2013 -0800

tcp: autocork should not hold first packet in write queue

Willem noticed a TCP_RR regression caused by TCP autocorking
on a Mellanox test bed. MLX4_EN_TX_COAL_TIME is 16 us, which can be
right above RTT between hosts.

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> > tcp corking kills iSCSI performance

> Here is the combined patch, could you test it?

the patch did not apply, so I edited by hand. Here is the resulting
patch:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b8..40d1958 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
if ((1 << sk->sk_state) &
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-   tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+   tcp_write_xmit(sk, tcp_current_mss(sk), 
tcp_sk(sk)->nonagle,
+  0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,16 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
 
if (atomic_read(>sk_wmem_alloc) > limit) {
set_bit(TSQ_THROTTLED, >tsq_flags);
-   break;
+   /* It is possible TX completion already happened
+* before we set TSQ_THROTTLED, so we must
+* test again the condition.
+* We abuse smp_mb__after_clear_bit() because
+* there is no smp_mb__after_set_bit() yet
+*/
+   smp_mb__after_clear_bit();
+   if (atomic_read(>sk_wmem_alloc) > limit)
+   break;
+
}
 
limit = mss_now;

-- cut here --

It fixes my case but if you look at the round trip time it is not even
close what it used to be. So while this fixes my problem I'm still for
disabling it by default.

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Cheers,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
> Here is the combined patch, could you test it ?

Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
("tcp: autocork should not hold first packet in write queue")
in your tree.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 10:38 +0100, Thomas Glanzmann wrote:
> Hello Eric,
> 
> [RESEND: the time it took the VMFS was created was switched between
> on/off so with on it took over 2 minutes with off it took less than 4
> seconds]
> 
> [RESEND 2: The throughput graphs were switched as well ;-(]
> 
> > * Thomas Glanzmann  [2014-02-07 08:55]:
> > > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > > and 15 minutes on 3.14.0-rc2+.
> 
> * Nicholas A. Bellinger  [2014-02-07 20:30]:
> > Would it be possible to try a couple of different stable kernel
> > versions to help track this down?
> 
> I bisected[1] it and found the offending commit f54b311 tcp auto corking
> [2] 'if we have a small send and a previous packet is already in the
> qdisc or device queue, defer until TX completion or we get more data.'
> - Description by David S. Miller
> 
> I gathered a pcap with tcp_autocorking on and off.
> 
> On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png
> 
> Off: - took 4 seconds to create a 500 GB VMFS file system
> sysctl net.ipv4.tcp_autocorking=0
> https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
> https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png
> 
> First graph can be generated by opening bunziping the file, opening it
> in wireshark and select Statistics > IO Grap and change the unit to
> Bytes/Tick. The second graph can be generated by selecting Statistics >
> TCP Stream Graph > Round Trip Time.
> 
> You can also see that the round trip time increases by factor 25 at
> least.
> 
> I once saw a similar problem with dealyed ACK packets of the
> paravirtulized network driver in xen it caused that the tcp window
> filled up and slowed down the throughput from 30 MB/s to less than 100
> KB/s the symptom was that the login to a Windows desktop took more than
> 10 minutes while it used to be below 30 seconds because the profile of
> the user was loaded slowly from a CIFS server. At that time the culprit
> were also delayed small packets: ACK packets in the CIFS case. However I
> only proofed iSCSI regression so far for tcp auto corking but assume we
> will see many others if we leave it enabled.
> 
> I found the problem by doing the following:
> - I compiled kernel by executing the following commands:
> yes '' | make oldconfig
> time make -j 24
> / make modules_install
> / mkinitramfs -o /boot/initrd.img-bisect 
> 
> - I cleaned the iSCSI configuration after each test by issuing:
> /etc/init.d/target stop
> rm /iscsi?/* /etc/target/*
> 
> - I configured iSCSI after each reboot
> cat > lio-v101.conf < set global auto_cd_after_create=false
> /backstores/fileio create shared-01.v101.campusvl.de 
> /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true
> 
> /iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
>  create 10.101.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
>  create 10.101.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
>  create 10.102.99.4
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
>  create 10.102.99.5
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns
>  create /backstores/fileio/shared-01.v101.campusvl.de lun=10
> /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ 
> set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 
> cache_dynamic_acls=1
> 
> saveconfig
> yes
> EOF
> targetcli < lio-v101.conf
> And configured a fresh booted ESXi 5.5.0 1331820 via 
> autodeploy
> to the iSCSI target, configured the portal, rescanned and
> created a 500 GB VMFS 5 filesystem and noticed the time if it
> was longer than 2 minutes it was bad if it was below 10 
> seconds
> it was good.
> git bisect good/bad
> 
> My network config is:
> 
> auto bond0
> iface bond0 inet static
>address 10.100.4.62
>netmask 255.255.0.0
>gateway 10.100.0.1
>slaves eth0 eth1
>bond-mode 802.3ad
>bond-miimon 100
> 
> auto bond0.101
> iface bond0.101 inet static
>address 10.101.99.4
>netmask 255.255.0.0
> 
> auto bond1
> iface bond1 inet static
>address 10.100.5.62
>netmask 255.255.0.0
>slaves eth2 eth3
>bond-mode 802.3ad
> 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

[RESEND 2: The throughput graphs were switched as well ;-(]

> * Thomas Glanzmann  [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger  [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect 

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat > lio-v101.conf 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

> * Thomas Glanzmann  [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger  [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect 

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat > lio-v101.conf 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

> * Thomas Glanzmann  [2014-02-07 08:55]:
> > Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
> > and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger  [2014-02-07 20:30]:
> Would it be possible to try a couple of different stable kernel
> versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 4 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics > IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics >
TCP Stream Graph > Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect 

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat > lio-v101.conf 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 * Thomas Glanzmann tho...@glanzmann.de [2014-02-07 08:55]:
  Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
  and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger n...@linux-iscsi.org [2014-02-07 20:30]:
 Would it be possible to try a couple of different stable kernel
 versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 4 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics  IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics 
TCP Stream Graph  Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect version

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat  lio-v101.conf EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de 
/iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns
 create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ 
set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 
cache_dynamic_acls=1

saveconfig
yes
EOF
targetcli  lio-v101.conf
And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
to the iSCSI target, configured the portal, rescanned and
created a 500 GB VMFS 5 filesystem and noticed the time if it
was longer than 2 minutes it was bad if it was below 10 seconds
it was good.
git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
   address 10.100.4.62
   netmask 255.255.0.0
   gateway 10.100.0.1
   slaves eth0 eth1
   bond-mode 802.3ad
   bond-miimon 100

auto bond0.101
iface bond0.101 inet static
   address 10.101.99.4
   netmask 255.255.0.0

auto bond1
iface bond1 inet static
   address 10.100.5.62
   netmask 255.255.0.0
   slaves eth2 eth3
   bond-mode 802.3ad
   bond-miimon 100

auto bond1.101
iface bond1.101 inet static
   address 10.101.99.5
   netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto corking we probably should do
that for the iSCSI target, but I don't know how. Anyone?

[1] http://pbot.rmdir.de/a65q6MjgV36tZnn5jS-DUQ
[2] 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

 * Thomas Glanzmann tho...@glanzmann.de [2014-02-07 08:55]:
  Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
  and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger n...@linux-iscsi.org [2014-02-07 20:30]:
 Would it be possible to try a couple of different stable kernel
 versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics  IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics 
TCP Stream Graph  Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect version

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat  lio-v101.conf EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de 
/iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns
 create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ 
set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 
cache_dynamic_acls=1

saveconfig
yes
EOF
targetcli  lio-v101.conf
And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
to the iSCSI target, configured the portal, rescanned and
created a 500 GB VMFS 5 filesystem and noticed the time if it
was longer than 2 minutes it was bad if it was below 10 seconds
it was good.
git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
   address 10.100.4.62
   netmask 255.255.0.0
   gateway 10.100.0.1
   slaves eth0 eth1
   bond-mode 802.3ad
   bond-miimon 100

auto bond0.101
iface bond0.101 inet static
   address 10.101.99.4
   netmask 255.255.0.0

auto bond1
iface bond1 inet static
   address 10.100.5.62
   netmask 255.255.0.0
   slaves eth2 eth3
   bond-mode 802.3ad
   bond-miimon 100

auto bond1.101
iface bond1.101 inet static
   address 10.101.99.5
   netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the commit mentions
that applications can explicitly disable auto 

REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: the time it took the VMFS was created was switched between
on/off so with on it took over 2 minutes with off it took less than 4
seconds]

[RESEND 2: The throughput graphs were switched as well ;-(]

 * Thomas Glanzmann tho...@glanzmann.de [2014-02-07 08:55]:
  Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
  and 15 minutes on 3.14.0-rc2+.

* Nicholas A. Bellinger n...@linux-iscsi.org [2014-02-07 20:30]:
 Would it be possible to try a couple of different stable kernel
 versions to help track this down?

I bisected[1] it and found the offending commit f54b311 tcp auto corking
[2] 'if we have a small send and a previous packet is already in the
qdisc or device queue, defer until TX completion or we get more data.'
- Description by David S. Miller

I gathered a pcap with tcp_autocorking on and off.

On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png

Off: - took 4 seconds to create a 500 GB VMFS file system
sysctl net.ipv4.tcp_autocorking=0
https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png

First graph can be generated by opening bunziping the file, opening it
in wireshark and select Statistics  IO Grap and change the unit to
Bytes/Tick. The second graph can be generated by selecting Statistics 
TCP Stream Graph  Round Trip Time.

You can also see that the round trip time increases by factor 25 at
least.

I once saw a similar problem with dealyed ACK packets of the
paravirtulized network driver in xen it caused that the tcp window
filled up and slowed down the throughput from 30 MB/s to less than 100
KB/s the symptom was that the login to a Windows desktop took more than
10 minutes while it used to be below 30 seconds because the profile of
the user was loaded slowly from a CIFS server. At that time the culprit
were also delayed small packets: ACK packets in the CIFS case. However I
only proofed iSCSI regression so far for tcp auto corking but assume we
will see many others if we leave it enabled.

I found the problem by doing the following:
- I compiled kernel by executing the following commands:
yes '' | make oldconfig
time make -j 24
/ make modules_install
/ mkinitramfs -o /boot/initrd.img-bisect version

- I cleaned the iSCSI configuration after each test by issuing:
/etc/init.d/target stop
rm /iscsi?/* /etc/target/*

- I configured iSCSI after each reboot
cat  lio-v101.conf EOF
set global auto_cd_after_create=false
/backstores/fileio create shared-01.v101.campusvl.de 
/iscsi1/shared-01.v101.campusvl.de size=500G buffered=true

/iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.101.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.4
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
 create 10.102.99.5
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns
 create /backstores/fileio/shared-01.v101.campusvl.de lun=10
/iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ 
set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 
cache_dynamic_acls=1

saveconfig
yes
EOF
targetcli  lio-v101.conf
And configured a fresh booted ESXi 5.5.0 1331820 via autodeploy
to the iSCSI target, configured the portal, rescanned and
created a 500 GB VMFS 5 filesystem and noticed the time if it
was longer than 2 minutes it was bad if it was below 10 seconds
it was good.
git bisect good/bad

My network config is:

auto bond0
iface bond0 inet static
   address 10.100.4.62
   netmask 255.255.0.0
   gateway 10.100.0.1
   slaves eth0 eth1
   bond-mode 802.3ad
   bond-miimon 100

auto bond0.101
iface bond0.101 inet static
   address 10.101.99.4
   netmask 255.255.0.0

auto bond1
iface bond1 inet static
   address 10.100.5.62
   netmask 255.255.0.0
   slaves eth2 eth3
   bond-mode 802.3ad
   bond-miimon 100

auto bond1.101
iface bond1.101 inet static
   address 10.101.99.5
   netmask 255.255.0.0

I propose to disable tcp_autocorking by default because it obviously degrades
iSCSI performance and probably many other protocols. Also the 

Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 10:38 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 
 [RESEND: the time it took the VMFS was created was switched between
 on/off so with on it took over 2 minutes with off it took less than 4
 seconds]
 
 [RESEND 2: The throughput graphs were switched as well ;-(]
 
  * Thomas Glanzmann tho...@glanzmann.de [2014-02-07 08:55]:
   Creating a 4 TB VMFS filesystem over iSCSI takes 24 seconds on 3.12
   and 15 minutes on 3.14.0-rc2+.
 
 * Nicholas A. Bellinger n...@linux-iscsi.org [2014-02-07 20:30]:
  Would it be possible to try a couple of different stable kernel
  versions to help track this down?
 
 I bisected[1] it and found the offending commit f54b311 tcp auto corking
 [2] 'if we have a small send and a previous packet is already in the
 qdisc or device queue, defer until TX completion or we get more data.'
 - Description by David S. Miller
 
 I gathered a pcap with tcp_autocorking on and off.
 
 On: - took 2 minutes 24 seconds to create a 500 GB VMFS file system
 https://thomas.glanzmann.de/tmp/tcp_auto_corking_on.pcap.bz2
 https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:46:34.png
 https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:52:28.png
 
 Off: - took 4 seconds to create a 500 GB VMFS file system
 sysctl net.ipv4.tcp_autocorking=0
 https://thomas.glanzmann.de/tmp/tcp_auto_corking_off.pcap.bz2
 https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:45:43.png
 https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-09:53:17.png
 
 First graph can be generated by opening bunziping the file, opening it
 in wireshark and select Statistics  IO Grap and change the unit to
 Bytes/Tick. The second graph can be generated by selecting Statistics 
 TCP Stream Graph  Round Trip Time.
 
 You can also see that the round trip time increases by factor 25 at
 least.
 
 I once saw a similar problem with dealyed ACK packets of the
 paravirtulized network driver in xen it caused that the tcp window
 filled up and slowed down the throughput from 30 MB/s to less than 100
 KB/s the symptom was that the login to a Windows desktop took more than
 10 minutes while it used to be below 30 seconds because the profile of
 the user was loaded slowly from a CIFS server. At that time the culprit
 were also delayed small packets: ACK packets in the CIFS case. However I
 only proofed iSCSI regression so far for tcp auto corking but assume we
 will see many others if we leave it enabled.
 
 I found the problem by doing the following:
 - I compiled kernel by executing the following commands:
 yes '' | make oldconfig
 time make -j 24
 / make modules_install
 / mkinitramfs -o /boot/initrd.img-bisect version
 
 - I cleaned the iSCSI configuration after each test by issuing:
 /etc/init.d/target stop
 rm /iscsi?/* /etc/target/*
 
 - I configured iSCSI after each reboot
 cat  lio-v101.conf EOF
 set global auto_cd_after_create=false
 /backstores/fileio create shared-01.v101.campusvl.de 
 /iscsi1/shared-01.v101.campusvl.de size=500G buffered=true
 
 /iscsi create iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
  create 10.101.99.4
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
  create 10.101.99.5
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
  create 10.102.99.4
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/portals
  create 10.102.99.5
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/luns
  create /backstores/fileio/shared-01.v101.campusvl.de lun=10
 /iscsi/iqn.2013-03.de.campusvl.v101.storage:shared-01.v101.campusvl.de/tpgt1/ 
 set attribute authentication=0 demo_mode_write_protect=0 generate_node_acls=1 
 cache_dynamic_acls=1
 
 saveconfig
 yes
 EOF
 targetcli  lio-v101.conf
 And configured a fresh booted ESXi 5.5.0 1331820 via 
 autodeploy
 to the iSCSI target, configured the portal, rescanned and
 created a 500 GB VMFS 5 filesystem and noticed the time if it
 was longer than 2 minutes it was bad if it was below 10 
 seconds
 it was good.
 git bisect good/bad
 
 My network config is:
 
 auto bond0
 iface bond0 inet static
address 10.100.4.62
netmask 255.255.0.0
gateway 10.100.0.1
slaves eth0 eth1
bond-mode 802.3ad
bond-miimon 100
 
 auto bond0.101
 iface bond0.101 inet static
address 10.101.99.4
netmask 255.255.0.0
 
 auto bond1
 iface bond1 inet static
address 10.100.5.62
netmask 255.255.0.0
slaves eth2 eth3
bond-mode 802.3ad
bond-miimon 100
 
 auto bond1.101
 iface bond1.101 inet static

Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
 Here is the combined patch, could you test it ?

Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
(tcp: autocork should not hold first packet in write queue)
in your tree.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

  tcp corking kills iSCSI performance

 Here is the combined patch, could you test it?

the patch did not apply, so I edited by hand. Here is the resulting
patch:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b8..40d1958 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -698,7 +698,8 @@ static void tcp_tsq_handler(struct sock *sk)
if ((1  sk-sk_state) 
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
 TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-   tcp_write_xmit(sk, tcp_current_mss(sk), 0, 0, GFP_ATOMIC);
+   tcp_write_xmit(sk, tcp_current_mss(sk), 
tcp_sk(sk)-nonagle,
+  0, GFP_ATOMIC);
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -1904,7 +1905,16 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
 
if (atomic_read(sk-sk_wmem_alloc)  limit) {
set_bit(TSQ_THROTTLED, tp-tsq_flags);
-   break;
+   /* It is possible TX completion already happened
+* before we set TSQ_THROTTLED, so we must
+* test again the condition.
+* We abuse smp_mb__after_clear_bit() because
+* there is no smp_mb__after_set_bit() yet
+*/
+   smp_mb__after_clear_bit();
+   if (atomic_read(sk-sk_wmem_alloc)  limit)
+   break;
+
}
 
limit = mss_now;

-- cut here --

It fixes my case but if you look at the round trip time it is not even
close what it used to be. So while this fixes my problem I'm still for
disabling it by default.

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
 (tcp: autocork should not hold first packet in write queue)
 in your tree.

confirmed:

(node-62) [~/work/linux-2.6] git show a181ceb501b31b4bf8812a5c84c716cc31d82c2d 
| head
commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
Author: Eric Dumazet eduma...@google.com
Date:   Tue Dec 17 09:58:30 2013 -0800

tcp: autocork should not hold first packet in write queue

Willem noticed a TCP_RR regression caused by TCP autocorking
on a Mellanox test bed. MLX4_EN_TX_COAL_TIME is 16 us, which can be
right above RTT between hosts.

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
 On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
  Here is the combined patch, could you test it ?
 
 Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
 (tcp: autocork should not hold first packet in write queue)
 in your tree.
 
 

BTW this problem demonstrates there is room for improvement in iCSCI,
using MSG_MORE to avoid sending two small segments in separate frames.

[1] 00:32:35.726568 IP 10.101.99.5.3260  10.101.0.13.27778: Flags [P.], seq 
145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], 
length 48
[2] 00:32:35.838074 IP 10.101.0.13.27778  10.101.99.5.3260: Flags [.], ack 
193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
[3] 00:32:35.838099 IP 10.101.99.5.3260  10.101.0.13.27778: Flags [P.], seq 
193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], 
length 512

[1]  [3] could be coalesced, and [2] would be avoided.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:
 Hello Eric,

 
 It fixes my case but if you look at the round trip time it is not even
 close what it used to be. So while this fixes my problem I'm still for
 disabling it by default.
 
 https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2
 https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-14:36:25.png

Very nice.

Now we have to check your NIC and how TX completion is performed.

What is your NIC model and driver ?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 What is your NIC model and driver?

I have four Intel Corporation I350 Gigabit Network Connection (rev 01).

(node-62) [~/work/linux-2.6] lspci -v | pbot
http://pbot.rmdir.de/rgu6yHMBDVQpflMmbcJACg
(node-62) [~/work/linux-2.6] ip a s | pbot
http://pbot.rmdir.de/xJjRT8u-ekC6mrWgl09ZtQ
(node-62) [~/work/linux-2.6] dmesg | pbot
http://pbot.rmdir.de/MigrSPtxGmp0fI1CRgXsHw

I do 802.3ad link aggregation layer 2 hash with two network cards to one
switch.

I'm running:
Linux node-62 3.14.0-rc1+ #23 SMP Sat Feb 8 14:27:47 CET 2014 x86_64 GNU/Linux

Driver: igb

If you need remote access to the machine let me know.

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 14:37 +0100, Thomas Glanzmann wrote:

 
 It fixes my case but if you look at the round trip time it is not even
 close what it used to be. So while this fixes my problem I'm still for
 disabling it by default.
 
 https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched.pcap.bz2

This pcap was taken on which host ?

10.101.99.5 or  10.101.0.13 ?




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

[RESEND: dropped CC accidently]

 10.101.99.5 or 10.101.0.13?

10.101.99.5 (iSCSI Target)

tcpdump -i bond0.101 -s 0 -w /tmp/tcp_auto_corking_on_patched.pcap host 
esx-03.v101.campusvl.de

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 05:50 -0800, Eric Dumazet wrote:
 On Sat, 2014-02-08 at 05:33 -0800, Eric Dumazet wrote:
  On Sat, 2014-02-08 at 05:14 -0800, Eric Dumazet wrote:
   Here is the combined patch, could you test it ?
  
  Also make sure you have commit a181ceb501b31b4bf8812a5c84c716cc31d82c2d
  (tcp: autocork should not hold first packet in write queue)
  in your tree.
  
  
 
 BTW this problem demonstrates there is room for improvement in iCSCI,
 using MSG_MORE to avoid sending two small segments in separate frames.
 
 [1] 00:32:35.726568 IP 10.101.99.5.3260  10.101.0.13.27778: Flags [P.], seq 
 145:193, ack 144, win 235, options [nop,nop,TS val 4294960733 ecr 385385], 
 length 48
 [2] 00:32:35.838074 IP 10.101.0.13.27778  10.101.99.5.3260: Flags [.], ack 
 193, win 514, options [nop,nop,TS val 385396 ecr 4294960733], length 0
 [3] 00:32:35.838099 IP 10.101.99.5.3260  10.101.0.13.27778: Flags [P.], seq 
 193:705, ack 144, win 235, options [nop,nop,TS val 4294960761 ecr 385396], 
 length 512
 
 [1]  [3] could be coalesced, and [2] would be avoided.
 

With the fix, new pcap is more explicit about this suboptimal behavior :

05:34:16.280900 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 54353, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.280949 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.280982 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281000 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512

05:34:16.281107 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 54913, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281157 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.281190 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281208 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512

05:34:16.281337 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 55473, 
win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
05:34:16.281390 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], 
length 48

05:34:16.281423 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 48
05:34:16.281440 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
1732452], length 512



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

  BTW this problem demonstrates there is room for improvement in iCSCI,
  using MSG_MORE to avoid sending two small segments in separate frames.

 With the fix, new pcap is more explicit about this suboptimal behavior :

 05:34:16.280900 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 
 54353, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
 05:34:16.280949 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
 5328:5376, ack 54353, win 514, options [nop,nop,TS val 1732452 ecr 
 4294935370], length 48

 05:34:16.280982 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 54353:54401, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 48
 05:34:16.281000 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 54401:54913, ack 5376, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 512

 05:34:16.281107 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 
 54913, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
 05:34:16.281157 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
 5376:5424, ack 54913, win 514, options [nop,nop,TS val 1732452 ecr 
 4294935370], length 48

 05:34:16.281190 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 54913:54961, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 48
 05:34:16.281208 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 54961:55473, ack 5424, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 512

 05:34:16.281337 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [.], ack 
 55473, win 514, options [nop,nop,TS val 1732452 ecr 4294935370], length 0
 05:34:16.281390 IP 10.101.0.13.41531  10.101.99.5.3260: Flags [P.], seq 
 5424:5472, ack 55473, win 514, options [nop,nop,TS val 1732452 ecr 
 4294935370], length 48

 05:34:16.281423 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 55473:55521, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 48
 05:34:16.281440 IP 10.101.99.5.3260  10.101.0.13.41531: Flags [P.], seq 
 55521:56033, ack 5472, win 235, options [nop,nop,TS val 4294935370 ecr 
 1732452], length 512

I get the idea. However I'm a little bit confused, when I do a 'git grep
MSG_MORE' I don't see much references in the Linux kernel who use it at
all. So do you have an example for me where this flags needs to be
applied?

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 15:19 +0100, Thomas Glanzmann wrote:
 Hello Eric,

 I get the idea. However I'm a little bit confused, when I do a 'git grep
 MSG_MORE' I don't see much references in the Linux kernel who use it at
 all. So do you have an example for me where this flags needs to be
 applied?

Idea would be to set this flag when calling sendmsg() of the 48 bytes of
the header, and not set it on the sendmsg() of the 512 bytes of the
payload.

iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
it would be nice to add a new _initial_ flags parameter to
iscsi_sw_tcp_xmit_segment()



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 Idea would be to set this flag when calling sendmsg() of the 48 bytes
 of the header, and not set it on the sendmsg() of the 512 bytes of the
 payload.

I see.

 iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
 it would be nice to add a new _initial_ flags parameter to
 iscsi_sw_tcp_xmit_segment()

This is for the iscsi initiator implementation. I'm interested in iSCSI
target code, but I already found it and experiemented a little bit, but
I need to dig deeper if I want to prepare a patch.

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 16:00 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 
  Idea would be to set this flag when calling sendmsg() of the 48 bytes
  of the header, and not set it on the sendmsg() of the 512 bytes of the
  payload.
 
 I see.
 
  iscsi_sw_tcp_xmit_segment() already adds MSG_MORE, but
  it would be nice to add a new _initial_ flags parameter to
  iscsi_sw_tcp_xmit_segment()
 
 This is for the iscsi initiator implementation. I'm interested in iSCSI
 target code, but I already found it and experiemented a little bit, but
 I need to dig deeper if I want to prepare a patch.

Fantastic !

Let me know if you want some help.

Note : We did some patches in the MSG_MORE logic for sendpage(), but in
your case I do not think its related

(git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 Note : We did some patches in the MSG_MORE logic for sendpage(), but
 in your case I do not think its related
 (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious

thank you for the pointer. The iSCSI target code actually uses sendpage
whenever it can.

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 17:57 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 
  Note : We did some patches in the MSG_MORE logic for sendpage(), but
  in your case I do not think its related
  (git grep -n MSG_SENDPAGE_NOTLAST ) if you are curious
 
 thank you for the pointer. The iSCSI target code actually uses sendpage
 whenever it can.

Yep, but the problem (at least on your pcap), is about sending the 48
bytes headers in  TCP segment of its own, then the 512 byte payload in a
separate segment.

I suspect the sendpage() is only used for the payload. No need for
MSG_MORE here.

The MSG_MORE would need to be set on the first part (48 bytes header),
so that TCP stack will defer the push of the segment at the time the 512
bytes payload is added.




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 Yep, but the problem (at least on your pcap), is about sending the 48
 bytes headers in  TCP segment of its own, then the 512 byte payload in
 a separate segment.

I agree.

 I suspect the sendpage() is only used for the payload. No need for
 MSG_MORE here.

I see.

 The MSG_MORE would need to be set on the first part (48 bytes header),
 so that TCP stack will defer the push of the segment at the time the 512
 bytes payload is added.

The iSCSI target uses one function to send all outbound data. So in
order to do it right every function that is sending data in multiple
chunks need to mark it correctly. Of course someone could also do some
wild guessing and saying that everything that is below 512 Bytes gets
pushed out. I wonder what Nab has to say about this?

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 18:15 +0100, Thomas Glanzmann wrote:

 The iSCSI target uses one function to send all outbound data. So in
 order to do it right every function that is sending data in multiple
 chunks need to mark it correctly. Of course someone could also do some
 wild guessing and saying that everything that is below 512 Bytes gets
 pushed out. I wonder what Nab has to say about this?

I was simply thinking about something like :
(might need further changes, but I guess this should solve your case)

diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index 0819e688a398..44f0d62a88d6 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1165,7 +1165,7 @@ send_data:
iov_count = cmd-iov_misc_count;
}
 
-   tx_sent = tx_data(conn, iov[0], iov_count, tx_size);
+   tx_sent = tx_data(conn, iov[0], iov_count, tx_size, 0);
if (tx_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1196,7 +1196,8 @@ send_hdr:
iov.iov_base = cmd-pdu;
iov.iov_len = tx_hdr_size;
 
-   tx_sent = tx_data(conn, iov, 1, tx_hdr_size);
+   data_len = cmd-tx_size - tx_hdr_size - cmd-padding;
+   tx_sent = tx_data(conn, iov, 1, tx_hdr_size, data_len ? MSG_MORE : 0);
if (tx_hdr_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1205,7 +1206,6 @@ send_hdr:
return -1;
}
 
-   data_len = cmd-tx_size - tx_hdr_size - cmd-padding;
/*
 * Set iov_off used by padding and data digest tx_data() calls below
 * in order to determine proper offset into cmd-iov_data[]
@@ -1249,7 +1249,8 @@ send_padding:
if (cmd-padding) {
struct kvec *iov_p = cmd-iov_data[iov_off++];
 
-   tx_sent = tx_data(conn, iov_p, 1, cmd-padding);
+   tx_sent = tx_data(conn, iov_p, 1, cmd-padding,
+ conn-conn_ops-DataDigest ? MSG_MORE : 0);
if (cmd-padding != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1263,7 +1264,7 @@ send_datacrc:
if (conn-conn_ops-DataDigest) {
struct kvec *iov_d = cmd-iov_data[iov_off];
 
-   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
+   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
if (ISCSI_CRC_LEN != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1349,11 +1350,12 @@ static int iscsit_do_rx_data(
 
 static int iscsit_do_tx_data(
struct iscsi_conn *conn,
-   struct iscsi_data_count *count)
+   struct iscsi_data_count *count,
+   int flags)
 {
int data = count-data_length, total_tx = 0, tx_loop = 0, iov_len;
struct kvec *iov_p;
-   struct msghdr msg;
+   struct msghdr msg = { .msg_flags = flags };
 
if (!conn || !conn-sock || !conn-conn_ops)
return -1;
@@ -1363,8 +1365,6 @@ static int iscsit_do_tx_data(
return -1;
}
 
-   memset(msg, 0, sizeof(struct msghdr));
-
iov_p = count-iov;
iov_len = count-iov_count;
 
@@ -1408,7 +1408,8 @@ int tx_data(
struct iscsi_conn *conn,
struct kvec *iov,
int iov_count,
-   int data)
+   int data,
+   int flags)
 {
struct iscsi_data_count c;
 
@@ -1421,7 +1422,7 @@ int tx_data(
c.data_length = data;
c.type = ISCSI_TX_DATA;
 
-   return iscsit_do_tx_data(conn, c);
+   return iscsit_do_tx_data(conn, c, flags);
 }
 
 void iscsit_collect_login_stats(
diff --git a/drivers/target/iscsi/iscsi_target_util.h 
b/drivers/target/iscsi/iscsi_target_util.h
index e4fc34a02f57..1b4f06801adc 100644
--- a/drivers/target/iscsi/iscsi_target_util.h
+++ b/drivers/target/iscsi/iscsi_target_util.h
@@ -54,7 +54,7 @@ extern int iscsit_print_dev_to_proc(char *, char **, off_t, 
int);
 extern int iscsit_print_sessions_to_proc(char *, char **, off_t, int);
 extern int iscsit_print_tpg_to_proc(char *, char **, off_t, int);
 extern int rx_data(struct iscsi_conn *, struct kvec *, int, int);
-extern int tx_data(struct iscsi_conn *, struct kvec *, int, int);
+extern int tx_data(struct iscsi_conn *, struct kvec *, int, int, int);
 extern void iscsit_collect_login_stats(struct iscsi_conn *, u8, u8);
 extern struct iscsi_tiqn *iscsit_snmp_get_tiqn(struct iscsi_conn *);
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 I was simply thinking about something like :
 (might need further changes, but I guess this should solve your case)

thank you for your patch. It did not apply on top of Linux tip, so I put
in the changes manually and fixed up another call to tx_data that your
forgot in your initial patch to make it apply.

I gave it another run, can you confirm that it now behaves better?

https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2

And look at that roundtrip graph it is perfect. Also filesystem is now
created in 3 seconds instead of 4.

https://thomas.glanzmann.de/tmp/screenshot-mini-2014-02-08-22:34:57.png

Nab, do you consider this patch for upstream? Would you take if I clean
it up?

Cheers,
Thomas

PS: I'm asleep for the next 8 hours.

diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index e655b04..0eb9681 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1168,7 +1168,7 @@ send_data:
iov_count = cmd-iov_misc_count;
}
 
-   tx_sent = tx_data(conn, iov[0], iov_count, tx_size);
+   tx_sent = tx_data(conn, iov[0], iov_count, tx_size, 0);
if (tx_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1199,7 +1199,8 @@ send_hdr:
iov.iov_base = cmd-pdu;
iov.iov_len = tx_hdr_size;
 
-   tx_sent = tx_data(conn, iov, 1, tx_hdr_size);
+   data_len = cmd-tx_size - tx_hdr_size - cmd-padding;
+tx_sent = tx_data(conn, iov, 1, tx_hdr_size, data_len ? MSG_MORE : 0);
if (tx_hdr_size != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1208,7 +1209,6 @@ send_hdr:
return -1;
}
 
-   data_len = cmd-tx_size - tx_hdr_size - cmd-padding;
/*
 * Set iov_off used by padding and data digest tx_data() calls below
 * in order to determine proper offset into cmd-iov_data[]
@@ -1252,7 +1252,8 @@ send_padding:
if (cmd-padding) {
struct kvec *iov_p = cmd-iov_data[iov_off++];
 
-   tx_sent = tx_data(conn, iov_p, 1, cmd-padding);
+   tx_sent = tx_data(conn, iov_p, 1, cmd-padding,
+ conn-conn_ops-DataDigest ? MSG_MORE : 0);
if (cmd-padding != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1266,7 +1267,7 @@ send_datacrc:
if (conn-conn_ops-DataDigest) {
struct kvec *iov_d = cmd-iov_data[iov_off];
 
-   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN);
+   tx_sent = tx_data(conn, iov_d, 1, ISCSI_CRC_LEN, 0);
if (ISCSI_CRC_LEN != tx_sent) {
if (tx_sent == -EAGAIN) {
pr_err(tx_data() returned -EAGAIN\n);
@@ -1352,11 +1353,13 @@ static int iscsit_do_rx_data(
 
 static int iscsit_do_tx_data(
struct iscsi_conn *conn,
-   struct iscsi_data_count *count)
+   struct iscsi_data_count *count,
+   int flags)
 {
int data = count-data_length, total_tx = 0, tx_loop = 0, iov_len;
struct kvec *iov_p;
struct msghdr msg;
+struct msghdr msg = { .msg_flags = flags };
 
if (!conn || !conn-sock || !conn-conn_ops)
return -1;
@@ -1366,8 +1369,6 @@ static int iscsit_do_tx_data(
return -1;
}
 
-   memset(msg, 0, sizeof(struct msghdr));
-
iov_p = count-iov;
iov_len = count-iov_count;
 
@@ -1411,7 +1412,8 @@ int tx_data(
struct iscsi_conn *conn,
struct kvec *iov,
int iov_count,
-   int data)
+   int data,
+   int flags)
 {
struct iscsi_data_count c;
 
@@ -1424,7 +1426,7 @@ int tx_data(
c.data_length = data;
c.type = ISCSI_TX_DATA;
 
-   return iscsit_do_tx_data(conn, c);
+   return iscsit_do_tx_data(conn, c, flags);
 }
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Eric Dumazet
On Sat, 2014-02-08 at 22:36 +0100, Thomas Glanzmann wrote:
 Hello Eric,
 
  I was simply thinking about something like :
  (might need further changes, but I guess this should solve your case)
 
 thank you for your patch. It did not apply on top of Linux tip, so I put
 in the changes manually and fixed up another call to tx_data that your
 forgot in your initial patch to make it apply.
 
 I gave it another run, can you confirm that it now behaves better?
 
 https://thomas.glanzmann.de/tmp/tcp_auto_corking_on_patched_tcp_more.pcap.bz2
 
 And look at that roundtrip graph it is perfect. Also filesystem is now
 created in 3 seconds instead of 4.

Yes, this is much better : 2 frames per request/response, instead of 4.

13:32:04.665367 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
384:432, ack 2529, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.665483 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
2529:3089, ack 432, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.665642 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
432:480, ack 3089, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.665756 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
3089:3649, ack 480, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.665933 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
480:528, ack 3649, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666046 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
3649:4209, ack 528, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.666214 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
528:576, ack 4209, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666333 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
4209:4769, ack 576, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.78 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
576:624, ack 4769, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.666790 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
4769:5329, ack 624, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.666983 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
624:672, ack 5329, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667097 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
5329:5889, ack 672, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.667280 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
672:720, ack 5889, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667324 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
5889:6449, ack 720, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560
13:32:04.667500 IP 10.101.0.12.43418  10.101.99.5.3260: Flags [P.], seq 
720:768, ack 6449, win 514, options [nop,nop,TS val 1576981 ecr 4294913967], 
length 48
13:32:04.667540 IP 10.101.99.5.3260  10.101.0.12.43418: Flags [P.], seq 
6449:7009, ack 768, win 235, options [nop,nop,TS val 4294913967 ecr 1576981], 
length 560


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: REGRESSION f54b311142a92ea2e42598e347b84e1655caf8e3 tcp auto corking slows down iSCSI file system creation by factor of 70 [WAS: 4 TB VMFS creation takes 15 minutes vs 26 seconds]

2014-02-08 Thread Thomas Glanzmann
Hello Eric,

 Yes, this is much better : 2 frames per request/response, instead of 4.

perfect. I send out the page to the iscsi target list in your name since
you did the work and I added me as signed off I hope that is how it is
handled or should I have added my name to the from line and mentioned in
the description of the patch that you did the heavy lifting?

Cheers,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/