Re: [E1000-devel] i40e card Tx resets

2016-03-20 Thread zhuyj

On 03/17/2016 10:20 AM, zhuyj wrote:

On 03/16/2016 10:36 PM, Sowmini Varadhan wrote:

On (03/16/16 19:46), zhuyj wrote:


It is busy today. Tomorrow I will share the steps about pktgen tools.

Ok. I can give that a try on my machine. It would be good to
have some way to reproduce this as simply as possible, maybe
we can take the discussion to netdev to see if others there
have experienced this.

--Sowmini


1. modprobe NET_PKTGEN

2. download the tar file and uncompress to any directory.
This tar file is from kernel. It is in samples/pktgen/

Sorry. The tar file is in the attachment. Please check it.

Zhu Yanjun


3. cd pktgen

4. pktgen_sample02_multiqueue.sh -i ethx -s size -t cpu_number

If size is set to a big number, the similar defect will occur.
Adjust this size to a appropriate number, my defect will not occur.

In the test, I found some types igb nic, such as i210, will work well 
no matter the size is a big number.

some nic, such as 82580, it will not work well if the size is too big.

As such, I think my problem results from the hardware and the big size 
triggers this problem.


I hope this can help us all.

Zhu Yanjun



--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-19 Thread zhuyj
On 03/16/2016 11:25 AM, Sowmini Varadhan wrote:
> On (03/16/16 11:19), zhuyj wrote:
>> Thanks for your reply.
>> Yesterday I made tests with rds-tools. It is very pity that I can
>> not reproduce my problem with rds-tools.
>>
>> But with pktgen tools, I can reproduce this problem easily. And I
>> found that if I set the packet size to 17792 or less than this size,
>> this problem would not occur. But if I set the packet size > 17792,
>> for example 17793, my problem would occur.
> I think it might have to do with number of sockets/cpu/cores and
> the irq balancing issues.
>
>> As such, maybe the packet size triggers my problem. I am not sure
>> that the packet size will trigger your size.
> In my case I did try against netperf request-response (but that is
> single threaded) and iperf (but that is unidirectional, i.e.,
> it is not a bidirectional request-response test)
>
> perhaps if you share the pktgen config (did you change the
> code itself)? some of the i40e experts at intel (who are also
> on the to/cc lists of this mail) can try it out themselves?
>
> I still think that the fundamental problem can be identified
> by looking for what's causing the mdd event.
>
> It must be some type of bug, because ixgbe works fine for me,
> and this seems like a regression on that performance.
>
> --Sowmini
>
It is busy today. Tomorrow I will share the steps about pktgen tools.

Best Regards!
Zhu Yanjun

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-19 Thread Sowmini Varadhan
On (03/16/16 19:46), zhuyj wrote:

> It is busy today. Tomorrow I will share the steps about pktgen tools.

Ok. I can give that a try on my machine. It would be good to
have some way to reproduce this as simply as possible, maybe
we can take the discussion to netdev to see if others there
have experienced this.

--Sowmini


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-15 Thread Sowmini Varadhan
On (03/16/16 11:19), zhuyj wrote:
> 
> Thanks for your reply.
> Yesterday I made tests with rds-tools. It is very pity that I can
> not reproduce my problem with rds-tools.
> 
> But with pktgen tools, I can reproduce this problem easily. And I
> found that if I set the packet size to 17792 or less than this size,
> this problem would not occur. But if I set the packet size > 17792,
> for example 17793, my problem would occur.

I think it might have to do with number of sockets/cpu/cores and
the irq balancing issues.

> As such, maybe the packet size triggers my problem. I am not sure
> that the packet size will trigger your size.

In my case I did try against netperf request-response (but that is
single threaded) and iperf (but that is unidirectional, i.e.,
it is not a bidirectional request-response test)

perhaps if you share the pktgen config (did you change the
code itself)? some of the i40e experts at intel (who are also
on the to/cc lists of this mail) can try it out themselves? 

I still think that the fundamental problem can be identified
by looking for what's causing the mdd event. 

It must be some type of bug, because ixgbe works fine for me,
and this seems like a regression on that performance.

--Sowmini


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-15 Thread zhuyj
On 03/15/2016 06:54 PM, Sowmini Varadhan wrote:
> On (03/15/16 16:55), zhuyj wrote:
>> Sorry. I explain this in details.
>> I have an similar problem. At first, I think it is related with tso.
>> Then I made tests with pktgen tools and found that this similar
>> problem still occurred whether
>> tso is enabled or not.
>>
>> So I suggest to make tests with pktgen tools to exclude tso.
>>
> I realize that TSO might not be the root cause (Tushar also
> pointed that out) but might just be triggering the issue...
>
> I dont think we need pktgen at this point- it's quite easy
> to reproduce this on commodity Haswell servers, and by installing
> the rds-stress from the rpm below:
>
> http://public-yum.oracle.com/repo/OracleLinux/OL6/ofed_UEK/x86_64//getPackageSource/rds-tools-2.0.7-1.12.el6.src.rpm
>
> To run it, set up 2 nodes  connected on i40e. I shall call them
> "client" and "server" though both will send traffic in the test
>
> Start the listener:
>   server# modprobe rds-tcp
>   server# rds-stress -r 
>
> Start the test:
>
>   client# modprobe rds-tcp
>   client# rds-stress -r  -s  -q 256 -a 8192 -d16 
> -t16 -T30
>
> (all params are explained in the rds-stress man page)
>
> If you do this on ixgbe, you will see that the column for "tx+rx K/s"
> shows a steady throughput, whereas i40e numbers are bursty and low.
>
> Also, for i40e, you will see messages about TX hang on on the console.
>
> I think that, to find the root-cause, we need to see what is
> triggering the mdd error.
Hi,

Thanks for your reply.
Yesterday I made tests with rds-tools. It is very pity that I can not 
reproduce my problem with rds-tools.

But with pktgen tools, I can reproduce this problem easily. And I found 
that if I set the packet size to 17792 or less than this size,
this problem would not occur. But if I set the packet size > 17792, for 
example 17793, my problem would occur.

As such, maybe the packet size triggers my problem. I am not sure that 
the packet size will trigger your size.

Best Regards!
Zhu Yanjun
>
> Would be good if someone from Intel could provide some hints on
> how to do that (or try the above tests!)
>
> --Sowmini


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-15 Thread Sowmini Varadhan
On (03/15/16 16:55), zhuyj wrote:
> Sorry. I explain this in details.
> I have an similar problem. At first, I think it is related with tso.
> Then I made tests with pktgen tools and found that this similar
> problem still occurred whether
> tso is enabled or not.
> 
> So I suggest to make tests with pktgen tools to exclude tso.
> 

I realize that TSO might not be the root cause (Tushar also
pointed that out) but might just be triggering the issue...

I dont think we need pktgen at this point- it's quite easy
to reproduce this on commodity Haswell servers, and by installing
the rds-stress from the rpm below:

http://public-yum.oracle.com/repo/OracleLinux/OL6/ofed_UEK/x86_64//getPackageSource/rds-tools-2.0.7-1.12.el6.src.rpm

To run it, set up 2 nodes  connected on i40e. I shall call them
"client" and "server" though both will send traffic in the test

Start the listener:
 server# modprobe rds-tcp
 server# rds-stress -r 

Start the test:

 client# modprobe rds-tcp
 client# rds-stress -r  -s  -q 256 -a 8192 -d16 -t16 
-T30 

(all params are explained in the rds-stress man page)

If you do this on ixgbe, you will see that the column for "tx+rx K/s"
shows a steady throughput, whereas i40e numbers are bursty and low.

Also, for i40e, you will see messages about TX hang on on the console.

I think that, to find the root-cause, we need to see what is
triggering the mdd error.

Would be good if someone from Intel could provide some hints on
how to do that (or try the above tests!)

--Sowmini

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-15 Thread zhuyj
On 03/15/2016 02:12 PM, zhuyj wrote:
> Hi,
>
> I have the similar problem. Would you like to make tests with pktgen 
> tools?
> Maybe the test result is not related with tso.

Sorry. I explain this in details.
I have an similar problem. At first, I think it is related with tso.
Then I made tests with pktgen tools and found that this similar problem 
still occurred whether
tso is enabled or not.

So I suggest to make tests with pktgen tools to exclude tso.

Best Regards!
Zhu Yanjun

>
> Zhu Yanjun
>
> On 03/15/2016 05:43 AM, Sowmini Varadhan wrote:
>> Hi,
>>
>> I am trying out some DB stress tests on both i40e and ixgbe. The
>> stress test that I use is rds-stress 
>> (http://linux.die.net/man/1/rds-stress),
>> and I can list out the entire set of steps and parameters that I
>> am using to run this if that info is  interesting.
>>
>> My test bed is a pair of X5-2 (Haswell) servers, each with a
>> Niantic (X540-AT2) card and a Fortville card. The Niantic/fortville
>> cards are connected back-to-back, so I essentially have a 10G
>> connection and a 40G connection.
>>
>> Everything else (kernel, RDS modules, stress test and parameters)
>> remaining the same, I get  the expected throughput on the 10G
>> connection, but the i40e connection goes through a lot of TX
>> errors that result in console messages like this:
>>
>>i40e :81:00.0: TX driver issue detected, PF reset issued
>>i40e :81:00.0 eth2: adding 68:05:ca:30:db:30 vid=0
>>i40e :81:00.0: TX driver issue detected, PF reset issued
>>i40e :81:00.0 eth2: VSI_seid 390, Hung TX queue 32, 
>> tx_pending: 82, NTC:0xeb, HWB: 0xeb, NTU: 0x13d, TAIL: 0x13d
>>i40e :81:00.0 eth2: VSI_seid 390, Issuing force_wb for TX 
>> queue 32, Interrupt Reg
>>
>> I understand these are "mdd errors", but how can I find out what
>> triggered these errors, any hints?
>>
>> The other data-point here is that if I disable tso, and fall back
>> to gso, there are no tx errors, and throughput matches the 10G
>> connection (for the same set of test parameters).
>>
>> Please let me know if there is any other info that would help.
>> The kernel is a 4.5.0-rc2 kernel. Info for the i40e card is
>>
>>  # ethtool -i eth3
>>  driver: i40e
>>  version: 1.4.8-k
>>  firmware-version: 5.02 0x80002285 0.0.0
>>  bus-info: :81:00.1
>> :
>>
>> Thanks in advance,
>> --Sowmini
>>
>> --
>>  
>>
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
>> ___
>> E1000-devel mailing list
>> E1000-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel Ethernet, visit 
>> http://communities.intel.com/community/wired
>


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e card Tx resets

2016-03-15 Thread zhuyj
Hi,

I have the similar problem. Would you like to make tests with pktgen tools?
Maybe the test result is not related with tso.

Zhu Yanjun

On 03/15/2016 05:43 AM, Sowmini Varadhan wrote:
> Hi,
>
> I am trying out some DB stress tests on both i40e and ixgbe. The
> stress test that I use is rds-stress (http://linux.die.net/man/1/rds-stress),
> and I can list out the entire set of steps and parameters that I
> am using to run this if that info is  interesting.
>
> My test bed is a pair of X5-2 (Haswell) servers, each with a
> Niantic (X540-AT2) card and a Fortville card. The Niantic/fortville
> cards are connected back-to-back, so I essentially have a 10G
> connection and a 40G connection.
>
> Everything else (kernel, RDS modules, stress test and parameters)
> remaining the same, I get  the expected throughput on the 10G
> connection, but the i40e connection goes through a lot of TX
> errors that result in console messages like this:
>
>i40e :81:00.0: TX driver issue detected, PF reset issued
>i40e :81:00.0 eth2: adding 68:05:ca:30:db:30 vid=0
>i40e :81:00.0: TX driver issue detected, PF reset issued
>i40e :81:00.0 eth2: VSI_seid 390, Hung TX queue 32, tx_pending: 82, 
> NTC:0xeb, HWB: 0xeb, NTU: 0x13d, TAIL: 0x13d
>i40e :81:00.0 eth2: VSI_seid 390, Issuing force_wb for TX queue 32, 
> Interrupt Reg
>
> I understand these are "mdd errors", but how can I find out what
> triggered these errors, any hints?
>
> The other data-point here is that if I disable tso, and fall back
> to gso, there are no tx errors, and throughput matches the 10G
> connection (for the same set of test parameters).
>
> Please let me know if there is any other info that would help.
> The kernel is a 4.5.0-rc2 kernel. Info for the i40e card is
>
>  # ethtool -i eth3
>  driver: i40e
>  version: 1.4.8-k
>  firmware-version: 5.02 0x80002285 0.0.0
>  bus-info: :81:00.1
> :
>
> Thanks in advance,
> --Sowmini
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel Ethernet, visit 
> http://communities.intel.com/community/wired


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired