[E1000-devel] ethtool and perfect match filters - how to use it?

2015-07-09 Thread Michał Purzyński
The 82599 data sheet says, that masks must match for all the filters. If so, how do I filter traffic, using Perfect Filters? I'm reading the data sheet now, here is are the exact words " The 82599 supports masking / range for the previously described fields. These masks are defined globally for a

[E1000-devel] Does 82599 DMA into directly skb data?

2016-09-04 Thread Michał Purzyński
My IXGBE data path journey made me ask questions, like the one above. Basically (I'm skipping a few steps here): 1. packet arrives to the card and is verified by MAC 2. packet is placed in the card's FIFO (which is small) 3. lots of steps here but finally card does DMA into one of receive buffers

Re: [E1000-devel] Does 82599 DMA into directly skb data?

2016-09-04 Thread Michał Purzyński
a great success. On Sun, Sep 4, 2016 at 7:20 PM, Alexander Duyck wrote: > On Sun, Sep 4, 2016 at 2:50 AM, Michał Purzyński > wrote: > > My IXGBE data path journey made me ask questions, like the one above. > > Basically (I'm skipping a few steps here): > > > > 1.

[E1000-devel] IOAT and ixgbe current status

2016-09-22 Thread Michał Purzyński
What's the current status ofI/O and ixgbe driver and 82599 cards? Is it even used? How's the performance? If it is used, do you know in which stage exactly? Is it used to copy data from driver buffers (those that card does DMA into from FIFO) to sk_buff or from sk_buff to socket buffers? -

[E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-23 Thread Michał Purzyński
Hello. On my IDS workload with af_packet I can see rx_missed_errors growing while rx_no_buffer_count does not. Basically every other kind of rx_ error counter is 0, including rx_no_dma_resources. It's an 82599 based card. I don't know what to think about that. I went through ixgbe source code and

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-23 Thread Michał Purzyński
tting lost. Unfortunately HP documentation is a scam and they actively avoid publishing motherboard layout. Any other place I could look for hints? On Fri, Sep 23, 2016 at 7:01 PM, Alexander Duyck wrote: > On Fri, Sep 23, 2016 at 1:10 AM, Michał Purzyński > wrote: > > Hello.

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-23 Thread Michał Purzyński
; > > > Any other place I could look for hints? > > > > > > On Fri, Sep 23, 2016 at 7:01 PM, Alexander Duyck < > alexander.du...@gmail.com> > > wrote: > >> > >> On Fri, Sep 23, 2016 at 1:10 AM, Michał Purzyński > >> wrote: > >

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-24 Thread Michał Purzyński
llows us to avoid having to do two different atomic operations > that would have been more expensive otherwise. > > On Fri, Sep 23, 2016 at 12:46 PM, Michał Purzyński > wrote: > > Here's what I did > > > > ethtool -A p1p1 rx off tx off > > ethtool -A p3p1 rx off

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-25 Thread Michał Purzyński
ve. Should I disable HW prefetcher and Adjacent Sector Prefetch? Anything more? > On 25 Sep 2016, at 03:55, Alexander Duyck wrote: > > On Sat, Sep 24, 2016 at 4:40 PM, Michał Purzyński > wrote: >> Thank for you being persistent with answers. >> >> So right after

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-26 Thread Michał Purzyński
der to disable ATR and change > the RSS key on the device to use a 16 bit repeating value. You can > find a paper detailing some of that here: > http://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf > > Other than these tips I don't know if there is much more info I can >

Re: [E1000-devel] rx_missed_errors grows while rx_no_buffer does not

2016-09-30 Thread Michał Purzyński
x queue for outgoing traffic. If it is enabled it would be > creating rules in the flow director filter table that would be > rerouting Rx traffic and could cause reordering. > > - Alex > > On Mon, Sep 26, 2016 at 6:53 AM, Michał Purzyński > wrote: > > Thank you a lot!

Re: [E1000-devel] rx queue

2016-10-06 Thread Michał Purzyński
Out of curiosity - does it use flow director for that, like creating a short(ish) living rule or there's another way? > On 6 Oct 2016, at 21:32, Skidmore, Donald C > wrote: > > Hey Lukasz, > > ATR work by targeting TCP flows as the adapter transmits a SYN packet. At > this point it creates

Re: [E1000-devel] rx queue

2016-10-06 Thread Michał Purzyński
X710 can use well over 100, logic tells me it's around 128 but specs says 144 per pci function. That's RSS. Just loaded i40e with 56 RSS queues. Ah and this card was cheaper then X520. Go figure. > On 6 Oct 2016, at 23:57, Skidmore, Donald C > wrote: > > Hey Lukasz, > > If you're using copp

[E1000-devel] Measuring LLC misses for DCA

2016-10-15 Thread Michał Purzyński
Hey, one more question to the already known (to some of you) tuning project. We can see excellent performance, findings how to achieve that will soon be published. There is one more little thing that keeps me up at night though ;) 2x E5-2697 v3, 2x X710 now, one per NUMA node. I use isolcpus kern

Re: [E1000-devel] Measuring LLC misses for DCA

2016-10-15 Thread Michał Purzyński
. Excellent. On Sat, Oct 15, 2016 at 10:38 AM, Michał Purzyński < michalpurzyns...@gmail.com> wrote: > Hey, one more question to the already known (to some of you) tuning > project. We can see excellent performance, findings how to achieve that > will soon be published. There is

[E1000-devel] How the ixgbe and Linux network stack manage memory

2016-10-16 Thread Michał Purzyński
Could you fill me in on how the bounce buffer approach to memory management in IXGBE and I40E works? Why do you allocate the same amount of memory in ixgbe_setup_rx_resources() twice? First time with a call to rx_ring->rx_buffer_info = vzalloc_node() - where size represents what I set with ethtool

Re: [E1000-devel] How the ixgbe and Linux network stack manage memory

2016-10-17 Thread Michał Purzyński
Thank you! Yes, it filled the last missing pieces :) Lesson learned - memory allocations are not free, after all. On Sun, Oct 16, 2016 at 10:54 PM, Alexander Duyck wrote: > On Sun, Oct 16, 2016 at 4:52 AM, Michał Purzyński > wrote: > > Could you fill me in on how the bounce buffer

[E1000-devel] IO non-posted prefetching and Intel cards

2016-10-19 Thread Michał Purzyński
Mellanox advises to disable I/O Non-posted prefetching. Is that true for Intel cards as well? What does it do, and why does it take some performance away? My current prefetching settings: HW Prefetcher - disabled Adjacent Sector Prefetch - disabled DCU Stream Prefetcher - enabled DCU IP Prefetche

[E1000-devel] DCA and DDIO detection

2016-11-01 Thread Michał Purzyński
Looking through some old posts on e1000 I found that one can detect if DCA is enabled with ethregs. Are these registers per queue? Here, the X520 is configured with a single queue and the highest bits are set only for DCA_RXCTRL[000]. On X520 DCA_RXCTRL[000]1f0002a0 DCA_RXCTRL[001]

Re: [E1000-devel] DCA and DDIO detection

2016-11-01 Thread Michał Purzyński
Did I ruin DDIO with DCA=2 parameter to X520? Looking at the driver code, I just enabled DCA instead. > On 1 Nov 2016, at 20:56, Alexander Duyck wrote: > > On Tue, Nov 1, 2016 at 12:04 PM, Michał Purzyński > wrote: >> Looking through some old posts on e1000 I found that one

[E1000-devel] Interrupt mitigation - two short questions

2016-11-01 Thread Michał Purzyński
For a good reason, Intel simplified the interrupt moderation and only respects the rx-usecs and rx-usecs-high for X520 and i40e. I can see why (reading the chips specializations) - with like 20 parameters the ethtool makes it unmanageable and better (internal, on chip) algorithms exist that would b

Re: [E1000-devel] DCA and DDIO detection

2016-11-03 Thread Michał Purzyński
see the tag > value change when you move from one socket to another. > > - Alex > > On Tue, Nov 1, 2016 at 1:07 PM, Michał Purzyński > wrote: > > Did I ruin DDIO with DCA=2 parameter to X520? > > > > Looking at the driver code, I just enabled DCA instead

Re: [E1000-devel] Symmetric hashing for ixgbe driver?

2016-11-16 Thread Michał Purzyński
Yes, that's ATR with OS scheduler reordering frames, I believe. There's even a paper about that. It actually has to reorder packets now that I think about that. Disable ATR, all offloading, set symmetric hashing and pin interrupts and IDS workers. Should work - my quick tests with the Bro IDS w

[E1000-devel] i40e and the kernel 4.14

2017-10-23 Thread Michał Purzyński
Hey, the 4.14 kernel seems to get rid of the tc_to_netdev structure. I tried building the current i40e stable against 4.14-rc5 and it failed. Applying the patch found here https://www.mail-archive.com/netdev@vger.kernel.org/msg181110.html seems to have fixed it. Can we have the i40e driver (and

[E1000-devel] How to read the RX flow hash indirection table

2017-11-22 Thread Michał Purzyński
Hello! How do I read the indirection table, then one that can be shown with ethtool -x? What is the meaning of columns vs rows and what those numbers are trying to tell me? I'm guessing it's kind of like a weight, but I'm not sure how to understand it. RX flow hash indirection table for p3p1 wit

Re: [E1000-devel] How to read the RX flow hash indirection table

2017-11-22 Thread Michał Purzyński
h of those cores would be used? The packet could you go a core 8 or 9 or 12? On Wed, Nov 22, 2017 at 5:55 PM, Alexander Duyck wrote: > Comments inline below. > > On Wed, Nov 22, 2017 at 8:23 AM, Michał Purzyński > wrote: > > Hello! > > > > How do I read the indir

Re: [E1000-devel] How to read the RX flow hash indirection table

2017-11-22 Thread Michał Purzyński
Ah, so it's just the representation that's confusing! 352: 2 3 4 5 6 7 8 9 360: 10111213 0 1 2 3 Core 9 corresponds to 359 indeed. Thanks for the explanation, I can see why. On Wed, Nov 22, 2017 at 6:39 PM, Michał

[E1000-devel] X710 and packet reordering from ATR

2019-07-11 Thread Michał Purzyński
Hey! The 82599 chip had a known problem with FDir ATR causing packet reordering. The solution was to enable "perfect filters" for flow director. Does the X710 line have similar problems and I should disable ATR? I can see it's enabled by default root@nsm1~ # ethtool --show-priv-flags enp17s0f0 P

[E1000-devel] Flow director and GRE

2019-07-24 Thread Michał Purzyński
ethtool -N enp17s0f0 flow-type ip4 src-ip 10.251.0.193 action -1 When trying to drop some GRE traffic with the X710, I found that no matter how I configure my filters, traffic is not dropped. It looks like the GRE traffic cannot be matched by FD at all. ethtool -u enp17s0f0 4 RX rings available

[E1000-devel] The true meaning of the RSS redirection table

2019-10-15 Thread Michał Purzyński
Is the RSS redirection table (ethtool -X ) ignored when interrupts are pinned to cores? I configured a card with 12 queues and pinned all of them to cores on a local NUMA nodes. Card is local to node 1, so with 12 cores per CPU I pinned card's interrupts to cores 12-23 - and interrupts seem to be

Re: [E1000-devel] The true meaning of the RSS redirection table

2019-10-15 Thread Michał Purzyński
> assume you are talking about an ixgbe NIC, probably something like > an x540 or an 82599? It would be helpful to specify the driver and > which specific NIC we are talking about. I’m talking about X720 so that’s the i40e driver. This could not be ATR, since I have perfect filters enabled and

[E1000-devel] Low cache hit rates with DDIO and Skylake

2019-10-24 Thread Michał Purzyński
When doing some low-level cache hit rates measurement I noticed that on Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous generations of Xeons. Both servers were configured in the same way 2x CPU 2x X710 card - one for each NUMA node RSS enabled - 10 queues All interrupts

Re: [E1000-devel] Low cache hit rates with DDIO and Skylake

2019-10-24 Thread Michał Purzyński
Yes, I'm familiar with the Skylake architecture changes. Unfortunately, all of my questions here still hold :) On Thu, Oct 24, 2019 at 11:10 AM Damjan Marion wrote: > > > On 24 Oct 2019, at 11:16, Michał Purzyński > wrote: > > When doing some low-level cache hit rate

[E1000-devel] Is the 82599 perfect filter mask limit still present on X710?

2019-10-26 Thread Michał Purzyński
On 82599 cards one could not filter traffic on say, src and dst in the same rule set. The mask for all filters was global. Is that still the case on X710? ethtool -N enp216s0f0 flow-type udp4 src-port 514 action -1 Added rule with ID 7679 ethtool -N enp216s0f0 flow-type udp4 dst-port 514 action

[E1000-devel] 4K PCIe writes and X710

2019-10-26 Thread Michał Purzyński
There was this old trick on 82599 to force the card to use 4KB writes setpci -v -d 8086:1572 e6.b=2e is it still relevant on PCIe based X710? Does it even make sense to use it? ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://list

[E1000-devel] High rate of DDIO misses

2019-10-26 Thread Michał Purzyński
Yet another post, but I'm almost there and I promise to publish yet another piece of documentation. Correct me if I'm wrong but reading the UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 perf stat -e unc_iio_data_req_of_cpu.mem_write.part0 -C1 -r 3 sleep 1 should tell me what the rate of DDIO misses i

Re: [E1000-devel] High rate of DDIO misses

2019-10-30 Thread Michał Purzyński
, Oct 26, 2019 at 6:00 PM Michał Purzyński wrote: > Yet another post, but I'm almost there and I promise to publish yet > another piece of documentation. > > Correct me if I'm wrong but reading the > > UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0 > > perf stat -e unc_i