The 82599 data sheet says, that masks must match for all the filters.
If so, how do I filter traffic, using Perfect Filters?
I'm reading the data sheet now, here is are the exact words
"
The 82599 supports masking / range for the previously described
fields. These masks are
defined globally for a
My IXGBE data path journey made me ask questions, like the one above.
Basically (I'm skipping a few steps here):
1. packet arrives to the card and is verified by MAC
2. packet is placed in the card's FIFO (which is small)
3. lots of steps here but finally card does DMA into one of receive buffers
a great success.
On Sun, Sep 4, 2016 at 7:20 PM, Alexander Duyck
wrote:
> On Sun, Sep 4, 2016 at 2:50 AM, Michał Purzyński
> wrote:
> > My IXGBE data path journey made me ask questions, like the one above.
> > Basically (I'm skipping a few steps here):
> >
> > 1.
What's the current status ofI/O and ixgbe driver and 82599 cards? Is it
even used? How's the performance?
If it is used, do you know in which stage exactly? Is it used to copy data
from driver buffers (those that card does DMA into from FIFO) to sk_buff or
from sk_buff to socket buffers?
-
Hello.
On my IDS workload with af_packet I can see rx_missed_errors growing while
rx_no_buffer_count does not. Basically every other kind of rx_ error
counter is 0, including rx_no_dma_resources. It's an 82599 based card.
I don't know what to think about that. I went through ixgbe source code and
tting lost.
Unfortunately HP documentation is a scam and they actively avoid publishing
motherboard layout.
Any other place I could look for hints?
On Fri, Sep 23, 2016 at 7:01 PM, Alexander Duyck
wrote:
> On Fri, Sep 23, 2016 at 1:10 AM, Michał Purzyński
> wrote:
> > Hello.
; >
> > Any other place I could look for hints?
> >
> >
> > On Fri, Sep 23, 2016 at 7:01 PM, Alexander Duyck <
> alexander.du...@gmail.com>
> > wrote:
> >>
> >> On Fri, Sep 23, 2016 at 1:10 AM, Michał Purzyński
> >> wrote:
> >
llows us to avoid having to do two different atomic operations
> that would have been more expensive otherwise.
>
> On Fri, Sep 23, 2016 at 12:46 PM, Michał Purzyński
> wrote:
> > Here's what I did
> >
> > ethtool -A p1p1 rx off tx off
> > ethtool -A p3p1 rx off
ve. Should I disable HW prefetcher and Adjacent Sector
Prefetch? Anything more?
> On 25 Sep 2016, at 03:55, Alexander Duyck wrote:
>
> On Sat, Sep 24, 2016 at 4:40 PM, Michał Purzyński
> wrote:
>> Thank for you being persistent with answers.
>>
>> So right after
der to disable ATR and change
> the RSS key on the device to use a 16 bit repeating value. You can
> find a paper detailing some of that here:
> http://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf
>
> Other than these tips I don't know if there is much more info I can
>
x queue for outgoing traffic. If it is enabled it would be
> creating rules in the flow director filter table that would be
> rerouting Rx traffic and could cause reordering.
>
> - Alex
>
> On Mon, Sep 26, 2016 at 6:53 AM, Michał Purzyński
> wrote:
> > Thank you a lot!
Out of curiosity - does it use flow director for that, like creating a
short(ish) living rule or there's another way?
> On 6 Oct 2016, at 21:32, Skidmore, Donald C
> wrote:
>
> Hey Lukasz,
>
> ATR work by targeting TCP flows as the adapter transmits a SYN packet. At
> this point it creates
X710 can use well over 100, logic tells me it's around 128 but specs says 144
per pci function. That's RSS.
Just loaded i40e with 56 RSS queues. Ah and this card was cheaper then X520. Go
figure.
> On 6 Oct 2016, at 23:57, Skidmore, Donald C
> wrote:
>
> Hey Lukasz,
>
> If you're using copp
Hey, one more question to the already known (to some of you) tuning
project. We can see excellent performance, findings how to achieve that
will soon be published. There is one more little thing that keeps me up at
night though ;)
2x E5-2697 v3, 2x X710 now, one per NUMA node.
I use isolcpus kern
. Excellent.
On Sat, Oct 15, 2016 at 10:38 AM, Michał Purzyński <
michalpurzyns...@gmail.com> wrote:
> Hey, one more question to the already known (to some of you) tuning
> project. We can see excellent performance, findings how to achieve that
> will soon be published. There is
Could you fill me in on how the bounce buffer approach to memory management
in IXGBE and I40E works?
Why do you allocate the same amount of memory in ixgbe_setup_rx_resources()
twice?
First time with a call to rx_ring->rx_buffer_info = vzalloc_node() - where
size represents what I set with ethtool
Thank you! Yes, it filled the last missing pieces :) Lesson learned -
memory allocations are not free, after all.
On Sun, Oct 16, 2016 at 10:54 PM, Alexander Duyck wrote:
> On Sun, Oct 16, 2016 at 4:52 AM, Michał Purzyński
> wrote:
> > Could you fill me in on how the bounce buffer
Mellanox advises to disable I/O Non-posted prefetching. Is that true for Intel
cards as well?
What does it do, and why does it take some performance away?
My current prefetching settings:
HW Prefetcher - disabled
Adjacent Sector Prefetch - disabled
DCU Stream Prefetcher - enabled
DCU IP Prefetche
Looking through some old posts on e1000 I found that one can detect if DCA
is enabled with ethregs. Are these registers per queue? Here, the X520 is
configured with a single queue and the highest bits are set only for
DCA_RXCTRL[000].
On X520
DCA_RXCTRL[000]1f0002a0
DCA_RXCTRL[001]
Did I ruin DDIO with DCA=2 parameter to X520?
Looking at the driver code, I just enabled DCA instead.
> On 1 Nov 2016, at 20:56, Alexander Duyck wrote:
>
> On Tue, Nov 1, 2016 at 12:04 PM, Michał Purzyński
> wrote:
>> Looking through some old posts on e1000 I found that one
For a good reason, Intel simplified the interrupt moderation and only
respects the rx-usecs and rx-usecs-high for X520 and i40e. I can see why
(reading the chips specializations) - with like 20 parameters the ethtool
makes it unmanageable and better (internal, on chip) algorithms exist that
would b
see the tag
> value change when you move from one socket to another.
>
> - Alex
>
> On Tue, Nov 1, 2016 at 1:07 PM, Michał Purzyński
> wrote:
> > Did I ruin DDIO with DCA=2 parameter to X520?
> >
> > Looking at the driver code, I just enabled DCA instead
Yes, that's ATR with OS scheduler reordering frames, I believe. There's even a
paper about that. It actually has to reorder packets now that I think about
that.
Disable ATR, all offloading, set symmetric hashing and pin interrupts and IDS
workers. Should work - my quick tests with the Bro IDS w
Hey,
the 4.14 kernel seems to get rid of the tc_to_netdev structure. I tried
building the current i40e stable against 4.14-rc5 and it failed. Applying
the patch found here
https://www.mail-archive.com/netdev@vger.kernel.org/msg181110.html
seems to have fixed it. Can we have the i40e driver (and
Hello!
How do I read the indirection table, then one that can be shown with
ethtool -x? What is the meaning of columns vs rows and what those numbers
are trying to tell me? I'm guessing it's kind of like a weight, but I'm not
sure how to understand it.
RX flow hash indirection table for p3p1 wit
h of those
cores would be used? The packet could you go a core 8 or 9 or 12?
On Wed, Nov 22, 2017 at 5:55 PM, Alexander Duyck
wrote:
> Comments inline below.
>
> On Wed, Nov 22, 2017 at 8:23 AM, Michał Purzyński
> wrote:
> > Hello!
> >
> > How do I read the indir
Ah, so it's just the representation that's confusing!
352: 2 3 4 5 6 7 8 9
360: 10111213 0 1 2 3
Core 9 corresponds to 359 indeed. Thanks for the explanation, I can see why.
On Wed, Nov 22, 2017 at 6:39 PM, Michał
Hey!
The 82599 chip had a known problem with FDir ATR causing packet reordering.
The solution was to enable "perfect filters" for flow director.
Does the X710 line have similar problems and I should disable ATR? I can
see it's enabled by default
root@nsm1~ # ethtool --show-priv-flags enp17s0f0
P
ethtool -N enp17s0f0 flow-type ip4 src-ip 10.251.0.193 action -1
When trying to drop some GRE traffic with the X710, I found that no matter
how I configure my filters, traffic is not dropped. It looks like the GRE
traffic cannot be matched by FD at all.
ethtool -u enp17s0f0
4 RX rings available
Is the RSS redirection table (ethtool -X ) ignored when interrupts are
pinned to cores?
I configured a card with 12 queues and pinned all of them to cores on a
local NUMA nodes. Card is local to node 1, so with 12 cores per CPU I
pinned card's interrupts to cores 12-23 - and interrupts seem to be
> assume you are talking about an ixgbe NIC, probably something like
> an x540 or an 82599? It would be helpful to specify the driver and
> which specific NIC we are talking about.
I’m talking about X720 so that’s the i40e driver. This could not be ATR, since
I have perfect filters enabled and
When doing some low-level cache hit rates measurement I noticed that on
Skylake (Xeon Gold 6126) the LLC hit rates are much worse than on previous
generations of Xeons.
Both servers were configured in the same way
2x CPU
2x X710 card - one for each NUMA node
RSS enabled - 10 queues
All interrupts
Yes, I'm familiar with the Skylake architecture changes. Unfortunately, all
of my questions here still hold :)
On Thu, Oct 24, 2019 at 11:10 AM Damjan Marion wrote:
>
>
> On 24 Oct 2019, at 11:16, Michał Purzyński
> wrote:
>
> When doing some low-level cache hit rate
On 82599 cards one could not filter traffic on say, src and dst in the same
rule set. The mask for all filters was global.
Is that still the case on X710?
ethtool -N enp216s0f0 flow-type udp4 src-port 514 action -1
Added rule with ID 7679
ethtool -N enp216s0f0 flow-type udp4 dst-port 514 action
There was this old trick on 82599 to force the card to use 4KB writes
setpci -v -d 8086:1572 e6.b=2e
is it still relevant on PCIe based X710? Does it even make sense to use it?
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://list
Yet another post, but I'm almost there and I promise to publish yet another
piece of documentation.
Correct me if I'm wrong but reading the
UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0
perf stat -e unc_iio_data_req_of_cpu.mem_write.part0 -C1 -r 3 sleep 1
should tell me what the rate of DDIO misses i
, Oct 26, 2019 at 6:00 PM Michał Purzyński
wrote:
> Yet another post, but I'm almost there and I promise to publish yet
> another piece of documentation.
>
> Correct me if I'm wrong but reading the
>
> UNC_IIO_DATA_REQ_OF_CPU.MEM_WRITE.PART0
>
> perf stat -e unc_i
37 matches
Mail list logo