On 17 August 2016 at 08:43, Ben RUBSON wrote:
>
>> On 17 Aug 2016, at 17:38, Adrian Chadd wrote:
>>
>> [snip]
>>
>> ok, so this is what I was seeing when I was working on this stuff last.
>>
>> The big abusers are:
>>
>> * so_snd lock, for TX'ing
> On 17 Aug 2016, at 17:38, Adrian Chadd wrote:
>
> [snip]
>
> ok, so this is what I was seeing when I was working on this stuff last.
>
> The big abusers are:
>
> * so_snd lock, for TX'ing producer/consumer socket data
> * tcp stack pcb locking (which rss tries to
[snip]
ok, so this is what I was seeing when I was working on this stuff last.
The big abusers are:
* so_snd lock, for TX'ing producer/consumer socket data
* tcp stack pcb locking (which rss tries to work around, but it again
doesn't help producer/consumer locking, only multiple sockets)
* for
> On 15 Aug 2016, at 16:49, Ben RUBSON wrote:
>
>> On 12 Aug 2016, at 00:52, Adrian Chadd wrote:
>>
>> Which ones of these hit the line rate comfortably?
>
> So Adrian, I ran tests again using FreeBSD 11-RC1.
> I put iperf throughput in result
> On 16 Aug 2016, at 21:36, Adrian Chadd wrote:
>
> On 16 August 2016 at 02:58, Ben RUBSON wrote:
>>
>>> On 16 Aug 2016, at 03:45, Adrian Chadd wrote:
>>>
>>> Hi,
>>>
>>> ok, can you try 5) but also running with the
On 16 August 2016 at 02:58, Ben RUBSON wrote:
>
>> On 16 Aug 2016, at 03:45, Adrian Chadd wrote:
>>
>> Hi,
>>
>> ok, can you try 5) but also running with the interrupt threads pinned to CPU
>> 1?
>
> What do you mean by interrupt threads ?
>
>
> On 16 Aug 2016, at 03:45, Adrian Chadd wrote:
>
> Hi,
>
> ok, can you try 5) but also running with the interrupt threads pinned to CPU
> 1?
What do you mean by interrupt threads ?
Perhaps you mean the NIC interrupts ?
In this case see 6) and 7) where NIC IRQs are
Hi,
ok, can you try 5) but also running with the interrupt threads pinned to CPU 1?
It looks like the interrupt threads are running on CPU 0, and my
/guess/ (looking at the CPU usage distributions) that sometimes the
userland bits run on the same CPU or numa domain as the interrupt
bits, and it
> On 12 Aug 2016, at 00:52, Adrian Chadd wrote:
>
> Which ones of these hit the line rate comfortably?
So Adrian, I ran tests again using FreeBSD 11-RC1.
I put iperf throughput in result files (so that we can classify them), as well
as top -P ALL and pcm-memory.x.
Which ones of these hit the line rate comfortably?
-a
On 11 August 2016 at 15:35, Ben RUBSON wrote:
>
>> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>>
>> Hi!
>>
>> mlx4_core0: mem
>> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>
> Hi!
>
> mlx4_core0: mem
> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0
> numa-domain 1 on pci16
> mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
> (Aug 11 2016)
>
> so the
adrian did mean fixed-domain-rr. :-P sorry!
(Sorry, needed to update my NUMA boxes, things "changed" since I wrote this.)
-a
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail
On 08/11/16 12:54 PM, Ben RUBSON wrote:
>
>> On 11 Aug 2016, at 19:51, Ben RUBSON wrote:
>>
>>
>>> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>>>
>>> Hi!
>>
>> Hi Adrian,
>>
>>> mlx4_core0: mem
>>> 0xfbe0-0xfbef,0xfb00-0xfb7f irq
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>
> Hi!
>
> mlx4_core0: mem
> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0
> numa-domain 1 on pci16
> mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
> (Aug 11 2016)
>
> so the
> On 11 Aug 2016, at 19:51, Ben RUBSON wrote:
>
>
>> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>>
>> Hi!
>
> Hi Adrian,
>
>> mlx4_core0: mem
>> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0
>> numa-domain 1 on pci16
>>
> On 11 Aug 2016, at 18:36, Adrian Chadd wrote:
>
> Hi!
Hi Adrian,
> mlx4_core0: mem
> 0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0
> numa-domain 1 on pci16
> mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
> (Aug 11 2016)
>
Hi!
mlx4_core0: mem
0xfbe0-0xfbef,0xfb00-0xfb7f irq 64 at device 0.0
numa-domain 1 on pci16
mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
(Aug 11 2016)
so the NIC is in numa-domain 1. Try pinning the worker threads to
numa-domain 1 when you run the test:
> On 11 Aug 2016, at 00:11, Adrian Chadd wrote:
>
> hi,
>
> ok, lets start by getting the NUMA bits into the kernel so you can
> mess with things.
>
> add this to the kernel
>
> options MAXMEMDOM=8
> (which hopefully is enough)
> options VM_NUMA_ALLOC
> options
On 10 August 2016 at 12:50, Ben RUBSON wrote:
>
>> On 10 Aug 2016, at 21:47, Adrian Chadd wrote:
>>
>> hi,
>>
>> yeah, I'd like you to do some further testing with NUMA. Are you able
>> to run freebsd-11 or -HEAD on these boxes?
>
> Hi Adrian,
>
>
> On 10 Aug 2016, at 21:47, Adrian Chadd wrote:
>
> hi,
>
> yeah, I'd like you to do some further testing with NUMA. Are you able
> to run freebsd-11 or -HEAD on these boxes?
Hi Adrian,
Yes I currently have 11 BETA3 running on them.
I could also run BETA4.
Ben
hi,
yeah, I'd like you to do some further testing with NUMA. Are you able
to run freebsd-11 or -HEAD on these boxes?
-adrian
On 8 August 2016 at 07:01, Ben RUBSON wrote:
>
>> On 04 Aug 2016, at 11:40, Ben RUBSON wrote:
>>
>>
>>> On 02 Aug 2016, at
> On 04 Aug 2016, at 11:40, Ben RUBSON wrote:
>
>
>> On 02 Aug 2016, at 22:11, Ben RUBSON wrote:
>>
>>> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
>>>
>>> The CX-3 driver doesn't bind the worker threads to specific CPU
> On 05 Aug 2016, at 10:30, Hans Petter Selasky wrote:
>
> On 08/04/16 23:49, Ben RUBSON wrote:
>>>
>>> On 04 Aug 2016, at 20:15, Ryan Stone wrote:
>>>
>>> On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
>>> But even without RSS,
On 08/04/16 23:49, Ben RUBSON wrote:
On 04 Aug 2016, at 20:15, Ryan Stone wrote:
On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
But even without RSS, I should be able to go up to 2x40Gbps, don't you think so
?
Nobody already did this ?
Try this
>
> On 04 Aug 2016, at 20:15, Ryan Stone wrote:
>
> On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
> But even without RSS, I should be able to go up to 2x40Gbps, don't you think
> so ?
> Nobody already did this ?
>
> Try this patch
> (...)
I also
> On 04 Aug 2016, at 20:15, Ryan Stone wrote:
>
> On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
> But even without RSS, I should be able to go up to 2x40Gbps, don't you think
> so ?
> Nobody already did this ?
>
> Try this patch
> (...)
I also
> On 04 Aug 2016, at 20:15, Ryan Stone wrote:
>
> On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
> But even without RSS, I should be able to go up to 2x40Gbps, don't you think
> so ?
> Nobody already did this ?
>
> Try this patch, which should
On Thu, Aug 4, 2016 at 11:33 AM, Ben RUBSON wrote:
> But even without RSS, I should be able to go up to 2x40Gbps, don't you
> think so ?
> Nobody already did this ?
>
Try this patch, which should improve performance when multiple TCP streams
are running in parallel over an
> On 04 Aug 2016, at 17:33, Hans Petter Selasky wrote:
>
> On 08/04/16 17:24, Ben RUBSON wrote:
>>
>>> On 04 Aug 2016, at 11:40, Ben RUBSON wrote:
>>>
On 02 Aug 2016, at 22:11, Ben RUBSON wrote:
> On 02 Aug 2016,
On 08/04/16 17:24, Ben RUBSON wrote:
On 04 Aug 2016, at 11:40, Ben RUBSON wrote:
On 02 Aug 2016, at 22:11, Ben RUBSON wrote:
On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
The CX-3 driver doesn't bind the worker
> On 04 Aug 2016, at 11:40, Ben RUBSON wrote:
>
>> On 02 Aug 2016, at 22:11, Ben RUBSON wrote:
>>
>>> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
>>>
>>> The CX-3 driver doesn't bind the worker threads to specific CPU
> On 02 Aug 2016, at 22:11, Ben RUBSON wrote:
>
>> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
>>
>> The CX-3 driver doesn't bind the worker threads to specific CPU cores by
>> default, so if your CPU has more than one so-called numa, you'll
> On 03 Aug 2016, at 20:02, Hans Petter Selasky wrote:
>
> The mlx4 send and receive queues have each their set of taskqueues. Look in
> output from "ps auxww".
I can't find them, I even unloaded/reloaded the driver in order to catch the
differences, but I did not found any
On 08/03/16 18:57, Ben RUBSON wrote:
taskqueue threads ?
The mlx4 send and receive queues have each their set of taskqueues. Look
in output from "ps auxww".
--HPS
___
freebsd-net@freebsd.org mailing list
> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
>
> The CX-3 driver doesn't bind the worker threads to specific CPU cores by
> default, so if your CPU has more than one so-called numa, you'll end up that
> the bottle-neck is the high-speed link between the CPU cores
> On 03 Aug 2016, at 04:32, Eugene Grosbein wrote:
>
> If you have gateway_enable="YES" (sysctl net.inet.ip.forwarding=1)
> then try to disable this forwarding setting and rerun your tests to compare
> results.
Thank you Eugene for this, but net.inet.ip.forwarding is
03.08.2016 1:43, Ben RUBSON пишет:
Hello,
I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a
ConnectX-3 Mellanox network adapter.
If you have gateway_enable="YES" (sysctl net.inet.ip.forwarding=1)
then try to disable this forwarding setting and rerun your tests to
> On 02 Aug 2016, at 21:35, Hans Petter Selasky wrote:
>
> Hi,
Thank you for your answer Hans Petter !
> The CX-3 driver doesn't bind the worker threads to specific CPU cores by
> default, so if your CPU has more than one so-called numa, you'll end up that
> the
On 08/02/16 20:43, Ben RUBSON wrote:
Hello,
I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a
ConnectX-3 Mellanox network adapter.
FreeBSD 10.3 just installed, last updates performed.
Network adapters running last firmwares / last drivers.
No workload at all, just iPerf
Hello,
I'm trying to reach the 40Gb/s max throughtput between 2 hosts running a
ConnectX-3 Mellanox network adapter.
FreeBSD 10.3 just installed, last updates performed.
Network adapters running last firmwares / last drivers.
No workload at all, just iPerf as the benchmark tool.
### Step 1 :
I
40 matches
Mail list logo