Re: Fwd: tap(4) performance tuning on (amd64)

2020-01-21 Thread Tom Smyth
Hello Claudio, All,


On Wed, 22 Jan 2020 at 01:01, Claudio Jeker  wrote:

> Surprised by the 20% better performance of the threaded version. I wonder
> if the single threaded version max out the performance of a single CPU.
> My tests running tcpbench just between two interfaces show no
> measurable performance difference between the different modes (for either
> tun or tap).

I will re-run the test  using bsd rather than bsd.mp   if that would help ?

Thanks


-- 
Kindest regards,
Tom Smyth.



Re: Fwd: tap(4) performance tuning on (amd64)

2020-01-21 Thread Claudio Jeker
On Tue, Jan 21, 2020 at 09:17:20PM +, Tom Smyth wrote:
> in testing tap(4)  performance on the same box with the following config
> using claudios userlandbridge (tbridge)  in between two tap interfaces
> each tap was also added their own standard bridge(4) along with 1 physical
> interface.
> 
> iperf3client--ix0--bridge0--tap0--tbridge--tap1--bridge1--ix1---iperf3svr
> 
> with a 1socket 2 core system that gives 3Gb/s we got the following
> performance
> 
> tbridge -t gave 557Mb/s TCP throughput
> 
> btw (tbridge -t did not stop after  using ^C  or kill
> but did respond to kill -s SIGKILL )

I forgot to mark the signals to interrupt read instead of restart. So you
need another packet to arrive to exit the loop.
You can add
siginterrupt(SIGTERM, 1);
siginterrupt(SIGINT, 1);
siginterrupt(SIGHUP, 1);
before the signal() calls to install the signal handler and then ^C will
work.

> tbridge -s gave 455Mb/s TCP throughput
> 
> tbridge -p gave 448Mb/s TCP throughput
> 
> tbridge -k gave 458mb/s TCP througput
> 
> im going to try this again with more CPUs as the workload of forwarding in
> this box involves 3 bridges in series.
> 
> I will also try with the tpmr(4) driver
> so something about OpenVPN  has a bottleneck that reduces performance
> by a factor of 3 -4x
> 

Surprised by the 20% better performance of the threaded version. I wonder
if the single threaded version max out the performance of a single CPU.
My tests running tcpbench just between two interfaces show no
measurable performance difference between the different modes (for either
tun or tap).

-- 
:wq Claudio



Fwd: tap(4) performance tuning on (amd64)

2020-01-21 Thread Tom Smyth
in testing tap(4)  performance on the same box with the following config
using claudios userlandbridge (tbridge)  in between two tap interfaces
each tap was also added their own standard bridge(4) along with 1 physical
interface.

iperf3client--ix0--bridge0--tap0--tbridge--tap1--bridge1--ix1---iperf3svr

with a 1socket 2 core system that gives 3Gb/s we got the following
performance

tbridge -t gave 557Mb/s TCP throughput

btw (tbridge -t did not stop after  using ^C  or kill
but did respond to kill -s SIGKILL )

tbridge -s gave 455Mb/s TCP throughput

tbridge -p gave 448Mb/s TCP throughput

tbridge -k gave 458mb/s TCP througput

im going to try this again with more CPUs as the workload of forwarding in
this box involves 3 bridges in series.

I will also try with the tpmr(4) driver
so something about OpenVPN  has a bottleneck that reduces performance
by a factor of 3 -4x








-- Forwarded message -
From: Tom Smyth 
Date: Tue, 21 Jan 2020 at 11:15
Subject: Re: tap(4) performance tuning on (amd64)
To: Tom Smyth , Misc 


Thanks Claudio,

the program now seems to run without exiting ...  Ill do some tests
and get back to you
later
Tom Smyth

On Tue, 21 Jan 2020 at 03:09, Claudio Jeker 
wrote:
>
> On Tue, Jan 21, 2020 at 02:44:35AM +, Tom Smyth wrote:
> > Claudio,
> > Thanks for this,
> > I compiled  it on Openbsd 6.6 (stable) amd64
> >
> > it compiled without error
> >
> > the binary seems to run  fine but,
> > ./tbridge -k /dev/tap0 /dev/tap1
> >
> > runs and displays the usage message and  gives an errorlevel of 1
> > every time  use the -k or -t or -s or -p arguments   see  terminal
> > conversation below
> >
>
> Shit, I added a last minute check and as usual introduced a bug.
> Line 189 change if (ch != 0) to if (mode != 0)
>
> --
> :wq Claudio
>
> /*
>  * Copyright (c) 2020 Claudio Jeker 
>  *
>  * Permission to use, copy, modify, and distribute this software for any
>  * purpose with or without fee is hereby granted, provided that the above
>  * copyright notice and this permission notice appear in all copies.
>  *
>  * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
WARRANTIES
>  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
>  * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
>  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
>  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
>  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>  */
> #include 
> #include 
> #include 
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> volatile sig_atomic_tquit;
>
> static void
> do_read(int in, int out)
> {
> char buf[2048];
> ssize_t n, o;
>
> n = read(in, buf, sizeof(buf));
> if (n == -1)
> err(1, "read");
> o = write(out, buf, n);
> if (o == -1)
> err(1, "read");
> if (o != n)
> errx(1, "short write");
> }
>
> static void
> do_poll(int fd[2])
> {
> struct pollfd pfd[2];
> int n, i;
>
> while (quit == 0) {
> memset(pfd, 0, sizeof(pfd));
> pfd[0].fd = fd[0];
> pfd[0].events = POLLIN;
>
> pfd[1].fd = fd[1];
> pfd[1].events = POLLIN;
>
> n = poll(pfd, 2, INFTIM);
> if (n == -1)
> err(1, "poll");
> if (n == 0)
> errx(1, "poll: timeout");
> for (i = 0; i < 2; i++) {
> if (pfd[i].revents & POLLIN)
> do_read(fd[i], fd[(i + 1) & 0x1]);
> else if (pfd[i].revents & (POLLHUP | POLLERR))
> errx(1, "fd %d revents %x", i,
pfd[i].revents);
> }
> }
>
> }
>
> static void
> do_select(int fd[2])
> {
> fd_set readfds;
> int n, i, maxfd = -1;
>
> while (quit == 0) {
> FD_ZERO();
> for (i = 0; i < 2; i++) {
> if (fd[i] > maxfd)
> maxfd = fd[i];
> FD_SET(fd[i], );
> }
> n = select(maxfd + 1, , NULL, NULL, NULL);
> if (n == -1)
> err(1, "select");
> if (n == 0)
> errx(1, "select: timeout");
> for (i = 0; i < 2; i++) {
> if (FD_ISSET(fd[i], ))
> do_read(fd[i], fd[(i + 1) & 0x1]);
> }
> }
> }
>
> static void
> do_kqueue(int fd[2])
> {
> struct kevent kev[2];
> int kq, i, n;
>
> if ((kq = kqueue()) == -1)
> err(1,