Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-05 Thread Mike Pumford

On 04/02/2019 21:45, Sad Clouds wrote:

I've tried those options before, but performance gain was marginal
compared to gigabit ethernet speeds, i.e. it went up from 13 MiB/sec to
around 17 MiB/sec. But I guess 30% gain is better than nothing.

I think Sun Ultra10 has 32-bit 33 MHz PCI bus, so in theory it has
bandwidth for 133 MiB/sec. Not sure why there is such high CPU usage
on this system when copying data via the network card. OK it's a very
old system, but I was expecting less overhead.



Knowing intel there will be some bit of register or descriptor that 
works great if you are on a cache coherent setup like i386 or amd64 but 
performs less well on the sparc architecture. I have one PCI wm 
interface but that's attached to a 2GHz chip quad core amd64 system  so 
is a lot faster than your ultra10.


Had a quick look at the code and couldn't see anything particularly odd 
in the driver code.


Mike




Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-04 Thread Sad Clouds
On Mon, 4 Feb 2019 20:45:13 +
Mike Pumford  wrote:

> On 03/02/2019 12:07, Sad Clouds wrote:
> > On Sun, 3 Feb 2019 11:27:07 +0100
> > tlaro...@polynum.com wrote:
> > 
> >> With all your help and from this summary, I suspect that the
> >> probable culprit is 3) above (linked also to 2) but mainly 3): an
> >> instance of Samba, serving a 10T or a 100T request is blocking on
> >> I/O,
> > 
> > You must have some ancient hardware that is not capable of utilising
> > full network bandwidth.
> > 
> > I have Sun Ultra10 with 440MHz CPU, this has a 1000baseT PCI network
> > card, but when sending TCP data it can only manage around 13
> > MiB/sec. Looking at 'vmstat 1' output it is clear that CPU is 100%
> > busy, 50% system + 50% interrupt.
> > 
> > ultra10$ ifconfig wm0
> > wm0: flags=0x8843 mtu 1500
> >  capabilities=2bf80
> >  capabilities=2bf80
> >  capabilities=2bf80
> >  enabled=0
> Any reason you aren't using the hardware offload capability of this 
> interface? The card supports checksum ofloading for v4  frames and 
> also the ofloading of some of the TCP stack work for V4 this might
> help your performance. Loogking at this output I'd suggest adding the 
> following to your ifconfig.wm0 file:
> tso4 ip4csum tcp4csum
> 
> Not exactly sure on the options for the V6 stuff (check the ifconfig
> man page). My wm card has more v6 offload so I don't think the v6
> options I use would work.
> 
> I suspect the offload would give you more of a boost than on my
> 2-4GHz amd64 systems but even on those its quit noticable the
> difference that turning these on makes.
> 
> Mike

I've tried those options before, but performance gain was marginal
compared to gigabit ethernet speeds, i.e. it went up from 13 MiB/sec to
around 17 MiB/sec. But I guess 30% gain is better than nothing.

I think Sun Ultra10 has 32-bit 33 MHz PCI bus, so in theory it has
bandwidth for 133 MiB/sec. Not sure why there is such high CPU usage
on this system when copying data via the network card. OK it's a very
old system, but I was expecting less overhead.



Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-04 Thread Mike Pumford




On 03/02/2019 12:07, Sad Clouds wrote:

On Sun, 3 Feb 2019 11:27:07 +0100
tlaro...@polynum.com wrote:


With all your help and from this summary, I suspect that the probable
culprit is 3) above (linked also to 2) but mainly 3): an instance
of Samba, serving a 10T or a 100T request is blocking on I/O,


You must have some ancient hardware that is not capable of utilising
full network bandwidth.

I have Sun Ultra10 with 440MHz CPU, this has a 1000baseT PCI network
card, but when sending TCP data it can only manage around 13 MiB/sec.
Looking at 'vmstat 1' output it is clear that CPU is 100% busy, 50%
system + 50% interrupt.

ultra10$ ifconfig wm0
wm0: flags=0x8843 mtu 1500
 capabilities=2bf80
 capabilities=2bf80
 capabilities=2bf80
 enabled=0
Any reason you aren't using the hardware offload capability of this 
interface? The card supports checksum ofloading for v4  frames and 
also the ofloading of some of the TCP stack work for V4 this might help 
your performance. Loogking at this output I'd suggest adding the 
following to your ifconfig.wm0 file:

tso4 ip4csum tcp4csum

Not exactly sure on the options for the V6 stuff (check the ifconfig man 
page). My wm card has more v6 offload so I don't think the v6 options I 
use would work.


I suspect the offload would give you more of a boost than on my 2-4GHz 
amd64 systems but even on those its quit noticable the difference that 
turning these on makes.


Mike


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-03 Thread tlaronde
Hello,

On Sun, Feb 03, 2019 at 12:07:06PM +, Sad Clouds wrote:
> On Sun, 3 Feb 2019 11:27:07 +0100
> tlaro...@polynum.com wrote:
> 
> > With all your help and from this summary, I suspect that the probable
> > culprit is 3) above (linked also to 2) but mainly 3): an instance
> > of Samba, serving a 10T or a 100T request is blocking on I/O,
> 
> You must have some ancient hardware that is not capable of utilising
> full network bandwidth.
> 
> I have Sun Ultra10 with 440MHz CPU, this has a 1000baseT PCI network
> card, but when sending TCP data it can only manage around 13 MiB/sec.
> Looking at 'vmstat 1' output it is clear that CPU is 100% busy, 50%
> system + 50% interrupt.
> 
> ultra10$ ifconfig wm0
> wm0: flags=0x8843 mtu 1500
> capabilities=2bf80
> capabilities=2bf80
> capabilities=2bf80
> enabled=0
> ec_capabilities=7
> ec_enabled=0
> address: 00:0e:04:b7:7f:47
> media: Ethernet autoselect (1000baseT 
> full-duplex,flowcontrol,rxpause,txpause)
> status: active
> inet 192.168.1.3/24 broadcast 192.168.1.255 flags 0x0
> inet6 fe80::20e:4ff:feb7:7f47%wm0/64 flags 0x0 scopeid 0x2
> 
> 
> ultra10$ ./sv_net -mode=cli -ip=192.168.1.2 -port= -threads=1 -block=64k 
> -size=100m
> Per-thread metrics:
>   T 1  connect 1.01 msec,  transfer 7255.29 msec (13.78 MiB/sec, 9899.03 
> Pks/sec)
> 
> Per-thread socket options:
>   T 1  rcvbuf=33580,  sndbuf=49688,  sndmaxseg=1460,  nodelay=Off
> 
> Aggregate metrics:
>   connect  1.01 msec
>   transfer 7255.29 msec (13.78 MiB/sec, 9899.03 Pks/sec)

Thank you for the data.

But there is one point that I failed to give: it happens that on some
nodes, the connections to the server become totally unresponsive for
several seconds.

Since the server, if not brand new, is a decent less than 5
years old dual-core Intel with plenty of RAM, upgraded to NetBSD 7.2, it
is able to use the Intel gigabit card at full speed.

Thanks to your data, I see that on some hardware, despite the
capabilities of the ethernet card, I might not get the full speed.

But since (what I forgot to write) sometimes the whole performance,
via the Samba shares, drops, and since, from the answers, the bottle
neck is not on the ethernet card of the server, and since something
is affecting everything, the problem seems logically to be on the
server and this is on the server that a process is grabbing one of
the core.

Thanks for your answer. I already learned valuable things!
-- 
Thierry Laronde 
 http://www.kergis.com/
   http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-03 Thread Sad Clouds
On Sun, 3 Feb 2019 11:27:07 +0100
tlaro...@polynum.com wrote:

> With all your help and from this summary, I suspect that the probable
> culprit is 3) above (linked also to 2) but mainly 3): an instance
> of Samba, serving a 10T or a 100T request is blocking on I/O,

You must have some ancient hardware that is not capable of utilising
full network bandwidth.

I have Sun Ultra10 with 440MHz CPU, this has a 1000baseT PCI network
card, but when sending TCP data it can only manage around 13 MiB/sec.
Looking at 'vmstat 1' output it is clear that CPU is 100% busy, 50%
system + 50% interrupt.

ultra10$ ifconfig wm0
wm0: flags=0x8843 mtu 1500
capabilities=2bf80
capabilities=2bf80
capabilities=2bf80
enabled=0
ec_capabilities=7
ec_enabled=0
address: 00:0e:04:b7:7f:47
media: Ethernet autoselect (1000baseT 
full-duplex,flowcontrol,rxpause,txpause)
status: active
inet 192.168.1.3/24 broadcast 192.168.1.255 flags 0x0
inet6 fe80::20e:4ff:feb7:7f47%wm0/64 flags 0x0 scopeid 0x2


ultra10$ ./sv_net -mode=cli -ip=192.168.1.2 -port= -threads=1 -block=64k 
-size=100m
Per-thread metrics:
  T 1  connect 1.01 msec,  transfer 7255.29 msec (13.78 MiB/sec, 9899.03 
Pks/sec)

Per-thread socket options:
  T 1  rcvbuf=33580,  sndbuf=49688,  sndmaxseg=1460,  nodelay=Off

Aggregate metrics:
  connect  1.01 msec
  transfer 7255.29 msec (13.78 MiB/sec, 9899.03 Pks/sec)



Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-03 Thread tlaronde
Hello,

And thank you all for the answers!

In order to not have to interpolate the various informations, I will
summarize:

My initial question: I have a NetBSD server serving FFSv2 filesystems
via Samba (last pkgsrc version) through a 1000T ethernet card to a bunch
of Windows clients, heterogeneous, both as OS version and as ethernet
cards, ranging from 10T to 1000T. All the nodes are connected to a Cisco
switch. The network performances (for Samba) seem poor and I wonder how
the various ethernet speeds (10 to 1000) could affect the performance,
impacting the negociations on the server auto-select 1000T card.

>From your answers (the details given are worth reading, and someone
reading this should return to the whole answers. I'm just summarizing,
if I'm not mistaken):

1) The negociations are done by the switch, and the server card doesn't
handle it by itself;
2) On the server, the capabilities of the disks serving should be
determined;
3) On the server, Samba is not multithreaded and spawning an
instance for each connection, so even on a multicore perhaps not using
all the cores and even if it does, the instances are still concurrent;
4) The measure of the network performances should be determined by using
for example iperf3 available on pkgsrc.

>From your questions about precisions:

For the switch:
a) The switch is a Cisco gigabit 16 ports switch (RV325) able of 
handling
simultaneous full-duplex gigabit on all the ports;

b) The cards are correctly detected to their maximum speed on the
switch: the leds indicate correctly gigabit for the correct cards, and
not gigabit (no difference between 10T and 100T) for the others.

For the poor performances:
a) On a Windows (10 if I remember---I'm not on site), with a Gigabit
Ethernet card, downloading from a Samba share gives a 12MB/s that is
the max performance of a 100T card; uploading to the server via
Samba gives a 3MB/s;
b) Testing on another Windows node with a 100T card, I have the same
throughput copying via Samba or using ftp.

With all your help and from this summary, I suspect that the probable
culprit is 3) above (linked also to 2) but mainly 3): an instance
of Samba, serving a 10T or a 100T request is blocking on I/O, specially
on writing (sync?), the other instances waiting for a chance to
have a slice of CPU? That is, the problem is probably with caching and
syncing ---there are some Samba parameters in the config file but
the whole is a bit cryptic... And I'd like to use NFS, but Microsoft
allowing and then dropping, I don't know if the NFS client on Windows 
can still be installed without requiring to install a full Linux based
distribution...

Thank you all! once again.

Best regards,
-- 
Thierry Laronde 
 http://www.kergis.com/
   http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-03 Thread Sad Clouds
On Sat, 2 Feb 2019 17:01:18 +0100
tlaro...@polynum.com wrote:

> Hello,
> 
> I have a NetBSD serving FFSv2 filesystems to various Windows nodes via
> Samba.
> 
> The network efficiency seems to me underpar.

And how did you determine that? There are so many factors that can
affect performance, you need to run detailed tests and work your way
down the list. Normally, good switches are designed to handle
throughput for the number of ports they have, i.e. their switching
fabric should be able to cope with all of those ports transmitting at
the same time at the highest supported rate. So quite often it's not
the switch that causes performance issues, but disk I/O latency and
protocol overheads.

At work, I used to spend a fair amount of my time diagnosing SMB and
NFS performance issues. Below is a checklist that I would normally
run through:

- Check network speeds between various hosts using something like
ttcp or iperf. This should give you baseline performance for TCP
throughput with your hardware.

- Understand how your SMB clients are reading/writing data, i.e. what
block size they use, do they use long sequential, or small random
reads/writes.

- Understand disk I/O latency on your SMB server. What throughput can
it sustain for the workloads from your clients.

- What SMB protocol versions are you clients using SMB1, SMB2 or SMB3.
Later versions of SMB protocol are more efficient. Are your clients
using SMB signing feature, this can have as much as %20 performance
hit.

- Understand how many concurrent streams are running and if they are
running from a single or multiple clients. Samba server is not
multithreaded, so I think it forks a single child process per client.
This means it won't scale on multicore hardware if you are running many
concurrent streams but all from a single client. It is better to spread
the load across many different clients, this way multiple Samba server
processes can be scheduled to run on multiple CPU cores.

... and the list goes on.



Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-02 Thread Johnny Billquist

On 2019-02-02 23:14, Michael van Elst wrote:

tlaro...@polynum.com writes:


Is the speed adapted to each connected device? Or does the serving card
fix the speed, during a slice of time, for all connexions to the minimum
speed?


Autonegotiation means that the card and the switch communicate and
agree on a speed.



What is the "cost" of switching the speed or, in other words, is
connecting a 10base card able to slow done the whole throughput of the
card even for other devices---due to the overhead of switching the speed
depending on connected devices?


The speed isn't switched unless you disconnect or reconnect the
ethernet cable, but if the configuration is unchanged, card and
switch will usually agree on the same speed as before.



(The other question relates to the switch but not to NetBSD: does the
switch have a table for the connected devices and buffers the
transactions, rewriting the packets to adjust for the speed of each of
the devices?).


Almost all switches will allow to connect devices with different
speeds and packets that are received at one speed are buffered and
sent out at possibly a different speed. The packets don't need to
be rewritten.


Right. And I would say "all switches". That's essentially the difference 
between a hub and a switch. A hub just forward the bits in realtime, 
while a switch stores and forwards the packet. Packets are never 
rewritten, but a switch will have buffers, so it can handle different 
speeds on different ports.



Differences in speed between sender and receiver are usually handled
by higher level protocols (e.g. TCP) or by lower level protocols
(802.3x) when device and switch support it.


Well, in general, it is not "handled" at all. The packets are received 
by the switch at one speed, and sent out on a different port at a 
different speed. The two machines communicating will be unaware of the 
fact that they are actually communicating at different speeds. And the 
switch will just be using some memory, and introduce some small delay.


The one problem is when the switch is running out of memory. For 
example, if a machine runs a 100 Mbit/s link, and is producing data at 
full speed, and it needs to be forwarded on a 10 Mbit/s port. But that's 
simply a case of the switch then just dropping the packets.


A more clever switch will utilize random early drops (RED), which 
potentially have less impact on TCP connections than just outright drop 
all packets when the switch is really out of memory. But all of that is 
just attempts at protocol performance improvements based on knowledge of 
how TCP reacts to dropped packets.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-02 Thread Michael van Elst
tlaro...@polynum.com writes:

>Is the speed adapted to each connected device? Or does the serving card
>fix the speed, during a slice of time, for all connexions to the minimum
>speed?

Autonegotiation means that the card and the switch communicate and
agree on a speed.


>What is the "cost" of switching the speed or, in other words, is
>connecting a 10base card able to slow done the whole throughput of the
>card even for other devices---due to the overhead of switching the speed
>depending on connected devices?

The speed isn't switched unless you disconnect or reconnect the
ethernet cable, but if the configuration is unchanged, card and
switch will usually agree on the same speed as before.


>(The other question relates to the switch but not to NetBSD: does the
>switch have a table for the connected devices and buffers the
>transactions, rewriting the packets to adjust for the speed of each of
>the devices?).

Almost all switches will allow to connect devices with different
speeds and packets that are received at one speed are buffered and
sent out at possibly a different speed. The packets don't need to
be rewritten.

Differences in speed between sender and receiver are usually handled
by higher level protocols (e.g. TCP) or by lower level protocols
(802.3x) when device and switch support it.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-02 Thread Greg Troxel
tlaro...@polynum.com writes:

> I have a NetBSD serving FFSv2 filesystems to various Windows nodes via
> Samba.
>
> The network efficiency seems to me underpar.

I would recommend trying to test with ttcp or some such first, to
establish packet-handling baseline performance separate from remote
filesystem issues.

Remember that network speeds are given in Mb/s but data throughput is
often in MB/s (bits vs bytes).

Then, there are various header overheads (ethernet, IP, TCP).  Many
interfaces cannot run at full Gb/s rate, but most 100 Mb/s ones get
close (for semi-modern or modern computers).

So you should explain what rates you are actually seeing.  80 Mb/s on
100 Mb/s is great, and 300 Mb/s on Gb/s ethernet is not unusual.

Plus, with seek time and disk read time, that can start to matter.

> There is very probably Samba tuning involved. Windows tuning too. But a
> question arised to me about miscellaneous speeds of ethernet cards
> connecting to a card on the NetBSD server able of auto-selecting the
> speed between 10 to 1000.
>
> The Windows boxes are very hetergoneous (one might even say that there
> are not too same Windows OS versions, because some hardware is quite
> old) and the cards range from 10 to 1000 able ethernet devices.
>
> Needless to say, there is a switch (Cisco) on which all the nodes are
> connected.

Modern (after 10base2 went awwy in the 90s) Ethernet is all point to
point, with switches or hubs.  These days, hubs are extremely rare.  I
believe hubs can only handle one speed.  There were certainly 10baseT
hubs, and I remember 10/100 hub/switch combos that were actually a 10
Mb/s hub and a 100 Mb/s hub with a 2-port switch between tem, and each
port got connected to the right hub depending on what was connected.
But anything in the last 5 years, maybe even 10 years, is highly likely
a full switch.

A switch will negotiate a speed with each connected device.  Most have
lights to show you what was negotiated, explained in the manual.

Sometimes autonegotiation is flaky (speed, and full-duplex) and it helps
to force a speed on each client.  But I suspect, without a good basis,
that this is probably not your issue.

> When concurrent accesses to an auto-select ethernet card are done by
> ethernet cards ranging from 10 to 1000 speeds, are is this handled by
> the card?

Each computer's interface negogiates a speed with the switch.  With a
gigabit switch, that should be the highest speed the interface supports.

Packets are then sent from card to switch, at that card's rate, and then
from switch to the other card, at the second card's rate.  There  is no
per-packet negotiation of speeds.

> What is the "cost" of switching the speed or, in other words, is
> connecting a 10base card able to slow done the whole throughput of the
> card even for other devices---due to the overhead of switching the speed
> depending on connected devices?

Broadcast packets can be an issue.  Not because the individual links
change speed, but because if you have a 10 Mb/s link then all broadcast
packets have to be sent on it.

> (The other question relates to the switch but not to NetBSD: does the
> switch have a table for the connected devices and buffers the
> transactions, rewriting the packets to adjust for the speed of each of
> the devices?).

If it is truly a switch, yes.  But not so much rewriting as clocking
them out at the right speed (with the right modulation).


Usually this sort of switch speed issues works just fine without anybody
having to pay attention.

I would try to measure the file read speed from each machine to the
NetBSD server, and make a table with machine name, local link speed, and
rate, and see if it makes any sense.  Also perhaps try the
newest/fastest windows box with the others all powered off.  Having a
machine on and idle should not really change things, and that's another
data point.

If you follow up on the list, it would be good to give the switch
model, and to find out what speed each device is connected at.  On
NetBSD, 'ifconfig' will show you.  An example line:

media: Ethernet autoselect (100baseTX full-duplex)

Probably there is some way to find this out on Windows.


Re: Ethernet auto-select and concurrent 10, 100 and 1000 connections

2019-02-02 Thread Andy Ruhl
On Sat, Feb 2, 2019 at 10:18 AM  wrote:
>
> Hello,
>
> I have a NetBSD serving FFSv2 filesystems to various Windows nodes via
> Samba.
>
> The network efficiency seems to me underpar.
>
> There is very probably Samba tuning involved. Windows tuning too. But a
> question arised to me about miscellaneous speeds of ethernet cards
> connecting to a card on the NetBSD server able of auto-selecting the
> speed between 10 to 1000.
>
> The Windows boxes are very hetergoneous (one might even say that there
> are not too same Windows OS versions, because some hardware is quite
> old) and the cards range from 10 to 1000 able ethernet devices.
>
> Needless to say, there is a switch (Cisco) on which all the nodes are
> connected.
>
> When concurrent accesses to an auto-select ethernet card are done by
> ethernet cards ranging from 10 to 1000 speeds, are is this handled by
> the card?
>
> Is the speed adapted to each connected device? Or does the serving card
> fix the speed, during a slice of time, for all connexions to the minimum
> speed?
>
> What is the "cost" of switching the speed or, in other words, is
> connecting a 10base card able to slow done the whole throughput of the
> card even for other devices---due to the overhead of switching the speed
> depending on connected devices?
>
> (The other question relates to the switch but not to NetBSD: does the
> switch have a table for the connected devices and buffers the
> transactions, rewriting the packets to adjust for the speed of each of
> the devices?).
>
> If someone has any clue on the subject, I will be very thankful to
> learn!

As you probably suspect, this isn't a NetBSD issue, and is something
you can read on extensively on the internet. Maybe you need a place to
start, which is often where I find myself on many subjects. I probably
will miss something.

The switch negotiates connections on a port by port basis. So if one
device is 1 gig, it will negotiate 1 gig. If another device is 10
megabit, it will negotiate that. Each port is a separate entity. Then
you have half vs. full duplex. So what happens when they talk?

The switch does something at layer 2 called RED or WRED (Weighted
Random Early Detection) to decide if one port is going too fast for
another. It's not really an ideal place to be, and it usually happens
when either you have different adapter speeds or you have a whole lot
of machines on lots of ports trying to overrun 1 port (like an uplink
port).

But you're hoping it doesn't come to that. It's best if TCP just does
it's thing and sets the window size to one that both sides can handle
nicely and things "just work". RED or WRED will happen but hopefully
less.

I'd love someone to correct me if I'm wrong on this.

If you're asking if using a 10 megabit adapter is the best way to do
traffic shaping, it isn't, and that's a whole different subject that
probably doesn't belong here.

Andy