[Bug 220076] [patch] [panic] [netgraph] repeatable kernel panic due to a race in ng_iface(4)

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220076

Mark Linimon  changed:

   What|Removed |Added

 CC|freebsd-net@FreeBSD.org |
   Assignee|freebsd-b...@freebsd.org|freebsd-net@FreeBSD.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 200382] Loading netgraph via bsnmpd, etc can cause domain to be registered after domain_finalize has been called

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200382

Mark Linimon  changed:

   What|Removed |Added

   Assignee|n...@freebsd.org |freebsd-net@FreeBSD.org

--- Comment #3 from Mark Linimon  ---
Canonicalize assignment.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 200382] Loading netgraph via bsnmpd, etc can cause domain to be registered after domain_finalize has been called

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200382

Mark Linimon  changed:

   What|Removed |Added

   Assignee|n...@freebsd.org |freebsd-net@FreeBSD.org

--- Comment #3 from Mark Linimon  ---
Canonicalize assignment.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Problem reports for freebsd-net@FreeBSD.org that need special attention

2017-06-25 Thread bugzilla-noreply
To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status  |Bug Id | Description
+---+---
In Progress |165622 | [ndis][panic][patch] Unregistered use of FPU in k 
In Progress |206581 | bxe_ioctl_nvram handler is faulty 
New |204438 | setsockopt() handling of kern.ipc.maxsockbuf limi 
New |205592 | TCP processing in IPSec causes kernel panic   
New |206053 | kqueue support code of netmap causes panic
New |213410 | [carp] service netif restart causes hang only whe 
New |215874 | [patch] [icmp] [mbuf_tags] teach icmp_error() opt 
New |217748 | sys/dev/ixgbe/if_ix.c: PVS-Studio: Assignment to  
Open|173444 | socket: IPV6_USE_MIN_MTU and TCP is broken
Open|193452 | Dell PowerEdge 210 II -- Kernel panic bce (broadc 
Open|194485 | Userland cannot add IPv6 prefix routes
Open|194515 | Fatal Trap 12 Kernel with vimage  
Open|199136 | [if_tap] Added down_on_close sysctl variable to t 
Open|202510 | [CARP] advertisements sourced from CARP IP cause  
Open|206544 | sendmsg(2) (sendto(2) too?) can fail with EINVAL; 
Open|211031 | [panic] in ng_uncallout when argument is NULL 
Open|211962 | bxe driver queue soft hangs and flooding tx_soft_ 
Open|218653 | Intel e1000 network link drops under high network 

18 problems total for which you should take action.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mbuf_jumbo_9k & iSCSI failing

2017-06-25 Thread Ben RUBSON
> On 25 Jun 2017, at 17:32, Ryan Stone  wrote:
> 
> Having looking at the original email more closely, I see that you showed an 
> mlxen interface with a 9020 MTU.  Seeing allocation failures of 9k mbuf 
> clusters increase while you are far below the zone's limit means that you're 
> definitely running into the bug I'm describing, and this bug could plausibly 
> cause the iSCSI errors that you describe.
> 
> The issue is that the newer version of the driver tries to allocate a single 
> buffer to accommodate an MTU-sized packet.  Over time, however, memory will 
> become fragmented and eventually it can become impossible to allocate a 9k 
> physically contiguous buffer.  When this happens the driver is unable to 
> allocate buffers to receive packets and is forced to drop them.  Presumably, 
> if iSCSI suffers too many packet drops it will terminate the connection.  The 
> older version of the driver limited itself to page-sized buffers, so it was 
> immune to issues with memory fragmentation.

Thank you for your explanation Ryan.
You say "over time", and you're right, I have to wait several days (here 88) 
before the problem occurs.
Strange however that in 2500MB free memory system is unable to find 9k 
physically contiguous. But we never know :)

Let's then wait for your patch !
(and reboot for now)

Many thx !

Ben
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mbuf_jumbo_9k & iSCSI failing

2017-06-25 Thread Ryan Stone
Having looking at the original email more closely, I see that you showed an
mlxen interface with a 9020 MTU.  Seeing allocation failures of 9k mbuf
clusters increase while you are far below the zone's limit means that
you're definitely running into the bug I'm describing, and this bug could
plausibly cause the iSCSI errors that you describe.

The issue is that the newer version of the driver tries to allocate a
single buffer to accommodate an MTU-sized packet.  Over time, however,
memory will become fragmented and eventually it can become impossible to
allocate a 9k physically contiguous buffer.  When this happens the driver
is unable to allocate buffers to receive packets and is forced to drop
them.  Presumably, if iSCSI suffers too many packet drops it will terminate
the connection.  The older version of the driver limited itself to
page-sized buffers, so it was immune to issues with memory fragmentation.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mbuf_jumbo_9k & iSCSI failing

2017-06-25 Thread Ben RUBSON
> On 25 Jun 2017, at 17:14, Ryan Stone  wrote:
> 
> Is this setup using the mlx4_en driver?  If so, recent versions of that 
> driver has a regression when using MTUs greater than the page size (4096 on 
> i386/amd64).  The bug will cause the card to drop packets when the system is 
> under memory pressure, and in certain causes the card can get into a state 
> when it is no longer able to receive packets.  I am working on a fix; I can 
> post a patch when it's complete.

Thank you very much for your feedback Ryan.

Yes, my system is using mlx4_en driver, the one directly from FreeBSD 11.0 
sources tree.
Any indicator I could catch to be sure I'm experiencing the issue you are 
working on ?

Sounds like anyway I may be suffering from it...
Of course I would be glad to help testing your patch when it's complete.

Thank you again,

Ben

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: mbuf_jumbo_9k & iSCSI failing

2017-06-25 Thread Ryan Stone
Is this setup using the mlx4_en driver?  If so, recent versions of that
driver has a regression when using MTUs greater than the page size (4096 on
i386/amd64).  The bug will cause the card to drop packets when the system
is under memory pressure, and in certain causes the card can get into a
state when it is no longer able to receive packets.  I am working on a fix;
I can post a patch when it's complete.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


mbuf_jumbo_9k & iSCSI failing

2017-06-25 Thread Ben RUBSON
> On 30 Dec 2016, at 22:55, Ben RUBSON  wrote:
> 
> Hello,
> 
> 2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target.
> Both with Mellanox ConnectX-3 40G.
> 
> Since a few days, sometimes, under undetermined circumstances, as soon as 
> there is some (very low) iSCSI traffic, some of the disks get disconnected :
> kernel: WARNING: 192.168.2.2 (iqn..): no ping reply (NOP-Out) after 5 
> seconds; dropping connection
> 
> At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow on 
> initiator side.
> 
> I then tried to reproduce these network errors burning the link at 40G 
> full-duplex using iPerf.
> But I did not manage to increase these error counters.
> 
> It's strange because it's a sporadic issue, I can have traffic on iSCSI disks 
> without any issue, and sometimes, they get disconnected with errors growing.

> On 01 Jan 2017, at 09:16, Meny Yossefi  wrote:
> 
> Any chance you ran out of mbufs in the system?

> On 02 Jan 2017, at 12:09, Ben RUBSON  wrote:
> 
> I think you are right, this could be a mbufs issue.
> Here are some more numbers :
> 
> # vmstat -z | grep -v "0,   0$"
> ITEM   SIZE   LIMIT USED FREE REQ  FAIL 
> SLEEP
> 4 Bucket:32,  0,2673,   28327,   88449799,17317, 0
> 8 Bucket:64,  0, 449,   15609,   13926386, 4871, 0
> 12 Bucket:   96,  0, 335,5323,   10293892,   142872, 0
> 16 Bucket:  128,  0, 533,6070,7618615,   472647, 0
> 32 Bucket:  256,  0,8317,   22133,   36020376,   563479, 0
> 64 Bucket:  512,  0,1238,3298,   20138111, 11430742, 0
> 128 Bucket:1024,  0,1865,2963,   21162182,   158752, 0
> 256 Bucket:2048,  0,1626, 450,   80253784,  4890164, 0
> mbuf_jumbo_9k: 9216, 603712,   16400,8744, 4128521064, 2661, 0

> On 03 Jan 2017, at 07:27, Meny Yossefi  wrote:
> 
> Have you tried increasing the mbufs limit? 
> (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed)

> On 04 Jan 2017, at 14:47, Ben RUBSON  wrote:
> 
> No I did not try this yet.
> However, from the numbers above (and below), I think I should increase 
> kern.ipc.nmbjumbo9 instead ?

> On 30 Jan 2017, at 15:36, Ben RUBSON  wrote:
> 
> So, to give some news, increasing kern.ipc.nmbjumbo9 helped a lot.
> Just a very little issue (compared to the others before) over the last 3 
> weeks.



Hello,

I'm back today with this issue.
Above is my discussion with Meny from Mellanox at the beginning of 2017.
(topic was "iSCSI failing, MLX rx_ring errors ?", on freebsd-net list)

So this morning issue came again, some of my iSCSI disks were disconnected.
Below are some numbers.



# vmstat -z | grep -v "0,   0$"
ITEM   SIZELIMIT USED FREE REQ FAIL SLEEP
8 Bucket:64,   0, 654,8522,   28604967,  11, 0
12 Bucket:   96,   0, 976,5092,   23758734,  78, 0
32 Bucket:  256,   0, 789,4491,   43446969, 137, 0
64 Bucket:  512,   0, 666,2750,   47568959, 1272018, 0
128 Bucket:1024,   0,1047,1249,   28774042,  232504, 0
256 Bucket:2048,   0,1611, 369,  139988097, 8931139, 0
vmem btag:   56,   0, 2949738,   15506,   18092235,   20908, 0
mbuf_jumbo_9k: 9216, 2037529,   16400,8776, 8610737115, 297, 0

# uname -rs
FreeBSD 11.0-RELEASE-p8

# uptime
 3:34p.m.  up 88 days, 15:57, 2 users, load averages: 0.95, 0.67, 0.62

# grep kern.ipc.nmb /boot/loader.conf 
kern.ipc.nmbjumbo9=2037529
kern.ipc.nmbjumbo16=1

# sysctl kern.ipc | grep mb
kern.ipc.nmbufs: 26080380
kern.ipc.nmbjumbo16: 4
kern.ipc.nmbjumbo9: 6112587
kern.ipc.nmbjumbop: 2037529
kern.ipc.nmbclusters: 4075060
kern.ipc.maxmbufmem: 33382887424

# ifconfig mlxen1
mlxen1: flags=8843 metric 0 mtu 9020
options=ed07bb
nd6 options=29
media: Ethernet autoselect (40Gbase-CR4 )
status: active



I just caught the issue growing :

# vmstat -z | grep mbuf_jumbo_9k
ITEM   SIZE  LIMIT USED  FREEREQ FAIL SLEEP
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735246407, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16411, 7320,8735286748, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735298937, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16438, 7293,8735337634, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8735354339, 668, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735382105, 669, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735392836, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735423910, 671, 0