[Bug 148807] [panic] "panic: sbdrop" and "panic: sbsndptr: sockbuf _ and mbuf _ clashing" (8.1-RELEASE/10.1-STABLE/11-CURRENT)

2016-10-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=148807

--- Comment #31 from Hiren Panchasara  ---
(In reply to Robert Watson from comment #29)

Robert,

Thanks for your response.

 On a slightly modified (nothing in driver space) stable/11, I am seeing
repeated panic in sbsndptr() with igb while box is pretty much idle or doing
very low traffic.

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:221
#1  doadump (textdump=-2121667464) at
/d2/hiren/freebsd/sys/kern/kern_shutdown.c:298
#2  0x80389f86 in db_fncall_generic (nargs=0, addr=,
rv=, 
args=) at /d2/hiren/freebsd/sys/ddb/db_command.c:568
#3  db_fncall (dummy1=, dummy2=,
dummy3=, dummy4=)
at /d2/hiren/freebsd/sys/ddb/db_command.c:616
#4  0x80389a29 in db_command (last_cmdp=,
cmd_table=, 
dopager=) at /d2/hiren/freebsd/sys/ddb/db_command.c:440
#5  0x80389784 in db_command_loop () at
/d2/hiren/freebsd/sys/ddb/db_command.c:493
#6  0x8038c76b in db_trap (type=, code=)
at /d2/hiren/freebsd/sys/ddb/db_main.c:251
#7  0x809a6f33 in kdb_trap (type=, code=,
tf=)
at /d2/hiren/freebsd/sys/kern/subr_kdb.c:654
#8  0x80d93521 in trap_fatal (frame=0xfe1f2bb38210, eva=24)
at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:836
#9  0x80d93753 in trap_pfault (frame=0xfe1f2bb38210, usermode=0)
at /d2/hiren/freebsd/sys/amd64/amd64/trap.c:691
#10 0x80d92cdc in trap (frame=0xfe1f2bb38210) at
/d2/hiren/freebsd/sys/amd64/amd64/trap.c:442
#11 
#12 sbsndptr (sb=0xf8060f8a5518, off=0, len=4294967287,
moff=0xfe1f2bb38420)
at /d2/hiren/freebsd/sys/kern/uipc_sockbuf.c:1191
#13 0x80ab9382 in tcp_output (tp=) at
/d2/hiren/freebsd/sys/netinet/tcp_output.c:1099
#14 0x80ab6105 in tcp_do_segment (m=, th=, so=0xf8060f8a5360, 
tp=, drop_hdrlen=60, tlen=, iptos=, 
ti_locked=)
at /d2/hiren/freebsd/sys/netinet/tcp_input.c:3182
#15 0x80ab2803 in tcp_input (mp=, offp=,
proto=)
at /d2/hiren/freebsd/sys/netinet/tcp_input.c:1444
#16 0x80aa6bc5 in ip_input (m=)
at /d2/hiren/freebsd/sys/netinet/ip_input.c:809
#17 0x80a82b35 in netisr_dispatch_src (proto=1, source=,
m=0x0)
at /d2/hiren/freebsd/sys/net/netisr.c:1120
#18 0x80a6c2ca in ether_demux (ifp=, m=0x0) at
/d2/hiren/freebsd/sys/net/if_ethersubr.c:850
#19 0x80a6cf22 in ether_input_internal (ifp=, m=0x0)
at /d2/hiren/freebsd/sys/net/if_ethersubr.c:639
#20 ether_nh_input (m=) at
/d2/hiren/freebsd/sys/net/if_ethersubr.c:669
#21 0x80a82b35 in netisr_dispatch_src (proto=5, source=,
m=0x0)
at /d2/hiren/freebsd/sys/net/netisr.c:1120
#22 0x80a6c546 in ether_input (ifp=, m=0x0) at
/d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0x804e2b3c in igb_rx_input (rxr=,
ifp=0xf80115614800, m=0xf8014eee7600, 
ptype=) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957
#24 igb_rxeof (que=, count=358700136, done=)
at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:5185
#25 0x804e1daf in igb_msix_que (arg=) at
/d2/hiren/freebsd/sys/dev/e1000/if_igb.c:1612
#26 0x8091425f in intr_event_execute_handlers (p=,
ie=)
at /d2/hiren/freebsd/sys/kern/kern_intr.c:1262
#27 0x80914876 in ithread_execute_handlers (ie=,
p=)
at /d2/hiren/freebsd/sys/kern/kern_intr.c:1275
#28 ithread_loop (arg=) at
/d2/hiren/freebsd/sys/kern/kern_intr.c:1356
#29 0x80910ea5 in fork_exit (callout=0x809147b0 ,
arg=0xf8011561a0e0, 
frame=0xfe1f2bb38ac0) at /d2/hiren/freebsd/sys/kern/kern_fork.c:1040
#30 



Most interesting frames are these 2:

#22 0x80a6c546 in ether_input (ifp=, m=0x0) at
/d2/hiren/freebsd/sys/net/if_ethersubr.c:759
#23 0x804e2b3c in igb_rx_input (rxr=,
ifp=0xf80115614800, m=0xf8014eee7600, 
ptype=) at /d2/hiren/freebsd/sys/dev/e1000/if_igb.c:4957

#23 has an mbuf while #22 has it null.

Does this point to your hunch of
"device-driver bugs involving modifications to the mbuf chain after submitting
the mbuf to the network stack (e.g., due to concurrency bugs in the device
driver)" ?

OR something else is going on?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD10.3-RELEASE. Kernel panic.

2016-10-12 Thread Donald Baud via freebsd-net

On 10/12/16 3:24 PM, Zaphod Beeblebrox wrote:

While my mp5 servers are possibly less busy (I havn't had common 
crashes), I have noticed a "group" of problems.


1. The carrier dropping communication (ie: fiber cut or l2 switch 
breakage) of the L2TP streams can leave mpd5 in a state where it will 
not die and will not destroy interfaces (requires reboot to clear).
I've encountered that once on 10.3 and I had tweaked some sysctl values 
while monitoring :

> vmstat -z | head -1; vmstat -z | grep -i netgraph

you might want to search other people's experience with the following 
values:

# net.graph.maxdgram   #this is set in /etc/sysctl.conf
# net.graph.recvspace#this is set in /etc/sysctl.conf
# net.graph.maxdata  #this is set in /boot/loader.conf
# net.graph.maxalloc #this is set in /boot/loader.conf

I'll leave others to comment on what's best to set as values with their 
experience on FreeBSD10.3.
In my case, as I had explained, one of the recipes that worked for me is 
to comment out and leave those kernel values to their default.


I've read in mpd5 mailing list some saying that FreeBSD-11 have had 
upgrades on the netgraph modules.
I am now using FreeBSD-11 and It looks like I don't need any of the 
kernel tweaks that I've described.


Also, may I suggest you troubleshoot the fiber-cut or L2 switch breakage 
by playing with some ipfw values to simulate a fiber-cut.:

ex: ipfw add 100 deny ip from 10.10.10.10 to me
2. There are race conditions between quagga and mpd5 for 
adding/dropping routes.
While troubleshooting the crashes of the mpd5, I have removed net/quagga 
and installed net/bird instead.
I am now using net/bird I've written a little howto to get you started 
with net/bird

see: https://forums.freebsd.org/threads/56988/

3. if A is a pppoe client and B is the mpd5 server, A cannot access 
TCP services on B.  It can access tcp services _beyond_ B, but not on 
B. (there is a ticket open for this).


On Wed, Oct 12, 2016 at 10:51 AM, Donald Baud via freebsd-net 
> wrote:



On 10/12/16 1:13 AM, Julian Elischer wrote:

On 11/10/2016 8:56 PM, Donald Baud via freebsd-net wrote:

I've been plagued with these =daily= panics until I tried
the following recipes and the server has been up for 30
days so far:

Normally I should expermient more to see which one of the
receipes is really the fix, but I'm just glad that the
server is stable for now.


this is really great information.
It makes debugging a lot more possible.
I know it is a hard question, but do you have a way to
simulate this workload?

I have no real way to simulate this kind of workload


Sadly, I don't have a way to simulate the workload but I am very
interested to help fix these crashes since as Cassiano said, this
makes mpd5/freebsd useless for pppoe/l2tp termination.

At this point, I would suggest that Cassiano and Андрей confirm
that they don't get panics when they apply the recipes that I am
using.

I am still running many other cisco-vpdn gateways that I would
convert into mpd5/freebsd but my plan was stalled with the daily
crashes.
I'll wait a couple of weeks to be sure that my recipes are a valid
workaround before converting my remaining cisco gateways to mpd5.

-Dbaud



recipe-1: Don't let mpd5 start automatically when server
boots:
i.e. in: /etc/rc.conf
mpd5_enable="NO"
and wait about 5 minutes after server boots then issue:
/usr/local/etc/rc.d/mpd5 onestart


recipe-2: recompile the kernel with the NETGRAPH_DEBUG option:
options NETGRAPH
options NETGRAPH_DEBUG
options NETGRAPH_KSOCKET
options NETGRAPH_L2TP
options NETGRAPH_SOCKET
options NETGRAPH_TEE
options NETGRAPH_VJC
options NETGRAPH_PPP
options NETGRAPH_IFACE
options NETGRAPH_MPPC_COMPRESSION
options NETGRAPH_MPPC_ENCRYPTION
options NETGRAPH_TCPMSS
options IPFIREWALL

recipe-3: recompile the kernel and disable the IPv6 and
SCTP options:
nooptions   INET6
nooptions   SCTP

recipe-4: Don't use any of the sysctl optimizations
in other words I commented out all values in sysctl.conf:
# net.graph.maxdgram=20480  (this is the default)
# net.graph.recvspace=20480  (this is the default)

recipe-5: Don't use any of the loader.conf optimizations
in other words I commented out all values in loader.conf
# net.graph.maxdata=4096  (this is the default)
# 

Re: FreeBSD10.3-RELEASE. Kernel panic.

2016-10-12 Thread Zaphod Beeblebrox
While my mp5 servers are possibly less busy (I havn't had common crashes),
I have noticed a "group" of problems.

1. The carrier dropping communication (ie: fiber cut or l2 switch breakage)
of the L2TP streams can leave mpd5 in a state where it will not die and
will not destroy interfaces (requires reboot to clear).
2. There are race conditions between quagga and mpd5 for adding/dropping
routes.
3. if A is a pppoe client and B is the mpd5 server, A cannot access TCP
services on B.  It can access tcp services _beyond_ B, but not on B. (there
is a ticket open for this).

On Wed, Oct 12, 2016 at 10:51 AM, Donald Baud via freebsd-net <
freebsd-net@freebsd.org> wrote:

>
> On 10/12/16 1:13 AM, Julian Elischer wrote:
>
>> On 11/10/2016 8:56 PM, Donald Baud via freebsd-net wrote:
>>
>>> I've been plagued with these =daily= panics until I tried the following
>>> recipes and the server has been up for 30 days so far:
>>>
>>> Normally I should expermient more to see which one of the receipes is
>>> really the fix, but I'm just glad that the server is stable for now.
>>>
>>
>> this is really great information.
>> It makes debugging a lot more possible.
>> I know it is a hard question, but do you have a way to simulate this
>> workload?
>>
>> I have no real way to simulate this kind of workload
>>
>
> Sadly, I don't have a way to simulate the workload but I am very
> interested to help fix these crashes since as Cassiano said, this makes
> mpd5/freebsd useless for pppoe/l2tp termination.
>
> At this point, I would suggest that Cassiano and Андрей confirm that they
> don't get panics when they apply the recipes that I am using.
>
> I am still running many other cisco-vpdn gateways that I would convert
> into mpd5/freebsd but my plan was stalled with the daily crashes.
> I'll wait a couple of weeks to be sure that my recipes are a valid
> workaround before converting my remaining cisco gateways to mpd5.
>
> -Dbaud
>
>
>>>
>>> recipe-1: Don't let mpd5 start automatically when server boots:
>>> i.e. in: /etc/rc.conf
>>> mpd5_enable="NO"
>>> and wait about 5 minutes after server boots then issue:
>>> /usr/local/etc/rc.d/mpd5 onestart
>>>
>>>
>>> recipe-2: recompile the kernel with the NETGRAPH_DEBUG option:
>>> options NETGRAPH
>>> options NETGRAPH_DEBUG
>>> options NETGRAPH_KSOCKET
>>> options NETGRAPH_L2TP
>>> options NETGRAPH_SOCKET
>>> options NETGRAPH_TEE
>>> options NETGRAPH_VJC
>>> options NETGRAPH_PPP
>>> options NETGRAPH_IFACE
>>> options NETGRAPH_MPPC_COMPRESSION
>>> options NETGRAPH_MPPC_ENCRYPTION
>>> options NETGRAPH_TCPMSS
>>> options IPFIREWALL
>>>
>>> recipe-3: recompile the kernel and disable the IPv6 and SCTP options:
>>> nooptions   INET6
>>> nooptions   SCTP
>>>
>>> recipe-4: Don't use any of the sysctl optimizations
>>> in other words I commented out all values in sysctl.conf:
>>> # net.graph.maxdgram=20480  (this is the default)
>>> # net.graph.recvspace=20480  (this is the default)
>>>
>>> recipe-5: Don't use any of the loader.conf optimizations
>>> in other words I commented out all values in loader.conf
>>> # net.graph.maxdata=4096  (this is the default)
>>> # net.graph.maxalloc=4096 (this is the default)
>>>
>>> 
>>> In my case, I had the panics with 10.3 and 11-PRERELEASE
>>> 11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #2 r305587
>>>
>>> With those recipes, I have been running without any crash for a month
>>> and counting.  Thats' 300 l2tp tunnels and 1400 l2tp sessions generating
>>> 700Mbit/s.
>>>
>>>
>>> -DBaud
>>>
>>>
>>> On Tuesday, October 11, 2016 7:30 AM, Cassiano Peixoto <
>>> peixotocassi...@gmail.com> wrote:
>>> Hi,
>>>
>>> There are many users complaining about this:
>>>
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186114
>>>
>>> I've been dealing with this issue for one year with no solution. mpd5 as
>>> pppoe server on FreeBSD is useless with this bug.
>>>
>>> I really would like to see it working again, i think it's quite important
>>> to both project and many users.
>>>
>>> Thanks.
>>>
>>> On Tue, Oct 11, 2016 at 3:24 AM, Eugene Grosbein 
>>> wrote:
>>>
>>> 11.10.2016 11:02, Андрей Леушкин пишет:

 Hello. I have problem with "FreeBSD nas 10.3-RELEASE FreeBSD
> 10.3-RELEASE
> #0: Fri Oct  7 21:12:56 YEKT 2016 nas@nas:/usr/obj/usr/src/sys/nasv3
>amd64"
>
> Kernel panic is repeated at intervals of 2-3 days. At first I thought
> that
> the problem is in the hardware, but the problem did not go away after
> replacing the server platform.
>
> Coredumps and more info on link
> https://drive.google.com/open?id=0BxciMy2q7ZjTTkIxem9wTE1tM2M
>
> Sorry for my english.
> I'll wait for an answer.
>
> This is known and long-stanging problem in the FreeBSD network stack.
 It shows up when you have lots of network interfaced created/removed
 

Re: FreeBSD10.3-RELEASE. Kernel panic.

2016-10-12 Thread Donald Baud via freebsd-net


On 10/12/16 1:13 AM, Julian Elischer wrote:

On 11/10/2016 8:56 PM, Donald Baud via freebsd-net wrote:
I've been plagued with these =daily= panics until I tried the 
following recipes and the server has been up for 30 days so far:


Normally I should expermient more to see which one of the receipes is 
really the fix, but I'm just glad that the server is stable for now.


this is really great information.
It makes debugging a lot more possible.
I know it is a hard question, but do you have a way to simulate this 
workload?


I have no real way to simulate this kind of workload


Sadly, I don't have a way to simulate the workload but I am very 
interested to help fix these crashes since as Cassiano said, this makes 
mpd5/freebsd useless for pppoe/l2tp termination.


At this point, I would suggest that Cassiano and Андрей confirm that 
they don't get panics when they apply the recipes that I am using.


I am still running many other cisco-vpdn gateways that I would convert 
into mpd5/freebsd but my plan was stalled with the daily crashes.
I'll wait a couple of weeks to be sure that my recipes are a valid 
workaround before converting my remaining cisco gateways to mpd5.


-Dbaud



recipe-1: Don't let mpd5 start automatically when server boots:
i.e. in: /etc/rc.conf
mpd5_enable="NO"
and wait about 5 minutes after server boots then issue:
/usr/local/etc/rc.d/mpd5 onestart


recipe-2: recompile the kernel with the NETGRAPH_DEBUG option:
options NETGRAPH
options NETGRAPH_DEBUG
options NETGRAPH_KSOCKET
options NETGRAPH_L2TP
options NETGRAPH_SOCKET
options NETGRAPH_TEE
options NETGRAPH_VJC
options NETGRAPH_PPP
options NETGRAPH_IFACE
options NETGRAPH_MPPC_COMPRESSION
options NETGRAPH_MPPC_ENCRYPTION
options NETGRAPH_TCPMSS
options IPFIREWALL

recipe-3: recompile the kernel and disable the IPv6 and SCTP options:
nooptions   INET6
nooptions   SCTP

recipe-4: Don't use any of the sysctl optimizations
in other words I commented out all values in sysctl.conf:
# net.graph.maxdgram=20480  (this is the default)
# net.graph.recvspace=20480  (this is the default)

recipe-5: Don't use any of the loader.conf optimizations
in other words I commented out all values in loader.conf
# net.graph.maxdata=4096  (this is the default)
# net.graph.maxalloc=4096 (this is the default)


In my case, I had the panics with 10.3 and 11-PRERELEASE
11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #2 r305587

With those recipes, I have been running without any crash for a month 
and counting.  Thats' 300 l2tp tunnels and 1400 l2tp sessions 
generating 700Mbit/s.



-DBaud


On Tuesday, October 11, 2016 7:30 AM, Cassiano Peixoto 
 wrote:

Hi,

There are many users complaining about this:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186114

I've been dealing with this issue for one year with no solution. mpd5 as
pppoe server on FreeBSD is useless with this bug.

I really would like to see it working again, i think it's quite 
important

to both project and many users.

Thanks.

On Tue, Oct 11, 2016 at 3:24 AM, Eugene Grosbein  
wrote:



11.10.2016 11:02, Андрей Леушкин пишет:

Hello. I have problem with "FreeBSD nas 10.3-RELEASE FreeBSD 
10.3-RELEASE

#0: Fri Oct  7 21:12:56 YEKT 2016 nas@nas:/usr/obj/usr/src/sys/nasv3
   amd64"

Kernel panic is repeated at intervals of 2-3 days. At first I 
thought that

the problem is in the hardware, but the problem did not go away after
replacing the server platform.

Coredumps and more info on link
https://drive.google.com/open?id=0BxciMy2q7ZjTTkIxem9wTE1tM2M

Sorry for my english.
I'll wait for an answer.


This is known and long-stanging problem in the FreeBSD network stack.
It shows up when you have lots of network interfaced created/removed
frequently
like in your case of Network Access Server (PPtP, PPPoE etc).

Generally, people run into this problem using mpd5 network daemon.
mpd5 uses NETGRAPH kernel subsystem to process traffic and
if an interface disappears (f.e., ,user disconnected)
while kernel still processes traffic obtained from this interface, it
panices.

There were lots of reports of this problem. Noone seems to be 
working on

it at the moment.
You should fill a PR using Bugzilla and attach your logs to it.

Eugene Grosbein



___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Bug 213410] [carp] service netif restart causes hang only when carp is enabled

2016-10-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213410

Mark Linimon  changed:

   What|Removed |Added

   Assignee|freebsd-b...@freebsd.org|freebsd-net@FreeBSD.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Bug 138782] [panic] sbflush_internal: cc 0 || mb 0xffffff004127b000 || mbcnt 2304

2016-10-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=138782

John W. O'Brien  changed:

   What|Removed |Added

 CC||j...@saltant.com

--- Comment #7 from John W. O'Brien  ---
I just encountered this panic on 10-STABLE r306933 with under two days of
uptime. I upgraded a few days ago from r301164 which had been running
continuously for over 120 days. Unfortunately I don't have a core, but I would
be glad to provide any other information and am interested in pursuing a fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"