Re: using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-27 Thread Kristof Provost

On 16 Apr 2021, at 17:58, Kristof Provost wrote:

On 14 Apr 2021, at 16:16, Peter Ankerstål wrote:
In pf I use the interface group syntax alot to make the configuration 
more readable. All interfaces are assigned to a group representing 
its use/vlan name.


For example:

ifconfig_igb1_102="172.22.0.1/24 group iot description 'iot vlan' up"
ifconfig_igb1_102_ipv6="inet6 2001:470:de59:22::1/64"

ifconfig_igb1_300="172.26.0.1/24 group mgmt description 'mgmt vlan’ 
up"

ifconfig_igb1_300_ipv6="inet6 2001:470:de59:26::1/64”

in pf.conf I use these group names all over the place. But since I 
upgraded to 13.0-RELEASE it no longer works to define a table using 
the :network syntax and interface groups:


tableconst { trusted:network mgmt:network 
dmz:network guest:network edmz:network \

admin:network iot:network client:network }

If I reload the configuration I get the following:
# pfctl -f /etc/pf.conf
/etc/pf.conf:12: cannot create address buffer: Invalid argument
pfctl: Syntax error in config file: pf rules not loaded


I can reproduce that.

It looks like there’s some confusion inside pfctl about the network 
group. It ends up in pfctl_parser.c, append_addr_host(), and expects 
an AF_INET or AF_INET6, but instead gets an AF_LINK.


It’s probably related to 250994 or possibly 
d2568b024da283bd2b88a633eecfc9abf240b3d8.
Either way it’s pretty deep in a part of the pfctl code I don’t 
much like. I’ll try to poke at it some more over the weekend.


It should be fixed as of d5b08e13dd6beb3436e181ff1f3e034cc8186584 in 
main. I’ll MFC that in about a week, and then it’ll turn up in 13.1 
in the fullness of time.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-16 Thread Kristof Provost

On 14 Apr 2021, at 16:16, Peter Ankerstål wrote:
In pf I use the interface group syntax alot to make the configuration 
more readable. All interfaces are assigned to a group representing its 
use/vlan name.


For example:

ifconfig_igb1_102="172.22.0.1/24 group iot description 'iot vlan' up"
ifconfig_igb1_102_ipv6="inet6 2001:470:de59:22::1/64"

ifconfig_igb1_300="172.26.0.1/24 group mgmt description 'mgmt vlan’ 
up"

ifconfig_igb1_300_ipv6="inet6 2001:470:de59:26::1/64”

in pf.conf I use these group names all over the place. But since I 
upgraded to 13.0-RELEASE it no longer works to define a table using 
the :network syntax and interface groups:


tableconst { trusted:network mgmt:network 
dmz:network guest:network edmz:network \

admin:network iot:network client:network }

If I reload the configuration I get the following:
# pfctl -f /etc/pf.conf
/etc/pf.conf:12: cannot create address buffer: Invalid argument
pfctl: Syntax error in config file: pf rules not loaded


I can reproduce that.

It looks like there’s some confusion inside pfctl about the network 
group. It ends up in pfctl_parser.c, append_addr_host(), and expects an 
AF_INET or AF_INET6, but instead gets an AF_LINK.


It’s probably related to 250994 or possibly 
d2568b024da283bd2b88a633eecfc9abf240b3d8.
Either way it’s pretty deep in a part of the pfctl code I don’t much 
like. I’ll try to poke at it some more over the weekend.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [pf] stable/12: block by OS broken

2021-02-17 Thread Kristof Provost

On 18 Feb 2021, at 6:01, Xin Li wrote:

Hi,

It appears that some change between 939430f2377 (December 31) and
b4bf7bdeb70 (today) on stable/12 have broken pf in a way that the
following rule:

block in quick proto tcp from any os "Linux" to any port ssh

would get interpreted as:

block drop in quick proto tcp from any to any port = 22

(and block all SSH connection instead of just the ones initiated from
Linux).


Thanks for the report. I think I see the problem.

Can you test this patch?

diff --git a/sys/netpfil/pf/pf_ioctl.c b/sys/netpfil/pf/pf_ioctl.c
index 593a38d4a360..458c6af3fa5e 100644
--- a/sys/netpfil/pf/pf_ioctl.c
+++ b/sys/netpfil/pf/pf_ioctl.c
	@@ -1623,7 +1623,7 @@ pf_rule_to_krule(const struct pf_rule *rule, 
struct pf_krule *krule)
	/* Don't allow userspace to set evaulations, packets or bytes. 
*/

/* kif, anchor, overload_tbl are not copied over. */

-   krule->os_fingerprint = krule->os_fingerprint;
+   krule->os_fingerprint = rule->os_fingerprint;

krule->rtableid = rule->rtableid;
bcopy(rule->timeout, krule->timeout, sizeof(krule->timeout));

With any luck we’ll be able to include the fix in 13.0.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: latest stable13.0-ALPHA3 can not start varnish anymore.

2021-02-02 Thread Kristof Provost

On 2 Feb 2021, at 14:05, Johan Hendriks wrote:

On 01/02/2021 22:48, Johan Hendriks wrote:
I just updated my FreeBSD 13.0-APLPHA3 to the latest revision and now 
i can not start varnish anymore.

This is on two machines.

if i start varnish it errors out as the startup script does a config 
file check.

If i try to do a test of the config file i get the following error.

root@jhost002:~ # varnishd -C -f /usr/local/etc/varnish/default.vcl
Error: Cannot create working directory '/tmp/varnishd_C_dwbl7mn/': Is 
a directory
Error: Cannot create working directory (/tmp/varnishd_C_dwbl7mn/): No 
error: 0

(-? gives usage)
root@jhost002:~ #

This is on:
FreeBSD jhost002.mydomain.nl 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 #35 
stable/13-c256281-gc415d0df47fc: Mon Feb  1 17:04:49 CET 2021 
r...@srv-01.home.local:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64


I did not update the package or installed any other software besides 
the buildworld.


regards,
Johan


I have tried some bisecting as far as my understanding of git goes. I 
do not know which one is the latest, but on these revisions varnish 
works.


FreeBSD jhost002 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 #8 
c256261-g9375a93b6c22: Tue Feb  2 13:33:05 CET 2021 
root@jhost002:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64


uname -a
FreeBSD jhost002 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 #7 
c256260-g247f652e622d: Tue Feb  2 13:07:37 CET 2021 
root@jhost002:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64



Can you try setting `sysctl vfs.cache_fast_lookup=0` ?

(As suggested by Mateusz elsewhere.)

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Kristof Provost

Peter,

I’m not interested in discussing software development methodology 
here.


Please drop me from this thread. Let me know if/when you have a test 
case I can work from.


Regards,
Kristof

On 9 Dec 2020, at 11:54, Peter wrote:


On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:

! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to 
reproduce


Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way.

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?).

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 9 Dec 2020, at 2:31, Peter wrote:

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! >
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop


exec.poststop = "
   sleep 6 ;
   /usr/sbin/ngctl shutdown ${ifname1l}: ;
   ";


helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can 
trigger

! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

No, we need a simple case to reproduce these problems. It’s fine if 
that test case triggers multiple issues.



Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A


Happily we’re not in space.



! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.





(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.

These issues should trigger just fine in VMs. There’s no need for 
hardware pain.


Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 19:49, Peter wrote:

On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s 
most easily

! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe003f80e4d0

vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp 
= 0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 
0xfe003f80e810

ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---

Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 
22: Broken pipe


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not 
present

instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607451910
KDB

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 0:34, Peter wrote:

Hi Kristof,
  it's great to read You!

On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote:

! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 
244703,

! 250870.

epair? No. It is purely Netgrh here.

Yeah, the bug is not exclusive to epair but that’s where it’s most 
easily seen.


! I pushed a fix for that in CURRENT in r368237. It’s scheduled to 
go into
! stable/12 sometime next week, but it’d be good to know that it 
fixes your

! problem too before I merge it.
! In other words: can you test a recent CURRENT? It’s likely fixed 
there, and

! if it’s not I may be able to fix it quickly.


Oh my Gods. No offense meant, but this is not really a good time
for that. This is the most horrible upgrade I experienced in 25 years
FreeBSD (and it was prepared, 12.2 did run fine on the other machine).

I have issue with mem config
https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/
I have issue with damaged filesystem, for no apparent reason
https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/

Then I have this issue here which is now gladly workarounded
https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365

and when I then dare to have a look at my applications, they look like
sheer horror, segfaults all over, and I don't even know where to begin
with these.


Other option: can you make this fix so that I can patch it into 12.2
source and just redeploy?

Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


That’s currently running the regression tests that used to provoke the 
panic nearly instantly, and no panics so far.


Best regards.
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Kristof Provost

On 7 Dec 2020, at 13:54, Peter wrote:

After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do
no longer work correctly.

Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style,
and do not work properly.
All did work flawlessly for nearly a year with Rel.11.

If I start 2-3 jails, and then stop them again, there is always a
panic.
Also reproducible with GENERIC kernel.

Can this be fixed, or do I need to revert to 11.4?

The backtrace looks like this:

#4 0x810bbadf at trap_pfault+0x4f
#5 0x810bb23f at trap+0x4cf
#6 0x810933f8 at calltrap+0x8
#7 0x80cdd555 at _if_delgroup_locked+0x465
#8 0x80cdbfbe at if_detach_internal+0x24e
#9 0x80ce305c at if_vmove+0x3c
#10 0x80ce3010 at vnet_if_return+0x50
#11 0x80d0e696 at vnet_destroy+0x136
#12 0x80ba781d at prison_deref+0x27d
#13 0x80c3e38a at taskqueue_run_locked+0x14a
#14 0x80c3f799 at taskqueue_thread_loop+0xb9
#15 0x80b9fd52 at fork_exit+0x82
#16 0x8109442e at fork_trampoline+0xe

This is my typical jail config, designed and tested with Rel.11:

That smells a lot like the epair/vnet issues in bugs 238870, 234985, 
244703, 250870.
I pushed a fix for that in CURRENT in r368237. It’s scheduled to go 
into stable/12 sometime next week, but it’d be good to know that it 
fixes your problem too before I merge it.
In other words: can you test a recent CURRENT? It’s likely fixed 
there, and if it’s not I may be able to fix it quickly.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Commit 367705+367706 causes a pabic

2020-11-23 Thread Kristof Provost

Peter,

Is that backtrace from the first or the second situation you describe? 
What kernel config are you using with that backtrace?


This backtrace does not appear to involve the bridge. Given that part of 
the panic message is cut off it’s very hard to conclude anything at 
all from it.


Best regards,
Kristof

On 23 Nov 2020, at 11:52, Peter Blok wrote:


Kristof,

With commit 367705+367706 and if_bridge statically linked. It crashes 
while adding an epair of a jail.


With commit 367705+367706 and if_bridge dynamically loaded there is a 
crash at reboot


#0 0x8069ddc5 at kdb_backtrace+0x65
#1 0x80652c8b at vpanic+0x17b
#2 0x80652b03 at panic+0x43
#3 0x809c8951 at trap_fatal+0x391
#4 0x809c89af at trap_pfault+0x4f
#5 0x809c7ff6 at trap+0x286
#6 0x809a1ec8 at calltrap+0x8
#7 0x8079f7ed at ip_input+0x63d
#8 0x8077a07a at netisr_dispatch_src+0xca
#9 0x8075a6f8 at ether_demux+0x138
#10 0x8075b9bb at ether_nh_input+0x33b
#11 0x8077a07a at netisr_dispatch_src+0xca
#12 0x8075ab1b at ether_input+0x4b
#13 0x8077a80b at swi_net+0x12b
#14 0x8061e10c at ithread_loop+0x23c
#15 0x8061afbe at fork_exit+0x7e
#16 0x809a2efe at fork_trampoline+0xe

Peter


On 21 Nov 2020, at 17:22, Peter Blok  wrote:

Kristof,

With a GENERIC kernel it does NOT happen. I do have a different iflib 
related panic at reboot, but I’ll report that separately.


I brought the two config files closer together and found out that if 
I remove if_bridge from the config file and have it loaded 
dynamically when the bridge is created, the problem no longer happens 
and everything works ok.


Peter


On 20 Nov 2020, at 15:53, Kristof Provost  wrote:

I still can’t reproduce that panic.

Does it happen immediately after you start a vnet jail?

Does it also happen with a GENERIC kernel?

Regards,
Kristof

On 20 Nov 2020, at 14:53, Peter Blok wrote:

The panic with ipsec code in the backtrace was already very 
strange. I was using IPsec, but only on one interface totally 
separate from the members of the bridge as well as the bridge 
itself. The jails were not doing any ipsec as well. Note that panic 
was a while ago and it was after the 1st bridge epochification was 
done on stable-12 which was later backed out


Today the system is no longer using ipsec, but it is still compiled 
in. I can remove it if need be for a test



src.conf
WITHOUT_KERBEROS=yes
WITHOUT_GSSAPI=yes
WITHOUT_SENDMAIL=true
WITHOUT_MAILWRAPPER=true
WITHOUT_DMAGENT=true
WITHOUT_GAMES=true
WITHOUT_IPFILTER=true
WITHOUT_UNBOUND=true
WITHOUT_PROFILE=true
WITHOUT_ATM=true
WITHOUT_BSNMP=true
#WITHOUT_CROSS_COMPILER=true
WITHOUT_DEBUG_FILES=true
WITHOUT_DICT=true
WITHOUT_FLOPPY=true
WITHOUT_HTML=true
WITHOUT_HYPERV=true
WITHOUT_NDIS=true
WITHOUT_NIS=true
WITHOUT_PPP=true
WITHOUT_TALK=true
WITHOUT_TESTS=true
WITHOUT_WIRELESS=true
#WITHOUT_LIB32=true
WITHOUT_LPR=true

make.conf
KERNCONF=BHYVE
MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge 
bridgestp if_vxlan pflog libmchain libiconv smbfs linux linux64 
linux_common linuxkpi linprocfs linsysfs ext2fs

DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8
OPTIONS_UNSET=DOCS NLS MANPAGES

BHYVE
cpu HAMMER
ident   BHYVE

makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols
makeoptions WITH_CTF=1  # Run ctfconvert(1) for DTrace support

options CAMDEBUG

options SCHED_ULE   # ULE scheduler
options PREEMPTION  # Enable kernel thread preemption
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options IPSEC
options TCP_OFFLOAD # TCP offload
options TCP_RFC7413 # TCP FASTOPEN
options SCTP# Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options UFS_GJOURNAL# Enable gjournal-based UFS journaling
options QUOTA   # Enable disk quotas for UFS
options SUIDDIR
options NFSCL   # Network Filesystem Client
options NFSD# Network Filesystem Server
options NFSLOCKD# Network Lock Manager
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options FUSEFS
options NULLFS  # NULL filesystem
options UNIONFS
options FDESCFS # File descriptor filesystem
options PROCFS  # Process

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost
 capabilities
options MAC # TrustedBSD MAC Framework
options MAC_PORTACL
options MAC_NTPD
options KDTRACE_FRAME   # Ensure frames are compiled in
options KDTRACE_HOOKS   # Kernel DTrace hooks
options DDB_CTF # Kernel ELF linker loads CTF data
options INCLUDE_CONFIG_FILE # Include this file in kernel

# Debugging support.  Always need this:
options KDB # Enable kernel debugger support.
options KDB_TRACE   # Print a stack trace for a panic.
options KDB_UNATTENDED

# Make an SMP-capable kernel by default
options SMP # Symmetric MultiProcessor Kernel
options EARLY_AP_STARTUP

# CPU frequency control
device  cpufreq
device  cpuctl
device  coretemp

# Bus support.
device  acpi
options ACPI_DMAR
device  pci
options PCI_IOV # PCI SR-IOV support

device  iicbus
device  iicbb

device  iic
device  ic
device  iicsmb

device  ichsmb
device  smbus
device  smb

#device jedec_dimm

# ATA controllers
device  ahci# AHCI-compatible SATA controllers
device  mvs # Marvell 
88SX50XX/88SX60XX/88SX70XX/SoC SATA

# SCSI Controllers
device  mps # LSI-Logic MPT-Fusion 2

# ATA/SCSI peripherals
device  scbus   # SCSI bus (required for ATA/SCSI)
device  da  # Direct Access (disks)
device  cd  # CD
device  pass# Passthrough device (direct ATA/SCSI 
access)
device  ses # Enclosure Services (SES and SAF-TE)
device  sg

device  cfiscsi
device  ctl # CAM Target Layer
device  iscsi

# atkbdc0 controls both the keyboard and the PS/2 mouse
device  atkbdc  # AT keyboard controller
device  atkbd   # AT keyboard
device  psm # PS/2 mouse

device  kbdmux  # keyboard multiplexer

# vt is the new video console driver
device  vt
device  vt_vga
device  vt_efifb

# Serial (COM) ports
device  uart# Generic UART driver

# PCI/PCI-X/PCIe Ethernet NICs that use iflib infrastructure
device  iflib
device  em  # Intel PRO/1000 Gigabit Ethernet Family
device  ix  # Intel PRO/10GbE PCIE PF Ethernet

# Network stack virtualization.
options VIMAGE

# Pseudo devices.
device  crypto
device  cryptodev
device  loop# Network loopback
device  random  # Entropy device
device  padlock_rng # VIA Padlock RNG
device  rdrand_rng  # Intel Bull Mountain RNG
device  ipmi
device  smbios
device  vpd
device  aesni   # AES-NI OpenCrypto module
device  ether   # Ethernet support
device  lagg
device  vlan# 802.1Q VLAN support
device  tuntap  # Packet tunnel.
device  md  # Memory "disks"
device  gif # IPv6 and IPv4 tunneling
device  firmware# firmware assist module

device  pf
#device pflog
#device pfsync

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device  bpf # Berkeley packet filter

# The `epair' device implements a virtual back-to-back connected 
Ethernet

# like interface pair.
device  epair

# USB support
options USB_DEBUG   # enable debug msgs
device  uhci# UHCI PCI->USB interface
device  ohci# OHCI PCI->USB interface
device  ehci# EHCI PCI->USB interface (USB 2.0)
device  xhci# XHCI PCI->USB interface (USB 3.0)
device  usb # USB Bus (required)
device  uhid
device  ukbd# Keyboard
device  umass   # Disks/Mass storage - Requires scbus 
and da
device  ums

device  filemon

device  if_bridge


On 20 Nov 2020, at 12:53, Kristof Provost  wrote:

Can you share your kernel config file (and src.conf / make.conf if 
they exist)?


This second panic is in the IPSec code. My current thinking is that 
your kernel config is triggering a bug that’s manifesting in 
multiple places, but not actually caused by those places.

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost
Can you share your kernel config file (and src.conf / make.conf if they 
exist)?


This second panic is in the IPSec code. My current thinking is that your 
kernel config is triggering a bug that’s manifesting in multiple 
places, but not actually caused by those places.


I’d like to be able to reproduce it so we can debug it.

Best regards,
Kristof

On 20 Nov 2020, at 12:02, Peter Blok wrote:

Hi Kristof,

This is 12-stable. With the previous bridge epochification that was 
backed out my config had a panic too.


I don’t have any local modifications. I did a clean rebuild after 
removing /usr/obj/usr


My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and 
nmdm.ko as modules. Everything else is statically linked. I have 
removed all drivers not needed for the hardware at hand.


My bridge is between two vlans from the same trunk and the jail epair 
devices as well as the bhyve tap devices.


The panic happens when the jails are starting.

I can try to narrow it down over the weekend and make the crash dump 
available for analysis.


Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0410
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80692326
stack pointer   = 0x28:0xfe00c06097b0
frame pointer   = 0x28:0xfe00c06097f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 2030 (ifconfig)
trap number = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0x80698165 at kdb_backtrace+0x65
#1 0x8064d67b at vpanic+0x17b
#2 0x8064d4f3 at panic+0x43
#3 0x809cc311 at trap_fatal+0x391
#4 0x809cc36f at trap_pfault+0x4f
#5 0x809cb9b6 at trap+0x286
#6 0x809a5b28 at calltrap+0x8
#7 0x803677fd at ck_epoch_synchronize_wait+0x8d
#8 0x8069213a at epoch_wait_preempt+0xaa
#9 0x807615b7 at ipsec_ioctl+0x3a7
#10 0x8075274f at ifioctl+0x47f
#11 0x806b5ea7 at kern_ioctl+0x2b7
#12 0x806b5b4a at sys_ioctl+0xfa
#13 0x809ccec7 at amd64_syscall+0x387
#14 0x809a6450 at fast_syscall_common+0x101





On 20 Nov 2020, at 11:30, Kristof Provost  wrote:

On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org 
<mailto:peter.b...@bsd4all.org> wrote:
I’m afraid the last Epoch fix for bridge is not solving the 
problem ( or perhaps creates a new ).



We’re talking about the stable/12 branch, right?


This seems to happen when the jail epair is added to the bridge.

There must be something more to it than that. I’ve run the bridge 
tests on stable/12 without issue, and this is a problem we didn’t 
see when the bridge epochification initially went into stable/12.


Do you have a custom kernel config? Other patches? What exact 
commands do you run to trigger the panic?



kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8


This panic is rather odd. This isn’t even the bridge code. This is 
during initial creation of the vnet. I don’t really see how this 
could even trigger panics.
That panic looks as if something corrupted the net_epoch_preempt, by 
overwriting the epoch->e_epoch. The bridge patches only access this 
variable through the well-established functions and macros. I see no 
obvious way that they could corrupt it.


Best regards,
Kristof



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.or

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost

On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org wrote:
I’m afraid the last Epoch fix for bridge is not solving the problem 
( or perhaps creates a new ).



We’re talking about the stable/12 branch, right?


This seems to happen when the jail epair is added to the bridge.

There must be something more to it than that. I’ve run the bridge 
tests on stable/12 without issue, and this is a problem we didn’t see 
when the bridge epochification initially went into stable/12.


Do you have a custom kernel config? Other patches? What exact commands 
do you run to trigger the panic?



kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8


This panic is rather odd. This isn’t even the bridge code. This is 
during initial creation of the vnet. I don’t really see how this could 
even trigger panics.
That panic looks as if something corrupted the net_epoch_preempt, by 
overwriting the epoch->e_epoch. The bridge patches only access this 
variable through the well-established functions and macros. I see no 
obvious way that they could corrupt it.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: pf and hnX interfaces

2020-10-13 Thread Kristof Provost

On 13 Oct 2020, at 14:02, Eugene M. Zheganin wrote:

Hello,

On 13.10.2020 14:19, Kristof Provost wrote:

Are these symptoms of a bug ?



Perhaps. It can also be a symptom of resource exhaustion.
Are there any signs of memory allocation failures, or incrementing 
error counters (in netstat or in pfctl)?




Well, the only signs of resource exhaustion I know so far are:

- "PF state limit reached" in /var/log/messages (none so far)

- mbufs starvation in netstat -m (zero so far)

- various queue failure counters in netstat -s -p tcp, but since this 
only applies to TCP this is hardly related (although it seems like 
there's also none).



so, what should I take a look at ?


Disabled PF shows in pfctl -s info:


[root@gw1:/var/log]# pfctl -s info
Status: Disabled for 0 days 00:41:42  Debug: Urgent

State Table  
Total Rate

  current entries 9634
  searches 
24212900618  9677418.3/s
  inserts    
222708269    89012.1/s
  removals   
222698635    89008.2/s

Counters
  match  
583327668   233144.6/s
  bad-offset 
0    0.0/s
  
fragment   
1    0.0/s
  
short  
0    0.0/s
  normalize  
0    0.0/s
  
memory 
0    0.0/s
  bad-timestamp  
0    0.0/s
  congestion 
0    0.0/s
  ip-option  
76057   30.4/s
  proto-cksum 
9669    3.9/s
  state-mismatch   
3007108 1201.9/s
  state-insert   
13236    5.3/s
  state-limit    
0    0.0/s
  src-limit  
0    0.0/s
  
synproxy   
0    0.0/s
  map-failed 
0    0.0/s



What’s your current state limit? You’re getting a lot of 
state-mismatches. (Also note that ip-options and proto-cksum also 
indicate dropped packets.)


If you set pfctl -x loud you should get reports for those state 
mismatches. There’ll be a lot though, so maybe pick a quiet time to do 
that.


Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: pf and hnX interfaces

2020-10-13 Thread Kristof Provost

On 13 Oct 2020, at 10:58, Eugene M. Zheganin wrote:
I'm running a FreeBSD 12.1 server as a VM under Hyper-V. And although 
this letter will make an impression of another lame post blaming 
FreeBSD for all of the issues while the author should blame himselm, 
I'm atm out of another explanation. The thing is: I'm getting loads of 
sendmail errors like:



===Cut===

Oct 13 13:49:33 gw1 sm-mta[95760]: 09D8mN2P092173: SYSERR(root): 
putbody: write error: Permission denied
Oct 13 13:49:33 gw1 sm-mta[95760]: 09D8mN2P092173: SYSERR(root): 
timeout writing message to .mail.protection.outlook.com.: 
Permission denied


===Cut===

A “Permission denied” on outbound packets can indeed happen when pf 
decides to block the packet.


The relay address is just random. The thing is, I can successfully 
connect to it via telnet. Even send some commands. But when this is 
done by senamil - and when it's actually sending messages, I get 
random errors. Firstly I was blaming myself and trying to get the rule 
that actually blocks something. I ended up having none of the block 
rules without log clause, and in the same time tcpdump -netti pflog0 
shows no droppen packets, but sendmail still eventually complains.


If it matters, I have relatively high rps on this interface, about 25 
Kpps.


I've also found several posting mentionsing that hnX is badly handling 
the TSO and LRO mode, so I switched it off. No luck however, with 
vlanhwtag and vlanmtu, which for some reason just cannot be switched 
off. the if_hn also lacks a man page for some reason, so it's unclear 
how to tweak it right.


While it’s possible that there are issues with TSO/LRO those 
wouldn’t look like this. (As an aside, I am interested in any 
reproducible setups where pf has issues with TSO/LRO. As far as I’ve 
been able to see all such issues have been resolved.)


And the most mysterious part  - when I switch the pf off, the errors 
stops to appear. This would clearly mean that pf blocks some packets, 
but then again, this way the pflog0 would show them up, right (and yes 
- it's "UP" )?


It’s possible for pf to drop packets without triggering log rules. For 
example, if pf decides to drop the packet before it matches any rule 
(e.g. it’s a corrupt packet) it won’t show up in pflog.



Is there some issue with pf and hn interfaces that I'm unaware about?

There’s no interface specific code in pf, so it wouldn’t be specific 
to hn interfaces.



Are these symptoms of a bug ?


Perhaps. It can also be a symptom of resource exhaustion.
Are there any signs of memory allocation failures, or incrementing error 
counters (in netstat or in pfctl)?


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net.pf.request_maxcount: UNDESIRABLE_OID

2020-08-21 Thread Kristof Provost

On 21 Aug 2020, at 8:56, Kristof Provost wrote:

On 21 Aug 2020, at 8:53, Chris wrote:

But why must it be a read-only OID?

It doesn’t have to be, and in CURRENT it’s not: 
https://svnweb.freebsd.org/base?view=revision=355744

That hasn’t been MFC’d for the excellent reason that I forgot.

I’ll try to do that today, after I fix my dev-VM.


And done in r364456.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net.pf.request_maxcount: UNDESIRABLE_OID

2020-08-21 Thread Kristof Provost

On 21 Aug 2020, at 8:53, Chris wrote:

On Fri, 21 Aug 2020 08:33:16 +0200 Kristof Provost k...@freebsd.org said


Hi Chris,

Hello, Kristof. Thanks for the reply.
Nice name BTW. ;-)


On 21 Aug 2020, at 2:40, Chris wrote:
> We've been developing an appliance/server based on FreeBSD &&
> pf(4). We started some time ago, and have been using a very
> early version of 12. We're now collecting some 20,000,000
> IP's /mos. So we're satisfied we're close to releasing. As
> such, we needed to bring the release up to a supported
> (freebsd) version (12-STABLE). We would have done so sooner.
> But we need a stable (unchanging) testbed to evaluate what
> we're working on.
> We built and deployed a copy of 12-STABLE @r363918 that
> contained our work with pf(4). Booting into it failed
> unexpectedly with: cannot define table nets: too many
> elements. Consider increasing net.pf.request_maxcount.
> pfctl: Syntax error in config file: pf rules not loaded
> OK this didn't happen on our testbed prior to the upgrade
> with a combined count of ~97,000,900 IPs. In fact the OID
> mentioned didn't exist.
> For reference; our testbed provides DNS, www, mail for
> ~60 domains/hosts, as well as our pf(4) testing. We can
> happily load our tables, and run these services w/8Gb
> RAM.
> This OID is more a problem than a savior. Why not simply
> return ENOMEM?
>
To quote the commit message:

pf ioctls frequently take a variable number of elements as 
argument. This can
potentially allow users to request very large allocations.  These 
will fail,
but even a failing M_NOWAIT might tie up resources and result in 
concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of 
caches.


Limit these ioctls to what should be a reasonable value, but 
allow users to

tune it should they need to.

Now that pf can be used in vnet jails there’s a possibility of an 
attacker using pf to deny service to other jails (or the host) by 
exhausting memory. Imposing limits on pf request sizes mitigates 
this.

Hadn't considered vnet. Thanks for mentioning it.
But why must it be a read-only OID?

It doesn’t have to be, and in CURRENT it’s not: 
https://svnweb.freebsd.org/base?view=revision=355744

That hasn’t been MFC’d for the excellent reason that I forgot.

I’ll try to do that today, after I fix my dev-VM.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: net.pf.request_maxcount: UNDESIRABLE_OID

2020-08-21 Thread Kristof Provost

Hi Chris,

On 21 Aug 2020, at 2:40, Chris wrote:

We've been developing an appliance/server based on FreeBSD &&
pf(4). We started some time ago, and have been using a very
early version of 12. We're now collecting some 20,000,000
IP's /mos. So we're satisfied we're close to releasing. As
such, we needed to bring the release up to a supported
(freebsd) version (12-STABLE). We would have done so sooner.
But we need a stable (unchanging) testbed to evaluate what
we're working on.
We built and deployed a copy of 12-STABLE @r363918 that
contained our work with pf(4). Booting into it failed
unexpectedly with: cannot define table nets: too many
elements. Consider increasing net.pf.request_maxcount.
pfctl: Syntax error in config file: pf rules not loaded
OK this didn't happen on our testbed prior to the upgrade
with a combined count of ~97,000,900 IPs. In fact the OID
mentioned didn't exist.
For reference; our testbed provides DNS, www, mail for
~60 domains/hosts, as well as our pf(4) testing. We can
happily load our tables, and run these services w/8Gb
RAM.
This OID is more a problem than a savior. Why not simply
return ENOMEM?


To quote the commit message:

pf ioctls frequently take a variable number of elements as 
argument. This can
potentially allow users to request very large allocations.  These 
will fail,
but even a failing M_NOWAIT might tie up resources and result in 
concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of 
caches.


Limit these ioctls to what should be a reasonable value, but allow 
users to

tune it should they need to.

Now that pf can be used in vnet jails there’s a possibility of an 
attacker using pf to deny service to other jails (or the host) by 
exhausting memory. Imposing limits on pf request sizes mitigates this.



Isn't that what it used to do? pf.conf(5)
already facilitates thresholds, and they aren't _read
only_. Is there any way to turn this OID off; like using
a -1 value? Or will we need to simply back out the commit?

You can functionally disable it by setting a very large value. Try 
setting 4294967295.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-24 Thread Kristof Provost

On 22 Apr 2020, at 18:15, Xin Li wrote:

On 4/22/20 01:45, Kristof Provost wrote:

On 22 Apr 2020, at 10:20, Xin Li wrote:

Hi,

On 4/14/20 02:51, Kristof Provost wrote:

Hi,

Thanks to support from The FreeBSD Foundation I’ve been able to 
work on

improving the throughput of if_bridge.
It changes the (data path) locking to use the NET_EPOCH 
infrastructure.

Benchmarking shows substantial improvements (x5 in test setups).

This work is ready for wider testing now.

It’s under review here: https://reviews.freebsd.org/D24250

Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
Patches for stable/12:
https://people.freebsd.org/~kp/if_bridge/stable_12/

I’m not currently aware of any panics or issues resulting from 
these

patches.


I have observed the following panic with latest stable/12 after 
applying
the stable_12 patchset, it appears like a race condition related 
NULL

pointer deference, but I haven't took a deeper look yet.

The box have 7 igb(4) NICs, with several bridge and VLAN configured
acting as a router.  Please let me know if you need additional
information; I can try -CURRENT as well, but it would take some time 
as
the box is relatively slow (it's a ZFS based system so I can create 
a
separate boot environment for -CURRENT if needed, but that would 
take
some time as I might have to upgrade the packages, should there be 
any

ABI breakages).

Thanks for the report. I don’t immediately see how this could 
happen.


Are you running an L2 firewall on that bridge by any chance? An 
earlier
version of the patch had issues with a stray unlock in that code 
path.


I don't think I have a L2 firewall (I assume means filtering based on
MAC address like what can be done with e.g. ipfw?  The bridges were
created on vlan interfaces though, do they count as L2 firewall?), the
system is using pf with a few NAT rules:



That backtrace looks identical to the one Peter reported, up to and 
including the offset in the bridge_input() function.
Given that there’s no likely way to end up with a NULL mutex either I 
have to assume that it’s a case of trying to unlock a locked mutex, 
and the most likely reason is that you ran into the same problem Peter 
ran into.


The current version of the patch should resolve it.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-22 Thread Kristof Provost

On 22 Apr 2020, at 10:20, Xin Li wrote:

Hi,

On 4/14/20 02:51, Kristof Provost wrote:

Hi,

Thanks to support from The FreeBSD Foundation I’ve been able to 
work on

improving the throughput of if_bridge.
It changes the (data path) locking to use the NET_EPOCH 
infrastructure.

Benchmarking shows substantial improvements (x5 in test setups).

This work is ready for wider testing now.

It’s under review here: https://reviews.freebsd.org/D24250

Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
Patches for stable/12: 
https://people.freebsd.org/~kp/if_bridge/stable_12/


I’m not currently aware of any panics or issues resulting from 
these

patches.


I have observed the following panic with latest stable/12 after 
applying

the stable_12 patchset, it appears like a race condition related NULL
pointer deference, but I haven't took a deeper look yet.

The box have 7 igb(4) NICs, with several bridge and VLAN configured
acting as a router.  Please let me know if you need additional
information; I can try -CURRENT as well, but it would take some time 
as

the box is relatively slow (it's a ZFS based system so I can create a
separate boot environment for -CURRENT if needed, but that would take
some time as I might have to upgrade the packages, should there be any
ABI breakages).


Thanks for the report. I don’t immediately see how this could happen.

Are you running an L2 firewall on that bridge by any chance? An earlier 
version of the patch had issues with a stray unlock in that code path.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-16 Thread Kristof Provost

On 16 Apr 2020, at 10:36, Peter Blok wrote:
Another issue I found with pf was with "set skip on bridge”. It 
doesn’t work on the interface group, unless a bridge exists prior to 
enabling pf. Makes sense, but I didn’t think of it. Other rules work 
fine with interface groups.



I am aware of this problem and have unfinished work to fix it.

No promises about a timeline though.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-16 Thread Kristof Provost

Hi Mark,

I wouldn’t expect these changes to make a difference in the 
performance of this setup.
My work mostly affects setups with multi-core systems that see a lot of 
traffic. Even before these changes I’d expect the if_bridge code to 
saturate a wifi link easily.


I also wouldn’t expect ng_bridge vs. if_bridge to make a significant 
difference in wifi features.


Best regards,
Kristof

On 16 Apr 2020, at 3:56, Mark Saad wrote:


Kristof
  Up until a month ago I ran a set of FreeBSD based ap in my house and 
even long ago at work . They were Pc engines apu ‘s or Alix’s with 
one em/igb nic and one ath nic in a bridge .  They worked well for a 
long time however the need for more robust wifi setup caused me to 
swap them  out with cots aps from tp-link .  The major issues were the 
lack of WiFi features and standards that work oob on Linux based aps .


So I always wanted to experiment with ng_bridge vs if_bridge for the 
same task . But I never got around to it . Do you have any insight 
into using one vs the other . Imho if_bridge is easier to setup and 
get working .



---
Mark Saad | nones...@longcount.org


On Apr 15, 2020, at 1:37 PM, Kristof Provost  wrote:

On 15 Apr 2020, at 19:16, Mark Saad wrote:

All
  Should this improve wifi to wired bridges in some way ? Has this 
been tested ?


What sort of setup do you have to bridge wired and wireless? Is the 
FreeBSD box also a wifi AP?


I’ve not done any tests involving wifi.

Best regards,
Kristof

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-16 Thread Kristof Provost

On 16 Apr 2020, at 8:34, Pavel Timofeev wrote:

Hi!
Thank you for your work!
Do you know if epair suffers from the same issue as tap?

I’ve not tested it, but I believe that epair scales significantly 
better than tap.
It has a per-cpu mutex (or more accurately, a mutex in each of its 
per-cpu structures), so I’d expect much better throughput from epair 
than you’d see from tap.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-15 Thread Kristof Provost

On 15 Apr 2020, at 19:16, Mark Saad wrote:

All
   Should this improve wifi to wired bridges in some way ? Has this 
been tested ?


What sort of setup do you have to bridge wired and wireless? Is the 
FreeBSD box also a wifi AP?


I’ve not done any tests involving wifi.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD CI Weekly Report 2020-04-12

2020-04-15 Thread Kristof Provost

On 15 Apr 2020, at 16:49, Olivier Cochard-Labbé wrote:
On Wed, Apr 15, 2020 at 4:10 PM Kristof Provost  
wrote:




The problem appears to be that
/usr/local/lib/python3.7/site-packages/scapy/arch/unix.py is 
misparsing

the `netstat -rnW` output.



Shouldn't scapy use the libxo output of netstat to mitigate this 
regression

?



That would likely help, yes. I’m going to leave that decision up to 
the maintainer, because I’m not going to do the work :)


I’m also not sure how “stable” we want the netstat output to be.

Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD CI Weekly Report 2020-04-12

2020-04-15 Thread Kristof Provost

On 15 Apr 2020, at 15:34, Kristof Provost wrote:

On 15 Apr 2020, at 0:37, Li-Wen Hsu wrote:
(Please send the followup to freebsd-testing@ and note Reply-To is 
set.)


FreeBSD CI Weekly Report 2020-04-12
===

Here is a summary of the FreeBSD Continuous Integration results for 
the period

from 2020-04-06 to 2020-04-12.

During this period, we have:

* 1801 builds (94.0% (+0.4) passed, 6.0% (-0.4) failed) of buildworld 
and
  buildkernel (GENERIC and LINT) were executed on aarch64, amd64, 
armv6,

  armv7, i386, mips, mips64, powerpc, powerpc64, powerpcspe, riscv64,
  sparc64 architectures for head, stable/12, stable/11 branches.
* 288 test runs (25.1% (-24.6) passed, 29.9% (+10.6) unstable, 45.1% 
(+14.1)
  exception) were executed on amd64, i386, riscv64 architectures for 
head,

  stable/12, stable/11 branches.
* 30 doc and www builds (83.3% (-1.3) passed, 16.7% (+1.3) failed)

Test case status (on 2020-04-12 23:59):
| Branch/Architecture | Total | Pass   | Fail | Skipped  
|
| --- | - | -- |  |  
|
| head/amd64  | 7744 (+4) | 7638 (+19) | 14 (+5)  | 92 (-20) 
|
| head/i386   | 7742 (+4) | 7628 (+15) | 16 (+5)  | 98 (-16) 
|
| 12-STABLE/amd64 | 7508 (0)  | 7449 (-3)  | 1 (+1)   | 58 (+2)  
|
| 12-STABLE/i386  | 7506 (0)  | 7425 (-17) | 2 (+2)   | 79 (+15) 
|
| 11-STABLE/amd64 | 6882 (0)  | 6829 (-6)  | 1 (+1)   | 52 (+5)  
|
| 11-STABLE/i386  | 6880 (0)  | 6749 (-82) | 80 (+80) | 51 (+2)  
|


(The statistics from experimental jobs are omitted)

If any of the issues found by CI are in your area of interest or 
expertise

please investigate the PRs listed below.

The latest web version of this report is available at
https://hackmd.io/@FreeBSD-CI/report-20200412 and archive is 
available at

https://hackmd.io/@FreeBSD-CI/ , any help is welcome.

## News

* The test env now loads the required module for firewall tests.

* New armv7 job is added (to replace armv6 one):
  * FreeBSD-head-armv7-testvm
  The images are available at https://artifact.ci.freebsd.org
  FreeBSD-head-armv7-test is ready but needs test env update.

## Failing jobs

* https://ci.freebsd.org/job/FreeBSD-head-amd64-gcc6_build/
  * See console log for the error details.

## Failing tests

* https://ci.freebsd.org/job/FreeBSD-head-amd64-test/
  * local.kyua.integration.cmd_about_test.topic__authors__installed
  * sys.netipsec.tunnel.empty.v4
  * sys.netipsec.tunnel.empty.v6
  * sys.netpfil.common.forward.ipf_v4
  * sys.netpfil.common.forward.ipfw_v4
  * sys.netpfil.common.forward.pf_v4
  * sys.netpfil.common.tos.ipfw_tos
  * sys.netpfil.common.tos.pf_tos
  * sys.netpfil.pf.forward.v4
I can’t actually reproduce this failure in my test VM, but with the 
ci test VM I can reproduce the problem.
It’s not related to pf, because the sanity check ping we do before 
we set up pf already fails.
Or rather pft_ping.py sends an incorrect packet, because `ping` does 
get the packet to go where it’s supposed to go.


Scapy seems to fail to find the source IP address, so we get this:

	12:12:22.152652 IP 0.0.0.0 > 198.51.100.3: ICMP echo request, id 0, 
seq 0, length 12


I have a vague recollection that we’ve seem this problem before, but 
I can’t remember what we did about it.


In all likelihood most of the other netpfil tests fail for exactly the 
same reason.


The problem appears to be that 
/usr/local/lib/python3.7/site-packages/scapy/arch/unix.py is misparsing 
the `netstat -rnW` output.


For reference, this is the output in the test VM:

Routing tables

Internet:
	DestinationGatewayFlags   Nhop#Mtu  Netif 
Expire

127.0.0.1  link#2 UH  1  16384lo0
192.0.2.0/24   link#4 U   2   1500epair0a
192.0.2.1  link#4 UHS 1  16384lo0
198.51.100.0/24192.0.2.2  UGS 3   1500epair0a

Internet6:
	Destination   Gateway   Flags   
Nhop#MtuNetif Expire
	::/96 ::1   UGRS
4  16384  lo0
	::1   link#2UH  
1  16384  lo0
	:::0.0.0.0/96 ::1   UGRS
4  16384  lo0
	fe80::/10 ::1   UGRS
4  16384  lo0
	fe80::%lo0/64 link#2U   
3  16384  lo0
	fe80::1%lo0   link#2UHS 
2  16384  lo0
	fe80::%epair0a/64 link#4U   
5   1500  epair0a
	fe80::3d:9dff:fe7c:d70a%epair0a   link#4UHS 
1  16384  lo0
	fe80::%epair1a/64 link#6U   
6   1

Re: FreeBSD CI Weekly Report 2020-04-12

2020-04-15 Thread Kristof Provost

On 15 Apr 2020, at 0:37, Li-Wen Hsu wrote:
(Please send the followup to freebsd-testing@ and note Reply-To is 
set.)


FreeBSD CI Weekly Report 2020-04-12
===

Here is a summary of the FreeBSD Continuous Integration results for 
the period

from 2020-04-06 to 2020-04-12.

During this period, we have:

* 1801 builds (94.0% (+0.4) passed, 6.0% (-0.4) failed) of buildworld 
and
  buildkernel (GENERIC and LINT) were executed on aarch64, amd64, 
armv6,

  armv7, i386, mips, mips64, powerpc, powerpc64, powerpcspe, riscv64,
  sparc64 architectures for head, stable/12, stable/11 branches.
* 288 test runs (25.1% (-24.6) passed, 29.9% (+10.6) unstable, 45.1% 
(+14.1)
  exception) were executed on amd64, i386, riscv64 architectures for 
head,

  stable/12, stable/11 branches.
* 30 doc and www builds (83.3% (-1.3) passed, 16.7% (+1.3) failed)

Test case status (on 2020-04-12 23:59):
| Branch/Architecture | Total | Pass   | Fail | Skipped  |
| --- | - | -- |  |  |
| head/amd64  | 7744 (+4) | 7638 (+19) | 14 (+5)  | 92 (-20) |
| head/i386   | 7742 (+4) | 7628 (+15) | 16 (+5)  | 98 (-16) |
| 12-STABLE/amd64 | 7508 (0)  | 7449 (-3)  | 1 (+1)   | 58 (+2)  |
| 12-STABLE/i386  | 7506 (0)  | 7425 (-17) | 2 (+2)   | 79 (+15) |
| 11-STABLE/amd64 | 6882 (0)  | 6829 (-6)  | 1 (+1)   | 52 (+5)  |
| 11-STABLE/i386  | 6880 (0)  | 6749 (-82) | 80 (+80) | 51 (+2)  |

(The statistics from experimental jobs are omitted)

If any of the issues found by CI are in your area of interest or 
expertise

please investigate the PRs listed below.

The latest web version of this report is available at
https://hackmd.io/@FreeBSD-CI/report-20200412 and archive is available 
at

https://hackmd.io/@FreeBSD-CI/ , any help is welcome.

## News

* The test env now loads the required module for firewall tests.

* New armv7 job is added (to replace armv6 one):
  * FreeBSD-head-armv7-testvm
  The images are available at https://artifact.ci.freebsd.org
  FreeBSD-head-armv7-test is ready but needs test env update.

## Failing jobs

* https://ci.freebsd.org/job/FreeBSD-head-amd64-gcc6_build/
  * See console log for the error details.

## Failing tests

* https://ci.freebsd.org/job/FreeBSD-head-amd64-test/
  * local.kyua.integration.cmd_about_test.topic__authors__installed
  * sys.netipsec.tunnel.empty.v4
  * sys.netipsec.tunnel.empty.v6
  * sys.netpfil.common.forward.ipf_v4
  * sys.netpfil.common.forward.ipfw_v4
  * sys.netpfil.common.forward.pf_v4
  * sys.netpfil.common.tos.ipfw_tos
  * sys.netpfil.common.tos.pf_tos
  * sys.netpfil.pf.forward.v4
I can’t actually reproduce this failure in my test VM, but with the ci 
test VM I can reproduce the problem.
It’s not related to pf, because the sanity check ping we do before we 
set up pf already fails.
Or rather pft_ping.py sends an incorrect packet, because `ping` does get 
the packet to go where it’s supposed to go.


Scapy seems to fail to find the source IP address, so we get this:

	12:12:22.152652 IP 0.0.0.0 > 198.51.100.3: ICMP echo request, id 0, seq 
0, length 12


I have a vague recollection that we’ve seem this problem before, but I 
can’t remember what we did about it.


In all likelihood most of the other netpfil tests fail for exactly the 
same reason.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


CFT: if_bridge performance improvements

2020-04-14 Thread Kristof Provost

Hi,

Thanks to support from The FreeBSD Foundation I’ve been able to work 
on improving the throughput of if_bridge.
It changes the (data path) locking to use the NET_EPOCH infrastructure. 
Benchmarking shows substantial improvements (x5 in test setups).


This work is ready for wider testing now.

It’s under review here: https://reviews.freebsd.org/D24250

Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
Patches for stable/12: 
https://people.freebsd.org/~kp/if_bridge/stable_12/


I’m not currently aware of any panics or issues resulting from these 
patches.


Do note that if you run a Bhyve + tap on bridges setup the tap code 
suffers from a similar bottleneck and you will likely not see major 
improvements in single VM to host throughput. I would expect, but have 
not tested, improvements in overall throughput (i.e. when multiple VMs 
send traffic at the same time).


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD CI Weekly Report 2019-06-09

2019-06-15 Thread Kristof Provost

On 15 Jun 2019, at 11:35, Kristof Provost wrote:

On 12 Jun 2019, at 16:49, Li-Wen Hsu wrote:

* https://ci.freebsd.org/job/FreeBSD-head-i386-test/
* Same as amd64:
* sys.netinet.socket_afinet.socket_afinet_bind_zero
* Others:
* sys.netpfil.pf.forward.v6
* sys.netpfil.pf.forward.v4
* sys.netpfil.pf.set_tos.v4


I’ve finally gotten around to taking a look at this, and it appears 
to not be a pf problem. forward:v4 already fails at its sanity check, 
before it configures pf.


It creates a vnet jail, telling it to route traffic through, and then 
we run a sanity check with pft_ping.py.
Scapy tries to resolve the MAC address of the gateway (jail, 
192.0.2.1). The jail replies, but scapy never picks up the reply, so 
the traffic looks like this:


	13:19:29.953468 02:be:b4:57:9f:0a > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 192.0.2.2 tell 192.0.2.1, length 
28
	13:19:29.953572 02:be:b4:57:9f:0b > 00:a0:98:b2:48:59, ethertype ARP 
(0x0806), length 42: Reply 192.0.2.2 is-at 02:be:b4:57:9f:0b, length 
28
	13:19:32.082843 02:be:b4:57:9f:0a > ff:ff:ff:ff:ff:ff, ethertype IPv4 
(0x0800), length 52: 192.0.2.1 > 198.51.100.3: ICMP echo request, id 
0, seq 0, length 18


The jail doesn’t forward the broadcast ICMP echo request and the 
test fails.


My current guess is that it’s related to bpf. It’s interesting to 
note that it fails on i386, but succeeds on amd64.



I’ve done a little dtracing, and I think that points at bpf too:

#!/usr/sbin/dtrace -s

fbt:kernel:bpf_buffer_uiomove:entry
{
tracemem(arg1, 1500, arg2);
stack();
}

Results in:

  1  49539 bpf_buffer_uiomove:entry
	 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  
0123456789abcdef
	 0: ce 0e 05 5d 17 ea 00 00 2a 00 00 00 2a 00 00 00  
...]*...*...
	10: 12 00 ff ff ff ff ff ff 02 fd 10 30 e6 0a 08 06  
...0
	20: 00 01 08 00 06 04 00 01 00 a0 98 b2 48 59 c0 00  
HY..
	30: 02 01 00 00 00 00 00 00 c0 00 02 02 ce 0e 05 5d  
...]
	40: 60 ea 00 00 2a 00 00 00 2a 00 00 00 12 00 00 a0  
`...*...*...
	50: 98 b2 48 59 02 fd 10 30 e6 0b 08 06 00 01 08 00  
..HY...0
	60: 06 04 00 02 02 fd 10 30 e6 0b c0 00 02 02 00 a0  
...0

70: 98 b2 48 59 c0 00 02 01  ..HY

  kernel`bpfread+0x137
  kernel`dofileread+0x6d
  kernel`kern_readv+0x3b
  kernel`sys_read+0x48
  kernel`syscall+0x2b4
  0xffc033b7

So, we see the ARP request through bpf, but we don’t see the reply, 
despite tcpdump capturing it. I have no idea how that’d happen, so 
I’d very much like someone more familiar with bpf to take a look at 
this problem.


Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD CI Weekly Report 2019-06-09

2019-06-15 Thread Kristof Provost

On 12 Jun 2019, at 16:49, Li-Wen Hsu wrote:

* https://ci.freebsd.org/job/FreeBSD-head-i386-test/
* Same as amd64:
* sys.netinet.socket_afinet.socket_afinet_bind_zero
* Others:
* sys.netpfil.pf.forward.v6
* sys.netpfil.pf.forward.v4
* sys.netpfil.pf.set_tos.v4


I’ve finally gotten around to taking a look at this, and it appears to 
not be a pf problem. forward:v4 already fails at its sanity check, 
before it configures pf.


It creates a vnet jail, telling it to route traffic through, and then we 
run a sanity check with pft_ping.py.
Scapy tries to resolve the MAC address of the gateway (jail, 192.0.2.1). 
The jail replies, but scapy never picks up the reply, so the traffic 
looks like this:


	13:19:29.953468 02:be:b4:57:9f:0a > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 192.0.2.2 tell 192.0.2.1, length 28
	13:19:29.953572 02:be:b4:57:9f:0b > 00:a0:98:b2:48:59, ethertype ARP 
(0x0806), length 42: Reply 192.0.2.2 is-at 02:be:b4:57:9f:0b, length 28
	13:19:32.082843 02:be:b4:57:9f:0a > ff:ff:ff:ff:ff:ff, ethertype IPv4 
(0x0800), length 52: 192.0.2.1 > 198.51.100.3: ICMP echo request, id 0, 
seq 0, length 18


The jail doesn’t forward the broadcast ICMP echo request and the test 
fails.


My current guess is that it’s related to bpf. It’s interesting to 
note that it fails on i386, but succeeds on amd64.


--
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Networking panic on 12 - found the cause

2019-02-12 Thread Kristof Provost
On 2019-02-12 13:54:21 (-0600), Eric van Gyzen  wrote:
> I see the same behavior on head (and stable/12).
> 
> (kgdb) f
> #16 0x80ce5331 in ether_output_frame (ifp=0xf80003672800,
> m=0xf8000c88b100) at /usr/src/sys/net/if_ethersubr.c:468
> 468   switch (pfil_run_hooks(V_link_pfil_head, , ifp, 
> PFIL_OUT,
> 
>0x80ce5321 <+81>:  mov%gs:0x0,%rax
>0x80ce532a <+90>:  mov0x500(%rax),%rax
> => 0x80ce5331 <+97>:  mov0x28(%rax),%rax
> 
> I think this is part of the V_link_pfil_head.  I'm not very familiar
> with vnet.  Does this need a CURVNET_SET(), maybe in garp_rexmit()?
> 
Yes. I posted a proposed patch in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235699

Basically we get called through a timer, so there's no vnet context. It
needs to be set, and then we can safely use any V_ variables.

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: PF problems with 11-stable

2018-07-26 Thread Kristof Provost


On 26 Jul 2018, at 10:16, Patrick Lamaiziere wrote:

> Le Thu, 26 Jul 2018 09:58:05 +0200,
> Patrick Lamaiziere  a écrit :
>
> Hello,
>
>>> Hey,
>>> I am on
>>> 11.2-STABLE FreeBSD 11.2-STABLE #9 r336597
>>> Sun Jul 22 14:08:38 CEST 2018
>>>
>>> and I see 2 problems with PF that are still there:
>>>  1.) set skip on lo
>>> does not work even though ifconfig lo matches.
>>> SOLVED TEMPORARILY BY: set skip on lo0
>>
>> I've seen this while upgrading from 10.3 to 11.2-RELEASE. I've added
>> lo0 to set skip too.
>>
>> When the problem occurs, lo is marked '(skip)' (pfctl -vs
>> Interfaces) but not lo0.
>>
>> But I can't reproduce this, this happened only one time.
>
> I don't know if this is related but there were some kernel logs about
> 'loopback' :
>
> Feb 15 17:11:48 fucop1 kernel: ifa_del_loopback_route: deletion failed:
> 47 Feb 15 17:11:48 fucop1 kernel: ifa_add_loopback_route: insertion
> failed: 47 Jul 16 13:50:36 fucop1 kernel: ifa_maintain_loopback_route:
> deletion failed for interface ix2: 3 Jul 16 14:07:31 fucop1 kernel:
> ifa_maintain_loopback_route: deletion failed for interface ix2: 3 Jul
> 16 14:07:31 fucop1 kernel: ifa_maintain_loopback_route: deletion failed
> for interface igb1: 3 Jul 16 14:10:43 fucop1 kernel:
> ifa_maintain_loopback_route: insertion failed for interface igb0: 17
>
No, those error messages are not related.

The issue with interface groups is known, and is being worked on.

The pfctl -n issue should be fixed as of r336164

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to setup ethernet address and IPv4 address on interface?

2016-06-29 Thread Kristof Provost
On 29 Jun 2016, at 13:47, Slawa Olhovchenkov wrote:
> I am trying to change MAC address and setup IPv4 address and got
> error:
>
> # ifconfig em1 ether 00:30:48:63:19:04 inet 192.168.2.1/24
> ifconfig: can't set link-level netmask or broadcast
>
> Is this posible?

Yes, but you can’t do both in one call.

This works:
ifconfig em1 ether 00:30:48:63:19:04
ifconfig em1 inet 192.168.2.1/24

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ipfw fwd to closed port

2016-06-09 Thread Kristof Provost


On 9 Jun 2016, at 9:06, Slawa Olhovchenkov wrote:

> On Thu, Jun 09, 2016 at 03:00:17PM +0200, Kristof Provost wrote:
>
>> On 2016-06-09 02:02:40 (+0300), Slawa Olhovchenkov <s...@zxy.spb.ru> wrote:
>>> Forwarding by ipfw to closed local port generating RST packet with
>>> incorrect checksun. Is this know ussuse? Need open PR?
>>
>> Where did you capture the packet? If you've captured the packet on the
>> machine that generated it tcpdump may indeed claim that the checksum is
>> wrong, because it's computed by the hardware (so after tcpdump captured
>> it).
>
> On the tun0 (destination of RST packet routed to tun0).
> tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
> options=8
> inet 192.168.4.1 --> 192.168.4.1 netmask 0xff00
> inet6 fe80::240:63ff:fedc:ac9e%tun0 prefixlen 64 scopeid 0x9
> nd6 options=21<PERFORMNUD,AUTO\_LINKLOCAL>
> Opened by PID 1345
>
> tun0 don't computed checksum.

I’m not sure I understand what you’re trying to say.

In any case: either capture the packet outside the machine, or confirm
that the checksum is wrong by watching the relevant netstat counters.

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ipfw fwd to closed port

2016-06-09 Thread Kristof Provost
On 2016-06-09 02:02:40 (+0300), Slawa Olhovchenkov  wrote:
> Forwarding by ipfw to closed local port generating RST packet with
> incorrect checksun. Is this know ussuse? Need open PR?

Where did you capture the packet? If you've captured the packet on the
machine that generated it tcpdump may indeed claim that the checksum is
wrong, because it's computed by the hardware (so after tcpdump captured
it).

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 10.3-STABLE - PF - possible regression in pf.conf set timeout interval

2016-05-09 Thread Kristof Provost

> On 09 May 2016, at 16:58, Damien Fleuriot  wrote:
> 
> Since the upgrade, pf rules won't load anymore at boot time, nor even
> manually with pfctl -f /etc/pf.conf :
> # pfctl -f /etc/pf.conf
> /etc/pf.conf:24: syntax error
> pfctl: Syntax error in config file: pf rules not loaded
> 
> The problematic line is :
> set timeout interval 10
> 
I think that was broken by the commit which added ALTQ support for CoDel.

It made ‘interval’ a keyword, and it looks like that breaks things for you.

I’ve cced   loos so he can take a look.

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-10 Thread Kristof Provost
On 2012-02-10 14:56:04 (+0100), Alexander Leidinger alexan...@leidinger.net 
wrote:
 The question is, is this enough? Or asked differently, why are you
 compiling a custom kernel in a production environment (so I rule out
 debug options zhich are not enabled in GENERIC)? Are there options
 which you add which you can not add as a module (SW_WATCHDOG comes
 to my mind)? If yes, which ones and how important are they for you?
 
VIMAGE and IPSEC

I currently require both of them.
The VIMAGE is sort of optional. I could run everything unjailed, but I
prefer this.
IPSEC is required, unless I add a separate device.

That's for a little home gateway (HP Microserver thingy), doing file
serving (NFS/ZFS), mail, web, backup, ...

Regards,
Kristof

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SIOCGIFADDR broken on 9.0-RC1?

2011-11-15 Thread Kristof Provost
On 2011-11-15 18:10:01 (+0100), GR free...@gomor.org wrote:
 more insights since my last post. Here is a small code to trigger the bug 
 (end of email).
 When you run it on 9.0-RC1, it gets an alias address instead of the main inet 
 address:
 
 % ./get-ip re0  
 inet: 192.168.2.10
 # Main address being 192.168.1.148
 
 On 8.2-RELEASE, all goes well:
 % ./get-ip re0
 inet: PUBLIC_IP4
 
 Is something broken, or a behaviour has changed since 8.2-RELEASE?
 

I think the relevant bit of the code is found in sys/netinet/in.c.

If your ioctl doesn't specify an IP address we end up in this bit:
TAILQ_FOREACH(ifa, ifp-if_addrhead, ifa_link) {
iap = ifatoia(ifa);
if (iap-ia_addr.sin_family == AF_INET) {
if (td != NULL 
prison_check_ip4(td-td_ucred,
iap-ia_addr.sin_addr) != 0)
continue;
ia = iap;
break;
}
}

The 'ia' pointer is later used to return the IP address. 

In other words: it returns the first address on the interface
of type IF_INET (which isn't assigned to a jail). 

I think the order of the addresses is not fixed, or rather it depends on 
the order in which you assign addresses. In the handling of SIOCSIFADDR
new addresses are just appended:

TAILQ_INSERT_TAIL(ifp-if_addrhead, ifa, ifa_link);

I don't believe this has changed since 8.0. Is it possible something
changed in the network initialisation, leading to the addresses being
assigned in a different order?

Eagerly awaiting to be told I'm wrong,
Kristof

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org