Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 9 Dec 2020, at 2:31, Peter wrote:

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! >
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop


exec.poststop = "
   sleep 6 ;
   /usr/sbin/ngctl shutdown ${ifname1l}: ;
   ";


helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can 
trigger

! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

No, we need a simple case to reproduce these problems. It’s fine if 
that test case triggers multiple issues.



Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A


Happily we’re not in space.



! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.





(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.

These issues should trigger just fine in VMs. There’s no need for 
hardware pain.


Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 'ifconfig lagg0.300 create' fails with 'SIOIFCREATE2: Device not configured' on r368459

2020-12-08 Thread Raúl Muñoz - CUSTOS via freebsd-stable
El 9/12/20 a las 2:45, Raúl Muñoz - CUSTOS via freebsd-stable escribió:

[]
> maybe related to:
> https://svnweb.freebsd.org/base?view=revision=368346

Excuse me, I meant this one:
https://svnweb.freebsd.org/base?view=revision=368297
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kyle Evans
On Tue, Dec 8, 2020 at 7:45 PM Peter  wrote:
>
>
> On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:
> > Can you reduce your netgraph use case to a small test case that can trigger
> ? the problem?
>
> I'm sorry, I fear I don't get Your point.
> Assumed there are actually two or three bugs here, You are asking me
> to reduce config so that it will trigger only one of them? Is that
> correct?
>
> Then let me put this different: assuming this is the OS for the life
> support system of the manned Jupiter mission. Then, which one of the
> bugs do You want to get fixed, and which would You prefer to keep and
> make Your oxygen supply cut off?
>
> https://www.youtube.com/watch?v=BEo2g-w545A

You seem to have misinterpreted this; he doesn't want to narrow it
down to one bug, he wants simple steps that he can follow to reproduce
any failure, preferably steps that can actually be followed by just
about anyone and don't require immense amounts of setup time or
additional hardware.

Unfortunately, your tone following the misunderstanding was pretty discouraging.

Thanks,

Kyle Evans
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! > 
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop

> exec.poststop = "
>sleep 6 ;
>/usr/sbin/ngctl shutdown ${ifname1l}: ;
>";

helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can trigger
! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A

! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.

So lets try that:
We know that there is a problem with taking down an interface from a
VIMAGE, in the way it is done by "jail -r". We know this problem can
be solidly workarounded by delaying the interface takedown for a short
time.

Now with Your patch, we do not get the typical crash at interface
takedown. Instead, all of a sudden, there are strange crashes from
various other places. And, interestingly, we get these also when
STARTING a jail.

I think this is not an additional problem, it is instead a valuable
information (albeit not the one You might like to get).

Furthermore, we get these new crashes always invoked by "ifconfig",
and they seem to have in common that somebody tries to obtain
information about some interface configuration and receives some
bogus. I might conclude, just out of the belly without looking into
details, that either
 - your patch achieves to garble some internal interface data,
   instead of what it is intended to do, or
 - the original problem manages to garble internal interface data
   (leading to the usual crash), and Your patch does not achieve to
   solve this, but only protects from the immediate consequence.

It might also be worth consideration, that, while the problem may be
more easy to reproduce with epair, this effect may or may not be a
netgraph specific one[2].

Now lets keep in mind that a successful test means EXACTLY NOTHING.
By which other means can we confirm that Your patch fully achieves
what it is intended for? (E.g. something like dumping and verifying
the respective internal tables in-vivo)

(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.
Therefore, experimenting on any of them creates considerable pain.
I'm working on that issue, trying to get a real server board for the
backend so to get the current one free for testing - but what I would
like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would
easily find on yardsales - and seldom for an acceptable price.)


cheerio,
PMc

[1] Rationale: a failing test tells us that either the test or the
application has a bug (50/50 chance). A succeeding test tells us
that 1 equals 1, which we knew already before.
In fact, tests tell us *nothing at all* about the state of our
code, and specifically, 'successful' outcomes do NOT mean that
things are all correct.
The only true usefulness of tests is to protect against
re-introducing a fault that was already fixed before,
i.e. regressions.

[2] My netgraph configuration consists of bringing up some bridges
and then attaching the jails to them.

Here is the bridge starter (only respective component,
there are more of these populated, but probably not influencing
the issue):

#! /bin/sh

# PROVIDE: netgraphs
# REQUIRE: netwait
# BEFORE: NETWORKING

. /etc/rc.subr

name="netgraphs"
start_cmd="${name}_start"
stop_cmd="${name}_stop"

load_rc_config $name

netgraphs_graphs="svc"

netgraphs_svc_if1_name="nge_svc_1u"
netgraphs_svc_if1_mac="00:1d:92:01:02:01"
netgraphs_svc_if1_addr="***.***.***.***/29"

netgraphs_svc_start()
{
local _ifname
if ngctl info svcswitch: > /dev/null 2>&1; then
netgraphs_svc_stop
fi

echo "Creating SVC Switch"
ngctl -f - < /dev/null 2>&1; then
$_cmd
else
echo "netgraphs-start: object $i not found" >&2
fi
done

'ifconfig lagg0.300 create' fails with 'SIOIFCREATE2: Device not configured' on r368459

2020-12-08 Thread Raúl Muñoz - CUSTOS via freebsd-stable


But works on r368184
no problem on regular, not aggregated interfaces

can anyone give it a try?

maybe related to:
https://svnweb.freebsd.org/base?view=revision=368346

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
Here is the next funny crashdump - I obtained this one twice
and also the sysctl_rtsock() again.

I can reproduce this by just starting and stopping a most simple jail
that does only
exec.start = "/bin/sleep 4 &";
(And as usual, when I let it time out, nothing bad happens.)


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 02
instruction pointer = 0x20:0x80a2ac45
stack pointer   = 0x28:0xfe0047cf2890
frame pointer   = 0x28:0xfe0047cf2890
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13557 (ifconfig)
trap number = 9
panic: general protection fault
cpuid = 1
time = 1607469295
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0
vpanic() at vpanic+0x17b/frame 0xfe0047cf25f0
panic() at panic+0x43/frame 0xfe0047cf2650
trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0
trap() at trap+0x67/frame 0xfe0047cf27c0
calltrap() at calltrap+0x8/frame 0xfe0047cf27c0
--- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 
0xfe0047cf2890 ---
strncmp() at strncmp+0x15/frame 0xfe0047cf2890
ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0
ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe3b8, rbp = 0x7fffe450 ---
Uptime: 8m54s
Dumping 880 out of 3959 MB:
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 19:49, Peter wrote:

On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s 
most easily

! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe003f80e4d0

vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp 
= 0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 
0xfe003f80e810

ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---

Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 
22: Broken pipe


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not 
present

instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607451910
KDB: 

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s most easily
! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch

Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0
vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 
0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810
ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---
Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: 
Broken pipe

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607451910
KDB: stack backtrace:
db_trace_self_wrapper() at 

virtio-9p support in bhyve coming to 12?

2020-12-08 Thread Marc Branchaud

Hi all,

Any chance of the virtio-9p support in bhyve (to mount a host directory 
directly inside a VM) landing in 12.3?  I ask mainly because r366413 
from October says "MFC after: 1 month" (and I'd like to avoid setting up 
local NFS shares if I can help it).


Thanks!

M.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 0:34, Peter wrote:

Hi Kristof,
  it's great to read You!

On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote:

! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 
244703,

! 250870.

epair? No. It is purely Netgrh here.

Yeah, the bug is not exclusive to epair but that’s where it’s most 
easily seen.


! I pushed a fix for that in CURRENT in r368237. It’s scheduled to 
go into
! stable/12 sometime next week, but it’d be good to know that it 
fixes your

! problem too before I merge it.
! In other words: can you test a recent CURRENT? It’s likely fixed 
there, and

! if it’s not I may be able to fix it quickly.


Oh my Gods. No offense meant, but this is not really a good time
for that. This is the most horrible upgrade I experienced in 25 years
FreeBSD (and it was prepared, 12.2 did run fine on the other machine).

I have issue with mem config
https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/
I have issue with damaged filesystem, for no apparent reason
https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/

Then I have this issue here which is now gladly workarounded
https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365

and when I then dare to have a look at my applications, they look like
sheer horror, segfaults all over, and I don't even know where to begin
with these.


Other option: can you make this fix so that I can patch it into 12.2
source and just redeploy?

Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


That’s currently running the regression tests that used to provoke the 
panic nearly instantly, and no panics so far.


Best regards.
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"