Re: ENOTCAPABLE returned without Capsicum

2021-05-15 Thread Peter Jeremy via freebsd-stable
On 2021-May-16 11:48:24 +1000, Peter Jeremy via freebsd-stable 
 wrote:
>I am running 13-stable from a couple of weeks ago, without Capsicum
>(neither CAPABILITY_MODE nor CAPABILITIES are specified in my kernel).
>Despite this, I am getting Capsicum-related errors.  As an example:
>openat(AT_FDCWD, "/")
>will return ENOTCAPABLE.

Please ignore.  I worked out I was misreading how O_RESOLVE_BENEATH
worked.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


ENOTCAPABLE returned without Capsicum

2021-05-15 Thread Peter Jeremy via freebsd-stable
I am running 13-stable from a couple of weeks ago, without Capsicum
(neither CAPABILITY_MODE nor CAPABILITIES are specified in my kernel).
Despite this, I am getting Capsicum-related errors.  As an example:
openat(AT_FDCWD, "/")
will return ENOTCAPABLE.

Rummaging around the sources, it seems that there's a non-trivial
amount of code in kern/vfs_lookup.c that's capable of returning
capability-related errors but isn't protected by CAPABILITY_MODE.
This seems undesirable since it means that FreeBSD is defaulting to
being locked down but unless I build it with Capsicum, there's no
way to change the processes capabilities.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


fileargs_init(3) doesn't work without CAPABILITIES (was: Re: tail(1) broken in 13-stable)

2021-05-06 Thread Peter Jeremy via freebsd-stable
On 2021-May-06 19:07:23 -0400, monochrome  wrote:
...
>On 5/6/21 7:49 AM, Peter Jeremy via freebsd-stable wrote:
...
>> server% tail /COPYRIGHT <&-
>> Assertion failed: (procfd > STDERR_FILENO), function service_clean, file 
>> /usr/src/lib/libcasper/libcasper/service.c, line 394.
>> tail: unable to init casper: Socket is not connected

>I get a different error on a 13.0-RELEASE machine I converted from 12 to 
>current about a year ago (bash and sh):
>
>$ tail /COPYRIGHT <&-
>tail: can't limit stdio rights: Bad file descriptor

I've done some more testing across a number of systems and narrowed the
difference in behaviour down to the presence of the CAPABILITIES option in
the kernel (it looks like I never added it to my kernel config on that
system):

If CAPABILITIES is present then the cap_rights_limit(2) call for the closed
FD fails, generating the "can't limit stdio rights" error.  (Whether this
behaviour is reasonable is a different issue - it was introduced in r348708,
based on https://reviews.freebsd.org/D20393 and the issue of closed file
descriptors doesn't seem to have been considered).

If CAPABILITIES is not present then the cap_rights_limit() failure is
(correctly) ignored but the subsequent fileargs_init(3) call gets upset at
opening a FD <= 2.  This behaviour seems wrong - if CAPABILITIES aren't
present in the kernel then the userland behaviour should be the same as if
WITHOUT_CASPER is specified.

IMO, this is a bug in fileargs_init(3).

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: tail(1) broken in 13-stable

2021-05-06 Thread Peter Jeremy via freebsd-stable
On 2021-May-06 12:59:54 +0200, Mariusz Zaborski  wrote:
>Could you provide details how to reproduce this?
>
>On Thu, 6 May 2021 at 12:13, Peter Jeremy via freebsd-stable
> wrote:
>>
>> Since updating from 12-stable to 13-stable, I've found that tail(1)
>> crashes, reporting:
>> Assertion failed: (procfd > STDERR_FILENO), function service_clean, file 
>> /usr/src/lib/libcasper/libcasper/service.c, line 394.
>> tail: unable to init casper: Socket is not connected
>> unless all three of stdin, stdout and stderr are open.  Whilst it
>> probably doesn't make sense to call tail without stdout open. there's
>> no obvious reason to require that stdin or stderr must be open.

server% tail /COPYRIGHT <&-
Assertion failed: (procfd > STDERR_FILENO), function service_clean, file 
/usr/src/lib/libcasper/libcasper/service.c, line 394.
tail: unable to init casper: Socket is not connected

-- 
Peter Jeremy


signature.asc
Description: PGP signature


tail(1) broken in 13-stable

2021-05-06 Thread Peter Jeremy via freebsd-stable
Since updating from 12-stable to 13-stable, I've found that tail(1)
crashes, reporting:
Assertion failed: (procfd > STDERR_FILENO), function service_clean, file 
/usr/src/lib/libcasper/libcasper/service.c, line 394.
tail: unable to init casper: Socket is not connected
unless all three of stdin, stdout and stderr are open.  Whilst it
probably doesn't make sense to call tail without stdout open. there's
no obvious reason to require that stdin or stderr must be open.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Congratulations on the stable/13 release!

2021-04-30 Thread Peter Libassi


> 1 maj 2021 kl. 03:45 skrev Andrew Reilly :
> 
> In case anyone's interested: for this morning's software maintenance 
> session (at home) I upgraded my file server from FreeBSD stable/12
> to the recently released stable/13.  From source, in-place, on a
> running, on-line system.  Despite the fact that the entire ZFS
> subsystem has been replaced, which is what caused me to wait for a
> couple of weeks, the upgrade appears to have been flawless.  Not a
> single error message on boot-up.  Not a single failed service.
> Everything is working perfectly.  Zpool status told me that I should
> upgrade the pools, and did: that turned on a dozen or so new features
> that I'm sure are useful.  Total downtime about a minute or so:
> just the time it took to reboot.  I'm amazed.  Good on the FreeBSD
> developers and (especially) the release engineers!
> 
> cd /usr/src
> git switch stable/13
> make -s -j20 buildworld kernel
> mergemaster -p
> make -s installworld
> mergemaster -U
> shutdown -r now
> 
> zpool status
> zpool upgrade backup20
> zpool upgrade root
> zpool upgrade tank
> 
> Done!
> 
> Cheers,
> 
> Andrew
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org”


I will join the gratulations! I’ve also upgraded my home NAS server and my 
remote Backup server without a glitch including upgrade of ZFS and all ports. 
Everything now works as on 12.2.

freebsd-update -r 13.0-RELEASE upgrade
/usr/sbin/freebsd-update install
fix /etc/ssh/sshd_conf
shutdown -r now
freebsd-update install
pkg-static install -f pkg
pkg bootstrap -f
pkg update
pkg upgrade
freebsd-update install
shutdown -r now
zpool upgrade nas

Excellent work from the FreeBSD team!

Thanks
Peter

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-27 Thread Peter Ankerstål

>>> 
>> I can 
>> It looks like there’s some confusion inside pfctl about the network group. 
>> It ends up in pfctl_parser.c, append_addr_host(), and expects an AF_INET or 
>> AF_INET6, but instead gets an AF_LINK.
>> 
>> It’s probably related to 250994 or possibly 
>> d2568b024da283bd2b88a633eecfc9abf240b3d8.
>> Either way it’s pretty deep in a part of the pfctl code I don’t much like. 
>> I’ll try to poke at it some more over the weekend.
>> 
> It should be fixed as of d5b08e13dd6beb3436e181ff1f3e034cc8186584 in main. 
> I’ll MFC that in about a week, and then it’ll turn up in 13.1 in the fullness 
> of time.

Nice thanks. 

I also seem to have problem even in anchors (not while using tables). But maybe 
this will also be fixed by this change.



smime.p7s
Description: S/MIME cryptographic signature


Re: zfs native encryption best practices on RELENG13

2021-04-23 Thread Peter Libassi


> 23 apr. 2021 kl. 23:23 skrev Xin Li via freebsd-stable 
> :
> 
> On 4/23/21 13:53, mike tancsa wrote:
>> Starting to play around with RELENG_13 and wanted explore ZFS' built in
>> encryption.  Is there a best practices doc on how to do full disk
>> encryption anywhere thats not GELI based  ?  There are lots for 
>> GELI,
>> but nothing I could find for native OpenZFS encryption on FreeBSD
>> 
>> i.e box gets rebooted, enter in passphrase to allow it to boot kind of
>> thing from the boot loader prompt ?
> 
> I think loader do not support the native OpenZFS encryption yet.
> However, you can encrypt non-essential datasets on a boot pool (that is,
> if com.datto:encryption is "active" AND the bootfs dataset is not
> encrypted, you can still boot from it).
> 
> BTW instead of entering passphrase at loader prompt, if / is not
> encrypted, it's also possible to do something like
> https://lists.freebsd.org/pipermail/freebsd-security/2012-August/006547.html
> .
> 
> Personally I'd probably go with GELI (or other kind of full disk
> encryption) regardless if OpenZFS's native encryption is used because my
> primary goal is to be able to just throw away bad disks when they are
> removed from production [1].  If the pool is not fully encrypted, there
> is always a chance that the sensitive data have landed some unencrypted
> datasets and never gets fully overwritten.
> 
> [1] Also keep in mind: https://xkcd.com/538/
> 
> Cheers,
> 
Yes, I’ve come to the same conclusion. This should be used on a data-zpool and 
not on the system-pool (zroot). Encryption is per dataset. Also if found that 
if the encrypted dataset is not mounted of some reason you will be writing to 
the parent unencrypted dataset.. At least it works for encrypted thumb_drive, i 
just posted this quick guide 
https://forums.freebsd.org/threads/freebsd-13-openzfs-encrypted-thumb-drive.80008/
 
<https://forums.freebsd.org/threads/freebsd-13-openzfs-encrypted-thumb-drive.80008/>

/Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-14 Thread Peter Ankerstål

const { trusted:network mgmt:network dmz:network
>> guest:network edmz:network \
>>admin:network iot:network client:network }
>> If I reload the configuration I get the following:
>> # pfctl -f /etc/pf.conf
>> /etc/pf.conf:12: cannot create address buffer: Invalid argument
>> pfctl: Syntax error in config file: pf rules not loaded
> Some changes in the pf source have been made over the last couple
> of months. The error returned appears to be related. It appears
> that your running into a table size/count and memory allocation
> related error. The first change moved/changed memory allocation to
> kernel space, requiring one to increase allocation via loader.conf(5).
> It was recently moved back to userspace allowing one to make changes
> to a running system via sysctl.conf(5) or the commandline.
> IOW if your on the recent change you should be able to simply
> increase your table count by executing something like:
> # echo "set limit table-entries " | pfctl -m -f -
> OTOH if your stuck with the change in kernelspace, increase
> net.pf.request_maxcount=
> by some amount in loader.conf(5). If you are on the newer userspace
> change, you can issue the sysctl(8) command at your terminal for
> net.pf.request_maxcount=
> as well.

I dont think so. Everything works normally if I switch from group name to 
interface name
in the config. 

It seems to me that pf for some reason changed how it interprets group names 
differently from
12.2-RELEASE-p4 and 13.0-RELEASE. 

I dont really get how "anchor in from trusted:network” can resolve to "anchor 
in inet6 all”

/Peter.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-14 Thread Peter Ankerstål



> On 14 Apr 2021, at 16:16, Peter Ankerstål  wrote:
> 
> In pf I use the interface group syntax alot to make the configuration more 
> readable. All interfaces are assigned to a group representing its use/vlan 
> name. 

It seems that the rest of my ruleset is also affected by this, and interface 
groups combined with :network no longer work.

For example I have this anchor:
anchor in from trusted:network {
}

which before resolved to 
anchor in inet from 172.25.0.0/24 to any {
}

but now resolves to:
anchor in inet6 all {
}

/Peter.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


using interface groups in pf tables stopped working in 13.0-RELEASE

2021-04-14 Thread Peter Ankerstål
In pf I use the interface group syntax alot to make the configuration more 
readable. All interfaces are assigned to a group representing its use/vlan 
name. 

For example:

ifconfig_igb1_102="172.22.0.1/24 group iot description 'iot vlan' up"
ifconfig_igb1_102_ipv6="inet6 2001:470:de59:22::1/64"

ifconfig_igb1_300="172.26.0.1/24 group mgmt description 'mgmt vlan’ up"
ifconfig_igb1_300_ipv6="inet6 2001:470:de59:26::1/64”

in pf.conf I use these group names all over the place. But since I upgraded to 
13.0-RELEASE it no longer works to define a table using the :network syntax and 
interface groups:

tableconst { trusted:network mgmt:network dmz:network 
guest:network edmz:network \
admin:network iot:network client:network }

If I reload the configuration I get the following:
# pfctl -f /etc/pf.conf
/etc/pf.conf:12: cannot create address buffer: Invalid argument
pfctl: Syntax error in config file: pf rules not loaded

I have tried to use just one network, double check the interface group setting 
and so on, but with no luck. 

to use actual interface works just fine:

table{ igb1.300:network }

but using the group fails:

# ifconfig -g mgmt
igb1.300

table{ mgmt:network }

# pfctl -f /etc/pf.conf
/etc/pf.conf:12: cannot create address buffer: Invalid argument
pfctl: Syntax error in config file: pf rules not loaded

Any ideas? 

Thanks!

/Peter.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


JMicron jms561 umass on arm64?

2021-04-07 Thread Peter Cornelius
G'day, folks,
 
Is there, by chance, anyone out there who has a JMicron jms561-based USB3 
'umass' kind of device up & running who can share experience or quirks, please?
 
I'm trying to get mine [2] to work under FreeBSD [3] but it does not even show 
up with usbconfig list. While, with Raspbian, I was able to make it work easily.
 
Thanks,
 
Peter.
 
---
 
[1] I believe, 
https://www.jmicron.com/file/download/1026/JMS561_Product+Brief.pdf
[2] 
https://wiki.radxa.com/Dual_Quad_SATA_HAT[https://wiki.radxa.com/Dual_Quad_SATA_HAT]
[3] Note: Later builds so far have not booted despite of current u-boot (March 
2021)
    FreeBSD rpi4 14.0-CURRENT FreeBSD 14.0-CURRENT #1: Tue Feb 23 02:30:31 UTC 
2021
    root@rpi4:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: geli broken in 13.0-BETA4 and later on armv8

2021-03-06 Thread Peter Jeremy via freebsd-stable
On 2021-Mar-06 10:39:02 -0800, Oleksandr Tymoshenko  wrote:
>Peter Jeremy via freebsd-current (freebsd-curr...@freebsd.org) wrote:
>> [Adding arm@ and making it clearer that this is armv8-only]
>> 
>> On 2021-Mar-06 20:26:19 +1100, Peter Jeremy  
>> wrote:
>> >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
>> > wrote:
>> >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>> >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>> >>RK3399, arm64) has changed so that a geli-encrypted partition (using
>> >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>> >>13.0-BETA4.
>> >
>> >I've confirmed that the problem is f76393a6305b - reverting that
>> >commit fixes the problem in releng/13.0.
>> >
>> >I've further verified that the bug is still present in main (14.x)
>> >at 028616d0dd69.
>
>Could you test this patch and let me know if it fixes the issue?
>
>https://people.freebsd.org/~gonzo/patches/armv8crypto-xts-fix.diff

Yes, it does.  Thank you very much.

--- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: geli broken in 13.0-BETA4 and later on armv8

2021-03-06 Thread Peter Jeremy via freebsd-stable
[Adding arm@ and making it clearer that this is armv8-only]

On 2021-Mar-06 20:26:19 +1100, Peter Jeremy  wrote:
>On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
> wrote:
>>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>>RK3399, arm64) has changed so that a geli-encrypted partition (using
>>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>>13.0-BETA4.
>
>I've confirmed that the problem is f76393a6305b - reverting that
>commit fixes the problem in releng/13.0.
>
>I've further verified that the bug is still present in main (14.x)
>at 028616d0dd69.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: geli broken in 13.0-BETA4 and later

2021-03-06 Thread Peter Jeremy via freebsd-stable
On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
 wrote:
>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>RK3399, arm64) has changed so that a geli-encrypted partition (using
>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>13.0-BETA4.

I've confirmed that the problem is f76393a6305b - reverting that
commit fixes the problem in releng/13.0.

I've further verified that the bug is still present in main (14.x)
at 028616d0dd69.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


geli broken in 13.0-BETA4 and later

2021-03-06 Thread Peter Jeremy via freebsd-stable
Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
RK3399, arm64) has changed so that a geli-encrypted partition (using
AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
13.0-BETA4.

I've verified that the garbage seems consistent between reboots and
isn't impacted by enabling the big cores in 7ba4d0f82955.  There's
nothing useful reported via geli debugging.

I've tried updating to releng/13.0 60e8939aa85b and it's still broken.

My suspicion is f76393a6305b - whilst that just talks about AES-GCM,
it does a reasonable job of roto-tilling the entire armv8crypto stack.
I notice that there are a fixes to f76393a6305b that don't seem to
have made it into releng/13.0 and I will continue to investigate.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Rasberry Pi 4 has no USB

2021-02-28 Thread Peter Cornelius
[Retransmssion, stupid HTML mails]

G'day, Carl,
 
On 2/28/21 11:26 AM, Carl Johnson wrote:
> I have an 8GB RPi 4B that I am trying out, but it has no USB response at
> all.  [...]
 
I second the suggestion to do a brief trial with Raspbian or so.  I currently 
run mine under -current as below, but can confirm that I have been able to run 
it since 13 with USB keyboard and mouse. I currenly am logged in via the net, 
so without further USB devices, at least the hubs should be present:
 
[root@rpi4 ~]# usbconfig  
ugen0.1: <0x1106 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) 
pwr=SAVE (0mA)
ugen0.2:  at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE (100mA)
 
[root@rpi4 ~]# uname -a
FreeBSD rpi4 14.0-CURRENT FreeBSD 14.0-CURRENT #1: Tue Feb 23 02:30:31 UTC 2021 
    root@rpi4:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC  arm64
 
I can share a 14.0 image I used to initially boot if you have somewhere to push 
it (FreeBSD-aarch64-14.0-GENERIC-a63eae65ff8-RaspberryPi4.img, seven and a half 
gig, roughly; an xz. I'd expect at around two).
 
I did, though, have to do a couple of tries with Raspbian initially to put a 
recent firmware onto my RPI4B8GB.
 
HOWEVER.
 
I also have an odd observation which is that there is NO OTHER USB device BUT 
the above and my mouse/keyboard. This is quite odd since I do have more devices 
connected (most notably a umass JMicron-JMS561-based device [1]) and naively 
hoped that also on arm, USB devices would at least show up as ugen-something. 
Well, at least keyboard and mouse did work, did they not?
 
I did boot with rasbian, all comes to life instantly and beautifully. I even 
was able to update some firmware on the JMS-thingo, see the disks, all 
blinkenlights there. Back to FreeBSD -- again, silence. I now really am 
scratching my head and also have run out of ideas... though kind of luxury, 
compared to you, I guess.
 
Any help appreciated, and Carl, as written, if you want my image, where should 
I put it?
 
Cheers,
 
Peter.
 
[1] https://www.jmicron.com/file/download/1033/JMS561_Product+Brief.pdf
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Rasberry Pi 4 has no USB

2021-02-28 Thread Peter Cornelius


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: lots of "no such file or directory" errors in zfs filesystem

2021-02-23 Thread Peter Jeremy via freebsd-stable
On 2021-Feb-23 11:30:58 -0600, Chris Anderson  wrote:
>nope, it led a pretty boring life. that zfs filesystem was created on that
>server and has been on the same two mirrored disks for its lifetime.

Does the server have ECC RAM?  Possibly it's a bitflip somewhere before
the data got to disk.

>prior to the upgrades) the server does have a relatively modest amount of
>ram (2GB). dunno if that makes it more likely that these kinds of issues
>get triggered.

Low amounts of RAM are going to increase the IO load but shouldn't
otherwise impact the filesystem consistency.  I have a FreeBSD test
system that's running ZFS in <1GB RAM and rebuilding itself daily for
multiple years and haven't run into any ZFS corruption issues.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: stable/13 buildworld fail

2021-02-20 Thread Peter Cornelius


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


stable/13 buildworld fail

2021-02-20 Thread Peter Cornelius
G'day, folks,

Is anyone else seeing something like below? (Because if not, I messed up 
something and need to continue to search for what it may have been...)

Last change, I believe was commit ced29ea4fb42a70301ba0770ec23e350155289f1 
(HEAD -> stable/13, origin/stable/13), last successful build some three weeks 
ago, stable/13-c256239-gb06fd805cc8.

Thanks,

Peter.

--- 

FreeBSD walkabout 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 #0 
stable/13-c256239-gb06fd805cc8: Mon Feb  1 13:51:31 CET 2021 
root@walkabout:/usr/obj/usr/Src-stable-13/amd64.amd64/sys/GENERIC  amd64

---

c++  -O2 -pipe -fno-common 
-I/usr/obj/usr/Src-stable-13/amd64.amd64/tmp/obj-tools/lib/clang/libllvm 
-I/usr/Src-stable-13/lib/clang/include -I/usr/Src-stab
le-13/contrib/llvm-project/llvm/include -D__STDC_CONSTANT_MACROS 
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DHAVE_VCS_VERSION_INC -DNDEBUG 
-DLLVM_DEFAU
LT_TARGET_TRIPLE=\"x86_64-unknown-freebsd13.0\" 
-DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd13.0\" 
-DDEFAULT_SYSROOT=\"/usr/obj/usr/Src-stable-13/amd64.amd
64/tmp\" -DLLVM_TARGET_ENABLE_X86 
-DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser 
-DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DI
SASSEMBLER=LLVMInitializeX86Disassembler 
-DLLVM_NATIVE_TARGET=LLVMInitializeX86Target 
-DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_T
ARGETMC=LLVMInitializeX86TargetMC -ffunction-sections -fdata-sections 
-gline-tables-only -MD -MF.depend.FixedLenDecoderEmitter.o 
-MTFixedLenDecoderEmitter
.o -Wno-format-zero-length -Wno-empty-body -Wno-string-plus-int 
-Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value 
-Wno-parentheses-eq
uality -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef 
-Wno-address-of-packed-member -Wno-switch -Wno-switch-enum -Wno-knr-promoted-pa
rameter -Wno-parentheses -Qunused-arguments 
-I/usr/obj/usr/Src-stable-13/amd64.amd64/tmp/legacy/usr/include  
-fno-exceptions -fno-rtti -std=c++14-stdl
ib=libc++ -Wno-c++11-extensions   -c 
/usr/Src-stable-13/contrib/llvm-project/llvm/utils/TableGen/FixedLenDecoderEmitter.cpp
 -o FixedLenDecoderEmitter.o
c++  -O2 -pipe -fno-common 
-I/usr/obj/usr/Src-stable-13/amd64.amd64/tmp/obj-tools/lib/clang/libllvm 
-I/usr/Src-stable-13/lib/clang/include -I/usr/Src-stab
le-13/contrib/llvm-project/llvm/include -D__STDC_CONSTANT_MACROS 
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DHAVE_VCS_VERSION_INC -DNDEBUG 
-DLLVM_DEFAU
LT_TARGET_TRIPLE=\"x86_64-unknown-freebsd13.0\" 
-DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd13.0\" 
-DDEFAULT_SYSROOT=\"/usr/obj/usr/Src-stable-13/amd64.amd
64/tmp\" -DLLVM_TARGET_ENABLE_X86 
-DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser 
-DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DI
SASSEMBLER=LLVMInitializeX86Disassembler 
-DLLVM_NATIVE_TARGET=LLVMInitializeX86Target 
-DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_T
ARGETMC=LLVMInitializeX86TargetMC -ffunction-sections -fdata-sections 
-gline-tables-only -MD -MF.depend.GICombinerEmitter.o -MTGICombinerEmitter.o 
-Wno-fo
rmat-zero-length -Wno-empty-body -Wno-string-plus-int 
-Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value 
-Wno-parentheses-equality -Wn
o-unused-function -Wno-enum-conversion -Wno-unused-local-typedef 
-Wno-address-of-packed-member -Wno-switch -Wno-switch-enum 
-Wno-knr-promoted-parameter -W
no-parentheses -Qunused-arguments 
-I/usr/obj/usr/Src-stable-13/amd64.amd64/tmp/legacy/usr/include  
-fno-exceptions -fno-rtti -std=c++14-stdlib=libc++
-Wno-c++11-extensions   -c 
/usr/Src-stable-13/contrib/llvm-project/llvm/utils/TableGen/GICombinerEmitter.cpp
 -o GICombinerEmitter.o
In file included from 
/usr/Src-stable-13/contrib/llvm-project/llvm/utils/TableGen/GICombinerEmitter.cpp:26:
/usr/Src-stable-13/contrib/llvm-project/llvm/utils/TableGen/GlobalISel/GIMatchDag.h:13:10:
 fatal error: 'GIMatchDagInstr.h' file not found
#include "GIMatchDagInstr.h"
^~~
1 error generated.
*** Error code 1

Stop.
make[3]: stopped in /usr/Src-stable-13/usr.bin/clang/llvm-tblgen
*** Error code 1

Stop.
make[2]: stopped in /usr/Src-stable-13
*** Error code 1

Stop.
make[1]: stopped in /usr/Src-stable-13
*** Error code 1

Stop.
make: stopped in /usr/Src-stable-13
 
 
 
 
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: No more update on stable/12

2020-12-31 Thread Peter Blok
I just removed the previous workspace and did git clone -b stable/12 --depth 1 
https://git.freebsd.org/src.git src 

> On 30 Dec 2020, at 21:18, Tj  wrote:
> 
> Wait, what exactly did you do btw? I tried just switching the url and now
> i'm  getting "your branch and Origin/stable/12 have diverged, and have
> 261519 and 243116 different commits each, respectively" is the github
> stable/12 not the same as the freebsd.org one?
> 
> On Wed, 30 Dec 2020 at 08:30, Peter Blok  wrote:
> 
>> I was using github instead of the git repository below.
>> 
>> Thx. All set now
>> 
>> Peter
>> 
>>> On 30 Dec 2020, at 10:32, Yasuhiro Kimura  wrote:
>>> 
>>> From: Peter Blok 
>>> Subject: No more update on stable/12
>>> Date: Wed, 30 Dec 2020 10:24:27 +0100
>>> 
>>>> I switched to git, but I noticed there were no MFC or any other updates
>> over the last couple of days after the migration. git fetch doesn’t show
>> any change.
>>>> 
>>>> Is this correct and just the learning process of git or is something
>> wrong with my git clone?
>>> 
>>> Which repository do you use?. Currently new canonical src repository
>>> (https://git.freebsd.org/src.git) is updated but not mirrored to
>>> GitHub yet.
>>> 
>>> ---
>>> Yasuhiro Kimura
>>> ___
>>> freebsd-stable@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org
>> "
>> 
>> 
> 
> -- 
> --
> -Tj Hariharan
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



smime.p7s
Description: S/MIME cryptographic signature


Re: No more update on stable/12

2020-12-30 Thread Peter Blok
I was using github instead of the git repository below.

Thx. All set now

Peter

> On 30 Dec 2020, at 10:32, Yasuhiro Kimura  wrote:
> 
> From: Peter Blok 
> Subject: No more update on stable/12
> Date: Wed, 30 Dec 2020 10:24:27 +0100
> 
>> I switched to git, but I noticed there were no MFC or any other updates over 
>> the last couple of days after the migration. git fetch doesn’t show any 
>> change.
>> 
>> Is this correct and just the learning process of git or is something wrong 
>> with my git clone?
> 
> Which repository do you use?. Currently new canonical src repository
> (https://git.freebsd.org/src.git) is updated but not mirrored to
> GitHub yet.
> 
> ---
> Yasuhiro Kimura
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



smime.p7s
Description: S/MIME cryptographic signature


No more update on stable/12

2020-12-30 Thread Peter Blok
Hi,

I switched to git, but I noticed there were no MFC or any other updates over 
the last couple of days after the migration. git fetch doesn’t show any change.

Is this correct and just the learning process of git or is something wrong with 
my git clone?

Peter




smime.p7s
Description: S/MIME cryptographic signature


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Wed, Dec 09, 2020 at 02:00:37PM +1100, Dewayne Geraghty wrote:

! On a jail with config:
! exec.start = "/bin/sh -x /etc/rc";
! exec.stop = "/bin/sh /etc/rc.shutdown";
! exec.clean;
! 
! test_prod  { jid=7; persist; ip4.addr =
! "10.0.7.96,10.0.5.96,127.0.5.96"; devfs_ruleset = "6";
! host.hostuuid=---0001-0302; host.hostid=000302; }
! 
! I successfully performed
! for i in `seq 10`; do jail -vc test_prod; sleep 3; jail -vr test_prod; done

But, this is not a VIMAGE jail, is it?
Old-style jails are unaffected by this issue. Only VIMAGE jails, using
epair or netgraph, might be affected. (In that case, you would not
have an "ip4.addr" configured, and rather a "vnet.interface".)

! I think the normal use of jail.conf is to NOT explicitly use a jid in
! the definition, which may be why this may not have been picked up?
! (Maybe a clue).

This is an interesting point. When you stop a jail, it may stay for
a more or less long time in a "dying" state (visible with "jls -d"),
keeping the jid occupied. During that time, the jail cannot be
restarted with that same jid.
Once ago, I read people complaining about this, and the advice was to
just not define the jid in the definition, so that the jail can be
restarted immediately (and will probably grab another jid).

I did not find a solid explanation for what is happening in that
"dying" state (and why it does take more or less long), even less
an approach to fix that. I found some theories circling the net, but
these don't really figure. So I would need to look into the source
myself - and I did postpone that indefinitely. ;)

But what I found out, with the VIMAGE jails (those that can carry
their own network interfaces), when you make a slight mistake with
managing and handling the interfaces, then the jail will stay in the
dying state forever. If you don't make a mistake, then it will finally
die within some time.
So I decided to keep the jid, so that rightaway nothing is allowed to
linger from misconfigured unnoticed. (The tradeoff is obviousely that
one might have to wait before restarting.)

cheerio,
PMc

P.S. 41 celsius is phantastic! I envy You! :)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:
 
! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to reproduce

Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way. 

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?). 

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! > 
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop

> exec.poststop = "
>sleep 6 ;
>/usr/sbin/ngctl shutdown ${ifname1l}: ;
>";

helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can trigger
! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A

! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.

So lets try that:
We know that there is a problem with taking down an interface from a
VIMAGE, in the way it is done by "jail -r". We know this problem can
be solidly workarounded by delaying the interface takedown for a short
time.

Now with Your patch, we do not get the typical crash at interface
takedown. Instead, all of a sudden, there are strange crashes from
various other places. And, interestingly, we get these also when
STARTING a jail.

I think this is not an additional problem, it is instead a valuable
information (albeit not the one You might like to get).

Furthermore, we get these new crashes always invoked by "ifconfig",
and they seem to have in common that somebody tries to obtain
information about some interface configuration and receives some
bogus. I might conclude, just out of the belly without looking into
details, that either
 - your patch achieves to garble some internal interface data,
   instead of what it is intended to do, or
 - the original problem manages to garble internal interface data
   (leading to the usual crash), and Your patch does not achieve to
   solve this, but only protects from the immediate consequence.

It might also be worth consideration, that, while the problem may be
more easy to reproduce with epair, this effect may or may not be a
netgraph specific one[2].

Now lets keep in mind that a successful test means EXACTLY NOTHING.
By which other means can we confirm that Your patch fully achieves
what it is intended for? (E.g. something like dumping and verifying
the respective internal tables in-vivo)

(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.
Therefore, experimenting on any of them creates considerable pain.
I'm working on that issue, trying to get a real server board for the
backend so to get the current one free for testing - but what I would
like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would
easily find on yardsales - and seldom for an acceptable price.)


cheerio,
PMc

[1] Rationale: a failing test tells us that either the test or the
application has a bug (50/50 chance). A succeeding test tells us
that 1 equals 1, which we knew already before.
In fact, tests tell us *nothing at all* about the state of our
code, and specifically, 'successful' outcomes do NOT mean that
things are all correct.
The only true usefulness of tests is to protect against
re-introducing a fault that was already fixed before,
i.e. regressions.

[2] My netgraph configuration consists of bringing up some bridges
and then attaching the jails to them.

Here is the bridge starter (only respective component,
there are more of these populated, but probably not influencing
the issue):

#! /bin/sh

# PROVIDE: netgraphs
# REQUIRE: netwait
# BEFORE: NETWORKING

. /etc/rc.subr

name="netgraphs"
start_cmd="${name}_start"
stop_cmd="${name}_stop"

load_rc_config $name

netgraphs_graphs="svc"

netgraphs_svc_if1_name="nge_svc_1u"
netgraphs_svc_if1_mac="00:1d:92:01:02:01"
netgraphs_svc_if1_addr="***.***.***.***/29"

netgraphs_svc_start()
{
local _ifname
if ngctl info svcswitch: > /dev/null 2>&1; then
netgraphs_svc_stop
fi

echo "Creating SVC Switch"
ngctl -f - < /dev/null 2>&1; then
$_cmd
else
echo "netgraphs-start: object $i not found" >&2
fi
done

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
Here is the next funny crashdump - I obtained this one twice
and also the sysctl_rtsock() again.

I can reproduce this by just starting and stopping a most simple jail
that does only
exec.start = "/bin/sleep 4 &";
(And as usual, when I let it time out, nothing bad happens.)


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 02
instruction pointer = 0x20:0x80a2ac45
stack pointer   = 0x28:0xfe0047cf2890
frame pointer   = 0x28:0xfe0047cf2890
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13557 (ifconfig)
trap number = 9
panic: general protection fault
cpuid = 1
time = 1607469295
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0
vpanic() at vpanic+0x17b/frame 0xfe0047cf25f0
panic() at panic+0x43/frame 0xfe0047cf2650
trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0
trap() at trap+0x67/frame 0xfe0047cf27c0
calltrap() at calltrap+0x8/frame 0xfe0047cf27c0
--- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 
0xfe0047cf2890 ---
strncmp() at strncmp+0x15/frame 0xfe0047cf2890
ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0
ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe3b8, rbp = 0x7fffe450 ---
Uptime: 8m54s
Dumping 880 out of 3959 MB:
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s most easily
! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch

Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0
vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 
0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810
ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---
Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: 
Broken pipe

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607451910
KDB: stack backtrace:
db_trace_self_wrapper() at 

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Peter

Hi Kristof,
  it's great to read You!
  
On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote:

! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 244703,
! 250870.

epair? No. It is purely Netgraph here.

! I pushed a fix for that in CURRENT in r368237. It’s scheduled to go into
! stable/12 sometime next week, but it’d be good to know that it fixes your
! problem too before I merge it.
! In other words: can you test a recent CURRENT? It’s likely fixed there, and
! if it’s not I may be able to fix it quickly.


Oh my Gods. No offense meant, but this is not really a good time
for that. This is the most horrible upgrade I experienced in 25 years
FreeBSD (and it was prepared, 12.2 did run fine on the other machine).

I have issue with mem config
https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/
I have issue with damaged filesystem, for no apparent reason
https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/

Then I have this issue here which is now gladly workarounded
https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365

and when I then dare to have a look at my applications, they look like
sheer horror, segfaults all over, and I don't even know where to begin
with these.


Other option: can you make this fix so that I can patch it into 12.2
source and just redeploy?

I tried to apply the changes from r368237 into my 12.2 source, that
seemed to be quite obvious, but it doesn't work; jails fail to remove
entirely:

# service jail stop rail
Stopping jails: rail.
# jexec rail
jexec: jail "rail" not found

-> it works once.

# service jail start rail
Starting jails: rail.
# service jail stop rail
Stopping jails: rail.
# jexec rail
root@rail:/ # ps ax
ps: empty file: Invalid argument

-> And here it doesn't work anymore, and leaves a skull of a jail
   one cannot get rid of.


Cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Analyzing kernel panic from VIMAGE/Netgraph takedown

2020-12-07 Thread Peter


Stopping a VIMAGE+Netgraph jail in 12.2 in the same way as it
did work with Rel. 11.4, crashes the kernel after 2 or 3 start/stop
iterations.

Specifically. this does not work:

  exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:";

Also this new option from Rel.12 does not work either, it just
gives a few more iterations:

  exec.release = "/usr/sbin/ngctl shutdown ${ifname1l}:";

What seems to work is adding a delay:

  exec.poststop = "
  sleep 2 ;
  /usr/sbin/ngctl shutdown ${ifname1l}: ;
  ";

The big question now is: how long should the delay be?

This example did run a test with 100 start/stop iterations. But then,
on a loaded machine stopping a jail that had been running for a few
months, is an entirely different matter: in such a case the jail will
spend hours in "dying" state, while in this test the jid became
instantly free for restart.

In any case, as all this did work flawlessly with Rel. 11.4, there
is now something broken in the code, and should be fixed.

PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Peter


After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do
no longer work correctly.

Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style,
and do not work properly.
All did work flawlessly for nearly a year with Rel.11.

If I start 2-3 jails, and then stop them again, there is always a
panic.
Also reproducible with GENERIC kernel.

Can this be fixed, or do I need to revert to 11.4?

The backtrace looks like this:

#4 0x810bbadf at trap_pfault+0x4f
#5 0x810bb23f at trap+0x4cf
#6 0x810933f8 at calltrap+0x8
#7 0x80cdd555 at _if_delgroup_locked+0x465
#8 0x80cdbfbe at if_detach_internal+0x24e
#9 0x80ce305c at if_vmove+0x3c
#10 0x80ce3010 at vnet_if_return+0x50
#11 0x80d0e696 at vnet_destroy+0x136
#12 0x80ba781d at prison_deref+0x27d
#13 0x80c3e38a at taskqueue_run_locked+0x14a
#14 0x80c3f799 at taskqueue_thread_loop+0xb9
#15 0x80b9fd52 at fork_exit+0x82
#16 0x8109442e at fork_trampoline+0xe

This is my typical jail config, designed and tested with Rel.11:

rail {
jid = 10;
devfs_ruleset = 11;
host.hostname = "xxx.xxx.xxx.org";
vnet = "new";
sysvshm;
$ifname1l = nge_${name}_1l;
$ifname1l_mac = 00:1d:92:01:01:0a;
vnet.interface = "$ifname1l";
exec.prestart = "
echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\" \
| /usr/sbin/ngctl -f -
/usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2
ifname=`/usr/sbin/ngctl msg ${ifname1l}: getifname | \
awk '$1 == \"Args:\" { print substr($2, 2, length($2)-2)}'`
/sbin/ifconfig \$ifname name $ifname1l
/sbin/ifconfig $ifname1l link $ifname1l_mac
";
exec.poststart = "
/usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3 ;
";
exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:";
}
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Help! 12.2 mem ctrl knob missing, might need 3 times more memory

2020-12-06 Thread Peter
Hiya,

 after upgrading 11.4 -> 12.2, I get this error:

> sysctl: unknown oid 'vm.pageout_wakeup_thresh' at line 105

How do I adjust the paging now? The ARC is much too small:

Mem: 1929M Active, 109M Inact, 178M Laundry, 1538M Wired, 37M Buf, 88M Free
ARC: 729M Total, 428M MFU, 154M MRU, 196K Anon, 25M Header, 122M Other
 118M Compressed, 533M Uncompressed, 4.52:1 Ratio
Swap: 10G Total, 1672M Used, 8567M Free, 16% Inuse

With 11.4 there was 200M active, 2500M wired, 4200M swap and the ARC
stayed filled to the configured arc_max. And there are not even all
applications loaded yet!

Config: installed 4G ram, application footprint ~11G.

vm.pageout_wakeup_thresh=11000 # default 6886
vm.v_inactive_target=48000 # default 1.5x vm.v_free_target
vfs.zfs.arc_grow_retry=6 # override shrink-event from pageout (every 10sec.)

I did this intentional: the ram is over-used with applications. These
applications are rarely accessed, but should respond to the network. So
they are best accomodated in paging space - taking a few seconds for
page-in at first access does not matter, and not many of them are
accessed at the same time. 

So, I want the machine to page out *before* shrinking the ARC, because
pageout is a normal happening in this layout. The above tuning
achieved exactly that, but now in 12.2 it seems missing.
Without that I would need to install the full 12G RAM, which is just a
waste. 

How do I get this behaviour back with 12.2?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Commit 367705+367706 causes a pabic

2020-11-23 Thread Peter Blok
Kristof,

It’s from the 2nd situation. It is so weird. Last time there was ipsec code in 
the backtrace, which wasn’t used on the bridge+members.

This is from my own kernel config, but during testing with the GENERIC kernel I 
had similar backtraces at reboot.

I can’t do a lot right now, but I’m planning to:

- build kernel with -O0
- do the deletem of the epair manually

I’ll get back to you if I find something.

Peter

> On 23 Nov 2020, at 12:15, Kristof Provost  wrote:
> 
> Peter,
> 
> Is that backtrace from the first or the second situation you describe? What 
> kernel config are you using with that backtrace?
> 
> This backtrace does not appear to involve the bridge. Given that part of the 
> panic message is cut off it’s very hard to conclude anything at all from it.
> 
> Best regards,
> Kristof
> 
> On 23 Nov 2020, at 11:52, Peter Blok wrote:
> 
>> Kristof,
>> 
>> With commit 367705+367706 and if_bridge statically linked. It crashes while 
>> adding an epair of a jail.
>> 
>> With commit 367705+367706 and if_bridge dynamically loaded there is a crash 
>> at reboot
>> 
>> #0 0x8069ddc5 at kdb_backtrace+0x65
>> #1 0x80652c8b at vpanic+0x17b
>> #2 0x80652b03 at panic+0x43
>> #3 0x809c8951 at trap_fatal+0x391
>> #4 0x809c89af at trap_pfault+0x4f
>> #5 0x809c7ff6 at trap+0x286
>> #6 0x809a1ec8 at calltrap+0x8
>> #7 0x8079f7ed at ip_input+0x63d
>> #8 0x8077a07a at netisr_dispatch_src+0xca
>> #9 0x8075a6f8 at ether_demux+0x138
>> #10 0x8075b9bb at ether_nh_input+0x33b
>> #11 0x8077a07a at netisr_dispatch_src+0xca
>> #12 0x8075ab1b at ether_input+0x4b
>> #13 0xffff8077a80b at swi_net+0x12b
>> #14 0xffff8061e10c at ithread_loop+0x23c
>> #15 0x8061afbe at fork_exit+0x7e
>> #16 0x809a2efe at fork_trampoline+0xe
>> 
>> Peter
>> 
>>> On 21 Nov 2020, at 17:22, Peter Blok  wrote:
>>> 
>>> Kristof,
>>> 
>>> With a GENERIC kernel it does NOT happen. I do have a different iflib 
>>> related panic at reboot, but I’ll report that separately.
>>> 
>>> I brought the two config files closer together and found out that if I 
>>> remove if_bridge from the config file and have it loaded dynamically when 
>>> the bridge is created, the problem no longer happens and everything works 
>>> ok.
>>> 
>>> Peter
>>> 
>>>> On 20 Nov 2020, at 15:53, Kristof Provost  wrote:
>>>> 
>>>> I still can’t reproduce that panic.
>>>> 
>>>> Does it happen immediately after you start a vnet jail?
>>>> 
>>>> Does it also happen with a GENERIC kernel?
>>>> 
>>>> Regards,
>>>> Kristof
>>>> 
>>>> On 20 Nov 2020, at 14:53, Peter Blok wrote:
>>>> 
>>>>> The panic with ipsec code in the backtrace was already very strange. I 
>>>>> was using IPsec, but only on one interface totally separate from the 
>>>>> members of the bridge as well as the bridge itself. The jails were not 
>>>>> doing any ipsec as well. Note that panic was a while ago and it was after 
>>>>> the 1st bridge epochification was done on stable-12 which was later 
>>>>> backed out
>>>>> 
>>>>> Today the system is no longer using ipsec, but it is still compiled in. I 
>>>>> can remove it if need be for a test
>>>>> 
>>>>> 
>>>>> src.conf
>>>>> WITHOUT_KERBEROS=yes
>>>>> WITHOUT_GSSAPI=yes
>>>>> WITHOUT_SENDMAIL=true
>>>>> WITHOUT_MAILWRAPPER=true
>>>>> WITHOUT_DMAGENT=true
>>>>> WITHOUT_GAMES=true
>>>>> WITHOUT_IPFILTER=true
>>>>> WITHOUT_UNBOUND=true
>>>>> WITHOUT_PROFILE=true
>>>>> WITHOUT_ATM=true
>>>>> WITHOUT_BSNMP=true
>>>>> #WITHOUT_CROSS_COMPILER=true
>>>>> WITHOUT_DEBUG_FILES=true
>>>>> WITHOUT_DICT=true
>>>>> WITHOUT_FLOPPY=true
>>>>> WITHOUT_HTML=true
>>>>> WITHOUT_HYPERV=true
>>>>> WITHOUT_NDIS=true
>>>>> WITHOUT_NIS=true
>>>>> WITHOUT_PPP=true
>>>>> WITHOUT_TALK=true
>>>>> WITHOUT_TESTS=true
>>>>> WITHOUT_WIRELESS=true
>>>>> #WITHOUT_LIB32=true
>>>>> WITHOUT_LPR=true
>>>>> 
>>>>>

Re: Commit 367705+367706 causes a pabic

2020-11-23 Thread Peter Blok
Kristof,

With commit 367705+367706 and if_bridge statically linked. It crashes while 
adding an epair of a jail.

With commit 367705+367706 and if_bridge dynamically loaded there is a crash at 
reboot

#0 0x8069ddc5 at kdb_backtrace+0x65
#1 0x80652c8b at vpanic+0x17b
#2 0x80652b03 at panic+0x43
#3 0x809c8951 at trap_fatal+0x391
#4 0x809c89af at trap_pfault+0x4f
#5 0x809c7ff6 at trap+0x286
#6 0x809a1ec8 at calltrap+0x8
#7 0x8079f7ed at ip_input+0x63d
#8 0x8077a07a at netisr_dispatch_src+0xca
#9 0x8075a6f8 at ether_demux+0x138
#10 0x8075b9bb at ether_nh_input+0x33b
#11 0x8077a07a at netisr_dispatch_src+0xca
#12 0x8075ab1b at ether_input+0x4b
#13 0x8077a80b at swi_net+0x12b
#14 0x8061e10c at ithread_loop+0x23c
#15 0x8061afbe at fork_exit+0x7e
#16 0x809a2efe at fork_trampoline+0xe

Peter

> On 21 Nov 2020, at 17:22, Peter Blok  wrote:
> 
> Kristof,
> 
> With a GENERIC kernel it does NOT happen. I do have a different iflib related 
> panic at reboot, but I’ll report that separately.
> 
> I brought the two config files closer together and found out that if I remove 
> if_bridge from the config file and have it loaded dynamically when the bridge 
> is created, the problem no longer happens and everything works ok.
> 
> Peter
> 
>> On 20 Nov 2020, at 15:53, Kristof Provost  wrote:
>> 
>> I still can’t reproduce that panic.
>> 
>> Does it happen immediately after you start a vnet jail?
>> 
>> Does it also happen with a GENERIC kernel?
>> 
>> Regards,
>> Kristof
>> 
>> On 20 Nov 2020, at 14:53, Peter Blok wrote:
>> 
>>> The panic with ipsec code in the backtrace was already very strange. I was 
>>> using IPsec, but only on one interface totally separate from the members of 
>>> the bridge as well as the bridge itself. The jails were not doing any ipsec 
>>> as well. Note that panic was a while ago and it was after the 1st bridge 
>>> epochification was done on stable-12 which was later backed out
>>> 
>>> Today the system is no longer using ipsec, but it is still compiled in. I 
>>> can remove it if need be for a test
>>> 
>>> 
>>> src.conf
>>> WITHOUT_KERBEROS=yes
>>> WITHOUT_GSSAPI=yes
>>> WITHOUT_SENDMAIL=true
>>> WITHOUT_MAILWRAPPER=true
>>> WITHOUT_DMAGENT=true
>>> WITHOUT_GAMES=true
>>> WITHOUT_IPFILTER=true
>>> WITHOUT_UNBOUND=true
>>> WITHOUT_PROFILE=true
>>> WITHOUT_ATM=true
>>> WITHOUT_BSNMP=true
>>> #WITHOUT_CROSS_COMPILER=true
>>> WITHOUT_DEBUG_FILES=true
>>> WITHOUT_DICT=true
>>> WITHOUT_FLOPPY=true
>>> WITHOUT_HTML=true
>>> WITHOUT_HYPERV=true
>>> WITHOUT_NDIS=true
>>> WITHOUT_NIS=true
>>> WITHOUT_PPP=true
>>> WITHOUT_TALK=true
>>> WITHOUT_TESTS=true
>>> WITHOUT_WIRELESS=true
>>> #WITHOUT_LIB32=true
>>> WITHOUT_LPR=true
>>> 
>>> make.conf
>>> KERNCONF=BHYVE
>>> MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp 
>>> if_vxlan pflog libmchain libiconv smbfs linux linux64 linux_common linuxkpi 
>>> linprocfs linsysfs ext2fs
>>> DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8
>>> OPTIONS_UNSET=DOCS NLS MANPAGES
>>> 
>>> BHYVE
>>> cpu HAMMER
>>> ident   BHYVE
>>> 
>>> makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols
>>> makeoptions WITH_CTF=1  # Run ctfconvert(1) for DTrace support
>>> 
>>> options CAMDEBUG
>>> 
>>> options SCHED_ULE   # ULE scheduler
>>> options PREEMPTION  # Enable kernel thread preemption
>>> options INET# InterNETworking
>>> options INET6   # IPv6 communications protocols
>>> options IPSEC
>>> options TCP_OFFLOAD # TCP offload
>>> options TCP_RFC7413 # TCP FASTOPEN
>>> options SCTP# Stream Control Transmission Protocol
>>> options FFS # Berkeley Fast Filesystem
>>> options SOFTUPDATES # Enable FFS soft updates support
>>> options UFS_ACL # Support for access control lists
>>> options UFS_DIRHASH # Improve performance on big directories
>>> options UFS_GJOURNAL# Enable gjournal-based UFS journaling
>

Re: Commit 367705+367706 causes a pabic

2020-11-21 Thread Peter Blok
Kristof,

With a GENERIC kernel it does NOT happen. I do have a different iflib related 
panic at reboot, but I’ll report that separately.

I brought the two config files closer together and found out that if I remove 
if_bridge from the config file and have it loaded dynamically when the bridge 
is created, the problem no longer happens and everything works ok.

Peter

> On 20 Nov 2020, at 15:53, Kristof Provost  wrote:
> 
> I still can’t reproduce that panic.
> 
> Does it happen immediately after you start a vnet jail?
> 
> Does it also happen with a GENERIC kernel?
> 
> Regards,
> Kristof
> 
> On 20 Nov 2020, at 14:53, Peter Blok wrote:
> 
>> The panic with ipsec code in the backtrace was already very strange. I was 
>> using IPsec, but only on one interface totally separate from the members of 
>> the bridge as well as the bridge itself. The jails were not doing any ipsec 
>> as well. Note that panic was a while ago and it was after the 1st bridge 
>> epochification was done on stable-12 which was later backed out
>> 
>> Today the system is no longer using ipsec, but it is still compiled in. I 
>> can remove it if need be for a test
>> 
>> 
>> src.conf
>> WITHOUT_KERBEROS=yes
>> WITHOUT_GSSAPI=yes
>> WITHOUT_SENDMAIL=true
>> WITHOUT_MAILWRAPPER=true
>> WITHOUT_DMAGENT=true
>> WITHOUT_GAMES=true
>> WITHOUT_IPFILTER=true
>> WITHOUT_UNBOUND=true
>> WITHOUT_PROFILE=true
>> WITHOUT_ATM=true
>> WITHOUT_BSNMP=true
>> #WITHOUT_CROSS_COMPILER=true
>> WITHOUT_DEBUG_FILES=true
>> WITHOUT_DICT=true
>> WITHOUT_FLOPPY=true
>> WITHOUT_HTML=true
>> WITHOUT_HYPERV=true
>> WITHOUT_NDIS=true
>> WITHOUT_NIS=true
>> WITHOUT_PPP=true
>> WITHOUT_TALK=true
>> WITHOUT_TESTS=true
>> WITHOUT_WIRELESS=true
>> #WITHOUT_LIB32=true
>> WITHOUT_LPR=true
>> 
>> make.conf
>> KERNCONF=BHYVE
>> MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp 
>> if_vxlan pflog libmchain libiconv smbfs linux linux64 linux_common linuxkpi 
>> linprocfs linsysfs ext2fs
>> DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8
>> OPTIONS_UNSET=DOCS NLS MANPAGES
>> 
>> BHYVE
>> cpu  HAMMER
>> identBHYVE
>> 
>> makeoptions  DEBUG=-g# Build kernel with gdb(1) debug symbols
>> makeoptions  WITH_CTF=1  # Run ctfconvert(1) for DTrace support
>> 
>> options  CAMDEBUG
>> 
>> options  SCHED_ULE   # ULE scheduler
>> options  PREEMPTION  # Enable kernel thread preemption
>> options  INET# InterNETworking
>> options  INET6   # IPv6 communications protocols
>> options  IPSEC
>> options  TCP_OFFLOAD # TCP offload
>> options  TCP_RFC7413 # TCP FASTOPEN
>> options  SCTP# Stream Control Transmission Protocol
>> options  FFS # Berkeley Fast Filesystem
>> options  SOFTUPDATES # Enable FFS soft updates support
>> options  UFS_ACL # Support for access control lists
>> options  UFS_DIRHASH # Improve performance on big directories
>> options  UFS_GJOURNAL# Enable gjournal-based UFS journaling
>> options  QUOTA   # Enable disk quotas for UFS
>> options  SUIDDIR
>> options  NFSCL   # Network Filesystem Client
>> options  NFSD# Network Filesystem Server
>> options  NFSLOCKD# Network Lock Manager
>> options  MSDOSFS # MSDOS Filesystem
>> options  CD9660  # ISO 9660 Filesystem
>> options  FUSEFS
>> options  NULLFS  # NULL filesystem
>> options  UNIONFS
>> options  FDESCFS # File descriptor filesystem
>> options  PROCFS  # Process filesystem (requires PSEUDOFS)
>> options  PSEUDOFS# Pseudo-filesystem framework
>> options  GEOM_PART_GPT   # GUID Partition Tables.
>> options  GEOM_RAID   # Soft RAID functionality.
>> options  GEOM_LABEL  # Provides labelization
>> options  GEOM_ELI# Disk encryption.
>> options  COMPAT_FREEBSD32# Compatible with i386 binaries
>> options  COMPAT_FREEBSD4 # Compatible with FreeBSD4
>> options  COMPAT_FREEBSD5 # Compatible with Fre

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
 are compiled in
options KDTRACE_HOOKS   # Kernel DTrace hooks
options DDB_CTF # Kernel ELF linker loads CTF data
options INCLUDE_CONFIG_FILE # Include this file in kernel

# Debugging support.  Always need this:
options KDB # Enable kernel debugger support.
options KDB_TRACE   # Print a stack trace for a panic.
options KDB_UNATTENDED

# Make an SMP-capable kernel by default
options SMP # Symmetric MultiProcessor Kernel
options EARLY_AP_STARTUP

# CPU frequency control
device  cpufreq
device  cpuctl
device  coretemp

# Bus support.
device  acpi
options ACPI_DMAR
device  pci
options PCI_IOV # PCI SR-IOV support

device  iicbus
device  iicbb

device  iic
device  ic
device  iicsmb

device  ichsmb
device  smbus
device  smb

#device jedec_dimm

# ATA controllers
device  ahci# AHCI-compatible SATA controllers
device  mvs # Marvell 
88SX50XX/88SX60XX/88SX70XX/SoC SATA

# SCSI Controllers
device  mps # LSI-Logic MPT-Fusion 2

# ATA/SCSI peripherals
device  scbus   # SCSI bus (required for ATA/SCSI)
device  da  # Direct Access (disks)
device  cd  # CD
device  pass# Passthrough device (direct ATA/SCSI 
access)
device  ses # Enclosure Services (SES and SAF-TE)
device  sg

device  cfiscsi
device  ctl # CAM Target Layer
device  iscsi

# atkbdc0 controls both the keyboard and the PS/2 mouse
device  atkbdc  # AT keyboard controller
device  atkbd   # AT keyboard
device  psm # PS/2 mouse

device  kbdmux  # keyboard multiplexer

# vt is the new video console driver
device  vt
device  vt_vga
device  vt_efifb

# Serial (COM) ports
device  uart# Generic UART driver

# PCI/PCI-X/PCIe Ethernet NICs that use iflib infrastructure
device  iflib
device  em  # Intel PRO/1000 Gigabit Ethernet Family
device  ix  # Intel PRO/10GbE PCIE PF Ethernet

# Network stack virtualization.
options VIMAGE

# Pseudo devices.
device  crypto
device  cryptodev
device  loop# Network loopback
device  random  # Entropy device
device  padlock_rng # VIA Padlock RNG
device  rdrand_rng  # Intel Bull Mountain RNG
device  ipmi
device  smbios
device  vpd
device  aesni   # AES-NI OpenCrypto module
device  ether   # Ethernet support
device  lagg
device  vlan# 802.1Q VLAN support
device  tuntap  # Packet tunnel.
device  md  # Memory "disks"
device  gif # IPv6 and IPv4 tunneling
device  firmware# firmware assist module

device  pf
#device pflog
#device pfsync

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device  bpf # Berkeley packet filter

# The `epair' device implements a virtual back-to-back connected Ethernet
# like interface pair.
device  epair

# USB support
options USB_DEBUG   # enable debug msgs
device  uhci# UHCI PCI->USB interface
device  ohci# OHCI PCI->USB interface
device  ehci# EHCI PCI->USB interface (USB 2.0)
device  xhci# XHCI PCI->USB interface (USB 3.0)
device  usb # USB Bus (required)
device  uhid
device  ukbd# Keyboard
device  umass   # Disks/Mass storage - Requires scbus 
and da
device  ums

device  filemon

device  if_bridge

> On 20 Nov 2020, at 12:53, Kristof Provost  wrote:
> 
> Can you share your kernel config file (and src.conf / make.conf if they 
> exist)?
> 
> This second panic is in the IPSec code. My current thinking is that your 
> kernel config is triggering a bug that’s manifesting in multiple places, but 
> not actually caused by those places.
> 
> I’d like to be able to reproduce it so we can debug it.
> 
> Best regards,
> Kristof
> 
> On 20 Nov 2020, at 12:02, Peter Blo

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
Hi Kristof,

This is 12-stable. With the previous bridge epochification that was backed out 
my config had a panic too.

I don’t have any local modifications. I did a clean rebuild after removing 
/usr/obj/usr

My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and nmdm.ko as 
modules. Everything else is statically linked. I have removed all drivers not 
needed for the hardware at hand.

My bridge is between two vlans from the same trunk and the jail epair devices 
as well as the bhyve tap devices.

The panic happens when the jails are starting.

I can try to narrow it down over the weekend and make the crash dump available 
for analysis.

Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0410
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80692326
stack pointer   = 0x28:0xfe00c06097b0
frame pointer   = 0x28:0xfe00c06097f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 2030 (ifconfig)
trap number = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0x80698165 at kdb_backtrace+0x65
#1 0x8064d67b at vpanic+0x17b
#2 0x8064d4f3 at panic+0x43
#3 0x809cc311 at trap_fatal+0x391
#4 0x809cc36f at trap_pfault+0x4f
#5 0x809cb9b6 at trap+0x286
#6 0x809a5b28 at calltrap+0x8
#7 0x803677fd at ck_epoch_synchronize_wait+0x8d
#8 0x8069213a at epoch_wait_preempt+0xaa
#9 0x807615b7 at ipsec_ioctl+0x3a7
#10 0x8075274f at ifioctl+0x47f
#11 0x806b5ea7 at kern_ioctl+0x2b7
#12 0x806b5b4a at sys_ioctl+0xfa
#13 0x809ccec7 at amd64_syscall+0x387
#14 0x809a6450 at fast_syscall_common+0x101




> On 20 Nov 2020, at 11:30, Kristof Provost  wrote:
> 
> On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org 
>  wrote:
>> I’m afraid the last Epoch fix for bridge is not solving the problem ( or 
>> perhaps creates a new ).
>> 
> We’re talking about the stable/12 branch, right?
> 
>> This seems to happen when the jail epair is added to the bridge.
>> 
> There must be something more to it than that. I’ve run the bridge tests on 
> stable/12 without issue, and this is a problem we didn’t see when the bridge 
> epochification initially went into stable/12.
> 
> Do you have a custom kernel config? Other patches? What exact commands do you 
> run to trigger the panic?
> 
>> kernel trap 12 with interrupts disabled
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 6; apic id = 06
>> fault virtual address= 0xc10
>> fault code   = supervisor read data, page not present
>> instruction pointer  = 0x20:0x80695e76
>> stack pointer= 0x28:0xfe00bf14e6e0
>> frame pointer= 0x28:0xfe00bf14e720
>> code segment = base 0x0, limit 0xf, type 0x1b
>>  = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = resume, IOPL = 0
>> current process  = 1686 (jail)
>> trap number  = 12
>> panic: page fault
>> cpuid = 6
>> time = 1605811310
>> KDB: stack backtrace:
>> #0 0x8069bb85 at kdb_backtrace+0x65
>> #1 0x80650a4b at vpanic+0x17b
>> #2 0x806508c3 at panic+0x43
>> #3 0x809d0351 at trap_fatal+0x391
>> #4 0x809d03af at trap_pfault+0x4f
>> #5 0x809cf9f6 at trap+0x286
>> #6 0x809a98c8 at calltrap+0x8
>> #7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
>> #8 0x80695c8a at epoch_wait_preempt+0xaa
>> #9 0x80757d40 at vnet_if_init+0x120
>> #10 0x8078c994 at vnet_alloc+0x114
>> #11 0x8061e3f7 at kern_jail_set+0x1bb7
>> #12 0x80620190 at sys_jail_set+0x40
>> #13 0x809d0f07 at amd64_syscall+0x387
>> #14 0x809aa1ee at fast_syscall_common+0xf8
> 
> This panic is rather odd. This isn’t even the bridge code. This is during 
> initial creation of the vnet. I don’t really see how this could even trigger 
> panics.
> That panic looks as if something corrupted the net_epoch_preempt, by 
> overwriting the epoch->e_epoch. The bridge patches only access this variable 
> through the well-established functions and macros. I see no obvious way that 
> they could corrupt it.
> 
> Best regards,
> Kristof



smime.p7s
Description: S/MIME cryptographic signature


Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
Hi,

I’m afraid the last Epoch fix for bridge is not solving the problem ( or 
perhaps creates a new ).

This seems to happen when the jail epair is added to the bridge.

Removing both fixes solves the problem.

Peter


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8

smime.p7s
Description: S/MIME cryptographic signature


Mount SNBv2+ Shares

2020-11-03 Thread Peter Fraser


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.2 cpuset behaves unexpected

2020-10-15 Thread Peter


After upgrading 11.4 -> 12.2, cpuset now behaves rather different:

# cpuset -C -p NNN

11.4: a new set is created with all cpu enabled, and the process
  is moved into that set, with the thread mask unchanged.
12.2: nothing is done, but an error raises if threadmask == setmask.

# cpuset -l XX -C -p NNN

11.4: a new set is created with all cpu enabled, and the process
  is moved into that set, with the thread mask changed to the
  -l parameter.
12.2: an error raises if threadmask == setmask, otherwise the
  threadmask is changed to the -l parameter.

It seems the -C option does not work anymore (except for creating
errors that appear somehow bogus).


PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.2 Firefox immediate crash "exiting due to channel error"

2020-10-15 Thread Peter


Hi all,

 I was forced to upgrade 11.4 -> 12.2, as QT5 reqires openssl 1.1.1.

I did a full rebuild from source as of this:
  12.2-RC2 FreeBSD 12.2-RC2 #11 r366648M#N1055:1078
(local patches applied - some published via sendbug 10 or 12 years ago)

I did a full rebuild of ALL ports from source, as of 2020Q4, Revision:
552058.
I verified all files in /usr/local were newly written.

Then I removed COMPAT_FREEBSD11.

Firefox (firefox-esr 78.3.1_3,1) reproducibly crashes immediate at
startup with some "exiting due to channel error". This is solved by
putting COMPAT_FREEBSD11 back in (after the better part of a day
spent with kernel builds while halving the diffs between GENERIC and
mine).

I found some comments, but they do not elaborate on the issue, e.g:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233028#c13
(that's two years ago and concerns 12.0-PRERELEASE!)

Finally I found this:
https://reviews.freebsd.org/D23100

"The Rust ecosystem currently uses pre-ino64 syscalls, so building
lang/rust without COMPAT_FREEBSD11 is not going to work."

It seems, *RUNNING* rust-built stuff w/o COMPAT11 is also not going
to work - and one wouldn't expect this (and probably search for a long
time), because removing compat switches finally before rebooting,
*AFTER* everything was rebuilt and installation verified, is just
good practice.

So, as a user I would expect to find this mentioned in some release
notes. OTOH, rust is an add-on, and so one could take the position
that base is not concerned.
But then at least ports/UPDATING should somehow mention it.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to free used Swap-Space? (from errno=8)

2020-09-22 Thread Peter
On Wed, Sep 23, 2020 at 12:03:32AM +0300, Konstantin Belousov wrote:
! On Tue, Sep 22, 2020 at 09:11:49PM +0200, Peter wrote:
! > So what happens then is this:
! > 
! > $ file scc.e
! > scc.e: ELF 32-bit LSB executable, Intel 80386, version 1
! > (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
! > for FreeBSD 9.3 (903504), stripped
! > 
! > $ ./scc.e
! > ELF interpreter /libexec/ld-elf.so.1 not found, error 8
! > Abort trap
! > 
! > And this will cost about some (hundred?) kB of swapspace every time it
! > happens. And they do not go away again, neither can the concerned jail
! > do fully die again.
! In what sense it 'costs' ?

Well that amount memory gets occupied. Forever, that is, until
poweroff/reset.

! Can you show exact sequence of commands and outputs that demostrate your
! point ?  What type of filesystem the binaries live on ?

Oh, I didn't care. Originally on ZFS. When I tried to reproduce it,
most likely on an NFS-4 share, as I didn't bother to put it anywhere
special.

! I want to reproduce it locally.

Yes that's great! Lets see which info You are lacking.
Here we are now on my desktop box (mostly same machine, same
configuration, i5-3570, 11.4-p3, amd64).

I explicitely removed all the files that do not get installed
when /etc/src.conf contains the "WITHOUT_LIB32=", but I have the
COMPAT_FREEBSD32 still in the kernel.

Now I fetch such an old R9.3/i386 binary from my backups, and
drop it into some NFS filesystem:
(That binary is only 4kB, I just attach it here, if you wanna try
you can straightaway use that one - in normal operation it just
converts some words stdin to stdout).

admin@disp:510:1/ext/Repos$ dir usr2sys
-rwxr-xr-x   1 bin   bin4316 Apr  7  2016 usr2sys
admin@disp:511:1/ext/Repos$ file usr2sys 
usr2sys: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), 
dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 9.3 (903504), 
stripped
admin@disp:513:1/ext/Repos$ mount | grep Repos
edge-e:/ext/Repos on /ext/Repos (nfs, nfsv4acls)
admin@disp:514:1/ext/Repos$ top | cat
Mem: 952M Active, 1687M Inact, 419M Laundry, 4423M Wired, 774M Buf, 348M Free
ARC: 1940M Total, 1378M MFU, 172M MRU, 2492K Anon, 48M Header, 340M Other
 1134M Compressed, 2749M Uncompressed, 2.43:1 Ratio
Swap: 20G Total, 36M Used, 20G Free

As we see, this machine has 8 Gig installed and currently about no swap
used. Now watch what happens:
epos$ ./usr2sys 
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap

admin@disp:519:1/ext/Repos$ for i in `seq 1000`
> do ./usr2sys
> done
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap
...

admin@disp:514:1/ext/Repos$ top | cat
Mem: 1010M Active, 1807M Inact, 419M Laundry, 4523M Wired, 774M Buf, 69M Free
ARC: 1940M Total, 1383M MFU, 166M MRU, 2503K Anon, 48M Header, 340M Other
 1134M Compressed, 2750M Uncompressed, 2.43:1 Ratio
Swap: 20G Total, 36M Used, 20G Free

The free memory has already disappeared!

admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done
...

admin@disp:522:1/ext/Repos$ top | cat
Mem: 2154M Active, 78M Inact, 787M Laundry, 4722M Wired, 774M Buf, 89M Free
ARC: 1753M Total, 1273M MFU, 97M MRU, 2653K Anon, 39M Header, 340M Other
 953M Compressed, 2445M Uncompressed, 2.56:1 Ratio
Swap: 20G Total, 358M Used, 20G Free, 1% Inuse

Now the swapspace starts filling.
Lets see if the placement filesystem makes any difference and go onto UFS:

admin@disp:525:1/ext/Repos$ su -
Password:
root@disp:~ # cp /ext/Repos/usr2sys /var
root@disp:~ # dir /var/usr2sys 
-rwxr-xr-x  1 bin  bin  4316 Sep 22 23:55 /var/usr2sys
root@disp:~ # mount | grep /var
/dev/ada0p5 on /var (ufs, local, soft-updates)

admin@disp:527:1/var$ ./usr2sys 
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap

admin@disp:521:1/ext/Repos$ for i in `seq 5000`; do ./usr2sys ; done
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap
...

Ahh, that runs a LOT faster now than on the NFS!

admin@disp:529:1/var$ top | cat
Mem: 1546M Active, 67M Inact, 934M Laundry, 5121M Wired, 774M Buf, 161M Free
ARC: 1646M Total, 1159M MFU, 107M MRU, 2686K Anon, 37M Header, 340M Other
 849M Compressed, 2257M Uncompressed, 2.66:1 Ratio
Swap: 20G Total, 1658M Used, 18G Free, 8% Inuse

But memory leakage is similar to worse.

admin@disp:530:1/var$ df tmp
Filesystem1K-blocks   UsedAvail Capacity  Mounted on
zdesk/var/tmp  24747504 231052 24516452 1%/var/tmp
admin@disp:531:1/var$ cp usr2sys tmp
admin@disp:532:1/var$ cd tmp
admin@disp:533:1/var/tmp$ ./usr2sys 
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap
admin@disp:534:1/var/tmp$ for i in `seq 5000`; do ./usr2sys ; done
...

You can see this is now a ZFS, and the behaviour is basically the same:

Mem: 1497M Active, 5292K Inact, 803M Laundry, 5313M Wired, 774M Buf, 212M Free
ARC: 1432M Total, 963M MFU, 105M MRU, 2511K Anon, 21M Header, 341M Other
 6

Re: How to free used Swap-Space? (from errno=8)

2020-09-22 Thread Peter


I think I can reproduce the problem now. See below.

On Tue, Sep 22, 2020 at 02:09:01PM -0400, Mark Johnston wrote:
! On Tue, Sep 22, 2020 at 07:31:07PM +0200, Peter wrote:
! > There is something, and I don't know who owns that:
! > $ vmstat -m | grep shmfd
! > shmfd1314K   -  473  64,256,1024,8192
! > 
! > But that doesn't look big either.
! 
! That is just the amount of kernel memory used to track a set of objects,
! not the actual object sizes.  Unfortunately, in 11 I don't think there's
! any way to enumerate them other than running kgdb and examining the
! shm_dictionary hash table.

One of the owners of this is also postgres (maybe among others).

! I think I see a possible problem in i915, though I'm not sure if you'd
! trigger it just by using vt(4).  It should be fixed in later FreeBSD
! versions, but is still a problem in 11.  Here's a (untested) patch:

Thank You, I'll keep that one in store, just in case.

But now I found something simpler, while tracking error messages that
came into my glance alongside:

When patching to 11.4-p3, I had been reluctant to recompile lib32 and
install that everywhere, and had kicked it off the systems.
And obviousely, I had missed to recompile some of my old self-written
binaries and they were still i386 and were called by various scripts.

So what happens then is this:

$ file scc.e
scc.e: ELF 32-bit LSB executable, Intel 80386, version 1
(FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1,
for FreeBSD 9.3 (903504), stripped

$ ./scc.e
ELF interpreter /libexec/ld-elf.so.1 not found, error 8
Abort trap

And this will cost about some (hundred?) kB of swapspace every time it
happens. And they do not go away again, neither can the concerned jail
do fully die again.

So, maybe, when removing the lib32 & friends from the system, one
must also remove the "options COMPAT_FREEBSD32" from the kernel, so
that it might not try to run that binary, and maybe that would avoid
the issue. (But then, what if one uses lib32 only in *some* jails?
Some evil user in another jail can then bring along an i386 binary
and crash the system by bloating the mem.)

Anyway, my problem is now solved; as I needed these binaries back in
working order anyway. 


regards,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to free used Swap-Space?

2020-09-22 Thread Peter
On Tue, Sep 22, 2020 at 12:33:19PM -0400, Mark Johnston wrote:

! On Tue, Sep 22, 2020 at 06:08:01PM +0200, Peter wrote:

! >  my machine should use about 3-4, maybe 5 GB swapspace. Today I found
! > it suddenly uses 8 GB (which is worryingly near the configured 10G).
! > 
! > I stopped all the big suckers - nothing found.
! > I stopped all the jails - no success.
! > I brought it down to singleuser: it tried to swapoff, but failed.
! > 
! > I unmounted all filesystems, exported all pools, detached all geli,
! > and removed most of the netgraphs. Swap is still occupied.
! > ! > Machine is now running only the init and a shell processes, has
! > almost no filesystems mounted, has mostly native networks only, and
! > this still occupies 3 GB of swap which cannot be released.
! > 
! > What is going on, what is doing this, and how can I get this swapspace
! > released??
! 
! Do you have any shared memory segments lingering?  ipcs -a will show
! SysV shared memory usage.

I have four small shmem segments from four postgres clusters running.
These should cleanly disappear when the clusters are stopped, and
they are very small.

Shared Memory:
T   ID  KEY MODEOWNERGROUPCREATOR  CGROUP   
  NATTCHSEGSZ CPID LPID ATIMEDTIMECTIME   
m65536  5432001 --rw--- postgres postgres postgres postgres 
   7   48 4793 4793  6:09:34 18:00:31  6:09:34
m655370 --rw--- postgres postgres postgres postgres 
  11   48 6268 6268  6:09:42 10:48:27  6:09:42
m655380 --rw--- postgres postgres postgres postgres 
   5   48 6968 6968  6:09:46 18:28:36  6:09:46
m655390 --rw--- postgres postgres postgres postgres 
   6   48 6992 6992  6:09:47  3:38:34  6:09:47

! For POSIX shared memory, in 11.4 we do not
! have any good way of listing objects, but "vmstat -m | grep shmfd" will
! at least show whether any are allocated.

There is something, and I don't know who owns that:
$ vmstat -m | grep shmfd
shmfd1314K   -  473  64,256,1024,8192

But that doesn't look big either.

Furthermore, this machine is running for quite some time already; it
was running as i386 (with ZFS) until very recently, and I know quite
well what is using much memory: these 3 GB were illegitimate; they
came from nothing I did install. And they are new; this has not
happened before.

! If those don't turn anything
! up then it's possible that there's a swap leak.  Do you use any DRM
! graphics drivers on this system?

Probably yes. There is no graphics used at all; it just uses "device
vt" in text mode, but it uses i5-3570T CPU (IvyBridge HD2500) graphics
for that, and the driver is "drm2" and "i915drm" from /usr/src/sys (not
those from ports).
Not sure how that would account for 3 GB, unless there is indeed some
leak.

regards,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


How to free used Swap-Space?

2020-09-22 Thread Peter
Hi all,

 my machine should use about 3-4, maybe 5 GB swapspace. Today I found
it suddenly uses 8 GB (which is worryingly near the configured 10G).

I stopped all the big suckers - nothing found.
I stopped all the jails - no success.
I brought it down to singleuser: it tried to swapoff, but failed.

I unmounted all filesystems, exported all pools, detached all geli,
and removed most of the netgraphs. Swap is still occupied.

Machine is now running only the init and a shell processes, has
almost no filesystems mounted, has mostly native networks only, and
this still occupies 3 GB of swap which cannot be released.

What is going on, what is doing this, and how can I get this swapspace
released??

It is 11.4-RELEASE-p3 amd64.


Script started on Mon Sep 21 05:43:20 2020
root@edge# ps axlww
UID   PID  PPID CPU PRI NI  VSZ  RSS MWCHAN   STAT TT TIME COMMAND
  0 0 0   0 -16  00  752 swapin   DLs   -291:32.41 [kernel]
  0 1 0   0  20  0 5416  248 wait ILs   -  0:00.22 /sbin/init --
  0 2 0   0 -16  00   16 ftcl DL-  0:00.00 [ftcleanup]
  0 3 0   0 -16  00   16 crypto_w DL-  0:00.00 [crypto]
  0 4 0   0 -16  00   16 crypto_r DL-  0:00.00 [crypto 
returns]
  0 5 0   0 -16  00   32 -DL- 11:41.94 [cam]
  0 6 0   0  -8  00   80 t->zthr_ DL- 13:07.13 [zfskern]
  0 7 0   0 -16  00   16 waiting_ DL-  0:00.00 
[sctp_iterator]
  0 8 0   0 -16  00   16 -DL-  2:05.20 
[rand_harvestq]
  0 9 0   0 -16  00   16 -DL-  0:00.04 [soaiod1]
  010 0   0 155  00   64 -RNL   -  17115:06.48 [idle]
  011 0   0 -52  00  352 -WL- 49:05.30 [intr]
  012 0   0 -16  00   64 sleepDL- 16:28.51 [ng_queue]
  013 0   0  -8  00   48 -DL- 23:10.60 [geom]
  014 0   0 -16  00   16 seqstate DL-  0:00.00 [sequencer 
00]
  015 0   0 -68  00  160 -DL-  0:23.64 [usb]
  016 0   0 -16  00   16 -DL-  0:00.04 [soaiod2]
  017 0   0 -16  00   16 -DL-  0:00.04 [soaiod3]
  018 0   0 -16  00   16 -DL-  0:00.04 [soaiod4]
  019 0   0 -16  00   16 idle DL-  0:00.83 [enc_daemon0]
  020 0   0 -16  00   48 psleep   DL- 12:07.72 [pagedaemon]
  021 0   0  20  00   16 psleep   DL-  4:12.41 [vmdaemon]
  022 0   0 155  00   16 pgzero   DNL   -  0:00.00 [pagezero]
  023 0   0 -16  00   64 psleep   DL-  0:23.50 [bufdaemon]
  024 0   0  20  00   16 -DL-  0:04.21 
[bufspacedaemon]
  025 0   0  16  00   16 syncer   DL-  0:32.48 [syncer]
  026 0   0 -16  00   16 vlruwt   DL-  0:02.31 [vnlru]
  027 0   0 -16  00   16 -DL-  7:11.58 [racctd]
  0   157 0   0  20  00   16 geli:w   DL-  0:22.03 [g_eli[0] 
ada1p2]
  0   158 0   0  20  00   16 geli:w   DL-  0:22.77 [g_eli[1] 
ada1p2]
  0   159 0   0  20  00   16 geli:w   DL-  0:31.08 [g_eli[2] 
ada1p2]
  0   160 0   0  20  00   16 geli:w   DL-  0:29.41 [g_eli[3] 
ada1p2]
  0 70865 1   0  20  0 7076 3104 wait Ss   v0  0:00.21 -sh (sh)
  0 71135 70865   0  20  0 6392 2308 select   S+   v0  0:00.00 script
  0 71136 71135   0  23  0 7076 3068 wait Ss0  0:00.00 /bin/sh -i
  0 71142 71136   0  23  0 6928 2584 -R+0  0:00.00 ps axlww

root@edge# df
Filesystem  512-blocksUsed   Avail Capacity  Mounted on
/dev/ada3p31936568  860864  92078448%/
devfs2   2   0   100%/dev
procfs   8   8   0   100%/proc
/dev/ada3p43099192 1184896 166636842%/usr
/dev/ada3p5 5803448112  525808 2%/var

root@edge# pstat -s
Device  512-blocks UsedAvail Capacity
/dev/ada1p2.eli   10485760  5839232  464652856%

root@edge# top | cat
last pid: 71147;  load averages:  0.19,  0.08,  0.09  up 3+03:21:0005:44:12
5 processes:1 running, 4 sleeping

Mem: 9732K Active, 10M Inact, 882M Laundry, 1920M Wired, 10M Buf, 1023M Free
ARC: 335K Total, 16K MFU, 304K MRU, 15K Header
 320K Compressed, 2944K Uncompressed, 9.20:1 Ratio
Swap: 5120M Total, 2851M Used, 2269M Free, 55% Inuse


  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIMEWCPU COMMAND
70865 root  1  200  7076K  3104K wait2   0:00   0.00% sh
71135 root  1  200  6392K  2308K select  1   0:00   0.00% script
71136 root  1  200  7076K  3068K wait2   0:00   0.00% sh
71146 root  1  200  7928K  2980K CPU00   0:00   0.00% top
71147 root  1  200  6300K  2088K piperd  1   0:00   0.00% 

Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-08-24 Thread Peter Jeremy via freebsd-stable
TL;DR: Ensure you explicitly destroy all ZFS labels on disused root pools.

On 2020-Jul-19 21:21:02 +1000, Peter Jeremy  wrote:
>I'm sending this to -stable, rather than the src groups because I
>don't believe the problem is the commit itself, rather the commit
>has uncovered a latent problem elsewhere.
>
>On 2020-Jul-01 18:03:38 +, Michael Tuexen  wrote:
>>Author: tuexen
>>Date: Wed Jul  1 18:03:38 2020
>>New Revision: 362848
>>URL: https://svnweb.freebsd.org/changeset/base/362848
>>
>>Log:
>>  MFC r353480: Use event handler in SCTP
>
>I have no idea how, but this update breaks booting amd64 for me (r362847
>works and this doesn't).  I have a custom kernel with ZFS but no SCTP so I
>have no real idea how this could break booting - presumably the
>eventhandler change has uncovered a bug somewhere else.

To close the loop on this, the problem was a combination of:
* changes in GEOM provider ordering;
* insufficient checks when ZFS is looking for the root pool;
* my system having remnants of a disused pool with the same name as the root 
poop.

It seems that the order of GEOM providers is relatively unstable - even
including a device, that doesn't physically exist, in a kernel can change
the provider order.  Presumably r362848 also resulted in a change in order.

During a root-on-ZFS boot, the kernel scans all providers, looking for ZFS
labels with a pool name matching the root pool.  Only minimal checks are
performed, in particular, there's no check that it's a valid pool, and the
first such label found is assumed to describe the root pool.

In my case, some time ago, I'd moved things around on my boot disk.  My old
root pool went to the end of the physical disk but I'd decided to shrink it
and left some free space at the end of the disk.  This meant that ZFS found
one (out of 4) labels when it tasted the physical disk and if GEOM sorted
the physical disk prior to its partitions then ZFS would use the pool GUIDs
from the stray label on the physical disk and then fail to find a usable
pool matching those GUIDs.  My fix was to zero the end of my disk.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Commit 364003 causes immediate restart

2020-08-09 Thread Peter Blok
Hi Alexander,

No problem, this happens and it doesn’t impact me. I try to stay up to date and 
when there is a regression I rollback and narrow it down as an early warning.

Your commit fixed the problem in 364020.

Next time I have a crash and have narrowed it down to a specific commit I’ll 
add the committer again.

Peter



> On 8 Aug 2020, at 21:36, Alexander Motin  wrote:
> 
> Peter,
> 
> I am sorry, there appeared to be a merge mistake in commit 364003. It was 
> fixed in 364020.  Have you tried it? Do you still have a problem after it? If 
> so, please add any more details. 'acpidump -t' would be a good start, same as 
> last output before reboot.
> 
> PS: I don't know about "different committer", but I read mail all the time 
> and appreciate related problem reports.
> 
> 
> On Sat, Aug 8, 2020, 1:49 AM  <mailto:peter.b...@bsd4all.org>> wrote:
> Last time I ( a week ago or so ) I did for another crash and wasn’t getting 
> any response from (a different) committer.
> 
> 
> > On 7 Aug 2020, at 21:48, Konstantin Belousov  > <mailto:kostik...@gmail.com>> wrote:
> > 
> > On Fri, Aug 07, 2020 at 03:09:00PM +0200, peter.b...@bsd4all.org 
> > <mailto:peter.b...@bsd4all.org> wrote:
> >> Hi,
> >> 
> >> After commit 364003 STABLE-12 reboots almost immediately. No error 
> >> message, not dump. Just a reboot.
> >> 
> >> Last working commit 364002.
> >> 
> >> Please let me know what is needed - acpidump or something like that.
> > Why did not you added the committer to Cc: ?
> > ___
> > freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> > <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> > <mailto:freebsd-stable-unsubscr...@freebsd.org>"
> 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Commit 364003 causes immediate restart

2020-08-07 Thread Peter Blok
Hi David,

I was at r346011 when I encountered the problem the first time, so 346007 was 
included. Went back to 346002 => ok. Added 346003 and it failed again.

Peter



> On 7 Aug 2020, at 16:23, David Wolfskill  wrote:
> 
> On Fri, Aug 07, 2020 at 03:09:00PM +0200, peter.b...@bsd4all.org wrote:
>> Hi,
>> 
>> After commit 364003 STABLE-12 reboots almost immediately. No error message, 
>> not dump. Just a reboot.
>> 
>> Last working commit 364002.
>> 
>> Please let me know what is needed - acpidump or something like that.
>> 
> 
> I'm not sure what's needed to diagnose that, but I built:
> 
> FreeBSD g1-55.catwhisker.org 12.1-STABLE FreeBSD 12.1-STABLE #776 
> r364007M/364009: Fri Aug  7 03:36:03 PDT 2020 
> r...@g1-55.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY  
> amd64 1201522 1201522
> 
> on my laptop this morning, and am running it now without issue.
> 
> So it's possible that updating to r364007 would help.
> 
> (My previous snapshot was:
> 
> FreeBSD g1-55.catwhisker.org 12.1-STABLE FreeBSD 12.1-STABLE #775 
> r363947M/363947: Thu Aug  6 03:34:53 PDT 2020 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY  
> amd64 1201522 1201522
> 
> in case that is of use.)
> 
> Peace,
> david
> -- 
> David H. Wolfskillda...@catwhisker.org
> "Knowingly failing to do his job is a hallmark of this presidency and
> we're all less safe because of it." -- Samantha Vinograd
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Commit 364003 causes immediate restart

2020-08-07 Thread peter . blok
Hi,

After commit 364003 STABLE-12 reboots almost immediately. No error message, not 
dump. Just a reboot.

Last working commit 364002.

Please let me know what is needed - acpidump or something like that.

Peter

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Crash in stable 363430 and higher

2020-07-26 Thread peter . blok
Hi,

I’m getting the following crash during startup. It seems strongswan is setting 
a reqid.

Commit r363430 is on if_bridge. The IPSec interfaces are not bridged at all, so 
I’m clueless to why this crash relates to this commit. The only commonality is 
that the crash is Epoch related and the commit as well.

(kgdb) list
418  * Propagate our priority to any other waiters to 
prevent us
419  * from starving them. They will have their original 
priority
420  * restore on exit from epoch_wait().
421  */
422 curwaittd = tdwait->et_td;
423 if (!TD_IS_INHIBITED(curwaittd) && 
curwaittd->td_priority > td->td_priority) {
424 critical_enter();
425 thread_unlock(td);
426 thread_lock(curwaittd);
427 sched_prio(curwaittd, td->td_priority);
(kgdb) p/x tdwait
$3 = 0xfe0075dca778
(kgdb) p/x tdwait->et_td
$4 = 0x806
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0x8064d335 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x8064d773 in vpanic (fmt=, ap=) at 
/usr/src/sys/kern/kern_shutdown.c:880
#4  0x8064d593 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x809cc3d1 in trap_fatal (frame=0xfe00c8a0e6f0, eva=3094) at 
/usr/src/sys/amd64/amd64/trap.c:925
#6  0x809cc42f in trap_pfault (frame=0xfe00c8a0e6f0, 
usermode=, signo=, ucode=) at 
/usr/src/sys/amd64/amd64/trap.c:743
#7  0x809cba76 in trap (frame=0xfe00c8a0e6f0) at 
/usr/src/sys/amd64/amd64/trap.c:407
#8  
#9  epoch_block_handler_preempt (global=, cr=, 
arg=) at /usr/src/sys/kern/subr_epoch.c:423
#10 0x803677fd in epoch_block (global=0xf800020be600, 
cr=0xfe0075db9a00, cb=0x80692320 , 
ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#11 ck_epoch_synchronize_wait (global=0xf800020be600, cb=, 
ct=) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#12 0x806921da in epoch_wait_preempt (epoch=0xf800020be600) at 
/usr/src/sys/kern/subr_epoch.c:513
#13 0x80761687 in ipsec_set_reqid (sc=0xf8004261e200, reqid=103) at 
/usr/src/sys/net/if_ipsec.c:964
#14 ipsec_ioctl (ifp=, cmd=, data=) at /usr/src/sys/net/if_ipsec.c:764
#15 0x807527ef in ifioctl (so=0xf8011d766000, cmd=2149607841, 
data=0xfe00c8a0ea10 "btcd", td=) at 
/usr/src/sys/net/if.c:3147
#16 0x806b5f47 in fo_ioctl (fp=0xf800194846e0, com=2149607841, 
data=0x0, active_cred=0x0, td=0xf80122379740) at /usr/src/sys/sys/file.h:337
#17 kern_ioctl (td=0x80692320 , 
fd=, com=2149607841, data=0x0) at 
/usr/src/sys/kern/sys_generic.c:805
#18 0x806b5bea in sys_ioctl (td=0xf80122379740, 
uap=0xf80122379b00) at /usr/src/sys/kern/sys_generic.c:713
#19 0x809ccf87 in syscallenter (td=0xf80122379740) at 
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#20 amd64_syscall (td=0xf80122379740, traced=0) at 
/usr/src/sys/amd64/amd64/trap.c:1167
#21 
#22 0x00080044e0da in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffe1a8

Any pointers?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-07-21 Thread Peter Jeremy
On 2020-Jul-21 00:47:23 +0300, Konstantin Belousov  wrote:
>On Tue, Jul 21, 2020 at 07:20:44AM +1000, Peter Jeremy wrote:
>> On 2020-Jul-19 14:48:28 +0300, Konstantin Belousov  
>> wrote:
>> >On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote:
>> >> The symptoms are that I get:
>> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 
>> >> more seconds
>> >> Mounting from zfs:zroot/ROOT/r363310 failed with error 6
>> >> 
>> >> (r363310 is where I was trying to update to and I didn't change the BE
>> >> name as I was searching for the problem and error 6 is ENXIO).
>> >> 
>> >> I tried to reproduce the problem with GENERIC but it hangs after
>> >> displaying the EFI framebuffer information (I've seen that before and
>> >> suspect it is a loader problem but haven't dug into it).
>> 
>> I've confirmed that particular problem is bug 209821.  I've disabled
>> EFI and GENERIC r362848 boots and runs successfully.
>Did you mis-typed the PR number ?   The referenced bug talks about very
>early hang, while your report said that kernel boots up to the point of
>mounting root.

My failure was with a custom kernel.  Once I narrowed the problem to a
commit that seemed unrelated to my problem, I tried to boot a GENERIC
kernel at r362848.  The GENERIC kernel boot failed much earlier due to
the EFI problem documented in PR 209821.  When I disabled EFI, then
the GENERIC kernel worked, showing that my problem was due to my custom
kernel.

>> Since GENERIC worked, I did some more experimenting and tracked the
>> problem down to a lack of "options ACPI_DMAR" in my kernel config.
>> That makes more sense, though I have no idea why it suddenly became
>> mandatory for my system.
>No, this does not make too much sense either, since DMAR is disabled
>by default.  Did you enabled it ?

"options ACPI_DMAR" has been in GENERIC since you first submitted the
DMAR code was in r257251.  I haven't ever set the hw.dmar.enable=1
loader tunable but it's not at all obvious that a kernel built without
"options ACPI_DMAR" is functionally equivalent to a kernel that has
DMAR compiled in but disabled - there's a lot of IOMMU manipulation
code that is purely conditional on ACPI_DMAR.

That said, I'm not using virtualisation and haven't actually enabled
DMAR in the loader so I suspect that I've only masked the real issue.
I currently have INVARIANTS and WITNESS but will look into some of the
more extensive debugging options.

(It looks like I missed the addition of "options ACPI_DMAR" when I was
updating my custom kernel config with the differences between r250963
and r259512 about 8 years ago, and it hasn't caused any obvious
problems until now.  Obviously, I need to do a more careful review of
my custom kernel config against GENERIC/NOTES).

>BTW, you are using stable, right ?  There were some code reorganization
>commits in HEAD moving DMAR code around, but they were not merged to
>stable.

I'm using 12-STABLE.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-07-20 Thread Peter Jeremy
On 2020-Jul-19 14:48:28 +0300, Konstantin Belousov  wrote:
>On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote:
>> I'm sending this to -stable, rather than the src groups because I
>> don't believe the problem is the commit itself, rather the commit
>> has uncovered a latent problem elsewhere.
>> 
>> On 2020-Jul-01 18:03:38 +, Michael Tuexen  wrote:
>> >Author: tuexen
>> >Date: Wed Jul  1 18:03:38 2020
>> >New Revision: 362848
>> >URL: https://svnweb.freebsd.org/changeset/base/362848
>> >
>> >Log:
>> >  MFC r353480: Use event handler in SCTP
>> 
>> I have no idea how, but this update breaks booting amd64 for me (r362847
>> works and this doesn't).  I have a custom kernel with ZFS but no SCTP so I
>> have no real idea how this could break booting - presumably the
>> eventhandler change has uncovered a bug somewhere else.
>> 
>> The symptoms are that I get:
>> Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 
>> more seconds
>> Mounting from zfs:zroot/ROOT/r363310 failed with error 6
>> 
>> (r363310 is where I was trying to update to and I didn't change the BE
>> name as I was searching for the problem and error 6 is ENXIO).
>> 
>> I tried to reproduce the problem with GENERIC but it hangs after
>> displaying the EFI framebuffer information (I've seen that before and
>> suspect it is a loader problem but haven't dug into it).

I've confirmed that particular problem is bug 209821.  I've disabled
EFI and GENERIC r362848 boots and runs successfully.

>> Does anyone have any ideas?
>
>Did you checked that the physical devices where your ZFS pool is located,
>are detected, and that kernel messages for their drivers are as usual ?
>Overall, is there anything strange in the verbose dmesg ?

There's nothing obviously strange (in particular, I can see the physical
boot/root disk) but the faulty kernel appears to have moved the msgbuf
somewhere unexpected so it's not saved across reboots and I'm limited to
eyeballing the messages via DDB.

Since GENERIC worked, I did some more experimenting and tracked the
problem down to a lack of "options ACPI_DMAR" in my kernel config.
That makes more sense, though I have no idea why it suddenly became
mandatory for my system.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: svn commit: r362848 - in stable/12/sys: net netinet sys

2020-07-19 Thread Peter Jeremy
I'm sending this to -stable, rather than the src groups because I
don't believe the problem is the commit itself, rather the commit
has uncovered a latent problem elsewhere.

On 2020-Jul-01 18:03:38 +, Michael Tuexen  wrote:
>Author: tuexen
>Date: Wed Jul  1 18:03:38 2020
>New Revision: 362848
>URL: https://svnweb.freebsd.org/changeset/base/362848
>
>Log:
>  MFC r353480: Use event handler in SCTP

I have no idea how, but this update breaks booting amd64 for me (r362847
works and this doesn't).  I have a custom kernel with ZFS but no SCTP so I
have no real idea how this could break booting - presumably the
eventhandler change has uncovered a bug somewhere else.

The symptoms are that I get:
Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 more 
seconds
Mounting from zfs:zroot/ROOT/r363310 failed with error 6

(r363310 is where I was trying to update to and I didn't change the BE
name as I was searching for the problem and error 6 is ENXIO).

I tried to reproduce the problem with GENERIC but it hangs after
displaying the EFI framebuffer information (I've seen that before and
suspect it is a loader problem but haven't dug into it).

Does anyone have any ideas?

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: swap space issues

2020-06-30 Thread Peter Jeremy
On 2020-Jun-28 12:33:21 -0700, Donald Wilde  wrote:
>On 6/28/20, Donald Wilde  wrote:
>> On 6/27/20, Donald Wilde  wrote:
>>> 'spinning rust'  for a disk. My loader.conf has
>>> kern.maxswzone=420 and ccache is fully active and working for both
>>> root on tcsh and users on sh.

Based on my calculations, that maxswzone is good for just under 1GB
swap.  What do you see have for vm.swap_maxpages and vm.swzone?

>> Synth is still crashing hard, same issue.
>An update. Synth still crashed with one swap zone of 16GB.

What do you mean by "swap zone"?  Do you mean you have one 16GB swap
device?

>stack overflow. As I say, there was no warning. Everything was fine,
>then memory usage went through the roof!

I've just tried building llvm80 via ports[1] on my laptop, using the
same options as you.  I have 4GB RAM and 4GB swap with system defaults
and had no problems with an 8-way build.  The highest swap usage I
noticed was <500MB.  I suspect your problems are related to either
ccache or synth.

>The second one, hopefully, contains every log up to the one that
>crashed and hopefully also the beginning of that task. As I say, ONE
>builder and ONE task, after a reboot. LLVM80 was the only builder
>input.

"one builder and one task" - these are presumably synth terms since
they aren't standard ports building terms.  You should be able to
do a single-theaded build of llvm80 in 4GB RAM without problems.

That said, I notice that the first log file suggests you were building
3 ports in parallel, and each port build was running 3 jobs - that's 9
jobs in parallel on a low-spec CPU with 4 threads.  You should limit
the number of CPU-bound processes to the number of CPU threads you have.

[1] cd /usr/ports/devel/llvm80 && make
-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: swap space issues

2020-06-26 Thread Peter Jeremy
On 2020-Jun-25 11:30:31 -0700, Donald Wilde  wrote:
>Here's 'pstat -s' on the i3 (which registers as cpu HAMMER):
>
>Device  1K-blocks UsedAvail Capacity
>/dev/ada0s1b 335544320 33554432 0%
>/dev/ada0s1d 335544320 33554432 0%
>Total671088640 67108864 0%

I strongly suggest you don't have more than one swap device on spinning
rust - the VM system will stripe I/O across the available devices and
that will give particularly poor results when it has to seek between the
partitions.

Also, you can't actually use 64GB swap with 4GB RAM.  If you look back
through your boot messages, I expect you'll find messages like:
warning: total configured swap (524288 pages) exceeds maximum recommended 
amount (498848 pages).
warning: increase kern.maxswzone or reduce amount of swap.
or maybe:
WARNING: reducing swap size to maximum of MB per unit

The absolute limit on swap space is vm.swap_maxpages pages but the realistic
limit is about half that.  By default the realistic limit is about 4×RAM (on
64-bit architectures), but this can be adjusted via kern.maxswzone (which
defines the #bytes of RAM to allocate to swzone structures - the actual
space allocated is vm.swzone).

As a further piece of arcana, vm.pageout_oom_seq is a count that controls
the number of passes before the pageout daemon gives up and starts killing
processes when it can't free up enough RAM.  "out of swap space" messages
generally mean that this number is too low, rather than there being a
shortage of swap - particularly if your swap device is rather slow.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: CFT: if_bridge performance improvements

2020-04-22 Thread peter . blok
Just using pf is enough to provoke this panic. I had the same back trace. This 
patch from Kristof fixed it for me.

diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
index 373fa096d70..83c453090bb 100644
--- a/sys/net/if_bridge.c
+++ b/sys/net/if_bridge.c
@@ -2529,7 +2529,6 @@ bridge_input(struct ifnet *ifp, struct mbuf *m)
 OR_PFIL_HOOKED_INET6)) {   \
if (bridge_pfil(, NULL, ifp,  \
PFIL_IN) != 0 || m == NULL) {   \
-   BRIDGE_UNLOCK(sc);  \
return (NULL);  \
}   \
eh = mtod(m, struct ether_header *);\


> On 22 Apr 2020, at 18:15, Xin Li  wrote:
> 
> On 4/22/20 01:45, Kristof Provost wrote:
>> On 22 Apr 2020, at 10:20, Xin Li wrote:
>>> Hi,
>>> 
>>> On 4/14/20 02:51, Kristof Provost wrote:
 Hi,
 
 Thanks to support from The FreeBSD Foundation I’ve been able to work on
 improving the throughput of if_bridge.
 It changes the (data path) locking to use the NET_EPOCH infrastructure.
 Benchmarking shows substantial improvements (x5 in test setups).
 
 This work is ready for wider testing now.
 
 It’s under review here: https://reviews.freebsd.org/D24250
 
 Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
 Patches for stable/12:
 https://people.freebsd.org/~kp/if_bridge/stable_12/
 
 I’m not currently aware of any panics or issues resulting from these
 patches.
>>> 
>>> I have observed the following panic with latest stable/12 after applying
>>> the stable_12 patchset, it appears like a race condition related NULL
>>> pointer deference, but I haven't took a deeper look yet.
>>> 
>>> The box have 7 igb(4) NICs, with several bridge and VLAN configured
>>> acting as a router.  Please let me know if you need additional
>>> information; I can try -CURRENT as well, but it would take some time as
>>> the box is relatively slow (it's a ZFS based system so I can create a
>>> separate boot environment for -CURRENT if needed, but that would take
>>> some time as I might have to upgrade the packages, should there be any
>>> ABI breakages).
>>> 
>> Thanks for the report. I don’t immediately see how this could happen.
>> 
>> Are you running an L2 firewall on that bridge by any chance? An earlier
>> version of the patch had issues with a stray unlock in that code path.
> 
> I don't think I have a L2 firewall (I assume means filtering based on
> MAC address like what can be done with e.g. ipfw?  The bridges were
> created on vlan interfaces though, do they count as L2 firewall?), the
> system is using pf with a few NAT rules:
> 
> $ sudo pfctl -s rules
> anchor "miniupnpd" all
> pass in quick inet6 proto tcp from  to any flags S/SA keep state
> block drop in quick inet6 proto tcp from !  to  flags S/SA
> block drop in quick proto tcp from any os "Linux" to any port = ssh
> pass out on igb6 inet proto tcp from (igb6) to any port = domain flags
> S/SA keep state queue dns
> pass out on igb6 inet proto udp from (igb6) to any port = domain keep
> state queue dns
> pass in on igb6 proto tcp from any to (igb6) port = http flags S/SA
> modulate state queue(web, ack)
> pass in on igb6 proto tcp from any to (igb6) port = https flags S/SA
> modulate state queue(web, ack)
> pass out on igb6 inet proto tcp from (igb6) to any flags S/SA modulate
> state queue bulk
> block drop in quick on igb6 proto tcp from  to any port = ssh
> label "ssh bruteforce"
> block drop in on igb6 from  to any
> 
> Cheers,

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CFT: if_bridge performance improvements

2020-04-16 Thread peter . blok
Hi Mark/Kristof,

I have been using ng_bridge for more than a year. It was very stable and it 
allowed to have members with different MTU. My jails were using jng to setup 
the bridge and I changed iohyve to use ng_bridge.

But I recently switched to if_bridge. I needed to have pf work on a member 
interface, which wasn’t easy with ng_bridge. It was not easy to make it work 
due to two members (VLAN) coming frome the same trunk.The behavior was erratic.

I have a trusted VLAN bridged to an untrusted physical and Wifi network. All 
members are on the same IP segment, but with pf I can make sure that the 
untrusted IOT devices are only able to go outside towards the internet. The 
untrusted devices can’t create connections to the trusted devices, but the 
trusted devices can create connections to the untrusted devices.

Another issue I found with pf was with "set skip on bridge”. It doesn’t work on 
the interface group, unless a bridge exists prior to enabling pf. Makes sense, 
but I didn’t think of it. Other rules work fine with interface groups.

My jails and bhyve now runs fine with if_bridge, which is easier to setup and I 
don’t need any changes in iohyve.

Peter 

> On 16 Apr 2020, at 09:44, Kristof Provost  wrote:
> 
> Hi Mark,
> 
> I wouldn’t expect these changes to make a difference in the performance of 
> this setup.
> My work mostly affects setups with multi-core systems that see a lot of 
> traffic. Even before these changes I’d expect the if_bridge code to saturate 
> a wifi link easily.
> 
> I also wouldn’t expect ng_bridge vs. if_bridge to make a significant 
> difference in wifi features.
> 
> Best regards,
> Kristof
> 
> On 16 Apr 2020, at 3:56, Mark Saad wrote:
> 
>> Kristof
>> Up until a month ago I ran a set of FreeBSD based ap in my house and even 
>> long ago at work . They were Pc engines apu ‘s or Alix’s with one em/igb nic 
>> and one ath nic in a bridge .  They worked well for a long time however the 
>> need for more robust wifi setup caused me to swap them  out with cots aps 
>> from tp-link .  The major issues were the lack of WiFi features and 
>> standards that work oob on Linux based aps .
>> 
>> So I always wanted to experiment with ng_bridge vs if_bridge for the same 
>> task . But I never got around to it . Do you have any insight into using one 
>> vs the other . Imho if_bridge is easier to setup and get working .
>> 
>> 
>> ---
>> Mark Saad | nones...@longcount.org
>> 
>>> On Apr 15, 2020, at 1:37 PM, Kristof Provost  wrote:
>>> 
>>> On 15 Apr 2020, at 19:16, Mark Saad wrote:
>>>> All
>>>> Should this improve wifi to wired bridges in some way ? Has this been 
>>>> tested ?
>>>> 
>>> What sort of setup do you have to bridge wired and wireless? Is the FreeBSD 
>>> box also a wifi AP?
>>> 
>>> I’ve not done any tests involving wifi.
>>> 
>>> Best regards,
>>> Kristof
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New Xorg - different key-codes

2020-03-11 Thread Peter Jeremy
On 2020-Mar-11 10:29:08 +0100, Niclas Zeising  wrote:
>This has to do with switching to using evdev to handle input devices on 
>FreeBSD 12 and CURRENT.  There's been several reports, and suggested 
>solutions to this, as well as an UPDATING entry detailing the change.

The UPDATING entry says that it's switched from devd to udev.  There's no
mention of evdev or that the keycodes have been roto-tilled.  It's basically
a vanilla "things have been changed, see the documentation" entry.  Given
that entry, it's hardly surprising that people are confused.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: ntp problems stratum 2 to 14?

2020-03-04 Thread Peter Jeremy
Hi Dewayne,

Sorry for the delay.  Unfortunately, I can't really suggest anything -
it's not clear to me why ntpd would prefer a stratum 14 clock over a
stratum 2 clock.  Have you tried looking through the debugging hints
page (https://www.eecis.udel.edu/~mills/ntp/html/debug.html)?

I haven't seen that problem but I don't use the local clock.

During startup, it would not seem unreasonable for the local clock to
become valid first because it will have a lower jitter.  But ntpd
should switch to the stratum 2 clock and stay with in as the better
time source.  One problem is that if ntpd decides to switch away from
the clock for any reason (eg a burst of jitter), it may get stuck on
the local clock as it drifts further from "real" time.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: jedec_dimm fails to boot

2020-03-04 Thread Peter
On Wed, Mar 04, 2020 at 11:41:22PM +0300, Yuri Pankov wrote:
! On 04.03.2020 19:09, Peter wrote:
! > When I kldload jedec_dimm durig runtime, it works just as expected,
! > and the DIMM data appears in sysctl.
! > 
! > But when I do
! >   * load the jedec_dimm at the loader prompt, or
! >   * add it to loader.conf, or
! >   * compile it into a custom kernel,
! > it does not boot anymore.

! Could you try backporting r351604 and see if it helps?

Yepp, that works. Thank You! :)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic: too many modules

2020-03-04 Thread Peter
Front up: I do not like loadable modules. They are nice to try
something out, but when you start to depend on some dozen loaded
modules, debugging becomes a living hell: say you hunt some spurious
misbehaviour and compare logfiles with those from four weeks ago,
you will not know exactly which modules were loaded at that time.
Compiling everything into the kernel has the advantage that the
'uname' does change on every change and so does precisely describe
the running kernel.

So I came across the cc_vegas and cc_cdg modules, and they aren't
provided to compile into the kernel straightaway. But that should not
be a big deal: just add some arbitrary new device to the KERNCONF, and
then add the required files to sys/conf/files appropriately.

Should work. But it doesn't. Right after the startup message, before
even probing devices, it says
 panic: module_register_init: module named ertt not found
and a stacktrace from kern/init_main.c:mi_startup().
But definitely the h_ertt is present in the kernel (I checked).

To have a closer look, I added VERBOSE_SYSINIT to the kernel, and -
the panic is gone, everything working as expected. Without even
activating the output from VERBOSE_SYSINIT.

Then, I moved netinet/khelp/h_ertt.c to the very end of
sys/conf/files - and this also avoids the panic and things do work.
While this change does nothing but change the sequence in which
the files are compiled (and probably linked).

I think this is not good. Everybody likes modules, (although -see
above- they come with a serious tradeoff on reproducability). But if
we now deliver components only as loadable modules because a compound
kernel is no longer able to sort them out on boot, that's a more
serious issue.
I wouldn't complain if the module would simply not work (reproducible)
when compiled into the kernel - but this here appears to be a race,
most likely a timing race. And such being possible to happen at the
point where the kernel sorts out it's own components - ups, that does
worry me indeed...

There seems also to be a desire for a *fast* system bringup. I don't
share that. I do boot once a quarter, and if that takes a hour I don't
mind.
Maybe there is need for an option, to give fast boot to those who want
a gaming console alike to be available immediately, and slow boot
for those who want a reliable system in 24/7 operation?

Maybe I'll take a closer look at the issue after switching to R.12
(probably not this year). Or, maybe somebody would like to point me
to some paper describing how the module fabric is supposed to
interface and by which steps the runtime linkage is achieved?

Platform: FreeBSD 11.3-RELEASE-p6, Intel(R) Core(TM) i5-3570T CPU (IvyBridge)

cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


jedec_dimm fails to boot

2020-03-04 Thread Peter


I met an Issue:

When I kldload jedec_dimm durig runtime, it works just as expected,
and the DIMM data appears in sysctl.

But when I do
 * load the jedec_dimm at the loader prompt, or
 * add it to loader.conf, or
 * compile it into a custom kernel,
it does not boot anymore.

My custom kernel does just hang somewhere while switching the screen,
i.e. no output. The GENERIC does immediate-reboot during the device
probe phase. So both are not suitable for gathering additional info
in an easy way. (And since my DIMM appear to have neither thermal nor
serial, there is not much to gain for me here, so I will not pursue
this further, at least not before switching to R.12.)
But I fear there are some general problems with sorting out of the
modules during system bringup - see also my other message titled
"panic: too many modules".

Some data for those interested:

FreeBSD 11.3-RELEASE-p6
CPU: Intel(R) Core(TM) i5-3570T CPU (IvyBridge)
Board: https://www.asus.com/Motherboards/P8B75V/specifications/
Config:
hint.jedec_dimm.0.at="smbus12"
hint.jedec_dimm.0.addr="0xa0"
hint.jedec_dimm.1.at="smbus12"
hint.jedec_dimm.1.addr="0xa2"
hint.jedec_dimm.2.at="smbus12"
hint.jedec_dimm.2.addr="0xa4"
hint.jedec_dimm.3.at="smbus12"
hint.jedec_dimm.3.addr="0xa6"

ichsmb0:  port 0xf040-0xf05f mem 0xf7d1500
0-0xf7d150ff irq 18 at device 31.3 on pci0
smbus12:  on ichsmb0
smb12:  on smbus12

With GENERIC it becomes smbus0 (because drm2 is not loaded) and I need
to load "smbus" and "ichsmb" frontup.

Cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntp problems stratum 2 to 14?

2020-02-26 Thread Peter Jeremy
On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty  
wrote:
>I usually run ntpd with both aslr and as user ntpd.  While testing I
>noticed that my server with a direct network cable to my main time keeper,
>jumped from the expected stratum 2 to 14 as follows (I record the date so I
>can synch with the debug log, also below):
>
>vm.loadavg={ 0.09 0.10 0.18 }
>
>Wed 26 Feb 2020 15:16:38 AEDT
> remote   refid  st t when poll reach   delay   offset
> jitter
>==
> 10.0.7.6203.35.83.2422 u   44   64  3770.147  -227.12 33.560
>*127.127.1.1 .LOCL.  14 l   59  128  3770.0000.000  0.000

>26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad

Why is this bad?  You've specified that this is a valid clock source so
ntpd is free to use it if it decides it is the best source of time.

>server 127.127.1.1 minpoll 7 maxpoll 7
>fudge  127.127.1.1 stratum 14

Synchronizing to the local clock (ie using 127.127.1.x as a reference) is
almost never correct.  What external (to NTP) source is being used to
synchronize the local clock?

>I'm also very surprised that the jitter on the server (under testing) is so
>poor.  The internet facing time server is
>*x.y.z.t   .ATOM.   1 u   73  5127   23.776   34.905  95.961
>but its very old and not running aslr.

The 23ms distance to the peer suggests that this is over the Internet.  What
sort of link do you have to the Internet and how heavily loaded is it?  The
NTP protocol includes the assumption that the client-server path delay is
symmetric - this is often untrue for SOHO connections.  And SOHO connections
will often wind up saturated in one direction - which skews the apparent
timestamps and shows up as high jitter values.

> /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork
...
>I get similar results with /usr/sbin/ntpd, I've been testing both and
>happened to record details for the port ntpd.

It's probably not relevant but it would be useful for you to say up front
which ntpd you are having problems with and which version of the port you
have installed.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Fwd: Re: session mgmt: does POSIX indeed prohibit NOOP execution?

2020-01-06 Thread Peter

> Not much room to argue?

Why that? This is not about laws you have to follow blindly whether
you understand them or not, this is all about an Outcome - a working
machine that should properly function.


"Not much to argue about what behaviour is required by the standard".
The standard could have been written to require different behaviour
and most probably still make sense, but it wasn't; but at least it's
unambiguous. After that, the discussion is rather... philosophical.

It is not the standard that concerns me, it is *failure* that concerns me.

When I try to run a daemon from the base OS (in the orderly way, via  
daemon command), and it just DOES NOT WORK, and I need to find out and  
look into it what's actually wrong, then for me that's not philosophy,  
that's a failure that needs some effort to fix.
And I dont want such issues, and, more important, I don't want other  
people to run into the same issue again! (Not sure what is so difficult to  
understand with that.)


In any case, either the base system has a flaw, or the syscall has a flaw,  
or the Posix has a flaw. I don't care which, You're free to choose,


But if you instead think that flaws are not allowed to exist because Posix  
is perfect, and therefore the much better solution is to just bully the  
people who happen to run into the flaws, well, thats also okay.


rgds,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: session mgmt: does POSIX indeed prohibit NOOP execution?

2020-01-06 Thread Peter
On Mon, 06 Jan 2020 01:10:57 +0100, Christoph Moench-Tegeder  
 wrote:



When a program is invoked via /usr/sbin/daemon, it should already be
session leader AND group leader, and then the above code WOULD be a
NOOP, unless POSIX would require the setpgid() to fail and thereby the
program to abort - which, btw, is NOT a NOOP :(


https://pubs.opengroup.org/onlinepubs/9699919799/
 "The setpgid() function shall fail if: [...] The process indicated by  
the

  pid argument is a session leader."


Okay, so, what You are saying is that I got correct information insofar  
that POSIX indeed demands the perceived behaviour. Thanks for that  
confirmation.



Not much room to argue?


Why that? This is not about laws you have to follow blindly whether you  
understand them or not, this is all about an Outcome - a working machine  
that should properly function.
So either there are other positive aspects in this behaviour that weight  
against the perceived malfunction, or the requirement is simply wrong. And  
the latter case should be all the argument that is needed.


I do not say disobey Posix. I only say that one of the involved parts must  
certainly be wrong, and that should be fixed. So if You are saying, the  
problem is in Posix, but we are in the role of blind monkeys who have to  
follow that alien commandment by all means no matter the outcome, then  
this does not seem acceptable to me. Actually, as it seems to me, this  
whole session thing came originally out of Kirk McKusick's kitchen and  
made its way from there into Posix, so if there is indeed a flaw in it, it  
should well be possible to fix it going the same way.


In any case, this here (to be found in /etc/rc,d/kadmind) is a crappy  
workaround and not acceptable style:

   command_args="$command_args &"


We aren't slaves, or, are we?

I for my part came just accidentially across this matter, and as my stance  
is, 1. the code has to be solid enough to stand the Jupiter mission, and  
therefore 2. do a rootcause Always, on Every misbehaviour (and then fix it  
once and for all), so I figured that thing out.


rgds,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS and power management

2020-01-05 Thread Peter
On Wed, 18 Dec 2019 17:22:16 +0100, Karl Denninger   
wrote:



I'm curious if anyone has come up with a way to do this...

I have a system here that has two pools -- one comprised of SSD disks
that are the "most commonly used" things including user home directories
and mailboxes, and another that is comprised of very large things that
are far less-commonly used (e.g. video data files, media, build
environments for various devices, etc.)


I'm using such a configuration for more than 10 years already, and didn't  
perceive the problems You describe.
Disks are powered down with gstopd or other means, and they stay powered  
down until filesystems in the pool are actively accessed.
A difficulty for me was that postgres autovacuum must be completeley  
disabled if there are tablespaces on the quiesced pools. Another thing  
that comes to mind is smartctl in daemon mode (but I never used that).
There are probably a whole bunch more of potential culprits, so I suggest  
You work thru all the housekeeping stuff (daemons, cronjobs, etc.) to find  
it.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


session mgmt: does POSIX indeed prohibit NOOP execution?

2020-01-05 Thread Peter

   pgrp = getpid();
   if(setpgid(0, pgrp) < 0)
   err(1, "setpgid");


This appears to me a program trying to deemonize itself (in the old style  
when there was only job control but no session management).


In the case this program is already properly daemonized, e.g. by starting  
it from /usr/sbin/daemon, this code now fails, invoking the err() clause  
and thereby aborting.


From what I could find out, POSIX does not allow a session leader to do  
setpgid() on itself.
When a program is invoked via /usr/sbin/daemon, it should already be  
session leader AND group leader, and then the above code WOULD be a NOOP,  
unless POSIX would require the setpgid() to fail and thereby the program  
to abort - which, btw, is NOT a NOOP :(


So, where is the mistake here?

Option 1: I have completely misunderstood something. Then please tell me  
what.

Option 2: The quoted code is bogus. Then why is it in base?
option 3: The setpgid() behaviour is bogus. It may stop a session leader  
from executing it, but it should detect a NOOP and just go thru with it.  
Then why don't we fix that?
Option 4: POSIX is bogus. Unlikely, because as far as I could find out,  
that part of it was written following the Berkeley implementation.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Disabling speculative execution mitigations

2019-12-06 Thread Peter
On Fri, 06 Dec 2019 06:21:04 +0100, O'Connor, Daniel   
wrote:



vm.pmap.pti="0"# Disable page table isolation
hw.ibrs_disable="1"# Disable Indirect Branch Restricted Speculation
hw.mds_disable="0" # Disable Microarchitectural Data Sampling flush
hw.vmm.vmx="1" # Don't flush RSB on vmexit (presumably only  
affects bhyve etc)

hw.lazy_fpu_switch="1" # Lazily flush FPU

Does anyone know of any others?


hw.spec_store_bypass_disable=2

I have that on 11.3 (no idea yet about 12). And honestly, I lost track  
which of these should be on, off, automatic, opaque or elsewhere to  
achieve either performance or security (not to mention for which cores and  
under which circumstances it would matter, and what the impact might be),  
and my oracle says this will not end with these.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic when stopping jails

2019-12-03 Thread peter . blok
Forgot to mention that it is a very recent 12-STABLE and I don’t suspect any 
recent commits. It is just that jails are now stopped more often.


> On 3 Dec 2019, at 11:47, peter.b...@bsd4all.org wrote:
> 
> Hi,
> 
> I’m getting the following panic when stopping jais. When ifunit_ref iterates 
> over the VNET ifnet’s it gets a bad ifp. I’m using netgrapg bridge’s.
> 
> Any pointers how to debug are welcome. Crash dump is available.
> 
> Peter
> 
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 3; apic id = 03
> instruction pointer   = 0x20:0x807377c5
> stack pointer = 0x28:0xfe00d1e90870
> frame pointer = 0x28:0xfe00d1e90870
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 8537 (ifconfig)
> trap number   = 9
> panic: general protection fault
> cpuid = 3
> time = 1575297301
> KDB: stack backtrace:
> #0 0x8069a8d7 at kdb_backtrace+0x67
> #1 0x8064ec6d at vpanic+0x19d
> #2 0x8064eac3 at panic+0x43
> #3 0x809e450c at trap_fatal+0x39c
> #4 0x809e395a at trap+0x6a
> #5 0x809be97c at calltrap+0x8
> #6 0x80750ff1 at ifunit_ref+0x51
> #7 0x8075328c at ifioctl+0x47c
> #8 0x806b8b2e at kern_ioctl+0x2be
> #9 0x806b87fd at sys_ioctl+0x15d
> #10 0x809e50a2 at amd64_syscall+0x362
> #11 0x809bf2b0 at fast_syscall_common+0x101
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic when stopping jails

2019-12-03 Thread peter . blok
Hi,

I’m getting the following panic when stopping jais. When ifunit_ref iterates 
over the VNET ifnet’s it gets a bad ifp. I’m using netgrapg bridge’s.

Any pointers how to debug are welcome. Crash dump is available.

Peter


Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 03
instruction pointer = 0x20:0x807377c5
stack pointer   = 0x28:0xfe00d1e90870
frame pointer   = 0x28:0xfe00d1e90870
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 8537 (ifconfig)
trap number = 9
panic: general protection fault
cpuid = 3
time = 1575297301
KDB: stack backtrace:
#0 0x8069a8d7 at kdb_backtrace+0x67
#1 0x8064ec6d at vpanic+0x19d
#2 0x8064eac3 at panic+0x43
#3 0x809e450c at trap_fatal+0x39c
#4 0x809e395a at trap+0x6a
#5 0x809be97c at calltrap+0x8
#6 0x80750ff1 at ifunit_ref+0x51
#7 0x8075328c at ifioctl+0x47c
#8 0x806b8b2e at kern_ioctl+0x2be
#9 0x806b87fd at sys_ioctl+0x15d
#10 0x809e50a2 at amd64_syscall+0x362
#11 0x809bf2b0 at fast_syscall_common+0x101
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: wrong value from DTRACE (uint32 for int64)

2019-12-02 Thread Peter
On Mon, 02 Dec 2019 21:58:36 +0100, Mark Johnston   
wrote:

The DTRACE_PROBE* macros cast their parameters to uintptr_t, which
will be 32 bits wide on i386.  You might be able to work around the
problem by casting arg0 to uint32_t in the script.


Thanks for the info - good that it has a logical explanation.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


wrong value from DTRACE (uint32 for int64)

2019-12-02 Thread Peter

Hi @all,

I felt the need to look into my ZFS ARC, but DTRACE provided misleading  
(i.e., wrong) output (on i386, 11.3-RELEASE):


# dtrace -Sn 'arc-available_memory { printf("%x %x", arg0, arg1); }'
DIFO 0x286450a0 returns D type (integer) (size 8)
OFF OPCODE  INSTRUCTION
00: 29010601ldgs DT_VAR(262), %r1   ! DT_VAR(262) = "arg0"
01: 2301ret  %r1

NAME ID   KND SCP FLAG TYPE
arg0 262  scl glb rD type (integer) (size 8)

DIFO 0x286450f0 returns D type (integer) (size 8)
OFF OPCODE  INSTRUCTION
00: 29010701ldgs DT_VAR(263), %r1   ! DT_VAR(263) = "arg1"
01: 2301ret  %r1

NAME ID   KND SCP FLAG TYPE
arg1 263  scl glb rD type (integer) (size 8)
dtrace: description 'arc-available_memory ' matched 1 probe
  0 14none:arc-available_memory 2fb000 2
  0 14none:arc-available_memory 4e000 2
  1 14none:arc-available_memory b000 2
  1 14none:arc-available_memory b000 2
  1 14none:arc-available_memory b000 2
  1 14none:arc-available_memory 19000 2
  0 14none:arc-available_memory d38000 2

# dtrace -n 'arc-available_memory { printf("%d %d", arg0, arg1); }'
  1 14none:arc-available_memory 81920 5
  1 14none:arc-available_memory 69632 5
  1 14none:arc-available_memory 4294955008 5
  1 14none:arc-available_memory 4294955008 5


The arg0 Variable is shown here obviousely as an unsigned int32 value. But  
in fact, the probe in the sourcecode in arc.c is a signed int64:


DTRACE_PROBE2(arc__available_memory, int64_t, lowest, int, r);


User @shkhin in the forum pointed me to check the bare dtrace program,  
unattached to the kernel code:

https://forums.freebsd.org/threads/dtrace-treats-int64_t-as-uint32_t-on-i386.73223/post-446517

And there everything appears correct.

So two questions:
1. can anybody check and confirm this happening?
2. any idea what could be wrong here? (The respective variable in arc.c  
bears the correct 64bit negative value, I checked that - and otherwise the  
ARC couldn't shrink.)


rgds,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: rev 355285 breaks stable build

2019-12-02 Thread peter . blok
Fixed by rev. 355290

> On 2 Dec 2019, at 16:21, peter.b...@bsd4all.org wrote:
> 
> Hi,
> 
> While building rescue
> 
> ld: error: undefined symbol: lz4_init
>>>> referenced by spa_misc.c:2066 
>>>> (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2066)
>>>>  spa_misc.o:(spa_init) in archive 
>>>> /usr/obj/usr/src/amd64.amd64/tmp/usr/lib/libzpool.a
> 
> ld: error: undefined symbol: lz4_fini
>>>> referenced by spa_misc.c:2096 
>>>> (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2096)
>>>>  spa_misc.o:(spa_fini) in archive 
>>>> /usr/obj/usr/src/amd64.amd64/tmp/usr/lib/libzpool.a
> cc: error: linker command failed with exit code 1 (use -v to see invocation)
> *** [rescue] Error code 1
> 
> Peter
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


rev 355285 breaks stable build

2019-12-02 Thread peter . blok
Hi,

While building rescue

ld: error: undefined symbol: lz4_init
>>> referenced by spa_misc.c:2066 
>>> (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2066)
>>>   spa_misc.o:(spa_init) in archive 
>>> /usr/obj/usr/src/amd64.amd64/tmp/usr/lib/libzpool.a

ld: error: undefined symbol: lz4_fini
>>> referenced by spa_misc.c:2096 
>>> (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:2096)
>>>   spa_misc.o:(spa_fini) in archive 
>>> /usr/obj/usr/src/amd64.amd64/tmp/usr/lib/libzpool.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [rescue] Error code 1

Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Long-shot: repeatable macOS samba share unmounting during Lightroom import

2019-11-25 Thread peter . blok
The fruit module forces avahi or mdns_responder to be compiled as well. A share 
dispappearing could be due to some interaction with avahi. It could be that the 
combination samba+fruit+avahi and samba+avahi is having different behavior.

Peter



> On 24 Nov 2019, at 12:15, Pete French  wrote:
> 
> I have a very similar setup to you for serving files to my Mac from a FreeBSD 
> server. I haven't seen the unmount problem, but I di have a few oddities 
> until I added the 'fruit' module on the Samba side, which helps with 
> compatbiloty with the Mac. The appropriate bit of my config looks like this:
> 
>   vfs objects = fruit streams_xattr zfsacl
>   fruit:resource = xattr
>   fruit:encoding = private
> 
> Don't ask me what they do anymore, I added them ages ago, but it does work 
> very nicely for me. You may already have this of course, but worth pointing 
> out just in case as it took me a few years to discover it!
> 
> As someone else has said though, this may well be a Catalina bug. I am not 
> running that (MacBook too old, and not buying another until the new keyboards 
> are avilable n the replacement I want).
> 
> -pete.
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Error building stable/12 (amd64) at r355087

2019-11-25 Thread peter . blok
I can confirm it has been fixed.

> On 25 Nov 2019, at 15:21, Konstantin Belousov  wrote:
> 
> On Mon, Nov 25, 2019 at 03:58:10AM -0800, David Wolfskill wrote:
>> This is during a source-based update from r355048 to r355087, during
>> "stage 4.3: building everything" (using META_MODE); meta file reads:
>> 
>> # Meta data file 
>> /common/S3/obj/usr/src/amd64.amd64/usr.sbin/camdd/camdd.o.meta
>> CMD cc -target x86_64-unknown-freebsd12.1 
>> --sysroot=/common/S3/obj/usr/src/amd64.amd64/tmp 
>> -B/common/S3/obj/usr/src/amd64.amd64/tmp/usr/bin  -O2 -pipe   -std=gnu99 
>> -fstack-protector-strong -Wsystem-headers -Werror -Wall -Wno-format-y2k -W 
>> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes 
>> -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow 
>> -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs 
>> -Wredundant-decls -Wold-style-definition -Wno-pointer-sign 
>> -Wmissing-variable-declarations -Wno-empty-body -Wno-string-plus-int 
>> -Wno-unused-const-variable  -Qunused-arguments  -c 
>> /usr/src/usr.sbin/camdd/camdd.c -o camdd.o
>> CMD 
>> CWD /common/S3/obj/usr/src/amd64.amd64/usr.sbin/camdd
>> TARGET camdd.o
>> -- command output --
>> In file included from /usr/src/usr.sbin/camdd/camdd.c:54:
>> In file included from 
>> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/machine/bus.h:6:
>> In file included from 
>> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus.h:1043:
>> In file included from 
>> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/machine/bus_dma.h:34:
>> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus_dma.h:182:1: 
>> error: unknown type name 'bool'
>> bool bus_dma_dmar_set_buswide(device_t dev);
>> ^
>> /common/S3/obj/usr/src/amd64.amd64/tmp/usr/include/x86/bus_dma.h:182:31: 
>> error: unknown type name 'device_t'
>> bool bus_dma_dmar_set_buswide(device_t dev);
>>  ^
>> 2 errors generated.
>> 
>> *** Error code 1
> 
> I hope that this is fixed by r355089.  I did not tracked down how HEAD
> was immune to the problem.
> ___
> freebsd-stable@freebsd.org  mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> 
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> "

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: linker.hints not being update for ARMs

2019-11-12 Thread Peter Jeremy
On 2019-Nov-12 10:30:21 +0200, Daniel Braniss  wrote:
>   warning: KLD '/boot/kernel/wlan.ko' is newer than the linker.hints file
>   warning: KLD '/boot/kernel/rtwn.ko' is newer than the linker.hints file
...
>the link.hints is indeed very old :
>neo-000# ls -ls /boot/kernel/linker.hints 
>224 -rw-r--r--  1 root  wheel  228972 Jan  1  2010 /boot/kernel/linker.hints

Well, that's a nonsense timestamp because FreeBSD didn't support AllWinner
in 2010.  My guess is that your system clock was wrong.

>how can this be fixed?

Try rerunning kldxref (with the clock set correctly).

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: `uname -a' can't display revision

2019-08-20 Thread Peter Jeremy
On 2019-Aug-20 14:36:14 +0200, Trond Endrestøl  
wrote:
>Maybe NFS is to blame, particularly if file locks cannot be obtained.

Yes, it is.  SVN tries to obtain locks, even for read-only commands like
"svn info".  My solution is to mount /usr/src with the option "nolockd".

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)

2019-07-25 Thread Peter
Hi Hans Petter,
 glad to read You! :)
 
On Thu, Jul 25, 2019 at 09:39:26AM +0200, Hans Petter Selasky wrote:
! On 2019-07-25 01:00, Peter wrote:

! >> The offending feature is either
! >> options ZFS
! >> or
! >> device dtrace
! >> (Adding any of these to the GENERIC config gives the same error.)

! Can you attach your kernel configuration file?

Yes, but to what point?
I can reproduce this with the GENERIC configuration by adding
  "options ZFS"

(My custom KERNCONF relates to my local patches, and is rather
pointless without these. So at first I tried to reproduce without
my local patches and with minimal changes from GENERIC config. And
the minimal change is to add "options ZFS" into the GENERIC conf.)

See here:

root@disp:/usr/src/sys/i386/compile/GENERIC # make 
linking kernel.full
atomic.o: In function `atomic_add_64':
/usr/src/sys/i386/compile/GENERIC/./machine/atomic.h:629: multiple definition 
of `atomic_add_64'
opensolaris_atomic.o:/usr/src/sys/i386/compile/GENERIC/../../../cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71:
 first defined here
*** Error code 1

Stop.
make: stopped in /usr/src/sys/i386/compile/GENERIC
root@disp:/usr/src/sys/i386/compile/GENERIC #

root@disp:/usr/src/sys/i386/compile/GENERIC # cd ../../../..
root@disp:/usr/src # svn stat
M   sys/i386/conf/GENERIC
root@disp:/usr/src # svn diff
Index: sys/i386/conf/GENERIC
===
--- sys/i386/conf/GENERIC   (revision 350287)
+++ sys/i386/conf/GENERIC   (working copy)
@@ -1,3 +1,4 @@
+options ZFS
 #
 # GENERIC -- Generic kernel configuration file for FreeBSD/i386
 #

root@disp:/usr/src # svn info
Path: .
Working Copy Root Path: /usr/src
URL: https://svn0.us-east.freebsd.org/base/releng/11.3
Relative URL: ^/releng/11.3
Repository Root: https://svn0.us-east.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 350287
Node Kind: directory
Schedule: normal
Last Changed Author: gordon
Last Changed Rev: 350287
Last Changed Date: 2019-07-24 12:58:21 + (Wed, 24 Jul 2019)


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Rel. 11.3: Kernel doesn't compile anymore (SVN-334762, please fix!)

2019-07-24 Thread Peter
> Trying to compile my custom kernel in Rel. 11.3 results in this:
> 
> -- kernel.full ---
> linking kernel.full
> atomic.o: In function `atomic_add_64':
> /usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of 
> `atomic_add_64'
> opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71:
>  first defined here
> *** [kernel.full] Error code 1
> 
> Same config worked with 11.2
> 
> The offending feature is either
>options ZFS
> or
>device dtrace
> (Adding any of these to the GENERIC config gives the same error.)
> 
> This happens only when building for i386. Building amd64 with these
> options works.


Trying to analyze the issue:


The problem appears with SVN 334762 in 11.3:

This change adds two new functions to sys/i386/include/atomic.h:
   atomic_add_64()
   atomic_subtract_64()
[I don't really understand why this goes into a headerfile, but, well,
   nevermind]

Also, this change deactivates two functions (only in case *i386*) from
sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c
   atomic_add_64()
   atomic_del_64()
[Now, there seems to be a slight strangeness here: if we *deactivate*
atomic_del_64(), and *insert* atomic_subtract_64(), then these two
names are not the same, and I might suppose that the atomic_del_64()
is then somehow missing. But, well, nevermind]

Now, the strange thing:
this file sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c
from which now two functions get excluded *only in case i386*, is not
even compiled for i386:

>/usr/src/sys/conf$ grep opensolaris_atomic.c *
>files.arm:cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | 
>dtrace compile-with "${CDDL_C}"
>files.mips:cddl/compat/opensolaris/kern/opensolaris_atomic.coptional zfs | 
>dtrace compile-with "${CDDL_C}"
>files.powerpc:cddl/compat/opensolaris/kern/opensolaris_atomic.c
> optional zfs powerpc | dtrace powerpc compile-with "${ZFS_C}"
>files.riscv:cddl/compat/opensolaris/kern/opensolaris_atomic.c   optional zfs | 
>dtrace compile-with "${CDDL_C}"

[So maybe that's the reason why the now lack of atomic_del_64() is not
complained? Or maybe it's not used, or maybe I didn't find some
definition whereever. Well, nevermind]


Anyway, the actual name clash happens between
sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S,
because that one *is* compiled:

>/usr/src/sys/conf$ grep i386/opensolaris_atomic.S *
>files.i386:cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
> optional zfs | dtrace compile-with "${ZFS_S}"


I tried to move out the changes from SVN 334762. Sadly, that didn't
work, because something does already use these atomic_add_64() stuff,

So instead, I did this one:

--- sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
(revision 350287)
+++ sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S
(working copy)
@@ -66,8 +66,7 @@
 * specific mapfile and remove the NODYNSORT attribute
 * from atomic_add_64_nv.
 */
-   ENTRY(atomic_add_64)
-   ALTENTRY(atomic_add_64_nv)
+   ENTRY(atomic_add_64_nv)
pushl   %edi
pushl   %ebx
movl12(%esp), %edi  // %edi = target address
@@ -87,7 +86,6 @@
popl%edi
ret
SET_SIZE(atomic_add_64_nv)
-   SET_SIZE(atomic_add_64)
 
ENTRY(atomic_or_8_nv)
movl4(%esp), %edx   // %edx = target address


And at least it compiles now. If it actually runs, that remains to be
found out.


Bottomline:
Please, please, please, sort this out and fix it.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Rel. 11.3: Kernel doesn't compile anymore :(

2019-07-24 Thread Peter
Trying to compile my custom kernel in Rel. 11.3 results in this:

[code]--- kernel.full ---
linking kernel.full
atomic.o: In function `atomic_add_64':
/usr/obj/usr/src/sys/E1R11V1/./machine/atomic.h:629: multiple definition of 
`atomic_add_64'
opensolaris_atomic.o:/usr/src/sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S:71:
 first defined here
*** [kernel.full] Error code 1[/code]

Same config worked with 11.2

The offending feature is either
   options ZFS
or
   device dtrace
(Adding any of these to the GENERIC config gives the same error.)

This happens only when building for i386. Building amd64 with these
options works.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


kernel crash from adjacent partitions (gpart, zfs)

2019-05-14 Thread Peter

Hi,

 when creating partitions directly adjacent without a safety free space 
between them, the kernel may crash.


Does anybody know how big that free space needs to be?

How I found out (and how to reproduce the crash):
https://forums.freebsd.org/threads/create-degraded-raid-5-with-2-disks-on-freebsd.70750/post-426756

OS concerned: 11.2, amd64 and i386.

Or, does anybody know if this is fixed in 12?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: route based ipsec

2019-05-09 Thread Peter Blok
I have tried certificates in the past, but racoon never worked stable enough. 
Didn’t crash on me though.

I have moved over to Strongswan and never regretted this move. Very stable.

Peter

> On 8 May 2019, at 03:29, Eugene Grosbein  wrote:
> 
> 08.05.2019 3:23, KOT MATPOCKuH wrote:
> 
>> I'm misunderstand what in my configuration can result core dumps a running
>> daemon...
>> I'm attached a sample racoon.conf. Can You check for possible problems?
>> Also on one host I got a crash in another function:
>> (gdb) bt
>> #0  0x0024717f in privsep_init ()
>> #1  0x002375f4 in inscontacted ()
>> #2  0x002337d0 in isakmp_plist_set_all ()
>> #3  0x0023210d in isakmp_ph2expire ()
>> #4  0x0023162a in isakmp_ph1delete ()
>> #5  0x0023110b in isakmp_ph2resend ()
>> #6  0x0008002aa000 in ?? ()
>> #7  0x in ?? ()
> 
> I guess configuration using certificates is not tested enough.
> It works stable for me but I use psk only.
> 
> You need to fix code yourself or stop using racoon with certificates.
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



smime.p7s
Description: S/MIME cryptographic signature


Re: Issues starting unbound on boot

2019-04-30 Thread Peter Jeremy
On 2019-Apr-30 19:44:36 +, Markus Wipp  wrote:
>I currently face an issue, where I don’t know further on why this happens and 
>what I could do about it.
>I hope that this is the correct list to ask my question. If not please let me 
>know where else I might try my luck.
>I installed unbound from ports, configured it and can start / stop it from 
>command line with service unbound start without any problems.
>But whenever I reboot the machine it just doesn’t get started. The only 
>information I was able to find out so far can be found in /var/log/messages:
>root: /etc/rc: WARNING: failed to start unbound

I have seen unbound fail to start for a variety of reasons but in all cases, it 
has
written a useful hint to the console.  Can you confirm that it's not writing 
anything
to your console.  Are you able to share your configuration?

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Replicable file-system corruption due to fsck/ufs

2019-04-13 Thread Peter Holm
On Fri, Apr 12, 2019 at 04:13:00PM -0700, Kirk McKusick wrote:
> > Peter Holm  wrote:
> > 
> >> I see this even with a single truncate on HEAD.
> >>
> >> $ ./truncate10.sh
> >> 96 -rw-r--r--  1 root  wheel  1073741824 11 apr. 06:33 test
> >> ** /dev/md10a
> >> ** Last Mounted on /mnt
> >> ** Phase 1 - Check Blocks and Sizes
> >> INODE 3: FILE SIZE 1073741824 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> >> 268435456
> >> ADJUST? yes
> > 
> > Thanks.. I should have tested that myself.. doh! I was trying to
> > closer replicate my real file that triggered the problem which
> > contained a number of sparse areas.
> > 
> > And thanks for adding Kirk to the discussion. I wanted to first be
> > sure it wasn't just me :-)
> > 
> > Cheers, Jamie
> 
> This is indeed a bug in the calculation of the location of the last
> block of a file. I believe that the following patch to head will
> fix it.
> 
> Peter, can you please test and let me know.
> 
> If Peter confirms that it fixes the bug, I will check it into head
> and MFC it to 12-stable and 11-stable after a 2-week settle-in time.
> 
>   Kirk McKusick
> 

Yes, this patch works for me.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Replicable file-system corruption due to fsck/ufs

2019-04-10 Thread Peter Holm
y
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | FREE BLK COUNT(S) WRONG IN SUPERBLK
>  | SALVAGE? [yn] y
>  |
>  | SUMMARY INFORMATION BAD
>  | SALVAGE? [yn] y
>  |
>  | BLK(S) MISSING IN BIT MAPS
>  | SALVAGE? [yn] y
>  |
>  | 4 files, 35 used, 507748 free (20 frags, 63466 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | * FILE SYSTEM WAS MODIFIED *
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /root/x/mnt
>  | ** Phase 1 - Check Blocks and Sizes
>  | PARTIALLY TRUNCATED INODE I=4
>  | SALVAGE? [yn] y
>  |
>  | INCORRECT BLOCK COUNT I=4 (256 should be 128)
>  | CORRECT? [yn] y
>  |
>  | INODE 4: FILE SIZE 268468224 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
> 134610944
>  | ADJUST? [yn] y
>  |
>  | ** Phase 2 - Check Pathnames
>  | ** Phase 3 - Check Connectivity
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | FREE BLK COUNT(S) WRONG IN SUPERBLK
>  | SALVAGE? [yn] y
>  |
>  | SUMMARY INFORMATION BAD
>  | SALVAGE? [yn] y
>  |
>  | BLK(S) MISSING IN BIT MAPS
>  | SALVAGE? [yn] y
>  |
>  | 4 files, 19 used, 507764 free (20 frags, 63468 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | * FILE SYSTEM WAS MODIFIED *
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /root/x/mnt
>  | ** Phase 1 - Check Blocks and Sizes
>  | ** Phase 2 - Check Pathnames
>  | ** Phase 3 - Check Connectivity
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | 4 files, 19 used, 507764 free (20 frags, 63468 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | root@thompson# fsck /dev/md1
>  | ** /dev/md1
>  | ** Last Mounted on /root/x/mnt
>  | ** Phase 1 - Check Blocks and Sizes
>  | ** Phase 2 - Check Pathnames
>  | ** Phase 3 - Check Connectivity
>  | ** Phase 4 - Check Reference Counts
>  | ** Phase 5 - Check Cyl groups
>  | 4 files, 19 used, 507764 free (20 frags, 63468 blocks, 0.0% fragmentation)
>  |
>  | * FILE SYSTEM IS CLEAN *
>  |
>  | root@thompson# mount /dev/md1 mnt
>  |
>  | root@thompson# cd mnt/
>  | ~/x/mnt ~/x
>  |
>  | root@thompson# l
>  | total 80
>  |  4 drwxr-xr-x  3 root  wheel - 512 11 Apr 04:14 ./
>  |  4 drwxr-x---  3 root  wheel - 512 11 Apr 04:09 ../
>  |  4 drwxrwxr-x  2 root  operator  - 512 11 Apr 04:09 .snap/
>  |  4 -rw-r-  1 root  wheel -  70 11 Apr 04:14 sha256.out
>  | 64 -rw-r-  1 root  wheel - 134,610,944 11 Apr 04:14 test
>  |
>  | root@thompson# cat sha256.out
>  | 76b042e7fbb3ed1914cf600a0b5ed8e10b8d917a006dbbff774a996c9bbce941 test
>  |
>  | root@thompson# sha256 -r test
>  | 6b1a548d057244632b5d2897f8c17177236c262c6af54cc0a9db5ddc8285fbd4 test
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

I see this even with a single truncate on HEAD.

$ ./truncate10.sh
96 -rw-r--r--  1 root  wheel  1073741824 11 apr. 06:33 test
** /dev/md10a
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
INODE 3: FILE SIZE 1073741824 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 
268435456
ADJUST? yes


-- 
Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: about zfs and ashift and changing ashift on existing zpool

2019-04-08 Thread Peter Jeremy
On 2019-Apr-07 16:36:40 +0100, tech-lists  wrote:
>storage  ONLINE   0 0 0
>  raidz1-0   ONLINE   0 0 0
>replacing-0  ONLINE   0 0 1.65K
>  ada2   ONLINE   0 0 0
>  ada1   ONLINE   0 0 0  block size: 512B configured, 
> 4096B native
>ada3 ONLINE   0 0 0
>ada4 ONLINE   0 0 0
>
>What I'd like to know is:
>
>1. is the above situation harmful to data

In general no.  The only danger is that ZFS is updating the uberblock
replicas at the start and end of the volume assuming 512B sectors which
means you are at a higher risk or losing one of the replica sets if a
power failure occurs during an uberblock update.

>2. given that vfs.zfs.min_auto_ashift=12, why does it still say 512B
>   configured for ada1 which is the new disk, or..
The pool is configured with ashift=9.

>3. does "configured" pertain to the pool, the disk, or both
"configured" relates to the pool - all vdevs match the pool

>4. what would be involved in making them all 4096B
Rebuild the pool - backup/destroy/create/restore

>5. does a 512B disk wear out faster than 4096B (all other things being
>   equal)
It shouldn't.  It does mean that the disk is doing read/modify/write at
the physical sector level but that should be masked by the drive cache.

>Given that the machine and disks were new in 2016, I can't understand why zfs
>didn't default to 4096B on installation

I can't answer that easily.  The current version of ZFS looks at the native
disk blocksize to determine the pool ashift but I'm not sure how things
were in 2016.  Possibilities include:
* The pool was built explicitly with ashift=9
* The initial disks reported 512B native (I think this is most likely)
* That version of ZFS was using logical, rather than native blocksize.

My guess (given that only ada1 is reporting a blocksize mismatch) is that
your disks reported a 512B native blocksize.  In the absence of any override,
ZFS will then build an ashift=9 pool.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Observations from a ZFS reorganization on 12-STABLE

2019-03-18 Thread peter . blok
Same here using mfsbsd from 11-RELEASE. First attempt I forgot to add swap - it 
killed the ssh I was using to issue a zfs send on the remote system.

Next attempt I added swap, but ssh got killed too.

Third attempt I used mfsbsd from 12-RELEASE. It succeeded.

Now I am using mfsbsd 11-RELEASE with added swap and vis.zfs.arc_min and 
arc_max to 128Mb (it is a 4GB system) and it succeeds



> On 18 Mar 2019, at 15:14, Karl Denninger  wrote:
> 
> On 3/18/2019 08:37, Walter Cramer wrote:
>> I suggest caution in raising vm.v_free_min, at least on 11.2-RELEASE
>> systems with less RAM.  I tried "65536" (256MB) on a 4GB mini-server,
>> with vfs.zfs.arc_max of 2.5GB.  Bad things happened when the cron
>> daemon merely tried to run `periodic daily`.
>> 
>> A few more details - ARC was mostly full, and "bad things" was 1:
>> `pagedaemon` seemed to be thrashing memory - using 100% of CPU, with
>> little disk activity, and 2: many normal processes seemed unable to
>> run. The latter is probably explained by `man 3 sysctl` (see entry for
>> "VM_V_FREE_MIN").
>> 
>> 
>> On Mon, 18 Mar 2019, Pete French wrote:
>> 
>>> On 17/03/2019 21:57, Eugene Grosbein wrote:
 I agree. Recently I've found kind-of-workaround for this problem:
 increase vm.v_free_min so when "FREE" memory goes low,
 page daemon wakes earlier and shrinks UMA (and ZFS ARC too) moving
 some memory
 from WIRED to FREE quick enough so it can be re-used before bad
 things happen.
 
 But avoid increasing vm.v_free_min too much (e.g. over 1/4 of total
 RAM)
 because kernel may start behaving strange. For 16Gb system it should
 be enough
 to raise vm.v_free_min upto 262144 (1GB) or 131072 (512M).
 
 This is not permanent solution in any way but it really helps.
>>> 
>>> Ah, thats very interesting, thankyou for that! I;ve been bitten by
>>> this issue too in the past, and it is (as mentioned) much improved on
>>> 12, but the act it could still cause issues worries me.
>>> 
> Raising free_target should *not* result in that sort of thrashing. 
> However, that's not really a fix standing alone either since the
> underlying problem is not being addressed by either change.  It is
> especially dangerous to raise the pager wakeup thresholds if you still
> run into UMA allocated-but-not-in-use not being cleared out issues as
> there's a risk of severe pathological behavior arising that's worse than
> the original problem.
> 
> 11.1 and before (I didn't have enough operational experience with 11.2
> to know, as I went to 12.x from mostly-11.1 installs around here) were
> essentially unusable in my workload without either my patch set or the
> Phabricator one.
> 
> This is *very* workload-specific however, or nobody would use ZFS on
> earlier releases, and many do without significant problems.
> 
> -- 
> Karl Denninger
> k...@denninger.net   >
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE zfs/vnode deadlock issue

2019-03-02 Thread Peter Avalos via freebsd-stable

> On Mar 1, 2019, at 7:00 AM, Nick Rogers  wrote:
> 
> I am hoping someone can help me figure out if this is a legitimate bug, or
> something already fixed in 12-STABLE. I wish I could reproduce it reliably
> to try against STABLE, but there doesn't appear to be any related ZFS fixes
> not in RELEASE. Thanks.
> 

I have also experienced this problem, but I haven’t been able to troubleshoot 
it at all.

Peter
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: dmesg submission service -- please submit today

2018-10-07 Thread Peter Jeremy
On 2018-Oct-07 23:41:43 +, Roger Leigh  wrote:
>Out of interest, has FreeBSD considered implementing an equivalent of 
>Debian's "popularity-contest" package, which periodically submits 
>anonymised lists of installed packages?  On FreeBSD this could be from 
>the pkg database, and could also include hardware information.

There's ports/sysutils/bsdstats but I'm not sure how popular that is.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: FCP-0101: Deprecating most 10/100 Ethernet drivers

2018-10-04 Thread Peter Jeremy
On 2018-Oct-04 08:44:11 +, Alexey Dokuchaev  wrote:
>Looking at the commits they require near zero maintenance.  What exactly
>is the burden here?

As various others have stated, this isn't true.  All the code in FreeBSD has
an ongoing maintenance cost and is an impediment to adding new features.
There is no point in spending valuable developer effort to update drivers
and test them with unusual/obsolete hardware unless those drivers are going
to actually be used.

>Another question: why the fuck FreeBSD likes to kill
>non-broken, low-volatile and perfectly working stuff?

That language is uncalled for.

>We offer probably
>the best NIC driver support on the block, yet you're proposing to shrink
>one of the few areas where we shine.  WTF?!

Supporting NICs that no-one uses doesn't benefit anyone.  No-one is talking
about removing NICs that are in active use.

>ae(4) was used in Asus EeePC 701/900 which are still popular among hackers.

Those netbooks are more than a decade old now and I don't expect many are
still functional.  Will people still expect to use them with FreeBSD 13 in 5
years time?

>As it can be seen this list tends to cover nearly all 100 cards, yet no
>one (pardon me if I missed those) asks for 10.  So how about making this
>proposal cover only 10 cards,

What is the purpose in keeping unused FastEthernet cards in the tree?

>if you can't resist the itch to remove
>something from the tree?

Again, that language is uncalled for.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: Ryzen consensus

2018-07-22 Thread Peter Moody
I've had no issues with my r7 1700 for a while now. updated my bios
(msi x370) probably 2 months ago and i'm currently running 11.2-STABLE
r336329

On Sat, Jul 21, 2018 at 6:48 PM, George Mitchell  wrote:
> Based on people's recent Ryzen experiences, is it fair to say that
> FreeBSD 11.2 is now believed to work on Ryzens, if you have a recent
> enough Ryzen and your motherboard has been updated to the latest BIOS?
> -- George
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Security patch SA-18:03 removed from 11.2 - why?

2018-07-08 Thread Peter
Release/update 11.1-p8 introduced so-called "mitigation for speculative 
execution vulnerabilities".


In RElease 11.2 these "mitigation" have been removed. What is the reason 
for the removal, and specifically why is Security advisory 18:03 still 
mentioned in the release notes?


Behaviour with 11.1-p8:

# sysctl hw.ibrs_disable
hw.ibrs_disable: 0
# sysctl hw.ibrs_active
hw.ibrs_active: 1

Behaviour with 11.2 w/ same CPU + microcode:

# sysctl hw.ibrs_disable
hw.ibrs_disable: 0
# sysctl hw.ibrs_active
hw.ibrs_active: 0
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS+find(1) wiring all RAM

2018-06-07 Thread Peter Jeremy
I've noticed that 11-stable/amd64 has been wiring seemingly excessive
amounts of RAM for some time (the problem goes back at least 6 months).
This extends to getting ENOMEM errors from g_io_deliver() and out-of-swap
errors killing processes on a low-memory system.  I'm not sure when it
started by it seems to hawe gotten worse between r331535 and r334494.

I can see the "excessive wired memory" on my main home system with 32GB RAM
but haven't seen it completely run out of RAM.  After some gentle use and a
nightly run, there is 10GB more wired RAM than ARC.

My "low memory" system is a Google GCE f1-micro instance[*] (600MB RAM) with
about 723k inodes used and the following ZFS tuning:
vfs.zfs.arc_max="128M"
vfs.zfs.arc_meta_limit="50M"
vfs.zfs.arc_min="25M"

The following numbers were gatherer by looking at top(1).  Running r334494,
after booting, to multi-user, the system has about 187MB wired (94MB ARC).
If I then run /etc/periodic/security/100.chksetuid, wired RAM increases to
about 580MB, with 380MB ARC, dropping to 467MB and 217MB ARC when the script
exits (this is still nearly twice arc_max).  Free memory can drop to <10KB
whilst the find(1) is running.

I have several issues with this behaviour:
0) ARC usage can significantly exceed arc_max.  I understand that arc_max is
a soft limit but IMO, 3x is unreasonable - especially when the system is
under extreme memory pressuse.
1) Significant amounts of wired memory are in use but I can't find anything
in "vmstat -mz" that would explain where it's going.

Does anyone have any suggestions for digging into this?

[1] I get the same behaviour using a VBox instance with similar dimensioning
and the same tuning)

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2018-05-25 Thread Peter Blok
Same here. EARLY_AP_STARTUP no longer needs to be disabled on my boxes too.


> On 24 May 2018, at 19:12, Mark Martinec  wrote:
> 
> Just a short report to a thread I started when 11.1 came out.
> 
> This machine would stall in a busy loop while attaching disks
> during boot. Rebuilding a kernel with EARLY_AP_STARTUP disabled
> avoided the problem. This was a situation through the whole
> 11.1 life cycle (i.e. patch releases did not help).
> 
> Today I have upgraded this host to 11.2-BETA2, and it is
> no longer necessary to disable EARLY_AP_STARTUP. Good, thanks!
> 
>  Mark
> 
> 
>> 2017-07-20 02:03, Mark Johnston wrote:
>>> One thing to try at this point would be to disable EARLY_AP_STARTUP in
>>> the kernel config. That is, take a configuration with which you're able
>>> to reproduce the hang during boot, and remove "options
>>> EARLY_AP_STARTUP".
>> 2017-07-20 15:45, Mark Martinec wrote:
>> Done. And it avoids the problem altogether! Thanks.
>> Tried a reboot several times and it succeeds every time.
>> Here is all that I had in a config file for building a kernel,
>> i.e. I took away the 'options DDB' which also seemingly avoided
>> the problem:
>>  include GENERIC
>>  ident NELI
>>  nooptions EARLY_AP_STARTUP
>>> This feature has a fairly large impact on the bootup process and has
>>> had a few problems that manifested as hangs during boot. There was at
>>> least one other case where an innocuous change to the kernel
>>> configuration "fixed" the problem by introducing some second-order
>>> effect (causing kernel threads to be scheduled in a different
>>> order, for instance).
> [...]
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"



smime.p7s
Description: S/MIME cryptographic signature


Re: problems with ssh-agent after running MATE desktop

2018-05-24 Thread Peter Moody
On Thu, May 24, 2018 at 9:01 AM, Charlie Li  wrote:

> MATE loads all of gnome-keyring, including the ssh-agent portion.

neither here nor there, but gnome-keyring is bad at being an
ssh-agent. if you can in any way avoid it, your life will be better if
you do.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-18 Thread Peter

Hi all of You,

  thank You very much for Your commenting and reports!

From what I see, we have (at least) two rather different demands here: 
while George looks at the over-all speed of compute throughput, others 
are concerned about interactive response.


My own issue is again a little bit different: I am running this small 
single-CPU machine as my home-office router, and it also runs a backup 
service, which involves compressing big files and handling an outgrown 
database (but that does not need to happen fast, as it's just backup stuff).
So, my demand is to maintain a good balance between realtime network 
activity being immediately served, and low-priority batch compute jobs, 
while still staying responsive to shell-commands - but the over-all 
compute throughput is not important here.


But then, I find it very difficult to devise some metrics, by which such 
a demand could be properly measured, to get compareable figures.



George Mitchell wrote:

I suspect my case (make buildworld while running misc/dnetc) doesn't
qualify.  However, I just completed a SCHED_ULE run with
preempt_thresh set to 5, and "time make buildworld" reports:
7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w
Much closer to SCHED_4BSD!  I'll try preempt_thresh=0 next, and I
guess I'll at least try preempt_thresh=224 to see how that works
for me. -- George



I found that preempt_thresh=0 cannot be used in practice:
When I try to do this on my quadcode desktop, and then start four 
endless-loops to get the cores busy, the (internet)radio will have a 
dropout every 2-3 seconds (and there is nothing else running, just a 
sleeping icewm and a mostly sleeping firefox)!


So, the (SMP) system *depends* on preemption, it cannot handle streaming 
data without it. (@George: Your buildworld test is pure batch load, and 
may not be bothered by this effect.)



I think the problem is *not* to be solved by finding a good setting for
preempt_thresh (or other tuneables). I think the problem lies deeper, 
and these tuneables only change its appearance.


I have worked out a writeup explaining my thoughts in detail, and I 
would be glad if You stay tuned and evaluate that.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-18 Thread Peter

EBFE via freebsd-stable wrote:

On Tue, 17 Apr 2018 09:05:48 -0700
Freddie Cash  wrote:


# Tune for desktop usage
kern.sched.preempt_thresh=224

​Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor
(3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU.


For interactive tasks, there is a "special" tunable:
% sysctl kern.sched.interact
kern.sched.interact: 10 # default is 30
% sysctl -d kern.sched.interact
kern.sched.interact: Interactivity score threshold

reducing the value from 30 to 10-15 keeps your gui/system responsive,
even under high load.


Yes, this may improve the "irresponsive-desktop" problem. Because 
threads that are scored interactive, are run as realtime threads, ahead 
of all regular workload queues.
But it will likely not solve the problem described by George, having two 
competing batch jobs. And for my problem as described at the beginning
of the thread, I could probably tune so far that my "worker" thread 
would be considered interactive, but then it would just toggle between
realtime and timesharing queues - and while this may make things better, 
it will probably not lead to a smooth system behaviour.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Found the issue! - SCHED_ULE+PREEMPTION is the problem

2018-04-10 Thread Peter

Results:


1. The tdq_ridx pointer

The perceived slow advance (of the tdq_ridx pointer into the circular 
array) is correct behaviour. McKusick writes:



The pointer is advanced once per system tick, although it may not
advance on a tick until the currently selected queue is empty. Since
each thread is given a maximum time slice and no threads may be added
to the current position, the queue will drain in a bounded amount of
time.


Therefore, it is also normal that the process (the piglet in this case) 
does run until it's time slice (aka quantum) is used up.



2. The influence of preempt_thresh

This can be found in tdq_runq_add(). A simplified description of the 
logic there is as follows:


td_priority <  152 ? -> add to realtime-queue
td_priority <= 223 ? -> add to timeshare-queue
   if preempted
   circular-index = tdq_ridx
   else
   circular_index = tdq_idx + td_priority
else-> add to idle-queue

If the thread had been preempted, it is reinserted at the current 
working position of the circular array, otherwise the position is 
calculated from thread priority.



3. The quorum

Most of the task switches come from device interrupts. Those are running 
at priority intr:8 or intr:12. So, as soon as preempt_thresh is 12 or 
bigger, the piglet is almost always reinserted in the runqueue due to 
preemption.
And, as we see, in that case we do not have a scheduling, we have a 
simple resume!


A real scheduling happens only after the quorum is exhausted. Therefore,
reducing the quorum helps.


4. History

In r171713 was this behaviour deliberately introduced.

In r220198 it was fixed, with a focus on CPU-hogs and single-CPU.

In r239157 the fix was undone due to performance considerations, with 
the focus on rescheduling only at end of the time-slice.



5. Conclusion

The current defaults seem not very well suited for certain CPU-intense 
tasks. Possible solutions are one of:

 * not use SCHED_ULE
 * not use preemption
 * change kern.sched.quorum to minimal value.

P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   10   >