Re: git: 4a864f624a70 - main - vm_pageout: Print a more accurate message to the console before an OOM kill [MFC in time for 13.1?]

2022-02-26 Thread Mark Millard
On 2022-Jan-15, at 07:55, Mark Johnston  wrote:

> On Fri, Jan 14, 2022 at 09:38:56PM -0800, Mark Millard wrote:
>> Thanks. This will allow me to remove part of my personal additions
>> in this area --and my having to explain the misnomer when trying
>> to help someone analyze why they end up with OOM activity so they
>> can figure out what to do about it.
>> 
>> There seem to be two separate sources of VM_OOM_SWAPZ. Showing
>> my personal additions for them (just making them explicit in the
>> sequence of messages generated):
>> 
>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
>> index 01cf9233329f..280621ca51be 100644
>> --- a/sys/vm/swap_pager.c
>> +++ b/sys/vm/swap_pager.c
>> @@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>> pindex, daddr_t swapblk)
>>0, 1))
>>printf("swap blk zone exhausted, "
>>"increase kern.maxswzone\n");
>> +   printf("swp_pager_meta_build: swap blk uma 
>> zone exhausted\n");
>>vm_pageout_oom(VM_OOM_SWAPZ);
>>pause("swzonxb", 10);
>>} else
>> @@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>> pindex, daddr_t swapblk)
>>0, 1))
>>printf("swap pctrie zone exhausted, "
>>"increase kern.maxswzone\n");
>> +   printf("swp_pager_meta_build: swap pctrie 
>> uma zone exhausted\n");
>>vm_pageout_oom(VM_OOM_SWAPZ);
>>pause("swzonxp", 10);
>>} else
>> 
>> Care to comment on the distinctions and why there are two
>> contexts classified as "out of swap space"? Would either
>> one show the swap space as (nearly?) all used in, say, top?
>> Or might one of them still end up looking like a misnomer
>> from just a top (or whatever) display?
> 
> Hmm, those cases should likely be changed from "out of swap space" to
> "failed to allocate swap metadata" or something like that.

The above does not seem to have happened yet in main [so: 14].

Will 13.1 get an MFC of 4a864f624a70 in time, possibly with the
above change also in place to fully avoid misnomer reporting
that misleads folks?

4a864f624a70 listed:

MFC after:  2 weeks

but it has been more than a month.

> . . .
> 

===
Mark Millard
marklmi at yahoo.com




Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman

On 02/26/2022 10:57 am, Larry Rosenman wrote:

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
l...@freenas.lerctr.org cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it 
can get to.


otis


—
Juraj Lutter
o...@freebsd.org

I just looked at the destination to see where it died (it did!) and I
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether
that was sufficient.

Thanks, all!

Well, it was NOT sufficient More zfs export fun to come :(

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org 
cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it can 
get to.


otis


—
Juraj Lutter
o...@freebsd.org
I just looked at the destination to see where it died (it did!) and I 
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether that 
was sufficient.


Thanks, all!
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: Build faulure of editors/libreoffice only on src main (stable/13 is OK)

2022-02-26 Thread Rainer Hurling

Am 26.02.22 um 14:14 schrieb Tomoaki AOKI:

Thanks.
But unfortunately, as I've described at Comment 21 [2] of Bug 262008,
setting kern.elf64.aslr.enable=0 didn't help.
As I'm building on amd64 and not built for compat32, I've not touched
kern.elf32.aslr.enable.
And as these are regular writable sysctl (and also are tunables, too),
setting these in /boot/loader.conf and reboot before build is not
tested.


I just tried building _after a reboot_ whith kern.elf64.aslr.enable=0 on 
recent CURRENT and it doesn't work for me.


14.0-CURRENT #0 main-n253393-2bfdc1ee9b1 amd64

Best wishes,
Rainer



Should I set more sysctl's? I thought setting above actually disable
all aslr related features (for 64bit), regardless its 1 ro 0.

Error messages (with "MAKE_JOBS_UNSAFE=yes") and backtraces are
described at Comment 20 [3].

[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c21

[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c20


On Sat, 26 Feb 2022 13:29:26 +0100
Michael Gmelin  wrote:


Maybe it’s related to ASLR? (or is it also enabled in 13/stable?)


On 26. Feb 2022, at 13:05, Tomoaki AOKI  wrote:

〓(Re-sent as not yet delivered in more than 5 hours)

Hi.

I have a build failure of editors/libreoffice on src main, amd64.
As I've reported on Bug 262008 [1], problems on stable/13 is already
fixed, but still fails on main with different faulure mode.

A tool gengal.bin, built within whole libreoffice build, coredumps but
it went OK on stable/13.

Port options are now default on both main and stable/13.

I now come to suspect the differences about toolchains within main and
stable/13, but as editors/libreoffivce is giant and this failure
happenes almost at the end of build, usual bisecting is not realistic.
(Would require tens of weekends, maybe.)

Any thoughts? Or am I missing something to check for?

Regards.


[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008

--
Tomoaki AOKI











Re: ZFS PANIC: HELP.

2022-02-26 Thread Alexander Leidinger
 Quoting Larry Rosenman  (from Fri, 25 Feb 2022  
20:03:51 -0600):



On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


 The safest / cleanest (but not fastest) is data export and  
pool re-creation. If you export dataset by dataset (instead of  
recursively all), you can even see which dataset is causing the  
issue. In case this per dataset export narrows down the issue and  
it is a dataset you don't care about (as in: 1) no issue to  
recreate from scratch or 2) there is a backup available) you could  
delete this (or each such) dataset and re-create it in-place (= not  
re-creating the entire pool).


Bye,
Alexander.
 http://www.Leidinger.net alexan...@leidinger.net: PGP  
0x8F31830F9F2772BF

http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF


  I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh  
l...@freenas.lerctr.org cat - \> $FN

done

   

  How will I know a "Problem" dataset?


You told a scrub is panicing the system. A scrub only touches occupied  
blocks. As such a problem-dataset should panic your system. If it  
doesn't panic at all, the problem may be within a snapshot which  
contains data which is deleted in later versions of the dataset.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpYqsU391ZUr.pgp
Description: Digitale PGP-Signatur


Re: Build faulure of editors/libreoffice only on src main (stable/13 is OK)

2022-02-26 Thread Tomoaki AOKI
Thanks.
But unfortunately, as I've described at Comment 21 [2] of Bug 262008,
setting kern.elf64.aslr.enable=0 didn't help.
As I'm building on amd64 and not built for compat32, I've not touched
kern.elf32.aslr.enable.
And as these are regular writable sysctl (and also are tunables, too),
setting these in /boot/loader.conf and reboot before build is not
tested.

Should I set more sysctl's? I thought setting above actually disable
all aslr related features (for 64bit), regardless its 1 ro 0.

Error messages (with "MAKE_JOBS_UNSAFE=yes") and backtraces are
described at Comment 20 [3].

[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c21

[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c20


On Sat, 26 Feb 2022 13:29:26 +0100
Michael Gmelin  wrote:

> Maybe it’s related to ASLR? (or is it also enabled in 13/stable?)
> 
> > On 26. Feb 2022, at 13:05, Tomoaki AOKI  wrote:
> > 
> > 〓(Re-sent as not yet delivered in more than 5 hours)
> > 
> > Hi.
> > 
> > I have a build failure of editors/libreoffice on src main, amd64.
> > As I've reported on Bug 262008 [1], problems on stable/13 is already
> > fixed, but still fails on main with different faulure mode.
> > 
> > A tool gengal.bin, built within whole libreoffice build, coredumps but
> > it went OK on stable/13.
> > 
> > Port options are now default on both main and stable/13.
> > 
> > I now come to suspect the differences about toolchains within main and
> > stable/13, but as editors/libreoffivce is giant and this failure
> > happenes almost at the end of build, usual bisecting is not realistic.
> > (Would require tens of weekends, maybe.)
> > 
> > Any thoughts? Or am I missing something to check for?
> > 
> > Regards.
> > 
> > 
> > [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008
> > 
> > -- 
> > Tomoaki AOKI
> 
> 


-- 
青木 知明  [Tomoaki AOKI]



Build faulure of editors/libreoffice only on src main (stable/13 is OK)

2022-02-26 Thread Tomoaki AOKI
(Re-sent as not yet delivered in more than 5 hours)

Hi.

I have a build failure of editors/libreoffice on src main, amd64.
As I've reported on Bug 262008 [1], problems on stable/13 is already
fixed, but still fails on main with different faulure mode.

A tool gengal.bin, built within whole libreoffice build, coredumps but
it went OK on stable/13.

Port options are now default on both main and stable/13.

I now come to suspect the differences about toolchains within main and
stable/13, but as editors/libreoffivce is giant and this failure
happenes almost at the end of build, usual bisecting is not realistic.
(Would require tens of weekends, maybe.)

Any thoughts? Or am I missing something to check for?

Regards.


[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008

-- 
Tomoaki AOKI