Re: Deprecating base system ftpd

2021-04-10 Thread Scott Bennett via freebsd-stable
 On Fri, 09 Apr 2021 07:32:12 +0900 aventa...@fastmail.fm wrote:

>It makes me think that there should be an offering for two completely 
>different audiences:
>(1) FreeBSD core (a very minimal offering for folks that want to build things, 
>like a Desktop, etc.)
>(2) FreeBSD server (an offering for folks that want a server build)
>
>Perhaps that idea is just unreasonably crazy as well. 
>
 LOL!  You have what is called a very big ask.  I would like
something far smaller, namely, a choice of schedulers during/just
after installation of a -RELEASE without having to a) download the
entire source tree, b) make buildworld, and c) make buildkernel.
The kernel developers in their wisdom--ahem--have burdened all new
installations with the abysmal performance of the ULE scheduler.
The installation images for -STABLE versions are much the same.
The 4BSD scheduler has been far from optimal, and the ULE scheduler
looked like a nice idea on paper for newer CPUs, but in fact, the
ULE scheduler's performance is awful, even when compared with the
4BSD scheduler, which generally gives acceptable, though not optimal,
performance.
 If the owner of a new installation wants to get passably usable
performance from his new system, he must first perform the tasks
noted above.  The second and third tasks will take *a lot* of extra
time because they must be done under the ULE scheduler.  Then one
must install the new kernel, reboot, do the mergemaster or /etc/update
steps, install the new world, more mergemaster or /etc/update, and
reboot again.
 Two ways of allowing a choice of scheduler are 1) to provide two
GENERIC kernels, e.g., GENERIC.ULE and GENERIC.4BSD, from which one
could choose at boot time, and 2) to compile both schedulers into the
GENERIC kernel, which could be selected from by a loader tunable at
boot time.
 The current system is yet another discouragement to upgrading to
a new -RELEASE via a new installation.  Further, this fix to bad
performance by default is not documented anywhere.  How is a user who
is new to FreeBSD to know about it?


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-04-09 Thread Scott Bennett via freebsd-stable
Eugene Grosbein  wrote:

> 07.04.2021 12:49, Scott Bennett via freebsd-stable wrote:
>
> >  At least w.r.t. gvinum's raid5, I can attest that the kernel panics
> > are real.  Before settling on ZFS raidz2 for my largest storage pool, I
> > experimented with gstripe(8), gmirror(8), graid3(8), and graid5(8) (from
> > sysutils/graid5).  All worked reasonably well, except for one operation,
> > namely, "stop".  Most/all such devices cannot actually be stopped because
> > a stopped device does not *stay* stopped.  As soon as the GEOM device
> > node is destroyed, all disks are retasted, their labels, if any, are
> > recognized, and their corresponding device nodes are recreated and placed
> > back on line. :-(  All of this happens too quickly for even a series of
> > commands entered on one line to be able to unload the kernel module for
> > the device node type in question, so there is no practical way to stop
> > such a device once it has been started.
>
> In fact, you can disable re-tasting with sysctl kern.geom.notaste=1,
> stop an GEOM, clear its lables and re-enable tasting setting 
> kern.geom.notaste=0 back.

 Thank you for this valuable, but undocumented, workaround!  However, it
serves to demonstrate the bugs in gstripe(8), gmirror(8), graid3(8), and
graid5(8), and perhaps a few others, either in the commands themselves, which
do not behave as advertised in their respective man pages or in their man pages
for not correctly documenting the commands' actual behavior.
>
> >  A special note is needed here regarding gcache(8) and graid3(8).  The
> > documentation of gcache parameters for sector size for physical devices
> > and gcache logical devices is very unclear, such that a user must have the
> > device nodes and space on them available to create test cases and do so,
> > whereas a properly documented gcache(8) would obviate the need to set up
> > such experiments.  There is similar lack of clarity in various size
> > specifications for blocks, sectors, records, etc. in many of the man pages
> > for native GEOM commands.
>
> I found gcache(8) very nice at first, it really boosts UFS performance 
> provided
> you have extra RAM to dedicate to its cache. gcache can be stacked with 
> gmirror etc.
> but I found it guilty to some obscure UFS-related panics. It seems there were 
> races or something.
> No data loss, though as it is intended to be transparent for writing.

 There are other, also undocumented, problems.  For example, I played with
gcache(8) for a short time as a method of dividing a ZFS pool into two extents
on a drive in order to place a frequently accessed partition between them.  It
worked nicely for a while, but the first time that gcache(8) choked it made a
real mess of the ZFS pool's copy on that drive.  As a result I immediately
abandoned that use of gcache(8).
 gcache(8) usses two poorly defined sysctl values, kern.geom.cache.used_hi
and kern.geom.cache.used_lo.  Its man page shows them with default values, but
neglects to mention whether they are enforced limits or merely sysctl variables
that report current or high and low watermark usages.
>
> I was forced to stop using gcache for sake of stability and it's a shame.
> For example, dump(8) speed-up due to gcache was 2x at least with big cache
> comparing to dump -C32 without gcache.
>
 I used it to make all accesses to a graid3(8) set of partitions work with
64 KB and 32 KB block sizes for UFS2 efficiency on a graid3(8) device.  That use
worked very nicely, but it took some experimentation to figure out how to do it
because the man page is so ambiguous about the gcache command's options and
arguments.
 A similar complaint could be leveled at the man pages for gstripe(8),
graid3(8), and graid5(8) w.r.t. their undocumented definitions of stripe size,
sector size, and block size.  At present, without reading the command and kernel
source code for each or experimenting extensively, it is difficult to understand
what the commands' options and arguments will do and which combinations of their
numerical values can be valid and accepted.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-04-06 Thread Scott Bennett via freebsd-stable
Ed,
 On Thu, 25 Mar 2021 13:25:44 -0400 Ed Maste  wrote:

>Vinum is a Logical Volume Manager that was introduced in FreeBSD 3.0,
>and for FreeBSD 5 was ported to geom(4) as gvinum. gvinum has had no
>specific development at least as far back as 2010 and it is not clear
>how well it works today. There are open PRs with reports of panics
>upon removing disks, etc. And, it also imposes an ongoing cost as it

 First off, the "port" to geom(4) was incomplete in that gvinum is
somehow not restricted to the geom(4) device nodes presented to it, but
instead always grabs the entire physical device and node to do its own
label processing.
 Second, gvinum is completely incompatible with GPT partitioning
because, regardless of the device nodes given it to use, it always writes
and reads its own label to and from the ends of the physical drives.
That means it overwrites the GPT secondary partition table with its own
labels, which soon causes error/warning messages from the kernel about
a damaged/missing secondary partition table and recommending a recovery
of that damaged/missing partition table.  Doing the recovery then will
overwrite gvinum's labels, which is likely to cause a kernel panic or
worse.
 My memory on gvinum's compatibility with glabel(8) labels is fuzzy
at the present remove, but I seem to recall having encountered problems
there, too.  This is not unique, unfortunately, to gvinum(8).  For
example, using glabel(8) to label swap partitions, as well as
bsdlabel(8)ed partitions, can lead to many unexpected problems.  Such
inconsistencies should be researched and fixed.
 GPT labels allow a partition type of "freebsd-vinum".  I did not try
to play with that one, but I suspect that it would not work correctly
because gvinum is somehow not limited to the GEOM device node for a
partition.  However, if you decide to keep gvinum(8) for some reason,
then this matter should be checked out in detail and its inconsistencies
fixed..
 At least w.r.t. gvinum's raid5, I can attest that the kernel panics
are real.  Before settling on ZFS raidz2 for my largest storage pool, I
experimented with gstripe(8), gmirror(8), graid3(8), and graid5(8) (from
sysutils/graid5).  All worked reasonably well, except for one operation,
namely, "stop".  Most/all such devices cannot actually be stopped because
a stopped device does not *stay* stopped.  As soon as the GEOM device
node is destroyed, all disks are retasted, their labels, if any, are
recognized, and their corresponding device nodes are recreated and placed
back on line. :-(  All of this happens too quickly for even a series of
commands entered on one line to be able to unload the kernel module for
the device node type in question, so there is no practical way to stop
such a device once it has been started.  Because gvinum's raid5 was
always unbearably slow and also subject to kernel panics, I soon excluded
it from further consideration.  GEOM is one of the brightest gems of
modern FreeBSD design.  GEOM's native functions should not be corrupted
or ignored as a result of a botched attempt to "modernize" an old
monstrosity like gvinum, which was originally written for a system that
lacked GEOM and has not fit well into a modern system that *has* GEOM,
not to mention GPT partitioning.
  All of these specific, native GEOM second-level devices otherwise
work pretty much as advertised.  graid5(8), however, was recently marked
as deprecated, which is a real shame.  I would vote for finishing its man
page, which is very incomplete, and for adding a subcommand to do some
sort of scrub procedure like many hardware RAID5 controllers do.  There
are perfectly valid reasons to use these devices in some situations
instead of ZFS, e.g., better performance for temporary/disposable data,
especially for situations involving millions of very short files like
ccache(1) directory trees, portmaster(8)'s $WRKDIRPREFIX, and likely
others.  gvinum(8) appears to have been broken in several ways since
FreeBSD 5.0, is unmaintained as you wrote, and should be deprecated and
eliminated for the reasons given above.  The simple GEOM devices provide
much the same flexibility that gvinum was intended to provide without
the need to learn gvinum's peculiar configuration method.  Once one
understands how GEOM devices work and can be stacked, they are generally
very simple to use in contrast to gvinum, which remains broken in
multiple ways.

>must be updated when other work is done (such as the recent MAXPHYS
>work). I suspect that by now all users have migrated to either
>graid(8) or ZFS.

 graid(8) is not always a good option.  If you read its man page,
you will see that RAID5 is usually only supported as read-only devices,
where it is supported at all.  This can be helpful for recovering data
from a proprietary RAID device, but is not generally useful for actively
used and updated data.  IOW, it can be helpful in a potentially large
number of situations for some users, especially for

Re: swap space issues

2020-07-12 Thread Scott Bennett via freebsd-stable
Don Wilde  wrote:

>
> On 7/11/20 11:28 PM, Scott Bennett via freebsd-stable wrote:
> >   I have read this entire thread to date with growing dismay, and I
> > thank Donald Wilde for reporting his ongoing troubles, although they
> > spoil my hopes that the kernel's memory management bugs that first became
> > apparent in 11.2-RELEASE (and -STABLE around the same time) were not
> > propagated into 12.x.  A recent update to stable/12 source tree made it
> > finally possible for me to build 12.1-STABLE under 11.4-PRERELEASE, and I
> > was just about to install the upgrade when this thread appeared.
> Spoiler alert. Since I gave up on Synth, I haven't had a single swap 
> issue. It does appear to be one particular port that drove it nuts 
> (apparently, one of the 'Google performance' bits, with a 
> mismatched-brackets problem). I have rebuilt the machine several times, 
> but that's more for my sense of tidiness than anything.
>
> I've got a little Crystal script that walks the installed packages and 
> ports and updates them with system() calls.
> The machine is very slow, but it's not swapping at all.

 That's good.  I use portmaster, but not often at present because a
"portmaster -a" run can only be done two or three times per boot before real
memory is locked down to the extent that the system is no longer functional
(i.e., even a scrub of ZFS pools comes to a halt in mid scrub due to lack of a
sufficient supply of free page frames).
 The build procedures of certain ports consistently get killed by the OOM
killer, along with much collateral damage.  I've noticed that lang/golang and
lang/rust are prime examples now, although both used to build without problems.
>
> It is quite usable now with 12-STABLE.

 I don't see any good reason to go through the hassle and lost time of an
upgrade across a major release boundary if I still won't have a production OS
afterward.  I'm already dealing with a graphics stack rendered unsafe to use by
the ongoing churn in X11 code.  (See PR #247441, kindly filed for me by Pau
Amma.)
> >
> >   On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde 
> > wrote:
> >
> >> On 6/26/20, Peter Jeremy  wrote:
> >>>
> [snip]
> >>> I strongly suggest you don't have more than one swap device on spinning
> >>> rust - the VM system will stripe I/O across the available devices and
> >>> that will give particularly poor results when it has to seek between the
> >>> partitions.
> >   True.  The only reason I can think of to use more than one swapping/
> > paging area on the same device for the same OS instance is for emergencies
> > or highly unusual, temporary situations in which more space is needed until
> > those situations conclude. and even in such situations, if the space can be
> > found on another device, it should be placed there.  Interleaving of swap
> > space across multiple devices is intended as a performance enhancement
> > akin to striping (a.k.a. RAID0), although the virtual memory isn't
> > necessarily always actually striped across those devices.  Adding a paging
> > area on the same device as an existing one is an abhorrent situation, as
> > Peter Jeremy noted, and it should be eliminated via swapoff(8) as soon as
> > the extraordinary situation has passed.  N.B. the GENERIC kernel sets a
> > limit of four swap devices, although it can be rebuilt with a different
> > limit.
> That's good data, Scott, thanks! The only reason I got into that 
> situation of trying to add another swap device was that it was crashing 
> with OO swap messages.

 I don't recall you posting those messages, but it sounds like exactly the
*temporary* situation in which adding an inappropriately placed paging area can
be used long enough to get you out of a bind without a reboot, even though
performance will probably suffer until you have removed it again.  Poor
performance is usually preferable to no performance if it is only temporary.
 One cautionary note in such situations, though, applies to remote paging
areas.  Sparse files allocated on the remote system should not be used as
paging areas.  For example, I discovered the hard way (i.e., the problem was
not documented) that SunOS would crash if a sparse file via NFS were added as
a paging area and the SunOS system tried to write a page out to an unallocated
region of the file, which was essentially all of the file at first.

> >> My intent is to make this machine function -- getting the bear
> >> dancing. How deftly she dances is less important than that she dances
> >> at all. My for-real boxen will have real HP and real cores and RAM.
>

Re: swap space issues

2020-07-11 Thread Scott Bennett via freebsd-stable
 I have read this entire thread to date with growing dismay, and I
thank Donald Wilde for reporting his ongoing troubles, although they
spoil my hopes that the kernel's memory management bugs that first became
apparent in 11.2-RELEASE (and -STABLE around the same time) were not
propagated into 12.x.  A recent update to stable/12 source tree made it
finally possible for me to build 12.1-STABLE under 11.4-PRERELEASE, and I
was just about to install the upgrade when this thread appeared.

 On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde 
wrote:

>On 6/26/20, Peter Jeremy  wrote:
>> On 2020-Jun-25 11:30:31 -0700, Donald Wilde  wrote:
>>>Here's 'pstat -s' on the i3 (which registers as cpu HAMMER):
>>>
>>>Device  1K-blocks UsedAvail Capacity
>>>/dev/ada0s1b 335544320 33554432 0%
>>>/dev/ada0s1d 335544320 33554432 0%
>>>Total671088640 67108864 0%
>>
>> I strongly suggest you don't have more than one swap device on spinning
>> rust - the VM system will stripe I/O across the available devices and
>> that will give particularly poor results when it has to seek between the
>> partitions.

 True.  The only reason I can think of to use more than one swapping/
paging area on the same device for the same OS instance is for emergencies
or highly unusual, temporary situations in which more space is needed until
those situations conclude. and even in such situations, if the space can be
found on another device, it should be placed there.  Interleaving of swap
space across multiple devices is intended as a performance enhancement
akin to striping (a.k.a. RAID0), although the virtual memory isn't
necessarily always actually striped across those devices.  Adding a paging
area on the same device as an existing one is an abhorrent situation, as
Peter Jeremy noted, and it should be eliminated via swapoff(8) as soon as
the extraordinary situation has passed.  N.B. the GENERIC kernel sets a
limit of four swap devices, although it can be rebuilt with a different
limit.
>
>My intent is to make this machine function -- getting the bear
>dancing. How deftly she dances is less important than that she dances
>at all. My for-real boxen will have real HP and real cores and RAM.
>
>>
>> Also, you can't actually use 64GB swap with 4GB RAM.  If you look back
>> through your boot messages, I expect you'll find messages like:
>> warning: total configured swap (524288 pages) exceeds maximum recommended
>> amount (498848 pages).
>> warning: increase kern.maxswzone or reduce amount of swap.

 Also true.  Unfortunately, no guidance whatsoever is provided to advise
system administrators who need more space as to how to increase the relevant
table sizes and limits.  However, that is a documentation bug, not a code
bug.
>
>Yes, as I posted, those were part of the failure stream from the synth
>program. When I had kern.maxswzone increased, it got through boot
>without complaining.
>
>> or maybe:
>> WARNING: reducing swap size to maximum of MB per unit
>
>The warnings were there, in the as-it-failed complaints.
>
>> The absolute limit on swap space is vm.swap_maxpages pages but the
>> realistic
>> limit is about half that.  By default the realistic limit is about 4?RAM
>> (on
>> 64-bit architectures), but this can be adjusted via kern.maxswzone (which
>> defines the #bytes of RAM to allocate to swzone structures - the actual
>> space allocated is vm.swzone).
>>
>> As a further piece of arcana, vm.pageout_oom_seq is a count that controls
>> the number of passes before the pageout daemon gives up and starts killing
>> processes when it can't free up enough RAM.  "out of swap space" messages
>> generally mean that this number is too low, rather than there being a
>> shortage of swap - particularly if your swap device is rather slow.
>>
>Thanks, Peter!

 A second round of thanks to Peter Jeremy for pointing out this sysctl
variable (vm.pageout_oom_seq), although thus far I have yet to see that it is
actually effective in working around the memory management bugs.  I have added
the following lines to /etc/sysctl.conf.

# Because FreeBSD 11.{2,3,4} tie up page frames unnecessarily, set value high
#vm.pageout_wakeup_thresh=14124 # Default value
vm.pageout_wakeup_thresh=112640 # 410 MB

Between the two changes, the pagedaemon *seems* to have stopped killing 
important
processes (or others) for now, which is a huge improvement and relief.  Too bad
FreeBSD needs the changes to be made to keep the system usable somewhat longer.
 My system has 8 GB of real memory.  The kernel apparently refuses to swap
in *any* process, even one as small as /bin/sh, when the free page frame list
has less that ~410 MB of page frames on it.  Setting the 
vm.pageout_wakeup_thresh
to at least 410 MB *seems* to help reduce the number of times a process that
has been marked as swapped out when the system has been under some form of
memory pressure, but it doesn't stop it from happening w