Re: The state of amdgpu on DragonFly

2022-06-13 Thread Matthew Dillon
Our GPU goals are mostly limited to modesetting, we just don't have the dev
resources to achieve solid accel support.  Of course, if someone wanted to
work on accel support that would be great!

-Matt


Hammer2 related fixes in-tree, plus more

2022-04-30 Thread Matthew Dillon
Both master and release now have fixes for a fairly serious but specialized
hammer2 bug, updates are recommended if you mount more than one PFS from
the same block devices, or if you regularly run things which generate a lot
of write activity ot the filesystem.

The hammer2 bug is in the bulkfree code.  When more than one PFS is mounted
from the same device (which is fairly atypical for most people), there is a
window during the bulkfree where blocks might be marked as completely free
which are actually not free.  If reallocated prior to the completion of the
bulkfree, filesystem corruption can be incurred.

The fix has been tested thoroughly, we basically ran an unpack and the grok
indexer on around 20 million files and periodically ran a bulkfree during
the operation over two days to ensure that the code fixes work properly.

--

A second bug in the kernel was also fixed, this one related to systems that
process many (typically in the millions) of files and directories that are
in relatively deep directory trees.   The vnode recycler could get into a
situation where it would not be able to make any progress cleaning up
inactive directory vnodes due to dangling namecache references cached by
the system.  A sufficient number of vnodes in this state could fill-up the
inactive vnode list and prevent the system from being able to recycle
vnodes and thus also prevent it from being able to allocate new vnodes.

--

We will roll a sub-release in a week or two.  The fixes are both in the
release tree and in master so anyone who wishes to update can do so from
system sources with a few simple commands as outlined in our upgrading page:

https://www.dragonflybsd.org/docs/handbook/Upgrading/

-Matt


Bug fix to ipfw in tree, bug fix to hammer2 bulkfree when many inodes are present

2022-03-12 Thread Matthew Dillon
There was a bug in ipfw related to adding IP addresses and networks to a
table.  When adding a mixed set of networks and hosts, where the hosts
occur after the networks, ipfw enters the hosts incorrectly by retaining
the network mask from the most recent network ip added on the same command
line:

ipfw table 1 add 10.0.0.0/8 192.0.2.1 # 2^24 + 1 addresses

A fix has been pushed to both -release and master.  If it affects you it
can be worked around either by upgrading, or by ensuring that only one IP
or network is added to the table for each table add command.

--

In other news, hammer2 bulkfree operations on filesystems with a very large
number of inodes (e.g. like a hundred million inodes) could exhaust kernel
memory allocating side-structures during deep recursions.   This bug has
finally been fixed!  Prior fixes only made some headway but did not
completely fix the bug.  The fix I just pushed to master fixes the issue
entirely.

The symptom is that the machine panics, typically during an overnight
automatic hammer2 bulkfree operation.  Or otherwise on lower-memory
machines, the machine locks up during a bulkfree operation.

This latest fix is currently only in master for testing and will be
cherry-picked to the release branch in a month or so.

-Matt


Re: The DFly website is down.

2022-01-07 Thread Matthew Dillon
Yah, sorry about that folks.  Down overnight on 2 successive days.  It
turned out to be a kernel memory exhaustion bug on the blade that is
routing the dfly network.  It also handles one of our backups and there was
a bug where the hammer2 bulkfree scan on the backups (around 64 million
inodes) ran out of kernel memory due to a depth deferral mechanism in the
recursive radix tree scan.  Should all be fixed now.   I'll have to start
thinking of ways to clean up H2 radix trees that get fragmented due to
constant file deletions and creations.

-Matt


Re: lookupdotdot failing again

2021-09-22 Thread Matthew Dillon
I'll go through the namecache changes between the two versions and see if I
can find something obvious.

-Matt


tmux dport broken, will be fixed asap

2021-09-01 Thread Matthew Dillon
The tmux dport is broken in master (cpu-bound loop on startup).  We have a
fix for it and will get an updated binary package in-place asap.

-Matt


Re: gcc8 deprecated but dependencies exist

2021-09-01 Thread Matthew Dillon
Ok.  Antonio will look into it.  We do override some of the GCC
specifications in the FreeBSD ports tree when translating them to DPorts.
You can ignore the warning for now (as long as the package works).  It
isn't quite as bad as the message says.

-Matt

On Tue, Aug 31, 2021 at 8:29 PM Phansi  wrote:

> Upgraded dragonfly setup recently. Got this message:
>
> -
> Message from gcc8-8.5.0_1:
>
> --
> ===>   NOTICE:
>
> This port is deprecated; you may wish to reconsider installing it:
>
> Unsupported by upstream. Use GCC 10 or newer instead..
> -
>
>
> Cannot seem to remove gcc8 as blas and lapack depend on it.
>
> Did remove all the three and then found that (re)installing blas/lapack
> still requires gcc8.
>
> Any suggestions? I do not use dports.
>
> --
> cheers
> phansi
>
>


Re: [Code Bounty] NVMM hypervisor landed in DragonFly

2021-08-12 Thread Matthew Dillon
Yes, what we decided to do for this (and probably all the bounties that get
completed... Tuxillo is vetting the VALGRIND work next) is that they will
be paid out of the DragonFly paypal account, and then the individual
contributors to the bounty can pay into the DFly paypal account at their
leisure.

-Matt


Re: lookupdotdot failing again

2021-07-07 Thread Matthew Dillon
Well, I'm a bit at a loss at the moment.  Try exporting the base filesystem
with NFS instead of exporting the nullfs mount.  See if that works more
reliably.

-Matt


Re: Hammer errors.

2021-07-01 Thread Matthew Dillon
Upgrade to 6.0 for sure, as it fixes at least one bug in HAMMER2, to
eliminate that possibility.   RAM is a possibility, though unlikely.  If
you are overclocking, turn off the overclocking.  An ovewrclocked CPU can
introduce corruption more easily than overclocked ram can.  And check the
dmesg for any NVME related errors.

-Matt


Re: Hammer errors.

2021-06-30 Thread Matthew Dillon
It looks like several different blocks failed a CRC test in your logs.  It
would make sense to try to track down exactly where.  If you want to dive
the filesystem meta-data you can dump it with full CRC tests using:

hammer2 -vv show /dev/serno/S59ANMFNB34055E-1.s1d  > (save to a file not on
the filesystem)

And then look for 'failed)' lines in the output and track the inodes back
to see which files are affected.  Its a bit round-about and you have to get
familiar with the meta-data format, but that gives the most comprehensive
results.   The output file is typically a few gigabytes (depends how big
the filesystem is).   For example, I wound up with a single data block
error in a mail file on one of my systems, easily rectified by copying-away
the file and then deleting it.  I usually dump the output to a file and
then run less on it, then search for failed crc checks.

  data.106 00051676000f 206a/16
   vol=0 mir=00149dc6
mod=02572acb lfcnt=0
   (xxhash64
32:b65e740a8f5ce753/799af250bfaf8651 failed)

A 'quick' way to try to locate problems is to use tar, something like
this.  However, tar exits when it encounters the first error so that won't
find everything, and if the problem is in a directory block that can
complicate matters.

tar --one-file-system -cvf /dev/null /

-Matt


Re: Thousands of "lookupdotdot failed" messages - should I be worried?

2021-06-19 Thread Matthew Dillon
Yah, keep monitoring it.  I kinda suspect what is happening is that one or
more processes on the clients are winding up CD'd into NFS directories that
then get renamed or deleted/replaced, or something like that.  But it could
easily also have been a hash collision in the file handle calculation the
server makes.

-Matt


Re: Thousands of "lookupdotdot failed" messages - should I be worried?

2021-06-16 Thread Matthew Dillon
Oh, I forgot to mention... this patch changes the FSID for null mounts, so
any clients should umount, then reboot the server, then clients can
remount.  or reboot the clients after rebooting the server.

-Matt


Re: Thousands of "lookupdotdot failed" messages - should I be worried?

2021-06-16 Thread Matthew Dillon
Ok, you could try this kernel patch and see if it helps.  If you know how
to rebuild the system from sources.  The client-side I/O errors would go
hand-in-hand with the dotdot stuff, but only because this is a NFS mount.
The server-side should not be having any I/O errors, and no filesystem data
is in error or corrupted or anything like that.

http://apollo.backplane.com/DFlyMisc/nullfs01.patch

Other questions I have are

(1) Jjust how many NULLFS filesystems
(2) Are you making multiple NFS mounts based on the same source path?
(3) And finally, what is the underlying filesystem type that the nullfs is
taking as its source?

What I try to do in this patch is construct a FSID based on the nullfs's
destination path instead of its source path, to try to reduce conflicts
there.  Another possible problem is that the nullfs's underlying filesystem
has a variable fsid due to not being a storage-based filesystem.  i.e. if
the underlying filesystem is a NFS or TMPFS filesystem, for example.

The only other thing that I can think of that could cause dotdot lookup
failures is if you rename a directory on one client and try to access it
from another, or rename a directory on the server and try to access it from
a client that had already cached it.  Directories are kinda finicky in NFS.

-Matt


Re: Thousands of "lookupdotdot failed" messages - should I be worried?

2021-06-16 Thread Matthew Dillon
I think this might be an issue with the filesystem id for the NULLFS
exports changing on reboot (or remount).  I thought I had figured out a
solution for that but apparently not.  I'll have to think about this a bit.

-Matt


Re: Thousands of "lookupdotdot failed" messages - should I be worried?

2021-06-16 Thread Matthew Dillon
If you are not seeing any actual I/O errors in the dmesg output, then there
is probably no issue with the filesystem.

The dotdot warnings might be some edge-case being caused by the null-mounts
(because the null-mount has a mount point, but its being mounted on top of
a sub-directory in an underlying filesystem).  If you can track down the
operation that is causing message, it might just wind up being a patch to
the kernel to get rid of the console warning for that particular case.  If
you can find a simple configuration that I can throw onto a test box to get
the same error, I can track down the issue and fix it.

-Matt


Re: Improving I/O efficiency and resilience

2021-06-16 Thread Matthew Dillon
There are a lot of potential failure points, its a long chain of software
and hardware from the application's behavior all the way down to the
storage device's behavior in a failure.  Failure paths tend to not be
well-tested.  Reliability guarantees are... kinda just a pile of nonsense
really, there are just too many moving parts so all systems in the world
rely on stability first (i.e. not crashing, not failing in the first
place).  Redundancy mechanisms improve matters up to a point but they also
introduce further complexities.

This should be readily apparent to everyone since nearly every service in
existence sees regular glitches.  Be it Google (GMail and Google Docs, for
example, glitch-out all the time), brokerage, bank, ATMs, whatever.
Fail-over subsystems can twist themselves into knots when just the wrong
sequence of events occurrs.  There is a limit to just how reliable one can
make something.

For an application, ultimately the best guarantee is to have an
application-specific remote log that can be replayed to restore corrupted
state.  That is, to not entirely rely on localized fail-over, storage, or
other redundancy mechanisms.  One then relies on the near impossibility of
the dedicated remote log machine crashing and burning at exactly the same
time the primary servers crash and burn.

For HAMMER2, well... our failure paths are not well tested.  Like with most
other filesystems.  Usually I/O failures are simulated for testing but
actual storage system failures can have different false-flag behaviors.
What HAMMER2 does is flush in two stages.  In the first stage it
asynchronously writes all dirty blocks except the volume header (block copy
on write filesystem so writing dirty blocks does not modify the
originals).  Then it waits for those asynchronous writes to complete.  Then
it issues a device flush.  And finally it writes out an updated volume
header.  Any system crash occurring prior to the writing out of the updated
volume header simply restores the filesystem to its pre-flush state upon
reboot because the old volume header is not directly or indirectly pointing
to any of the new blocks.

And for DFly, an async block write failure leaves the buffer marked dirty
so the filesystem data and meta-data state remains consistent on the live
system (even if it cannot be flushed).  This is a choice taken from a list
of bad choices, because leaving a block dirty means that dirty blocks can
build-up in ram until you run out of ram.  But it is better than the
alternative (presenting stale data to a filesystem and/or to an application
which then causes a chain-reaction of corruption on a running system).

But realistically, even the most sophisticated fault-tolerant systems hit
situations which require manual intervention.  There are just too many
moving parts in a modern system that depend on a multitude of behaviors
that are specified by standards but not necessarily followed at every
stage.  So, ultimately, the best protection remains having
application-level redundancy via a replayable remote log (verses kernel,
filesystem, or block-level redundancy).  Other forms of redundancy can
reduce error rates but cannot eliminate them, and ultimately reach a point
where new potential failure conditions introduced by the added
sophistication exceeds the failure conditions that are being protected
against.

Also, redundancies can introduce points of attack.  If you want to crater
the performance of a competitor through hacking, the redundancy subsystems
offer a tempting target.

Almost universally, even commercial systems rely on stability and the added
redundancies are only able to deal with a subset of 'common' problems on a
live system.  And then they fall-back to replaying logs to restore
otherwise unrecoverably corrupted state.

-Matt


Re: Help to configure automatic fan speed and cpu frequency on ThinkPad X260

2021-04-25 Thread Matthew Dillon
If it is GPU related we just might not have a working solution to the power
management.  You could try adjusting the Xorg configuration to use the
"modesetting" driver with acceleration disabled but it's a long-shot.  It
kinda feels like the GPU is defaulting to a consumption mode that is
forcing the fans to run.

-Matt


Re: crc errors and installation troubles

2021-01-15 Thread Matthew Dillon
Well, its clearly getting I/O errors trying to access the drive.  That
could point to several possibilities:

* Bad SATA port.  Try connecting the cable to a different SATA port on the
mobo

* Bad drive.  But you already changed the drive out, so maybe not a bad
drive.

* Bad power supply.  The power supply could be going wonky and causing the
drive to reset.

* Bad SATA cable.  Sometimes old SATA cables can't handle the higher
bandwidths of newer motherboards and drives.  I have piles of cables at
home and over the last 10 years I have had to throw half of them away
because they could only handle SATA I or SATA II speeds.

-Matt


Re: TeX Live 2021 status on DragonFlyBSD

2021-01-11 Thread Matthew Dillon
Thanks Nelson, this looks great!  I will look into generating a dragonfly
ports build (build- from sources) for it.

-Matt


Re: Booting issues

2021-01-11 Thread Matthew Dillon
Looks like the undo area got corrupted.  The filesystem will probably
refuse to mount.  You may have to recover the contents to a new filesystem
on another storage device using the 'hammer recover' utility.  Since this
is your root mount, you may be able to boot into the rescue ramdisk
(usually boot option 'r') to get a very basic shell and then work through
the problem from there.  Sometimes it is easier to boot a DFly image from a
usb stick and login as 'root' to work the problem from there.

-Matt


heads up - vkernels will be broken on master for a while

2021-01-06 Thread Matthew Dillon
It turns out that some of the pmap work last year is making MAP_VPAGETABLE
memory maps basically not work properly.  The pmap work was important and
it is undesirable to back any of it out, so what we will have to do instead
is take out MAP_VPAGETABLE support which will break vkernel support in
master until we can get hardware virtualization operational again.

This does not affect running dragonfly in a virtualized environment, it
just affects running the 'vkernel's under dragonfly.  I hope to bring
vkernel support back, but we will have to do it under the umbrella of
adding more reliable HVM (hardware virtualization) support to DragonFly.

-Matt


Re: Binary packages available for "staged" branch of DPorts?

2020-12-27 Thread Matthew Dillon
Antonio (tuxillo) and I build 'staged' regularly, but the staged repos are
always in a state of flux and I don't recommend depending on it.  That
said, I often have an external URL available for binary repo access to test
bulks, and at the moment it is operational and on staged:

http://apollo.backplane.com/Ripper/live_packages/

Your download bw from this site is going to be pretty horrible though as it
is in my home.  Maybe Antonio can set up a semi-official staged HTML link
from the colo.

Again, these links are always in flux, so don't depend on it :-)

Antonio believes he can get the next sync to master done by the end of the
year.  It will have the chromium fix.

-Matt


Re: ifconfig seg fault

2020-11-07 Thread Matthew Dillon
It sounds like the kernel and the world are out of sync with each other.
e.g. old kernel but new world, or new kernel but old world.

-Matt


Re: hammer2: ls reports "No such file or directory"

2020-10-12 Thread Matthew Dillon
Theoretically as long as the RAID system properly handles the FLUSH
command, it should be ok to run it with a volatile write cache.  But if it
just accepts the FLUSH command without flushing the write cache, then bad
things can happen.

That error is a CRC failure.  Sorry, I wasn't clear before.  CHECK FAIL is
typically a CRC failure.  The CRCs don't match.  In terms of what you can
do... usually its best to backup and reformat.  In this case it looks like
a single inode #49686 is messed up.  You can try destroying that inode with
'hammer2 -s  destroy-inum 49686', then re-run the bulkfree and see
if there are any more issues.

Generally speaking, if the CHECK FAIL is at an inode you can 'hammer2
destroy-inode ...' the inode and then 'hammer2 destroy ...' any directory
entries that were pointing to that inode, but if it is at an indirect block
then the filesystem is probably really messed up and the only real choice
is to backup, reformat, and restore.

Those hammer2 directives are extremely dangerous, I recommend making a full
backup before messing around any further.

-Matt


Re: hammer2: ls reports "No such file or directory"

2020-10-12 Thread Matthew Dillon
Generally speaking this error occurs if a directory entry is present but
the related inode cannot be found.  You can use a hammer2 directive to
destroy the directory entry to clean it up.  But before you do so you want
to check the media for CHECK FAIL errors.

The easiest way to do this is to just read off the entire directory
structure with tar, e.g. 'tar cf /dev/null filesystem' and then check the
dmesg output for errors.  'dmesg | fgrep CHECK'.  Something like that.

If the filesystem appears clean other than the disconnected directory
entry, then you can use 'hammer2 destroy filename' to destroy the directory
entry.  Be very careful when doing that.

If the filesystem has other problems, such as CRC errors, other CHECK
errors, etc then it is best to make a full backup and reformat.

Also make sure that bulkfree runs don't have errors.  'hammer2 bulkfree
...' and then check dmesg output as well.

--

In terms of how a disconnected inode can happen.  It has become more rare
but it might still be possible if a power failure or panic occurs during
heavy filesystem activity.  It shouldn't be possible for CRC errors to
occur unless the media itself corrupted the data.

-Matt


security update to ftpd now in master and release.

2020-09-16 Thread Matthew Dillon
ftpd has received a fairly significant security fix and updating it to the
latest version on the master or release branch is recommended if you use.

That said, nobody should really be using either ftpd or telnetd any more
these days, neither is turned on by default, and we are contemplating
removing both from base entirely.

-Matt


Re:

2020-09-15 Thread Matthew Dillon
I'm fairly sure whatever is causing it is related to the VM and not the
filesystem.  But what, I don't know.  Maybe something related to the
storage device implementation on the VM.

-Matt


Re:

2020-09-14 Thread Matthew Dillon
It looks like an inode became corrupt.  How this happened I don't know, but
probably the easiest solution in this particular case is to use the
'hammer2' utility to destroy the inode and to delete the bad directory
entry (man hammer2), then run a few hammer2 bulkfree passes and see if that
cleans it up.

Usually when a H2 filesystem becomes corrupt it points to some other issue
in the system.  This being a VM, there could be any number of potential
issues causing the corruption but I don't have any ideas as to what it was
in this case.  It is usually best to copy the data off and reformat after
such events.

-Matt


vi or X updates utilize screen switching - how to disable

2020-09-10 Thread Matthew Dillon
Recent ports merges or the recent vi update (not sure which) turned on
screen switching by default.  this is where you edit a file with vi (or
other editors) and when you quit the xterm's contents goes back to what it
was before the vi was run.

If you don't like this operation (and frankly, I really really dislike it
myself), the simple solution is to add an X resource for xterm.  Create a
.Xresources file with this in it:

xterm*titeInhibit:  true

And install it with xrdb in your .xinitrc:

xrdb -merge .Xresources

I'm not sure about other X terminal apps but I presume they have a similar
feature.

-Matt


Re: git: calendar(1): Rewrite to support Chinese & Julian calendars

2020-09-07 Thread Matthew Dillon
Wow, those are pretty serious calculations!  Very cool!  A really nice
update to the utility.

-Matt


Re: may I suggest two new symlinks: newfs_ufs for newfs and newfs_hammer1 for newfs_hammer ?

2020-08-12 Thread Matthew Dillon
For now I'd rather not.  It would certainly be less ambiguous but there is
a lot of history there and, again, potential backwards portability issues.

One thing that would be nice would be for 'newfs' to look at the filesystem
type in the disklabel for the specified device (if possible) and exec the
correct newfs_* program based on that type.  Another big problem, though,
is that the options and other arguments are different for each newfs_*
variant.

-Matt


Re: State of IPSEC

2020-06-23 Thread Matthew Dillon
IPSEC is gone.  Most people use explicit VPNs these days (openvpn works
with DragonFly quite well, for example).

-Matt


Re: VPN options

2020-06-23 Thread Matthew Dillon
I personally use openvpn to good effect.  Getting the keys set up is rather
annoying and it takes a bit of time to get the network configuration
working properly, but once it is operational it works perfectly.

I usually pre-create a tap interface and configure openvpn to use that.

-Matt


Re: pf table size limit

2020-06-23 Thread Matthew Dillon
There is a table-entries limit specified, you can see current settings with
'pfctl -s all'.  You can adjust the limits in the /etc/pf.conf file
containing the rules with a line like this near the top:

set limit table-entries 10

-Matt


Re: Wrecked kernel

2020-06-18 Thread Matthew Dillon
It really sounds like a lot of random stuff is missing and you should
reinstall.  If the machine's hardware is not stable then that is the main
issue that needs to be addressed.

-Matt

On Thu, Jun 18, 2020 at 5:06 PM Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> Static libraries disappear for no reason
> ldconfig -i/usr/local/lib recovers them, though it must be run repeatedly.
> After that I wrenched X in semi working order, without a fully functioning
> window manager. Screen and mouse check!
> It is enough work to make me reinstall.
> How is the 5.8.1 installation process, honestly?
> Jonathan Engwall
>
> On Thu, Jun 18, 2020, 12:00 PM  wrote:
>
>> Send Users mailing list submissions to
>> users@dragonflybsd.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.dragonflybsd.org/mailman/listinfo/users
>> or, via email, send a message with subject or body 'help' to
>> users-requ...@dragonflybsd.org
>>
>> You can reach the person managing the list at
>> users-ow...@dragonflybsd.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Users digest..."
>>
>>
>> Today's Topics:
>>
>>1. Wrecked kernel (Jonathan Engwall)
>>2. RE: Wrecked kernel (Jonathan Engwall)
>>
>>
>> --
>>
>> Message: 1
>> Date: Wed, 17 Jun 2020 16:31:33 -0700
>> From: Jonathan Engwall 
>> To: DragonFlyBSD Users 
>> Subject: Wrecked kernel
>> Message-ID:
>> > t6zm8n6gemgo29ssl7yksnzb_2qma4rckcdntgmey3ut...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> New problems have begun.
>> In the build world of 5_8_1 I am stuck at something that looks like this,
>> if you can forgive my cellphone-only state:
>> --- libgreputils.a---
>> building static greputils library
>> rm -f libgreputils.a
>> ar can libgreputils.a 'lorder argmatch.o c-strcasecmp.o ...
>>
>> Many .o files follow after that specific line ends, I have this:
>>
>> /usr/libexec/binutils227/elf/nm: 'a file or another.o' : No such file
>>
>> I did check, nm is an executable. Currently I cannot make symlinks either.
>> What might be my trouble?
>>
>> How can I figure out what is going on?
>>
>> Jonathan Engwall
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: <
>> http://lists.dragonflybsd.org/pipermail/users/attachments/20200617/783928e7/attachment-0001.htm
>> >
>>
>> --
>>
>> Message: 2
>> Date: Wed, 17 Jun 2020 19:07:07 -0700
>> From: Jonathan Engwall 
>> To: DragonFlyBSD Users 
>> Subject: RE: Wrecked kernel
>> Message-ID:
>> > t6zp+0kjkr4qzv+ghz3kadcqmp3xq2t8yspu9_mdyw2x...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> After replacing files from my partial backup X11 is acting strangely.
>>
>> Cannot open log file "/root/.local/share/xorg/Xorg.0.log"
>>
>> The log file is alive and well where it has always been in
>> /var/log/Xorg.0.log
>>
>> There is mention of this on the internet, mostly on ArchLinux, nothing
>> useful however.
>>
>> Another problem, if I compile kernel 5.8 can I get updates?
>>
>> Or is it so that I need to buildworld...because that was looking bad...but
>> if I could get updates maybe it would help.
>>
>> Thank you anyone at all for any help
>>
>> Jonathan Engwall
>>
>> On Mon, Jun 15, 2020, 12:00 PM  wrote:
>>
>> > Send Users mailing list submissions to
>> > users@dragonflybsd.org
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > http://lists.dragonflybsd.org/mailman/listinfo/users
>> > or, via email, send a message with subject or body 'help' to
>> > users-requ...@dragonflybsd.org
>> >
>> > You can reach the person managing the list at
>> > users-ow...@dragonflybsd.org
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of Users digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >1. Re: Damaged kernel (Jonathan Engwall)
>> >2. Re: Damaged kernel (Jonathan Engwall)
>> >3. Finding visually similiar images (Tim Darby)
>>

Re: Damaged kernel

2020-06-14 Thread Matthew Dillon
Not enough info to really help you here.   Try going into the BIOS (usually
the F2 or DEL key in early boot) and check the boot order, it might be
trying to boot from internal storage first and USB second and needs to be
switched around.

-Matt


Re: dscheck(vn0): b_bcount 2 is not on a sector boundary

2020-06-11 Thread Matthew Dillon
Generally speaking these messages look like one or more of these window
managers are trying to probe all raw storage devices with sector sizes that
are not compatible with the devices.  I'm guessing they are doing so in
order to try to implement e.g. auto-mount or something similar (detecting
usb stick insertions, etc).  The error messages themselves can just be
ignored.  Whether the desktops do the right thing or not is another
question entirely.

-Matt


Re: jail questions

2020-06-03 Thread Matthew Dillon
We did a bunch of work on localhost bindings for jails.  Basically it works
as follows:

* You can specify a list of IPs for the jail (more than one if you desire)

* Any 'localhost' binding within the jail will automatically be translated
to the localhost IP specified when the jail was created, or the host IP if
no localhost IP was specified.

* You can, if you desire, isolate the jail's localhost by passing something
other than 127.0.0.1 to the jail.  You can add 127.0.0.2 to the localhost
interface as an alias for example and pass 127.0.0.2 into the jail.  Any
bindings to 127.0.0.1 within the jail will actually bind to 127.0.0.2 from
the point of view of the host outside the jail.

There are also a number of jail sysctl variables which effect how network
addresses are handled.
-Matt


Re: UNIX signals on DragonFly

2020-05-26 Thread Matthew Dillon
We can add defines but someone would need to research it.  Linux does have
a RTSIG_MAX (set to 32) in  which we do not appear to
have.  There is also a __SIGRTMAX define in linux (set to 64).  I don't see
anything for a RTSIG_MIN, however.

-Matt


Re: Upgrade from 5.5 failed

2020-05-12 Thread Matthew Dillon
If you installed the world and kernel properly then a 'uname -a' should
show the dragonfly version.  If it looks right, you can try forcing a pkg
update with:

pkg update -f
pkg upgrade -f

-Matt


Re: Upgrade from 5.5 failed

2020-05-12 Thread Matthew Dillon
The most recent package sync was May 2nd, which includes chromium version
81 for both -release and -master.

-Matt


Re: HAMMER2 PFSes

2020-04-23 Thread Matthew Dillon
For now don't try to cluster anything.  That work is still in-progress.
 You can create multiple independent masters on the same device, and you
can snapshot them, as well as be able to write to the snapshots.  That all
works.  The snapshots basically work the same as masters.

-Matt


Re: New "chromium" does not sync bookmarks or any data

2020-04-13 Thread Matthew Dillon
It should just be the "~/.config/chromium/Default/Bookmarks" file.  I don't
think its changed format.  See if you can find your old bookmarks file.

-Matt


Re: TeX Live 2020 status on DragonFlyBSD

2020-03-08 Thread Matthew Dillon
We usually pull TeX via FreeBSD ports.  Looks like there is an open PR in
FreeBSD ports to update it.  We will poke them a bit on the side channel
from our side to see if we can get the port updated!

-Matt


Re: mdconfig

2020-02-28 Thread Matthew Dillon
We don't have any all-in-one utilities to manage jails.  Just the basic
jls, jexec, and jail utilities.The jail(8) manual page should be mostly
up-to-date with recent changes to improve options management.

MD is very, very old.  Ancient, really.  It was removed ages ago because it
was old, crufty, and didn't work as well as VN or TMPFS.  It basically
formatted a UFS filesystem backed by ram.  You can use the VN device if you
need a block device backed by ram or swap, or the TMPFS filesystem if you
just need a ram and swap backed temporary filesystem.   We can't bring MD
back, sorry.

-Matt


Re: Byhve

2020-02-28 Thread Matthew Dillon
preferred-stack-boundary is an option that specifies what the compiler
should align procedure stacks to, as a power of 2.  So the value 4 means to
align procedure stacks to 16 bytes.  You can probably remove this option,
modern compilers should align elements properly by default.  Older
compilers did not.

All of the floating point options such as --mno-sse5' generally tell the
compiler not to use FP instructions or registers in generated kernel code.
This is because the kernel does not fully save the user floating point
state across system calls or interrupts.  These options or the equivalent
for clang are mandatory.

I don't know what the -mno-abm options does.

The indirect-branch mode... ummm. I think that turns on retpoline security
mitigations for indirect calls, but I'm not sure.

The inline-limit option tells the compiler to allow larger inline functions
to be inlined.  You can probably remove this.

-Matt


Re: mdconfig

2020-02-25 Thread Matthew Dillon
I don't think the FreeBSD tools are going to work, per-say.   For any ram
drives, you should use TMPFS instead of MD.  TMPFS is heavily optimized for
performance.  We only use MD for boot-time strapping of crypto mounts.
It's very old and should not be used for anything else.

-Matt

On Mon, Feb 24, 2020 at 12:31 PM Quelrond  wrote:

> Hello,
>
> I am new in DragonFly, coming from FreeBSD world.
>
> Trying to install FreeBSD tool for jails management CBSD
> (https://github.com/cbsd/cbsd), on DragonFly, I stopped by a problem of
> mdconfig absent in my DragonFly installation:
>
> DragonFly drugoj.reseaucloud.local 5.6-RELEASE DragonFly v5.6.2-RELEASE
>
> It seems that md kernel module is loaded. Where is mdconfig?
>
> Best regards,
>
> Peter
>
>
>


Re: Byhve

2020-02-22 Thread Matthew Dillon
Not sure what you are trying to do but porting byhve is definitely not an
easy undertaking!

-Matt

On Fri, Feb 21, 2020 at 9:55 PM Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> I have the headers, folders and vmm.c...but I was thinking if I write over
> written memory with a foreign header it would be bad.
> And, of course I have to VM to test build a kernel module.
> The headers came from a clone of FreeBSD I made from GitHub. Vmm.c I
> simply found on the internet. One is missing vmm_ippi.h which I cannot find.
> Any suggestions or offers of help?
>


Re: DragonFlyBSD 5.6.2 installation failures

2020-01-31 Thread Matthew Dillon
Hmm.  We changed from hardwiring /dev/da8 to the part-by-label
specification to try to avoid situations where the disk configuration
didn't match a basic setup.  if the kernel boot is recognizing and
attaching the usb stick then it should work.   Perhaps the problem is that
it is having problems recognizing the USB port the stick is plugged into.
You could try plugging it into different ports.

I would definitely stick with the .img (usb image) files, and AHCI mode.  A
normal dd without special options usually works for copying (I usually do
bs=32k to improve copy performance).  It might also be worth trying to
install the latest master image rather than the release image.

If it recognizes the drive but is having problems with the uuid labeling,
you can mount the usb image on another dfly box (usually something like
mount /dev/da8s2a /mnt) and mess around with /mnt/boot/loader.conf and
/mnt/etc/fstab, then umount and try booting with that image.

-Matt


Re: HAMMER2 questions

2020-01-28 Thread Matthew Dillon
On Mon, Jan 27, 2020 at 11:53 PM Chuck Musser  wrote:

> I've run Dfly for several years now, but never paused to understand
> HAMMER2 (or for that matter, it's predecessor) very deeply. I've used
> undo(1) a few times, but haven't dug into concepts like clustering or
> even what exactly PFSes are used for. I have a feeling I'm missing out
> on interesting things. So I'm finally getting around to asking a
> scattershot bunch of questions. I'm using my simple one-disk setup as a
> starting point. Its df -h output is below:
>
> Filesystem Size   Used  Avail Capacity  Mounted on
> serno/5QE57G6D.s1d 193G  1054M   192G 1%/
> devfs 1024B  1024B 0B   100%/dev
> /dev/serno/5QE57G6D.s1a   1022M   187M   753M20%/boot
> /dev/serno/5QE57G6D.s1e@DATA  18.5G   123M  18.4G 1%/build
> /build/usr.obj18.5G   123M  18.4G 1%/usr/obj
> /build/var.crash  18.5G   123M  18.4G 1%/var/crash
> /build/var.cache  18.5G   123M  18.4G 1%/var/cache
> /build/var.spool  18.5G   123M  18.4G 1%/var/spool
> /build/var.log18.5G   123M  18.4G 1%/var/log
> /build/var.tmp18.5G   123M  18.4G 1%/var/tmp
> tmpfs  935M 0B   935M 0%/tmp
> procfs4096B  4096B 0B   100%/proc
> tmpfs  935M 0B   935M 0%/var/run/shm
>
> 1. What are are the three entries that begin with "serno" and why is the
> root slice not prefixed with /dev/ like the other two? What does
> the nomenclature mean, specifically "5QE57G6D", "s1a" and "@DATA", for
> instance).
>

All attached storage devices have a software-readable serial number and can
be addressed by that serial number instead of by the raw device.  This
allows the raw device numbers to probe in random order or for the drive to
be moved without having to make adjustments to /etc/fstab.


> 2. Are the subdirectories under /build each their own PFS? What do those
> do and why are they there? As an administrator, when would I want to
> create my own PFSes? I saw someone's example of creating a set of jails
> on a dedicated PFS, but I don't understand the implications of doing
> that.
>

The installer used to create a PFS for each major subdirectory but no
longer does.  Instead it just separates everything into three partitions.
The 'a' partition is a small UFS filesystem containing /boot.  We use UFS
for /boot because its tiny and deletions immediately free up media space.
The 'd' partition contains the root mount and anything that one would
normally want to backup.  For example, /home is on 'd'.   The 'e' partition
contains the /build mount and has all the stuff one would normally NOT want
to backup, such as /var/crash, /usr/obj, /var/log, and so forth.  I put
/var/spool in there too but its debatable.  Mostly I put /var/spool in
there too because it might see significant activity and I wanted the root
mount to be as unassuming as possible in crash situations.


> 3. What are null mounts? It seems like that's how you mount PFSes, at
> least judging by the output of the mount(1) command.
>

A NULL mount mounts one directory that already exists in the filesystem
onto another part of the filesystem.  Both places reference the same
directory.  Also note that these NULL mounts are not PFSs.  The installer
no longer creates a separate PFS for each directory, it just creates three
partitions ('a' for boot, 'd' for root, and 'e' for /build) and then uses
NULL mounts to arrange the subdirectories that we want to physically be on
/build.

4. What are clusters, in HAMMER2? Can I use them to create a big
> filesystem from storage on separate systems?
>

Clusters are not implemented yet, but that's the idea.


> 5. I think I recall hearing that the remote mirroring feature in HAMMER
> is not yet supported in HAMMER2. Is that status still accurate?
>

This is correct.  My recommendation for now is to just use a regular rsync
or cpdup, and then use hammer2's snapshot feature to create a frozen
snapshot of the backup.  Ultimately mirroring will be built into the
clustering.  Not yet though.


> 6. Although I have a feeling that RAID configuration is a lower level
> construct than HAMMER, what are the options? I did see that the
> natacontrol(8) is able to manage RAIDs, but If I happened to get my
> hands on NVMe storage, those might call for a different utility? There
> is a nvmectl, but it seems like it might be for a different purpose.
>
> Thanks,
>
> Chuck
>

Honestly I don't recommend using NATA or natacontrol in any manner.  The
ATA code is extremely crufty, old, and just not reliable.  Use AHCI mode
(which attaches as /dev/da0, /dev/da1, etc...).  The AHCI driver is fully
supported and reliable.  If the BIOS does not support an AHCI-based
softraid then your best raid option with DragonFly is some sort of 

Re: .note.tag, readelf and dsynth version detection

2020-01-28 Thread Matthew Dillon
I pushed a hack to dsynth to use the second field if the first is zero,
please test.

-Matt

On Mon, Jan 27, 2020 at 5:05 PM Matthew Dillon  wrote:

> Hmm.  that's a good point.  It does look like a repeated structure.  I
> will look into it.
>
> -Matt
>
> On Mon, Jan 27, 2020 at 4:15 PM Romick  wrote:
>
>> I was probably lucky :) Of course, I could be wrong, but it seems to me
>> that these pieces are not "fields" of the same structure, these are fields
>> that belong to two records in the file. Is the order of these records
>> guaranteed?
>>
>> I mean, it’s possible if I rebuild the world now, the linker will arrange
>> these records in a different order and everything will be fine, or maybe
>> not :)
>>
>> On Mon, Jan 27, 2020 at 03:56:25PM -0800, Matthew Dillon wrote:
>> > That's ... weird.  the 'zero' and the 'version' fields are transposed.
>> Are you
>> > compiling in any special way?   I've tested -release and -master on a
>> bunch of
>> > boxes and they all have the version in the right spot.
>> >
>> > -Matt
>> >
>> > On Mon, Jan 27, 2020 at 1:45 PM Romick 
>> wrote:
>> >
>> > Hello,
>> > It seems that dsynth defines the system version based on the
>> .note.tag(s)
>> > in
>> > /bin/sh and a necessary condition is that these entries follow in a
>> > certain order.  On my system this is not so :)
>> >
>> > ==
>> > rabbit@fly ~% readelf -x .note.tag /bin/sh
>> >
>> > Hex dump of section '.note.tag':
>> >   0x00400218 0a00 0400 2000 44726167  ...Drag
>> >   0x00400228 6f6e466c 7900  0a00 onFly...
>> >   0x00400238 0400 0100 44726167 6f6e466c DragonFl
>> >   0x00400248 7900 e5a30700   y...
>> >
>> > rabbit@fly ~%
>> > ==
>> >
>> > === /usr/src/usr.bin/dsynth/config.c ===
>> > struct NoteTag {
>> > Elf_Note note;
>> > char osname1[12];
>> > int version;/* e.g. 500702 -> 5.7 */
>> > int x1;
>> > int x2;
>> > int x3;
>> > char osname2[12];
>> > int zero;
>> > };
>> > 
>> >
>> > --
>> >   with best regards,
>> >   Yellow Rabbit @yrab...@mastodon.sdf.org
>> >   DragonFly 5.7-DEVELOPMENT x86_64
>> >
>>
>> --
>>   with best regards,
>>   Yellow Rabbit @yrab...@mastodon.sdf.org
>>   DragonFly 5.7-DEVELOPMENT x86_64
>>
>


Re: .note.tag, readelf and dsynth version detection

2020-01-27 Thread Matthew Dillon
Hmm.  that's a good point.  It does look like a repeated structure.  I will
look into it.

-Matt

On Mon, Jan 27, 2020 at 4:15 PM Romick  wrote:

> I was probably lucky :) Of course, I could be wrong, but it seems to me
> that these pieces are not "fields" of the same structure, these are fields
> that belong to two records in the file. Is the order of these records
> guaranteed?
>
> I mean, it’s possible if I rebuild the world now, the linker will arrange
> these records in a different order and everything will be fine, or maybe
> not :)
>
> On Mon, Jan 27, 2020 at 03:56:25PM -0800, Matthew Dillon wrote:
> > That's ... weird.  the 'zero' and the 'version' fields are transposed.
> Are you
> > compiling in any special way?   I've tested -release and -master on a
> bunch of
> > boxes and they all have the version in the right spot.
> >
> > -Matt
> >
> > On Mon, Jan 27, 2020 at 1:45 PM Romick 
> wrote:
> >
> > Hello,
> > It seems that dsynth defines the system version based on the
> .note.tag(s)
> > in
> > /bin/sh and a necessary condition is that these entries follow in a
> > certain order.  On my system this is not so :)
> >
> > ==
> > rabbit@fly ~% readelf -x .note.tag /bin/sh
> >
> > Hex dump of section '.note.tag':
> >   0x00400218 0a00 0400 2000 44726167  ...Drag
> >   0x00400228 6f6e466c 7900  0a00 onFly...
> >   0x00400238 0400 0100 44726167 6f6e466c DragonFl
> >   0x00400248 7900 e5a30700   y...
> >
> > rabbit@fly ~%
> > ==
> >
> > === /usr/src/usr.bin/dsynth/config.c ===
> > struct NoteTag {
> > Elf_Note note;
> > char osname1[12];
> > int version;/* e.g. 500702 -> 5.7 */
> > int x1;
> > int x2;
> > int x3;
> > char osname2[12];
> > int zero;
> > };
> > 
> >
> > --
> >   with best regards,
> >   Yellow Rabbit @yrab...@mastodon.sdf.org
> >   DragonFly 5.7-DEVELOPMENT x86_64
> >
>
> --
>   with best regards,
>   Yellow Rabbit @yrab...@mastodon.sdf.org
>   DragonFly 5.7-DEVELOPMENT x86_64
>


Re: .note.tag, readelf and dsynth version detection

2020-01-27 Thread Matthew Dillon
That's ... weird.  the 'zero' and the 'version' fields are transposed.  Are
you compiling in any special way?   I've tested -release and -master on a
bunch of boxes and they all have the version in the right spot.

-Matt

On Mon, Jan 27, 2020 at 1:45 PM Romick  wrote:

> Hello,
> It seems that dsynth defines the system version based on the .note.tag(s)
> in
> /bin/sh and a necessary condition is that these entries follow in a
> certain order.  On my system this is not so :)
>
> ==
> rabbit@fly ~% readelf -x .note.tag /bin/sh
>
> Hex dump of section '.note.tag':
>   0x00400218 0a00 0400 2000 44726167  ...Drag
>   0x00400228 6f6e466c 7900  0a00 onFly...
>   0x00400238 0400 0100 44726167 6f6e466c DragonFl
>   0x00400248 7900 e5a30700   y...
>
> rabbit@fly ~%
> ==
>
> === /usr/src/usr.bin/dsynth/config.c ===
> struct NoteTag {
> Elf_Note note;
> char osname1[12];
> int version;/* e.g. 500702 -> 5.7 */
> int x1;
> int x2;
> int x3;
> char osname2[12];
> int zero;
> };
> 
>
> --
>   with best regards,
>   Yellow Rabbit @yrab...@mastodon.sdf.org
>   DragonFly 5.7-DEVELOPMENT x86_64
>


Re: OT: third party relay attack

2020-01-10 Thread Matthew Dillon
I last looked at it a few years ago but there were numerous DNS based
services that you could use to test IP addresses and domains.  But they
never worked well... they tended to block a lot of legitimate mail along
with the spam, and tended to always be out of date.

You can also turn on SPF validation, which actually helps protect against
third-party relays (at least for well known domains).  There might be some
setup required though, it depends on the mail server.  You can google up
instructions for e.g. turning SPF on with postfix (I think it requires a
few perl modules).  Instructions won't be accurate for DFly but they will
give you a good template for what needs to be done.

But if you do that, be sure to test that the mail server is still accepting
mail from important domains that you communicate with.  It's really easy to
misconfigure.

-Matt


Re: OT: third party relay attack

2020-01-10 Thread Matthew Dillon
There isn't a whole lot that can be done short of white-listing only
allowed originators and recipients.  Most anti-spam services filter out
critical non-spam emails along with the spam.

What I do for my personal domain is actually forward all my mail, spam and
all, to my gmail account and let Google's spam filters deal with it (to the
tune of hundreds of spams a day).   And for DragonFlyBSD's domain... we've
mostly not using it for email beyond the mailing list server and the
mailing list server is essentially white-listed based on the subscriptions.

-Matt

On Fri, Jan 10, 2020 at 7:36 AM Steffen Nurpmeso  wrote:

> Pierre Abbat wrote in <3633605.BztBv1gPr2@puma>:
>  |My mailserver is being attacked by what looks like a botnet since
> December \
>  |16
>  |at 6:07 (11:07 UTC). Many hosts all over the world are sending mail \
>  |purporting
>  |to be from many domains all over the world to a few domains in Russia. \
>  |Most of
>  |the IP addresses are blocked by uceprotect.net; a few are blocked by
> other
>  |blocklists. A few are not blocked, but are rejected with "Relay access
>  |denied". The messages come at a rate of several per second.
>  |
>  |There are 133 emails stuck in leaf's mail queue, but they do not appear \
>  |to be
>  |related to this attack.
>
> Fwiw, not being an administrator and having had no idea of that
> side of the road, i learned to let connections "sleep" for
> a while.  This is possible with Postfix, for example.  First i let
> them hang, before blacklist lookups.  It reduced those attacks
> a little bit.  E.g.,
>
>   smtpd_relay_restrictions =
>   sleep NUMBER,
>   reject_invalid_helo_hostname,
>   reject_non_fqdn_helo_hostname,
>   reject_non_fqdn_sender,
>   reject_non_fqdn_recipient,
>   sleep NUMBER
>
> You can set restrictive error counts
>
>   smtpd_soft_error_limit = 1
>   smtpd_hard_error_limit = 1
>   smtpd_per_record_deadline = yes
>   smtpd_timeout = 21s
>
> This i did after i have switched to OpenSMTPD for one day.  Like
> magic, a few hours after i did, there was one connection, it did
> nothing for a few seconds, followed by another one, and then these
> two started sending mails like grazy to Taiwenese Yahoo addresses
> i think it was.  They then entered a wave of disconnections and
> reconnections with other addresses which continued this work.  (My
> firewall throttles over time.)  Well, i got a nice information
> mail from Yahoo Taiwan i think it was saying that they blocked my
> IP temporarily because of the activity.  Blocking had no influence
> on the attack itself.  Realizing the OpenSMTPD config error
> i fixed that, but their misuse continued, and OpenSMPTD did not
> seem to have something like Postfix's _error_limit (my query on
> OpenSMTPD bugs/tracker never received an answer), so after
> continuously blacklisting the bots' IP addresses i threw away
> OpenSMTPD and reinstalled Postfix, with the error_limit reduced
> from 3 to 1.  Attack over.
>
> Having said that, it would be tremendous if servers like Postfix,
> dovecot, ssh, would offer hooks which would get invoked on
> connection establishment and break, to be able to track
> un/successful logins as well as "nonsense connections" etc. so
> that the entire [di]notify/log file parse sauce could vanish.
> Always strived me being total nonsense that log files are parsed
> to collect the info that servers had at hand.  Christos Zoulas of
> NetBSD implemented the blacklistd with patches for i think at
> least Postfix and ssh, this does implement that for logins at
> least.  FreeBSD imported that.
>
> Of course all that does not help against firewall rules aka tables
> filling with lots of addresses to be blocked.  I have some general
> rate limiting, but sometimes this bites real connectivity, for
> example if people merge their readily prepared git topic branches
> into mainline repositories, and dozen of messages from the same
> server fly in.  I have no idea on what to do against these two
> problems.
>
> --steffen
> |
> |Der Kragenbaer,The moon bear,
> |der holt sich munter   he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>


Re: Failed make buildworld on 5.6.2

2019-12-09 Thread Matthew Dillon
Check the BIOS settings for the memory and, if possible, run the memory at
a lower frequency or pump the voltage up slightly from stock, and you may
be able to temporarily work around failing memory.

You can also try re-seating the memory, it could be dust.  Try this first,
actually.

-Matt


Re: One ssh hangs on exit, the other doesn't

2019-12-05 Thread Matthew Dillon
ssh won't exit unless all virtual links, such as the X forwarding, are also
gone.  I think there are some bugs related to that (I'm pretty sure it
isn't dragonfly specific).  So running with X forwarding can cause the
situation you describe.

-Matt

On Thu, Dec 5, 2019 at 12:15 PM Steffen Nurpmeso  wrote:

> Pierre Abbat wrote in <12216388.yjY6BSWRI4@mooncat>:
>  |On Thursday, 5 December 2019 03.48.18 EST Harald Brinkhof wrote:
>  |> Maybe some background processes keep running and prevent the X program
> \
>  |> from
>  |> sitting down?
>  |>
>  |> http://www.snailbook.com/faq/background-jobs.auto.html
>  |
>  |The examples at the top of that page don't make sense. If I start an \
>  |xterm in
>  |an ssh session, and log out of the ssh session with the xterm still \
>  |running, I
>  |expect ssh to keep running until I close the xterm. When I ssh into \
>  |leopard,
>  |my local mail server, and forward the mail ports, the ssh session hangs
> on
>  |exit, because kmail still has the mail ports open.
>  |
>  |When I shelled into zyxomma, I exited each program before logging out \
>  |of ssh.
>  |I don't understand why one ssh session closed and the other hung.
>
> Does it still hang when ControlMaster exits?
> I have seen this on Linux->Linux (same host even) with
> X forwarding in muxer sessions, but it went away as rolling
> updates flew by.  (At the moment i have
>
>   Load key "/home/steffen/local_ed25519.pub": invalid format
>
> that that makes no sense :})
>
>  |Pierre
>  |--
>  |loi mintu se ckaji danlu cu jmaji
>  --End of <12216388.yjY6BSWRI4@mooncat>
>
> --steffen
> |
> |Der Kragenbaer,The moon bear,
> |der holt sich munter   he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>


Re: Hang in tcdrain(3) after write(3)

2019-11-28 Thread Matthew Dillon
If modem control is turned on you have to go into clocal mode to drain
commands, otherwise its waiting for a carrier.  That's one possibility.

-Matt

On Mon, Nov 25, 2019 at 3:14 PM Jeffrey Walton  wrote:

> Hi Everyone,
>
> I'm testing some software on DragonFly. There's not much to it. It
> talks to the modem, and sends an ATZ and then reads the response.
> Linux, FreeBSD, NetBSD, OpenBSD and OS X are OK.
>
> My test rig is DragonFly 5.6-RELEASE x86_64 (fully patched) with a
> USR5637 modem, https://www.amazon.com/gp/product/B0013FDLM0. The modem
> is located at /dev/cuaU0.
>
> DragonFly hangs on the call to tcdrain(3). Looking at the man page I
> don't see any special handling. Cf.,
> http://man.dragonflybsd.org/?command=tcdrain.
>
> Attached is the reproducer. The trace is:
>
> % ./test.exe
> Setting TIOCEXCL
> Getting tty
> Setting tty options
> Flushing tty
> Setting tty
> Writing ATZ
> Waiting for write  <<-- call to tcdrain(fd)
>
> Thanks.
>


master users - heads up, full buildworld/buildkernel needed for recent commits

2019-11-13 Thread Matthew Dillon
Everyone using master needs to be sure to do a full buildworld,
buildkernel, installkernel, installworld sequence due to the addition of a
new system call.  Piecemeal compilation could result in a non-working
system due to libc and pthreads linking for the new symbols.

-Matt


Signal safety additions now in master

2019-11-12 Thread Matthew Dillon
Some fairly significant signal safety additions have been made to libc and
pthreads in master.  Anyone upgrading please note that you will need to do
a full world and kernel build.

The basis for this is that for a very long time we've had problems building
a number of large packages, in particular lang/mono and lang/rust.  The
builds would only succeed one time out of ten, or one time out of twenty,
and just fail to complete the rest of the time.  After a lot of messing
around we traced the failures down to signal safety issues in both libc
(primarily the malloc subsystem), but also pthreads (primarily mutex
recursions and deadlocks from signal handlers).

But not just these two applications.  Really any large, sophisticated
application that tries to implement any sort of signal-based asynchronous
garbage collection or other mechanism is stretching the limits and causing
issues.  As these applications become more sophisticated they begin not
only to trip over our library code, but their own code as well.  The result
is that some applications are just aren't as robust as we'd like them to be.

The basic problem with doing signal-safety is that it normally eats at
least two system calls, often in performance-critical code paths, as well
as messes with signal masks which can interfere with the expectations of
the application.  To solve the problem properly and be able to enable
signal safety across every important function in libc and pthreads I have
created another shared user/kernel memory mapping called 'lpmap', which is
similar to the upmap and kpmap, but which is able to give us a per-thread
interface.  lpmap is per-thread, upmap is per-process, and kpmap is
system-wide.

With this new mechanism signal safety is as easy as incrementing and
decrementing a variable in memory that the kernel also has access to and
can use to prevent posts during critical section of code.  The signal
safety now being tested in master uses this new mechanism.  Two new libc
routines have been created to support it:

https://leaf.dragonflybsd.org/cgi/web-man?command=sigblockall=ANY

The memory allocator and most internal mutex used by pthreads have been
wrapped with this functionality.  We couldn't do this without the 'lpmap',
it would just be too expensive otherwise.  Languages such as rust and mono
are now far more reliable because of this, and we continue to test other
applications.

The primary repo commits implementing this are shown below.  Note that
there are also some follow-up stability and cleanup commits to these:

https://gitweb.dragonflybsd.org/dragonfly.git/commit/721505dec240e78696660384d988da78813a33bd

https://gitweb.dragonflybsd.org/dragonfly.git/commit/64b5a8a550c3c782ab04d04d63723691ac054ffc

Master will see more commits in coming weeks wrapping more library code
and/or replacing existing uses of sigprocmask() system calls (particularly
in rtld) to improve performance.  sigblockall()/sigunblockall() is a much
cleaner mechanism.

-Matt


Re: Re[2]: pkg upgrade error after system failure

2019-11-10 Thread Matthew Dillon
A complete wipe and reinstall... ok, well, you may have to re-bootstrap the
packages.  If you are getting a certificate verification failure you may be
missing /etc/ssl/cert.pem or /usr/local/etc/ssl/cert.pem.

cd /usr
make pkg-bootstrap-force

Then see if the 'pkg' command works again by trying to install a few things.

I'm not sure if that will wipe the corrupt database.  If not you will have
to rm the database... Antonio, do you remember what files must be removed
to do that?

-Matt

On Sun, Nov 10, 2019 at 10:44 AM Ilia Gorstkine  wrote:

>
> Hi Antonio,
> I use Hammer.
> How can I reinstall all packages from scratch if both pkgng and dports
> fails with these errors respectively?
> Certificate verification failed for...
> and pkg: sqlite error while executing...
>
> Пятница, 8 ноября 2019, 18:07 +03:00 от Antonio Huete Jiménez <
> tuxi...@quantumachine.net>:
>
> Hi,
>
> I've just made a fresh 5.6.2 installation to a VM and everything
> worked as expected. I suspect your hard reset has caused some kind of
> corruption both in the certificates and the pkgng database. You may
> have to reinstall all the packages again from the scratch.
>
> BTW, which filesystem did you use for the installation?
>
> Regards,
> Antonio Huete
>
>
>
>
> Ilia Gorstkine  escribió:
>
> > Sony vaio pcg-41213v
> > 5.6-release DragonFly v5.6.2.4.g39d387-RELEASE #4
> >
> > When updating packages by pkg upgrade at the installation stage my
> > laptop freezes tightly and I had to turn it off via the power button.
> > After booting the system, the pkg upgrade command throws errors:
> > Updating Avalon repository catalogue...
> > Certificate verification failed for /C=SE/O=AddTrust AB/OU=AddTrust
> > External TTP Network/CN=AddTrust External CA Root
> > 34371375740:error:14007086:SSL routines:CONNECT_CR_CERT:certificate
> > verify
> > failed:/usr/src/lib/libressl/../../crypto/libressl/ssl/ssl_clnt.c:1121:
> > pkg:
> >
> https://mirror-master.dragonflybsd.org/dports/dragonfly:5.6:x86:64/LATEST/meta.txz:
> Authentication
> > error...
> > ...
> > pkg:
> >
> https://mirror-master.dragonflybsd.org/dports/dragonfly:5.6:x86:64/LATEST/packagesite.txz:
> Authentication
> > error
> > Unable to update repository Avalon
> > Error updating repositories!
> >
> > I have tried other repositories, but with the same results.
> > I tried installing ca_root_nss, but pkg install ca_root_nss returned
> > the same error.
> > pkg fetch, pkg search, pkg update - same error.
> > pkg backup -r /var/backups.sql.xz - recovery went ok, but nothing
> changed.
> > pkg version returns the following:
> > pkg: sqlite error while executing ALTER TABLE packages ADD
> > licenselogic INTEGER NOT NULL DEFAULT(1); in file pkgdb.c:2477: no
> > such table: packages
> >
> > Can anyone help restore pkg database?
> > Thanks in advance!
> > --
> > Ilia Gorstkine
>
>
>
>
>
> --
> Ilia Gorstkine
>


Re: Arena and threads

2019-10-30 Thread Matthew Dillon
Generally speaking the allocation algorithm depends on the language, the OS
doesn't have a lot to do with it.  Anything linked against libc (that is,
any C program), however, will use the allocator in libc which we do
control.  That allocator allocates per-thread zones.

That said, the memory allocator itself, no matter what the language, is
still just allocating memory that is shared across all the threads.  No
copying is involved.  All the threads share the same address space.  Any
locking is up to the program (or the language) itself.

-Matt


Re: Samsung QX411L laptop...

2019-10-04 Thread Matthew Dillon
Yah, those older laptops do tend to have better compatibility.  The newer
ones have wifi chipsets that we often don't have support for, among other
issues that crop up.

-Matt


Master can now be updated - a full world + kernel build and pkgs upgrade is required.

2019-09-16 Thread Matthew Dillon
For people using master, all the ABI breakage has been committed and new
binary packages have been generated.  A full world + kernel build is
required, reboot, and then a full replacement of all packages (typically
via 'pkg upgrade -f') is required.  Remember to upgrade packages after
rebooting, not before.

Generally speaking, the breakage means that numerous new packages will not
run on an old kernel and numerous old packages will not run on a new
kernel.  Packages which use messaging (which would be many of the web
packages such as chrome and X applications such as xpdf) or mess with
network interfaces (such as named / bind) will break if not matched.

Sometimes the packages database can get confused when doing a major
replacement like this.  If the pkg upgrade finds dangling packages that it
can't figure out it is usually best to ^C, delete them with pkg delete, and
then try again, until pkg upgrade is able to run cleanly with only its
normal deinstall/install/reinstall output.  If things get really bad you
might have to delete your packages and reinstall.  You really don't want
any older packages or libraries sitting around if you upgrade to the latest
master.

-Matt


ABI breakage commits complete - a few more days for dports (for master)

2019-09-13 Thread Matthew Dillon
The ABI-breaking commits to master are now in the tree.  However, it is
going to be a few more days before we can rebuild dports.  If you are using
master, I recommend refraining from updating until early next week when we
get the binary package set updated.

For master users, when you do update next week you will have to rebuild and
reinstall everything.  world, kernel, and pkg upgrade.

-Matt


Heads up - ABI breakage going into master

2019-09-11 Thread Matthew Dillon
Commits today, this evening, and tomorrow are going to break a few ABI's.
Since we're breaking one, we might as well fix the other little niggling
issues at the same time.

If you are on master, we recommend not updating until new binary packages
are available for it, which might not be until the weekend or possibly even
early next week.  If you decide to upgrade anyway, a full world and kernel
build is required.

Commits are still incoming, but by tomorrow afternoon they should all be
in.  I do not recommending rebuilding until at least tomorrow evening, or
later if you also need to update ports.

-Matt


New servers in the colo, monster is being retired.

2019-08-13 Thread Matthew Dillon
We have three new servers in the colo now that will be taking most/all bulk
package building duties from monster and the two blades (muscles and
pkgbox64) that previously did the work.   Monster will be retired.   The
new servers are a dual-socket Xeon (sting) and two 3900X based systems
(thor and loki) which all together burn only around half the wattage that
monster burned (500W vs 1000W) and 3 times the performance.   That's at
least a 6:1 improvement in performance efficiency.

With SSD prices down significantly the new machines have all-SSDs.  These
new machines allow us to build dports binary packages for release, master,
and staged at the same time and reduces the full-on bulk build times for
getting all three done down from 2 weeks to 2 days.  It will allow us to
more promptly synchronize updates to ports with dports and get binary
packages up sooner.

--

Monster, our venerable 48-core quad-socket opteron is being retired.  This
was a wonderful dev machine for working on DragonFly's SMP algorithms over
the last 6+ years precisely because its inter-core and inter-socket
latencies were quite high.  If a SMP algorithm wasn't spot-on, you could
feel it.  Over the years DragonFly's performance on monster in doing things
like bulk builds increased radically as the SMP algorithms got better and
the cores became more and more localized.  This kept monster relevant far
longer than I thought it would be.

But we are at a point now where improvements in efficiency are just too
good to ignore.  Monster's quad-socket opteron (4  x 12 core 6168's) pulls
1000W under full load while a single Ryzen 3900X (12 core / 24 thread) in a
server configuration pulls only 150W, and is slightly faster on the same
workload to boot.

I would like to thank everyone's generous donations over the last few
years!  We burned a few thousand on the new machines (as well as the major
SSD upgrades we did to the blades) and made very good use of the money,
particularly this year as prices for all major components (RAM, SSDs, CPUs,
Mobos, etc) have dropped significantly.

-Matt


Heads up - TAP and TUN changes in master may require a little rejiggering

2019-08-02 Thread Matthew Dillon
For anyone using openvpn or otherwise using TAP and TUN, changes in TAP and
TUN in master may break your systems.  However, they should be easy to
fix.  Basically TAP and TUN no longer pre-create the first four interfaces
(tap0...tap3 and tun0...tun3).  Some application code might depend on
scanning unit numbers to find an available device and no longer work.

These are auto-clone devices, which means that opening "/dev/tap" or
"/dev/tun" automatically creates a new interface.  Not all application code
handles this properly but there is an easier way to deal with these
interfaces and that is to use 'ifconfig tap8 create' or something like that
for tap and tun to pre-create a specific unit, and then specify that
specific device in the application needing it.  "/dev/tap8", etc.

For example:

ifconfig tap8 create

... and in the openvpn configuration one might then specify 'dev tap8'.

If you are running release, this methodology also works the same so you can
get an early start and fixup the system now so it smoothly upgrades to a
new DFly version later on.

-Matt


Re: DragonFlyBSD Project Update - colo upgrade, future trends

2019-07-29 Thread Matthew Dillon
The mailing list software has been less than stellar, but the bigger
problem is in areas that we have very little control over.  We have no
control over other people's spam filters, and the mailing list software
itself has to deal with a constant influx of spam (which is why you have to
be subscribed, now).   It is almost impossible to manage it any other way.
Nearly all of the internet has moved on to WWW based forum-like mechanisms
because they are a whole lot easier to manage.  We're going to have to as
well.

I feel that we do not have a choice here.  Privately-run mail systems, in
general, are almost dead due to the spam load.  I have to forward my own
personal domain email through GMail just to be able to continue using it
and my GMail spam mailbox consistently contains more than 3000 spams in it
(30-day expiration, so ... 100+ spams per day). And that doesn't count the
ones Google auto-deletes immediately or the ones my smtp server discards.
 I've tried everything possible to keep my personal domain and dragonfly's
domain email usable but its an impossible task.

-Matt

On Sun, Jul 28, 2019 at 1:41 PM Constantine A. Murenin 
wrote:

> On Mon, 22 Jul 2019 at 15:56, Matthew Dillon  wrote:
>
>> The mailing lists are not seeing much if any activity any more.  This is
>> more a generational issue... people kinda prefer web-based forums these
>> days and younger generations do not use mailing lists at all for group
>> stuff (not really).  Even the devs almost universally use IRC and not
>> mailing lists for discussions now (its kinda too bad that we don't have a
>> permanent irc log stored on DFly servers for posterity).  So we are looking
>> into potentially shifting user interaction to a web-based forum, perhaps
>> this year, and retiring the mailing lists, leaving just an archive for the
>> mailing list.  Possibly sometime this year, so look for action on that
>> upcoming.
>>
>
> I would think that part of this must be because messages sent to the
> mailing lists are silently discarded from non-subscribers.  IME,
> DragonFlyBSD.org doesn't even send out any error messages in this
> instance.  I've repeatedly had this happen to me, several times over the
> years, and I bet others have been affected as well.  On OpenBSD.org, in
> these instances, you simply receive a confirmation email asking you to
> confirm that you've sent the message (good because the feedback and the
> resolution are both instant — apart from the greylisting by PF spamd).  On
> FreeBSD.org, depending on list, moderators eventually approve any such
> messages (often causing a delay of several days).
>
> I think it'll be a sad day to see the mailing lists go.  They are so much
> better than the forums from so many perspectives, including archival.  I
> find forums problematic due to censorship and lack of accountability, not
> to mention archival issues — not even posters themselves would have copies
> of their own posts, unless extra care is taken, usually on the part of the
> poster, requiring quite some discipline.  On nginx.org, there is some
> sort of a forum-based mirror and gateway for the mailing lists, perhaps
> that's what DragonFly might be interested in adopting as well, if forum
> availability is a requirement?
>
> Cheers,
> Constantine.
> http://cm.su/
>


DragonFlyBSD Project Update - colo upgrade, future trends

2019-07-22 Thread Matthew Dillon
 NetBSD and OpenBSD and I'd kinda
like to know what their plans are, because the future is clearly going not
only multi-core, but many-core.  For everything.  But as I like to say, for
SMP there are only three at the moment.  One can't dispute that Linux has
nearly all the eyeballs, and DragonFly has very few.  But OpenSource tends
to live on forever and algorithms never die... I think there is a place for
all of these projects and there really aren't any alternatives if you want
a sparkling clean system that doesn't have too many layers of abstraction.
At the current Juncture DragonFlyBSD is doing well and there are no plans
to slow down or stop.

There are many other developers who help out with DragonFlyBSD on a regular
basis, or drop in from time to time, as well as past developers who did an
awful lot of work.   For this I am going to run the names out of the git
log in alphabetical order, so I don't miss anyone (hopefully).  And to
'User' and 'Charlie Root'... we will never know who you were, but the party
is still going!

-Matt

Aaron LI
Adam Hoka
Adam Sakareassen
Adrian Chadd
Aggelos Economopoulos
Alex Hornung
Alexander Kuleshov
Alexander Polakov
Alexandre Perrin
Antonio Huete
Antonio Huete Jimenez
Antonio Nikishaev
Aycan iRiCAN
Ben Woolley
Bill Yuan
Brad Hoffman
Brills Peng
Charlie Root
Chris Pressey
Chris Turner
Chris Turner
Chris Wilson
Christian Groessler
Constantine A. Murenin
Daniel Bilik
Dave Hayes
David P. Reese
David Rhodus
David Shao
David Xu
Diederik de Groot
Dimitris Papastamos
Dylan Reinhold
Ed Schouten
Eirik Nygaard
Eitan Adler
Francis GUDIN
Franco Fichtner
Fran\xc3\xa7ois Tigeot
Gregory Neil Shapiro
Gwenio
Hasso Tepper
Hidetoshi Shimokawa
Hiroki Sato
Hiten Pandya
Ilya Dryomov
Imre Vadasz
Imre Vad\xc3\xa1sz
Imre Vad\xc3\xa1sz
Jan Lentfer
Jan Sucan
Javier Alc\xc3\xa1zar
Jean-S\xc3\xa9bastien P\xc3\xa9dron
Jeffrey Hsu
Jeremy C. Reed
Jeroen Ruigrok/asmodai
Joe Talbott
Joerg Sonnenberger
Johannes Hofmann
John Marino
Jordan Gordeev
Joris Giovannangeli
Justin C. Sherrill
Levente Kurusa
Liam J. Foy
Lubos Boucek
Magliano Andrea
Markus Pfeiffer
Matt Dillon
Matteo Cypriani
Matthew Dillon
Matthias Rampke
Matthias Schmidt
Maurizio Lombardi
Max Herrgard
Max Herrg\xc3\xa5rd
Max Okumoto
Maxim Ag
Michael Neumann
Michael Neumann
Michael Neumann
Mihai Carabas
Nicolas Thery
Nicolas Thery
Nolan Lum
Noritoshi Demizu
Nuno Antunes
Nuno Antunes
Peeter
Peeter Must
Peter Avalos
Pierre-Alain TORET
Robert Garrett
Robin Hahling
Rui Paulo
Rumko
Samuel J. Greear
Sascha Wildner
Scott Ullrich
Sepherosa Ziehau
Simon 'corecode' Schubert
Simon Arlott
Simon Schubert
Simon Schubert
Stathis Kamperis
Sylvestre Gallon
Thomas E. Spanjaard
Thomas Nikolajsen
Tim
Tim Bisson
Tobias Heilig
Tomasz Konojacki
Tomohiro Kusumi
Ulrich Sp\xc3\xb6rlein
User
Venkatesh Srinivas
Victor Balada Diaz
Vishesh Yadav
Walter Sheets
YONETANI Tomokazu
Yellow Rabbit
Yonghong Yan
Zach Crownover
b86
dumbbell
glebius
hrs
jkim
minux
rnoland
sinetek
zrj
\xc3\x81kos Kov\xc3\xa1cs

-Matt


Re: Taskset for Dragonfly BSD

2019-07-03 Thread Matthew Dillon
We have a utility called 'usched' which does basically that, though our
usched is pretty old and minimalist... it doesn't handle specifications for
more than 64 cores whereas the cpumask API handles up to (currently) 256
cores.  Our usched definitely needs to be updated.

-Matt


Re: Errors during mirror-copy after upgrading to 5.6.1

2019-06-29 Thread Matthew Dillon
No changes were made to HAMMER1 that would effect the btree code, so I
suspect it is a coincidence.

If the main filesystems look ok it could be that the error resides on one
of the snapshots.  You can try deleting all the snapshots on the source
machine via 'hammer prune-everything ' and then see if you
still get mirroring errors.

If you still get mirroring errors after doing that, your best bet would be
to backup the two PFSs in question, then destroy and recreate them and
restore.

-Matt


Three more items brought into -release (fix for paging-to-swap bug, ahci polling, improved read/write fairness for hard drives)

2019-06-28 Thread Matthew Dillon
The release branch's kernel has two more bug fixes and a new
read/write fairness feature for (primarily) hard drive accesses.

The first bug fix deals with a paging issue.  A machine which pages
heavily to swap can wind up looping on pages in the inactive queue,
writing the same pages out over and over again without making
progress.   This can eventually make a machine unusable.  This bug has
been fixed.

The second item is a mitigation to a possible AHCI chipset bug, which
is how most SATA drives are attached.   The mitigation is to add a
poll every 10 seconds just in case the chipset misses an interrupt
somehow.  We've had a number of reports of sata drives deadlocking for
no reason.  This mitigation is an attempt to narrow down the problem.

The third item is a modification to the 'da*' disk device attachment
which balances read and write I/O when both operations are present.
Hard drives have large write buffers and even though the driver makes
sure that both reads and writes get tags, a hard drive can wind up
starving read requests due to its write buffer filling up.  A single
tag is all that is needed to fill up a hard drive's write buffer.  The
new feature detects this situation and ensures that read TPS is given
a fair shake by temporarily blocking writes.

This last item significantly improves concurrent reads and writes to a
hard drive, particularly one used for swap (NOTE: we recommend only
using SSDs for swap, but yes, some people still use hard drives for
swap).  It may also avoid a possible read starvation issue caused by
the hard drive itself that could cause a read command tag to not get
processed in a reasonable period of time, long enough to potentially
cause a CMD TIMEOUT by the driver.

--

We are tracking several bug reports related to "indefinite wait
buffer" messages on the console, typically related to heavy paging
to/from swap.  The AHCI polling mitigation and the TPS balancing
feature are an attempt to narrow down the possible problem source and
possibly even fix the issue.

-Matt


Re: DragonFly 5.6.1 tagged and built

2019-06-25 Thread Matthew Dillon
As karma would have it, we've been working on getting a more recent version
of chrome operational and we were having issues with blank or incomplete
pages coming up.  It turns out that the best way to deal with it is to make
yet another change to the kernel.  This change has been pushed to master
and release.

We now expect to have chromium-75 in dports and our binary packages
sometime this weekend for both release and master.  It will be
significantly more stable than the current chromium package.  However, you
will need to compile and install the latest release or master kernel
depending on your system.

I don't think we are going to roll a 5.6.2 for this so soon after 5.6.1,
I'd like to wait maybe a month to see if anything else comes out of the
woodwork before rolling 5.6.2.

-Matt


Re: How to get the thread number and every thread's ID of a running process?

2019-06-25 Thread Matthew Dillon
My favorite is:   ps axlRH

That not only gives you the threads, it gives you all the parent/child
relationships in nicely indented output.   Use a wide xterm.  Or do it
without the 'l' (ps axRH) to make more room.

-Matt


Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master

2019-06-22 Thread Matthew Dillon
Ok, I'll try to reproduce it.  What GPU do you have?  Is this an Intel iGPU
(which cpu?), or is this a radeon of some sort ?

-Matt


Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master

2019-06-21 Thread Matthew Dillon
Try going back to the default /usr/src/sys/config/X86_64_GENERIC kernel
config and see if you still have the lockup problem.  There are two
possibilities.  One is that the removal of the DDB option caused the kernel
to enter into a cpu-bound loop instead of panicing on an improper user
space address, and the second possibility is that there is some other bit
of code related to one of the other options, such as INVARIANTS, whos
removal is causing problems (if you commented out other options as well).

Since you are running a custom kernel config, if you could post a diff -u
of the default verses your current config, that would be helpful.  I might
need it to reproduce the issue.

-Matt

On Fri, Jun 21, 2019 at 5:46 PM Matthew Dillon  wrote:

> The unused function is because DDB isn't compiled into your kernel.  I
> will push a fix for that so DDB doesn't have to be compiled in.
>
> I'll see if I can reproduce the lockup issue with compton.
>
> -Matt
>


Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master

2019-06-21 Thread Matthew Dillon
The unused function is because DDB isn't compiled into your kernel.  I will
push a fix for that so DDB doesn't have to be compiled in.

I'll see if I can reproduce the lockup issue with compton.

-Matt


Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master

2019-06-17 Thread Matthew Dillon
We had a bit of an oops with the 5.6.0 release.  We did insufficient
testing of a particular drm change and a serious X lockup bug wound up in
the release.  We will be rolling 5.6.1 but in the mean time anyone running
the release can update their sources to get the fix and then simply rebuild
and reinstall their kernel.  There is no need to rebuild/reinstall the
world.  Just the kernel (including modules).

(if you don't have the system sources:  cd /usr; make src-create-shallow)
cd /usr/src
git pull
make -j 4 nativekernel
make installkernel

And then reboot normally.  This will fix an issue with X that could result
in a machine freeze.

--

The opie removal only applies to master.  Opie still exists in release (but
is deprecated).

Also note that the upgrade procedure for master has not yet been fixed to
deal with the opie removal, so if you are using a recent master or updating
to master please manually remove any 'opie' related lines from
/etc/pam.d/*, if any are present.

If you have updated to the latest master and accidentally rebooted the
system into an unusable state (where you are unable to login),  The
solution there is to reboot and choose single-user mode, make sure / is
mounted rw (usually just 'mount -u -o rw /' does the job), and make the
appropriate edits to /etc/pam.d/*.  Then reboot once more.

-Matt


Re: HEADS UP on master

2019-06-17 Thread Matthew Dillon
We're going to fix 'make upgrade' to not require this manual intervention,
but it may take a day or two, so anyone using master... beware of this
issue.

-Matt

On Mon, Jun 17, 2019 at 7:21 AM Rimvydas Jasinskas 
wrote:

> Hi,
>
> deprecated OPIE removal from base requires manual intervention
> *before* rebooting on updated master,
> The "make upgrade" script only warns about detected opie presence and
> suggest to manually edit or reinstall default PAM configs (cd
> /usr/src/etc/pam.d && make install) on both 5.6-RELEASE and
> 5.7-DEVELOPMENT.
> Make sure that /etc/pam.d/* configs no longer have hardcoded  pam_opie
> entries on master. For more information, see
> ed5666c1699a23a9ae3c0aca97dabaae71e26431
>
> Also OpenSSH was recently updated to 8.0p1. UsePAM option is enabled
> by default if sshd(8) is compiled with -DUSE_PAM.
> From now on default base sshd(8) configs are installed into
> /usr/share/examples/ssh/ together with cert.pem and openssl.cnf in
> /usr/share/examples/ssl/ too.
> Please check your /etc/ssh/sshd_config for deprecated options.
>
> RJ
>


VM work will be in the upcoming 5.6 release.

2019-06-14 Thread Matthew Dillon
   June 9 2019
5.4 RELEASE vs UPCOMING 5.6 RELEASE

Here is a set of simple semi-scientific tests headlining performance
improvements in the upcoming 5.6 release over the 5.4 release.  These
improvements were primarily obtained by rewriting (again) major chunks of
the VM system and the PMAP system.

Prior work was able to move many exclusive locks to shared locks.  This new
work is able to do-away with many locks entirely, and reduces the amount of
cache-line ping-ponging occurring between cpu cores when taking faults on
shared VM objects.

These tests were done on a little Haswell 2/4 box and on a Xeon 16/32
dual-socket box.  It demonstrates the following:

* The massive VM rework modestly reduces per-thread VM fault overheads
  and significantly reduces VM fault overheads on shared VM pages.

  Thus we see a MASSIVE improvement in the concurrent self-exec tests
  when any part of the binary is shared or if it is a dynamic binary
  (uses shared libraries).

  We see a modest improvement for ad-hoc concurrent compile tests.

  We see a small improvement in the buildkernel test on the haswell
  and a more significant improvement on the xeon, which roughly matches
  expectations.  Buildkernel bottlenecks in the linker and a few other
  places (even with NO_MODULES=TRUE).  What is important to note here
  is the huge reduction in system time.  System time dropped by 40%.

* The zero-fill fault rate has significantly improved.  It's a bit hard
  to test because I am butting up against bandwidth limitations in the
  hardware, but the improvement is a very real 17% (haswell) and
  14% (xeon), respectively.

* Scheduler fixes in 5.6 improve concurrency and reduce cache-line
  ping-ponging.  Note, however, that the scheduler heuristic in 5.4
  was a bit broken so this mostly restores scheduler performance from
  5.2.  This only effects the DOCOMP test (see note 2 below).

Other observations (not shown here)

* The VM rework got rid of all pv_entry structures for terminal PTEs.
  This can save an enormous amount of ram in certain limited situations
  such as a postgres server with many service processes sharing a
single,
  huge, shared-memory cache.

* There is a huge reduction in system overheads in some tests.  In fact,
  in most tests, but keep in mind that most tests are already cpu-bound
  in user-mode so the overall real-time improvement in those tests is
  more modest.

* In synth-based bulk runs I am observing a drop in system overhead
  from 15-20% to 10-15%, and the bulk build does appear to take
  commensurately less time (around 5%).

  That said, certain aspects of the synth bulk run are much, much faster
  now.  The port scans used to be able to run around 5%/sec on our
threadripper
  (and that was already considered fast!).  Now the port scans run
around 10%/sec.
  This is because the insane concurrent exec load involved with doing
the
  port scan is directly impacted by this work.


SELF-EXEC TESTS
   This tests a concurrent exec loop sequencing across
N CPUs.  It is a simple program which exec's itself
and otherwise does nothing.

We test (1) A statically linked binary that copies
itself to $NAME.$N so each cpu is exec()ing a
separate copy, (2) A statically linked binary that
does not do the copy step so multiple CPUs are
exec()ing the same binary.  (3) A dynamic binary
that copies itself (but not the shared libraries
it links against), meaning that the shared libraries
cause shared faults, and (4) A dynamic binary that
is
fully shared, along with the libraries, so all vnode
faults are shared faults.

FAULTZF
   This tests N concurrent processes doing zero-fill
VM faults in a private per-process mmap().  Each
process is doing a mmap()/force-faults/munmap()
loop.

DOCOMP
This does N concurrent compiles of a small .c
program,
waits for them to complete, and then loops.  The
compiler is locked to gcc-4.7 (same compiler for
release vs master).

This is a good concurrent fork/exec/exit/wait test
with a smattering of file ops and VM faults in the
mix.  It tests some of the scheduler heuristics,
too.

NATIVEKERNEL
   This does a high-concurrency buildkernel tests that
does not include modules, simulating 

Major kernel VM pmap work in master

2019-05-21 Thread Matthew Dillon
Master has received some major VM work so please take care if you decide to
update or upgrade your system, it may lose a little stability.  A full
buildworld and buildkernel is needed due to internal structural changes.
The work is also not entirely complete, there are two or three memory
conservation routines that have not been put back in yet.  That said, the
work looks pretty solid under brute force testing.

The new work going in basically rewrites the handling of leaf PTEs in the
pmap subsystem.  Each vm_page entered into the MMU's pmap used to be
tracked with a 'pv_entry' structure.  The new work gets rid of these
tracking structures for leaf pages.  This saves memory, helps deal with
certain degenerate situations when many processes share lots of memory, and
significantly improves concurrent page fault performance because we no
longer have to do any list manipulation on a per-page basis.

Replacing this old system is a new system where we use vm_map_backing
structures which hang off of vm_map_entry's... essentially one structure
for each 'whole mmap() operation', with some replication for copy-on-write
shadowing.  So, instead of having a structure for each individual page in
each individual pmap, we now have a single structure that covers
potentially many pages.  The new tracking structures are locked, but the
number of lock operations is reduced by a factor of 100 (at least), or even
better.

Currently the committed work is undergoing stability testing and there will
be follow-up commits to fix things like minor memory leaks and so forth, so
expect those to be incoming.

Work still to do:

* I need to optimize vm_fault_collapse() to retain backing vnodes.
Currently any shadow object chain deeper than 5 causes the entry to fault
all pages to the front object and then disconnect the backing objects.  But
this includes the terminal vnode object which I don't actually want to
include.

* I need to put page table pruning back in (right now empty page table
pages are just left in the pmap until exit() to avoid racing the pmap's
pmap_page_*() code)

* I need to implement a new algorithm to locate and destroy completely
shadowed anonymous pages.

None of this is critical for the majority of use cases, though.  The
vm_object shadowing code does limit the depth so completely shadowed
objects won't just build up forever.

--

These changes significantly improve page fault performance, particularly
under heavy concurrent loads.

* kernel overhead during the 'synth everything' bulk build is now under 15%
system time.  It used to be over 20%.  (system time / (system time + user
time)).  Tested on the threadripper (32-core/64-thread).

* The heavy use of shared mmap()s across processes no longer multiplies the
pv_entry use, saving a lot of memory.  This can be particularly important
for postgres.

* Concurrent page faults now have essentially no SMP lock contention and
only four cache-line bounces for atomic ops per fault (something that we
may now also be able to deal with with the new work as a basis).

* Zero-fill fault rate appears to max-out the CPU chip's internal data
busses, though there is still room for improvement.  I top out at 6.4M
zfod/sec (around 25 GBytes/sec worth of zero-fill faults) on the
threadripper and I can't seem to get it to go higher.  Note that obviously
there is a little more dynamic ram overhead than that from the executing
kernel code, but still...

* Heavy concurrent exec rate on the TR (all 64 threads) for a shared
dynamic binary increases from around 6000/sec to 45000/sec.  This is
actually important, because bulk builds

* Heavy concurrent exec rate on the TR for independent static binaries now
caps out at around 45 execs per second.  Which is an insanely high
number.

* Single-threaded page fault rate is still a bit wonky but hit 500K-700K
faults/sec (2-3 GBytes/sec).

--

Small system comparison using a Ryzen 2400G (4-core/8-thread), release vs
master (this includes other work that has gone into master since the last
release, too):

* Single threaded exec rate (shared dynamic binary) - 3180/sec to 3650/sec

* Single threaded exec rate (independent static binary) - 10307/sec to
12443/sec

* Concurrent exec rate (shared dynamic binary x 8) - 15160/sec to 19600/sec

* Concurrent exec rate (independent static binary x 8) - 60800/sec to
78900/sec

* Single threaded zero-fill fault rate - 550K zfod/sec -> 604K zfod/sec

* Concurrent zero-fill fault rate (8 threads) - 1.2M zfod/sec -> 1.7M
zfod/sec

* make -j 16 buildkernel test (tmpfs /usr/src, tmpfs /usr/obj):

4.4% improvement in overall time on the first run (6.2% improvement on
subsequent runs).  system% 15.6% down to 11.2% of total cpu seconds.  This
is a kernel overhead reduction of 31%.  Note that the increased time on
release is probably due to inefficient buffer cache recycling.

1309.445u 242.506s 3:53.54 664.5%   (release)
1315.890u 258.165s 4:00.97 653.2%   (release, run 2)
1318.458u 259.394s 4:00.51 

Some recent fairly important fixes in master and release

2019-05-03 Thread Matthew Dillon
Two important fixes have gone into master with a version of them also
brought in to the release branch.  The first is a floating point bug
related to a (long time) known hardware issue on Intel CPUs.  We thought we
had fixed this bug ages ago, but it turns out we didn't so it is being
fixed permanently with the removal of the of the remainder of the FP
switching heuristic.

If you do not want to update immediately, you can fix the problem live with
a simple sysctl:

sysctl machdep.npx_fpu_heuristic=1

or put 'machdep.npx_fpu_heuristic=1' in your /etc/sysctl.conf and reboot.

--

The second bug is related to mmap()s MAP_STACK feature, which a number of
interpreted languages use, in particular 'ruby'.   The kernel was not
handling several of the cases properly.  In addition to fixing that case,
we have also basically stopped allowing user programs to create grow-down
segments in memory any more.  We do this by converting MAP_STACK into a
normal anonymous mapping.

The grow-down feature will ultimately be removed entirely as it is not
really applicable to 64-bit systems, but because the -release threading
libraries still assume that the main user stack uses this mode, it will be
another one or two release cycles before we actually scrap it completely.

This fix requires updating sources and building and installing a new kernel.

--

This work is in both master and the release branch.  No sub-release has
been scheduled for the release branch at this time, we need to talk about
it internally a bit first.

Thanks for the bug reports everyone!

-Matt


Re: How to run a vkernel as a background process

2019-04-28 Thread Matthew Dillon
Try redirecting stdout and stderr to a file.   notty -12 blahblahblabhalh
>& /tmp/logfile

That's with csh or tcsh . With sh it would be notty -12 blsahblahblbha  >
/tmp/logfile 2>&1

And then see what error you are getting.

-Matt


Re: How to run a vkernel as a background process

2019-04-27 Thread Matthew Dillon
You should be able to run the vkernel without a tty by using 'notty'
instead of 'nohup'.  It will fork and run detached, and then you can ssh
into it.

-Matt


Hammer2 fix now in the -release branch

2019-04-13 Thread Matthew Dillon
The H2 fix is now in the release branch so pulling sources and compiling up
a new kernel will have it.  We will roll a sub-release that also includes
the fix in a little more than a week.

-Matt


Re: What programming languages ​​and operating systems that will to be used after Jesus to return?

2019-04-13 Thread Matthew Dillon
Presumably the original post was just a spam.  In anycase...

-Matt


Hammer2 corruption bug fix under test

2019-04-12 Thread Matthew Dillon
A Hammer 2 corruption bug fix is currently under test in master.  My
expectation is that it will be merged into the -release branch on Saturday
after further testing (I will post another message then).  The corruption
is modestly difficult to cause but please read the commit message for more
details.  It can occur when significant filesystem write activity occurs
during a bulkfree operation.  This operation typically occurs in the early
morning (~3 a.m.).  The corruption is caught by the crc code and reported
on the console and in /var/log/messages as CHECK FAIL entries.

https://gitweb.dragonflybsd.org/dragonfly.git/commit/83815ec6515002d007c3800cb9fd83c9451852f7

Master also has some performance work for H2 under test that may be of
interest.

https://gitweb.dragonflybsd.org/dragonfly.git/commit/1159c75c92fbfdd230dd598904ede92791c00843

-Matt


Re: HAMMER2 snapshots and other help

2019-03-06 Thread Matthew Dillon
Originally I wanted to be able to snapshot subdirectory trees but I could
never make it work.  Snapshots are always of the mount-point now.  The
hammer2 manual page is a bit behind in that respect.  Otherwise though the
snapshot command works as advertised.   the 'pfs-list' command lists
available snapshots.

You can mount a snapshot with @LABEL, but if anything on that
device has already been mounted you can use a simple shortcut 'mount @LABEL
...' to mount the snapshot.  Snapshots are writable.  Unlike Hammer-1, in
Hammer-2 the snapshot must be mounted to access it.

-Matt


Re: Realtek NIC patch to try

2019-03-04 Thread Matthew Dillon
Yah, unfortunately the patch didn't fix the 'chip stops working every month
or so' problem on my server (not a big deal, I have other NICs on that box
that I can move the ethernet cable to).  But it didn't break anything new,
either.  So in that respect it certainly doesn't hurt :-)

-Matt

On Mon, Mar 4, 2019 at 1:07 AM Sepherosa Ziehau  wrote:

> Chip stops working.  Well, after all, it's a desktop level chip.
>
> On Mon, Mar 4, 2019 at 1:26 PM Eric Melville  wrote:
> >
> > I put this on a pair of cheap Chinese mini PCs, also 8111/8168. Looks
> good so far.
> >
> > What is the problem seen after a month or two?
> >
> >
> > On February 19, 2019 at 1:00 PM Matthew Dillon 
> wrote:
> >
> > So far so good on the RealTek 8111/8168 PCIe Gigabit Ethernet.  I'll run
> it and see if it continues to lock up every once in a while, but it usually
> takes a month or two to reproduce that particular hw bug.
> >
> > -Matt
> >
> > On Sat, Feb 16, 2019 at 3:47 AM Sepherosa Ziehau < sepher...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > Please help testing the following patch:
> >
> https://gitweb.dragonflybsd.org/~sephe/dragonfly.git/commit/9df33aeb3d49f4ac11af479ea0f3a2a2a48b538d
> >
> > I have tested it for a while, should be safe to apply.
> >
> > Thanks,
> > sephe
> >
> > --
> > Tomorrow Will Never Die
>
>
>
> --
> Tomorrow Will Never Die
>


Note: current master breaks smartctl (will be fixed in 1-3 days)

2019-03-03 Thread Matthew Dillon
The current master made a change to the kernel callout structure which
broke the CAM ABI used by smartctl, camcontrol, etc.  The problem is that
the userland is being exposed to a kernel structure that it should be
exposed to.

A followup commit will be made in 1-3 days (as early as tomorrow) to fix
the issue and make the kernel compatible with smartctl and so forth again.

-Matt


Re: Can't shell into leaf on IPv6

2019-02-25 Thread Matthew Dillon
The colo had connectivity problems today.  It appears to be resolved now.

-Matt


Re: desktop problem

2019-02-19 Thread Matthew Dillon
Make sure /etc/rc.conf has:

dbus_enable="YES"

You should be able to add and remove packages at-will with the 'pkg'
program as root.  If it isn't on the system you can bootstrap it with:

cd /usr
make pkg-bootstrap

-Matt

On Tue, Feb 19, 2019 at 3:46 PM Jonathan Engwall <
engwalljonathanther...@gmail.com> wrote:

> Downloading 5.4.1 now hoping for better results.
> Any advice, suggestion, inquiry is welcome.
> Jonathan Engwall
>
> On Tue, Feb 19, 2019, 2:06 PM Jonathan Engwall <
> engwalljonathanther...@gmail.com wrote:
>
>> Also you should know that I am using it on virtualbox.
>>
>> On Tue, Feb 19, 2019, 1:52 PM Jonathan Engwall <
>> engwalljonathanther...@gmail.com wrote:
>>
>>> I have spent all day working on this, since 8 a.m. it is now 1:45. X11
>>> and DBUS did not configure properly. Mkdesktop is the pkg I used to build
>>> KDE and now, because /var/run/dbus is literaly not there I cannot log in.
>>> I need to bypass the desktop to get console access. I have used the live
>>> image several times and made changes which ultimately have not produced
>>> results.
>>> I have changed the entry for tty8, I have written an .xinitrc, I have
>>> changed the .login and .login-config both to BAK.login. And I get nowhere.
>>> At this point I want to remove KDE, but I can't. Using the live image as
>>> root does not allow me to remove any packages.
>>> How can I get around the desktop login?
>>> Jonathan Engwall
>>> Also I have noticed that /etc/X11/ is empty
>>>
>>


Re: Realtek NIC patch to try

2019-02-19 Thread Matthew Dillon
So far so good on the RealTek 8111/8168 PCIe Gigabit Ethernet.  I'll run it
and see if it continues to lock up every once in a while, but it usually
takes a month or two to reproduce that particular hw bug.

-Matt

On Sat, Feb 16, 2019 at 3:47 AM Sepherosa Ziehau 
wrote:

> Hi all,
>
> Please help testing the following patch:
>
> https://gitweb.dragonflybsd.org/~sephe/dragonfly.git/commit/9df33aeb3d49f4ac11af479ea0f3a2a2a48b538d
>
> I have tested it for a while, should be safe to apply.
>
> Thanks,
> sephe
>
> --
> Tomorrow Will Never Die
>


Re: 5.4 hangs in UFS

2019-02-11 Thread Matthew Dillon
This feels more like an issue with the I/O and not with UFS specifically.
But since you tried two different storage devices it couldn't be that.
Perhaps there is a power or overheat issue on the system.

-Matt

On Sun, Feb 10, 2019 at 1:42 PM Eric Melville  wrote:

> Hello there,
>
> After installing 5.4 my system has been getting stuck in UFS, apparently
> in softdeps.
>
> At first I was faulting the -j12 buildworld, but then saw it in lower
> parallel counts, and
>
> eventually saw it when looping buildworld with no -j option at all. Then I
> was faulting
>
> my fast new NVME but eventually factored that out too by changing back to
> an old
>
> hard drive. In any case, the faster the hardware and the more work
> running, the
>
> more quickly and easily this seems to reproduce.
>
>
> Typically during the phase that removes old output, the build will hang
> indefinitely.
>
> Some processes continue to run but new ones never get going, and the old
> world
>
> clean never makes any progress. For example ssh to the host in this state
> would
>
> succeed to connect and authenticate, but the new shell never seems to run.
>
>
> I suppose I should try disabling softdeps next.
>


Re: system crashing with 5.2 and 5.0 but not with 4.8

2019-02-09 Thread Matthew Dillon
EHCI has some sort of niggling problem that I haven't been able to track
down.  I may try to re-port that piece from FreeBSD, if the infrastructure
hasn't diverged too much.  The code in the original port was extremely
fragile.

-Matt

On Sat, Feb 9, 2019 at 9:56 AM  wrote:

> On 10/10/18 7:06 AM, Aaron LI wrote:
> > On Sun, 7 Oct 2018 13:11:42 +0200
> > spuelmasch...@email.de wrote:
> >
> >> Hallo,
> >>
> >> my HP Microserver N40L (Turion2) keeps crashing while or shortly after
> >> installing any 5.x version of DragonFly.
> >> I reinstalled 4.8 and this is running for several days now.
> >>
> >> A memcheck run from a KNOPPIX CD test was successfull.
> >> KNOPPIX itself was also running for several days.
> >>
> >> Below are two ocr-ed screen photographs of these crashes. I do not know
> >> how to keep that text in a more elegant way.
> >>
> >> Now this is kind of a dilemma.
> >> - hardware ok -> run another OS
> >> - hardware defect -> renew server, run dfly ...
> >>
> >> Hopefully there's a better solution.
> >
> > Hi Sascha,
> >
> > According to your panic log below, both were happened during the
> installation
> > phase and related to the EHCI (USB 2.0).
> >
> > Did you use the same USB stick and the same USB port on the machine for
> 4.8,
> > 5.0 and 5.2 installations?  If yes, this is a strange and serious bug.
> >
> > I don't remember there were any EHCI related changes after 4.8.  Maybe
> others
> > have better knowledge about this.
> >
> >
>
> Hallo Aaron,
>
> Finally i did setup the external SATA port to install directly onto
> the SSD which i used via a SATA - USB adapter taken from an
> external harddisk.
> Now it's running without any problems since then and is still running.
> The SSD meanwhile went into the case.
>
> For the external SATA port i had to find a modified BIOS for the HP
> microserver. There are some fans who have the knowledge to modify
> the BIOS and activate the port ...
>
> Thank's for your useful hints.
>
> Tschuess,
> Sascha
>
>
>
>
>
>
>
>
>
> >> = 1 ==
> >> | /mnt/boot/kernel/radeonkmsfwTAHITI_rlc.ko copy-ok
> >>
> >> panic: Bad link elm 0xff8125709368 prev->next != elm
> >> cpuid = 1
> >> Trace beginning at frame 0xff81258d2870
> >> panic() at panic+0x236 0x805ec136
> >> panic() at panic+0x236 0x805ec136
> >> _callout_stop() at _callout_stop+0x3c3 0x80610973
> >> callout_reset[) at callout_reset+0x96 0x80610bd6
> >> ehci_interrupt() at ehci_interrupt+0x198 0x821aa3d8
> >> ithread_handler() at ithread_handler+0x2e9 0x805bal09
> >> Debugger("panic")
> >> CPU1 stopping CPUs: 0x0001
> >> stopped
> >> Stopped at Debugger+0x7c: movb $0,0xd31929(%rip)
> >> db> |
> >> = 1 ==
> >>
> >> = 2 ==
> >> ,-<<< Executing ‘/shin/newfs_hammer -f -L ROoT
> /dev/serno/2011090623F8.s2d'
> >> panic: Bad link elm 0xf80125b47368 prev->next != elm
> >> cpuid = 1
> >> Trace beginning at frame 0xf80125ccbB870
> >> panic() at panic+0x236 @x805f8666
> >> panie() at panic+0x236 0x805f8666
> >> _callout_stop() at _callout_stop+0x3d3 0x8061d803
> >> callout_reset() at callout_reset+0x96 0xfff8061d916
> >> ehci_interrupt() at ehei;interrupt+0x198 0x822973c8
> >> ithread_handler() at ithread_handler+0x2e9 0x805c5d09
> >> Debugger("panic")
> >> CPU1 stopping CPUs: 0x0001
> >> stopped
> >> Stopped at Debugger+0x7c: movb $0,0xe67a49(%rip)
> >> db> |
> >> = 2 ==
> >
> >
> > Cheers,
> >
>
>


Re: scheduling permissions (jackd)

2019-01-22 Thread Matthew Dillon
Yah, only superuser can access the rtprio classes.  I do not recommend
using rtprio though, because you can completely lock up the system if the
program you are running rtprio goes into a cpu-bound loop.  Try using nice
-n -20 without the rtprio, it should work nearly as well.  You still need
root to be able to nice to -20 but your solution to root su user works well.

-Matt


Re: Wired memory analysis

2019-01-20 Thread Matthew Dillon
Danial, Francois Tigeot just made a commit to master (not release yet) that
fixes a probably wired memory leak in the drm (gpu) code.  It could very
well be the cause of the excessive wired memory usage you have been
reporting.  This fix will eventually go into -release but we have to test
it first.  It's in master now.

-Matt


Re: HAMMER2 - mounting root from dm_crypt

2019-01-18 Thread Matthew Dillon
HAMMER2 by default names the PFS based on the partition it was created on.
And the H2 mount code will automatically add an @blah suffix according to
the same metrics.  But they both get confused when the mount is via the
mapper and that's why the problem occurs.  It might have been a bit of a
stretch on my part to try to be fancy in defining the default names when
they weren't otherwise being provided.   I think we could detect the crypto
or mapper paths and fix the defaults this way but nobody has gotten around
to coding and validating it all up.

-Matt

On Fri, Jan 18, 2019 at 12:38 PM Daniel Bilik  wrote:

> Hi, Aaron.
>
> On Fri, 18 Jan 2019 23:02:45 +0800
> Aaron LI  wrote:
>
> > Since the HAMMER2 FS was created with '-L ROOT', it requires '@ROOT' to
> > be mounted.
>
> Ah, that was it. Thank you for pointing out this detail. I've not expected
> it could be related.
>
> > Another simple way to workaround your issues is use the following setting
> > in /boot/loader.conf:
> > vfs.root.realroot="crypt:hammer2:/dev/serno/.s1d:rootd"
> > Note the final 'd'!
>
> Indeed, this workaround really works. I've not examined the magic behind
> it but my "ROOT" hammer2 filesystem is correctly mounted during boot with
> original non-patched initrd. Thank you very much for the hint.
>
> --
> Daniel
>


  1   2   3   4   >