Re: The state of amdgpu on DragonFly
Our GPU goals are mostly limited to modesetting, we just don't have the dev resources to achieve solid accel support. Of course, if someone wanted to work on accel support that would be great! -Matt
Hammer2 related fixes in-tree, plus more
Both master and release now have fixes for a fairly serious but specialized hammer2 bug, updates are recommended if you mount more than one PFS from the same block devices, or if you regularly run things which generate a lot of write activity ot the filesystem. The hammer2 bug is in the bulkfree code. When more than one PFS is mounted from the same device (which is fairly atypical for most people), there is a window during the bulkfree where blocks might be marked as completely free which are actually not free. If reallocated prior to the completion of the bulkfree, filesystem corruption can be incurred. The fix has been tested thoroughly, we basically ran an unpack and the grok indexer on around 20 million files and periodically ran a bulkfree during the operation over two days to ensure that the code fixes work properly. -- A second bug in the kernel was also fixed, this one related to systems that process many (typically in the millions) of files and directories that are in relatively deep directory trees. The vnode recycler could get into a situation where it would not be able to make any progress cleaning up inactive directory vnodes due to dangling namecache references cached by the system. A sufficient number of vnodes in this state could fill-up the inactive vnode list and prevent the system from being able to recycle vnodes and thus also prevent it from being able to allocate new vnodes. -- We will roll a sub-release in a week or two. The fixes are both in the release tree and in master so anyone who wishes to update can do so from system sources with a few simple commands as outlined in our upgrading page: https://www.dragonflybsd.org/docs/handbook/Upgrading/ -Matt
Bug fix to ipfw in tree, bug fix to hammer2 bulkfree when many inodes are present
There was a bug in ipfw related to adding IP addresses and networks to a table. When adding a mixed set of networks and hosts, where the hosts occur after the networks, ipfw enters the hosts incorrectly by retaining the network mask from the most recent network ip added on the same command line: ipfw table 1 add 10.0.0.0/8 192.0.2.1 # 2^24 + 1 addresses A fix has been pushed to both -release and master. If it affects you it can be worked around either by upgrading, or by ensuring that only one IP or network is added to the table for each table add command. -- In other news, hammer2 bulkfree operations on filesystems with a very large number of inodes (e.g. like a hundred million inodes) could exhaust kernel memory allocating side-structures during deep recursions. This bug has finally been fixed! Prior fixes only made some headway but did not completely fix the bug. The fix I just pushed to master fixes the issue entirely. The symptom is that the machine panics, typically during an overnight automatic hammer2 bulkfree operation. Or otherwise on lower-memory machines, the machine locks up during a bulkfree operation. This latest fix is currently only in master for testing and will be cherry-picked to the release branch in a month or so. -Matt
Re: The DFly website is down.
Yah, sorry about that folks. Down overnight on 2 successive days. It turned out to be a kernel memory exhaustion bug on the blade that is routing the dfly network. It also handles one of our backups and there was a bug where the hammer2 bulkfree scan on the backups (around 64 million inodes) ran out of kernel memory due to a depth deferral mechanism in the recursive radix tree scan. Should all be fixed now. I'll have to start thinking of ways to clean up H2 radix trees that get fragmented due to constant file deletions and creations. -Matt
Re: lookupdotdot failing again
I'll go through the namecache changes between the two versions and see if I can find something obvious. -Matt
tmux dport broken, will be fixed asap
The tmux dport is broken in master (cpu-bound loop on startup). We have a fix for it and will get an updated binary package in-place asap. -Matt
Re: gcc8 deprecated but dependencies exist
Ok. Antonio will look into it. We do override some of the GCC specifications in the FreeBSD ports tree when translating them to DPorts. You can ignore the warning for now (as long as the package works). It isn't quite as bad as the message says. -Matt On Tue, Aug 31, 2021 at 8:29 PM Phansi wrote: > Upgraded dragonfly setup recently. Got this message: > > - > Message from gcc8-8.5.0_1: > > -- > ===> NOTICE: > > This port is deprecated; you may wish to reconsider installing it: > > Unsupported by upstream. Use GCC 10 or newer instead.. > - > > > Cannot seem to remove gcc8 as blas and lapack depend on it. > > Did remove all the three and then found that (re)installing blas/lapack > still requires gcc8. > > Any suggestions? I do not use dports. > > -- > cheers > phansi > >
Re: [Code Bounty] NVMM hypervisor landed in DragonFly
Yes, what we decided to do for this (and probably all the bounties that get completed... Tuxillo is vetting the VALGRIND work next) is that they will be paid out of the DragonFly paypal account, and then the individual contributors to the bounty can pay into the DFly paypal account at their leisure. -Matt
Re: lookupdotdot failing again
Well, I'm a bit at a loss at the moment. Try exporting the base filesystem with NFS instead of exporting the nullfs mount. See if that works more reliably. -Matt
Re: Hammer errors.
Upgrade to 6.0 for sure, as it fixes at least one bug in HAMMER2, to eliminate that possibility. RAM is a possibility, though unlikely. If you are overclocking, turn off the overclocking. An ovewrclocked CPU can introduce corruption more easily than overclocked ram can. And check the dmesg for any NVME related errors. -Matt
Re: Hammer errors.
It looks like several different blocks failed a CRC test in your logs. It would make sense to try to track down exactly where. If you want to dive the filesystem meta-data you can dump it with full CRC tests using: hammer2 -vv show /dev/serno/S59ANMFNB34055E-1.s1d > (save to a file not on the filesystem) And then look for 'failed)' lines in the output and track the inodes back to see which files are affected. Its a bit round-about and you have to get familiar with the meta-data format, but that gives the most comprehensive results. The output file is typically a few gigabytes (depends how big the filesystem is). For example, I wound up with a single data block error in a mail file on one of my systems, easily rectified by copying-away the file and then deleting it. I usually dump the output to a file and then run less on it, then search for failed crc checks. data.106 00051676000f 206a/16 vol=0 mir=00149dc6 mod=02572acb lfcnt=0 (xxhash64 32:b65e740a8f5ce753/799af250bfaf8651 failed) A 'quick' way to try to locate problems is to use tar, something like this. However, tar exits when it encounters the first error so that won't find everything, and if the problem is in a directory block that can complicate matters. tar --one-file-system -cvf /dev/null / -Matt
Re: Thousands of "lookupdotdot failed" messages - should I be worried?
Yah, keep monitoring it. I kinda suspect what is happening is that one or more processes on the clients are winding up CD'd into NFS directories that then get renamed or deleted/replaced, or something like that. But it could easily also have been a hash collision in the file handle calculation the server makes. -Matt
Re: Thousands of "lookupdotdot failed" messages - should I be worried?
Oh, I forgot to mention... this patch changes the FSID for null mounts, so any clients should umount, then reboot the server, then clients can remount. or reboot the clients after rebooting the server. -Matt
Re: Thousands of "lookupdotdot failed" messages - should I be worried?
Ok, you could try this kernel patch and see if it helps. If you know how to rebuild the system from sources. The client-side I/O errors would go hand-in-hand with the dotdot stuff, but only because this is a NFS mount. The server-side should not be having any I/O errors, and no filesystem data is in error or corrupted or anything like that. http://apollo.backplane.com/DFlyMisc/nullfs01.patch Other questions I have are (1) Jjust how many NULLFS filesystems (2) Are you making multiple NFS mounts based on the same source path? (3) And finally, what is the underlying filesystem type that the nullfs is taking as its source? What I try to do in this patch is construct a FSID based on the nullfs's destination path instead of its source path, to try to reduce conflicts there. Another possible problem is that the nullfs's underlying filesystem has a variable fsid due to not being a storage-based filesystem. i.e. if the underlying filesystem is a NFS or TMPFS filesystem, for example. The only other thing that I can think of that could cause dotdot lookup failures is if you rename a directory on one client and try to access it from another, or rename a directory on the server and try to access it from a client that had already cached it. Directories are kinda finicky in NFS. -Matt
Re: Thousands of "lookupdotdot failed" messages - should I be worried?
I think this might be an issue with the filesystem id for the NULLFS exports changing on reboot (or remount). I thought I had figured out a solution for that but apparently not. I'll have to think about this a bit. -Matt
Re: Thousands of "lookupdotdot failed" messages - should I be worried?
If you are not seeing any actual I/O errors in the dmesg output, then there is probably no issue with the filesystem. The dotdot warnings might be some edge-case being caused by the null-mounts (because the null-mount has a mount point, but its being mounted on top of a sub-directory in an underlying filesystem). If you can track down the operation that is causing message, it might just wind up being a patch to the kernel to get rid of the console warning for that particular case. If you can find a simple configuration that I can throw onto a test box to get the same error, I can track down the issue and fix it. -Matt
Re: Improving I/O efficiency and resilience
There are a lot of potential failure points, its a long chain of software and hardware from the application's behavior all the way down to the storage device's behavior in a failure. Failure paths tend to not be well-tested. Reliability guarantees are... kinda just a pile of nonsense really, there are just too many moving parts so all systems in the world rely on stability first (i.e. not crashing, not failing in the first place). Redundancy mechanisms improve matters up to a point but they also introduce further complexities. This should be readily apparent to everyone since nearly every service in existence sees regular glitches. Be it Google (GMail and Google Docs, for example, glitch-out all the time), brokerage, bank, ATMs, whatever. Fail-over subsystems can twist themselves into knots when just the wrong sequence of events occurrs. There is a limit to just how reliable one can make something. For an application, ultimately the best guarantee is to have an application-specific remote log that can be replayed to restore corrupted state. That is, to not entirely rely on localized fail-over, storage, or other redundancy mechanisms. One then relies on the near impossibility of the dedicated remote log machine crashing and burning at exactly the same time the primary servers crash and burn. For HAMMER2, well... our failure paths are not well tested. Like with most other filesystems. Usually I/O failures are simulated for testing but actual storage system failures can have different false-flag behaviors. What HAMMER2 does is flush in two stages. In the first stage it asynchronously writes all dirty blocks except the volume header (block copy on write filesystem so writing dirty blocks does not modify the originals). Then it waits for those asynchronous writes to complete. Then it issues a device flush. And finally it writes out an updated volume header. Any system crash occurring prior to the writing out of the updated volume header simply restores the filesystem to its pre-flush state upon reboot because the old volume header is not directly or indirectly pointing to any of the new blocks. And for DFly, an async block write failure leaves the buffer marked dirty so the filesystem data and meta-data state remains consistent on the live system (even if it cannot be flushed). This is a choice taken from a list of bad choices, because leaving a block dirty means that dirty blocks can build-up in ram until you run out of ram. But it is better than the alternative (presenting stale data to a filesystem and/or to an application which then causes a chain-reaction of corruption on a running system). But realistically, even the most sophisticated fault-tolerant systems hit situations which require manual intervention. There are just too many moving parts in a modern system that depend on a multitude of behaviors that are specified by standards but not necessarily followed at every stage. So, ultimately, the best protection remains having application-level redundancy via a replayable remote log (verses kernel, filesystem, or block-level redundancy). Other forms of redundancy can reduce error rates but cannot eliminate them, and ultimately reach a point where new potential failure conditions introduced by the added sophistication exceeds the failure conditions that are being protected against. Also, redundancies can introduce points of attack. If you want to crater the performance of a competitor through hacking, the redundancy subsystems offer a tempting target. Almost universally, even commercial systems rely on stability and the added redundancies are only able to deal with a subset of 'common' problems on a live system. And then they fall-back to replaying logs to restore otherwise unrecoverably corrupted state. -Matt
Re: Help to configure automatic fan speed and cpu frequency on ThinkPad X260
If it is GPU related we just might not have a working solution to the power management. You could try adjusting the Xorg configuration to use the "modesetting" driver with acceleration disabled but it's a long-shot. It kinda feels like the GPU is defaulting to a consumption mode that is forcing the fans to run. -Matt
Re: crc errors and installation troubles
Well, its clearly getting I/O errors trying to access the drive. That could point to several possibilities: * Bad SATA port. Try connecting the cable to a different SATA port on the mobo * Bad drive. But you already changed the drive out, so maybe not a bad drive. * Bad power supply. The power supply could be going wonky and causing the drive to reset. * Bad SATA cable. Sometimes old SATA cables can't handle the higher bandwidths of newer motherboards and drives. I have piles of cables at home and over the last 10 years I have had to throw half of them away because they could only handle SATA I or SATA II speeds. -Matt
Re: TeX Live 2021 status on DragonFlyBSD
Thanks Nelson, this looks great! I will look into generating a dragonfly ports build (build- from sources) for it. -Matt
Re: Booting issues
Looks like the undo area got corrupted. The filesystem will probably refuse to mount. You may have to recover the contents to a new filesystem on another storage device using the 'hammer recover' utility. Since this is your root mount, you may be able to boot into the rescue ramdisk (usually boot option 'r') to get a very basic shell and then work through the problem from there. Sometimes it is easier to boot a DFly image from a usb stick and login as 'root' to work the problem from there. -Matt
heads up - vkernels will be broken on master for a while
It turns out that some of the pmap work last year is making MAP_VPAGETABLE memory maps basically not work properly. The pmap work was important and it is undesirable to back any of it out, so what we will have to do instead is take out MAP_VPAGETABLE support which will break vkernel support in master until we can get hardware virtualization operational again. This does not affect running dragonfly in a virtualized environment, it just affects running the 'vkernel's under dragonfly. I hope to bring vkernel support back, but we will have to do it under the umbrella of adding more reliable HVM (hardware virtualization) support to DragonFly. -Matt
Re: Binary packages available for "staged" branch of DPorts?
Antonio (tuxillo) and I build 'staged' regularly, but the staged repos are always in a state of flux and I don't recommend depending on it. That said, I often have an external URL available for binary repo access to test bulks, and at the moment it is operational and on staged: http://apollo.backplane.com/Ripper/live_packages/ Your download bw from this site is going to be pretty horrible though as it is in my home. Maybe Antonio can set up a semi-official staged HTML link from the colo. Again, these links are always in flux, so don't depend on it :-) Antonio believes he can get the next sync to master done by the end of the year. It will have the chromium fix. -Matt
Re: ifconfig seg fault
It sounds like the kernel and the world are out of sync with each other. e.g. old kernel but new world, or new kernel but old world. -Matt
Re: hammer2: ls reports "No such file or directory"
Theoretically as long as the RAID system properly handles the FLUSH command, it should be ok to run it with a volatile write cache. But if it just accepts the FLUSH command without flushing the write cache, then bad things can happen. That error is a CRC failure. Sorry, I wasn't clear before. CHECK FAIL is typically a CRC failure. The CRCs don't match. In terms of what you can do... usually its best to backup and reformat. In this case it looks like a single inode #49686 is messed up. You can try destroying that inode with 'hammer2 -s destroy-inum 49686', then re-run the bulkfree and see if there are any more issues. Generally speaking, if the CHECK FAIL is at an inode you can 'hammer2 destroy-inode ...' the inode and then 'hammer2 destroy ...' any directory entries that were pointing to that inode, but if it is at an indirect block then the filesystem is probably really messed up and the only real choice is to backup, reformat, and restore. Those hammer2 directives are extremely dangerous, I recommend making a full backup before messing around any further. -Matt
Re: hammer2: ls reports "No such file or directory"
Generally speaking this error occurs if a directory entry is present but the related inode cannot be found. You can use a hammer2 directive to destroy the directory entry to clean it up. But before you do so you want to check the media for CHECK FAIL errors. The easiest way to do this is to just read off the entire directory structure with tar, e.g. 'tar cf /dev/null filesystem' and then check the dmesg output for errors. 'dmesg | fgrep CHECK'. Something like that. If the filesystem appears clean other than the disconnected directory entry, then you can use 'hammer2 destroy filename' to destroy the directory entry. Be very careful when doing that. If the filesystem has other problems, such as CRC errors, other CHECK errors, etc then it is best to make a full backup and reformat. Also make sure that bulkfree runs don't have errors. 'hammer2 bulkfree ...' and then check dmesg output as well. -- In terms of how a disconnected inode can happen. It has become more rare but it might still be possible if a power failure or panic occurs during heavy filesystem activity. It shouldn't be possible for CRC errors to occur unless the media itself corrupted the data. -Matt
security update to ftpd now in master and release.
ftpd has received a fairly significant security fix and updating it to the latest version on the master or release branch is recommended if you use. That said, nobody should really be using either ftpd or telnetd any more these days, neither is turned on by default, and we are contemplating removing both from base entirely. -Matt
Re:
I'm fairly sure whatever is causing it is related to the VM and not the filesystem. But what, I don't know. Maybe something related to the storage device implementation on the VM. -Matt
Re:
It looks like an inode became corrupt. How this happened I don't know, but probably the easiest solution in this particular case is to use the 'hammer2' utility to destroy the inode and to delete the bad directory entry (man hammer2), then run a few hammer2 bulkfree passes and see if that cleans it up. Usually when a H2 filesystem becomes corrupt it points to some other issue in the system. This being a VM, there could be any number of potential issues causing the corruption but I don't have any ideas as to what it was in this case. It is usually best to copy the data off and reformat after such events. -Matt
vi or X updates utilize screen switching - how to disable
Recent ports merges or the recent vi update (not sure which) turned on screen switching by default. this is where you edit a file with vi (or other editors) and when you quit the xterm's contents goes back to what it was before the vi was run. If you don't like this operation (and frankly, I really really dislike it myself), the simple solution is to add an X resource for xterm. Create a .Xresources file with this in it: xterm*titeInhibit: true And install it with xrdb in your .xinitrc: xrdb -merge .Xresources I'm not sure about other X terminal apps but I presume they have a similar feature. -Matt
Re: git: calendar(1): Rewrite to support Chinese & Julian calendars
Wow, those are pretty serious calculations! Very cool! A really nice update to the utility. -Matt
Re: may I suggest two new symlinks: newfs_ufs for newfs and newfs_hammer1 for newfs_hammer ?
For now I'd rather not. It would certainly be less ambiguous but there is a lot of history there and, again, potential backwards portability issues. One thing that would be nice would be for 'newfs' to look at the filesystem type in the disklabel for the specified device (if possible) and exec the correct newfs_* program based on that type. Another big problem, though, is that the options and other arguments are different for each newfs_* variant. -Matt
Re: State of IPSEC
IPSEC is gone. Most people use explicit VPNs these days (openvpn works with DragonFly quite well, for example). -Matt
Re: VPN options
I personally use openvpn to good effect. Getting the keys set up is rather annoying and it takes a bit of time to get the network configuration working properly, but once it is operational it works perfectly. I usually pre-create a tap interface and configure openvpn to use that. -Matt
Re: pf table size limit
There is a table-entries limit specified, you can see current settings with 'pfctl -s all'. You can adjust the limits in the /etc/pf.conf file containing the rules with a line like this near the top: set limit table-entries 10 -Matt
Re: Wrecked kernel
It really sounds like a lot of random stuff is missing and you should reinstall. If the machine's hardware is not stable then that is the main issue that needs to be addressed. -Matt On Thu, Jun 18, 2020 at 5:06 PM Jonathan Engwall < engwalljonathanther...@gmail.com> wrote: > Static libraries disappear for no reason > ldconfig -i/usr/local/lib recovers them, though it must be run repeatedly. > After that I wrenched X in semi working order, without a fully functioning > window manager. Screen and mouse check! > It is enough work to make me reinstall. > How is the 5.8.1 installation process, honestly? > Jonathan Engwall > > On Thu, Jun 18, 2020, 12:00 PM wrote: > >> Send Users mailing list submissions to >> users@dragonflybsd.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.dragonflybsd.org/mailman/listinfo/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@dragonflybsd.org >> >> You can reach the person managing the list at >> users-ow...@dragonflybsd.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Users digest..." >> >> >> Today's Topics: >> >>1. Wrecked kernel (Jonathan Engwall) >>2. RE: Wrecked kernel (Jonathan Engwall) >> >> >> -- >> >> Message: 1 >> Date: Wed, 17 Jun 2020 16:31:33 -0700 >> From: Jonathan Engwall >> To: DragonFlyBSD Users >> Subject: Wrecked kernel >> Message-ID: >> > t6zm8n6gemgo29ssl7yksnzb_2qma4rckcdntgmey3ut...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> New problems have begun. >> In the build world of 5_8_1 I am stuck at something that looks like this, >> if you can forgive my cellphone-only state: >> --- libgreputils.a--- >> building static greputils library >> rm -f libgreputils.a >> ar can libgreputils.a 'lorder argmatch.o c-strcasecmp.o ... >> >> Many .o files follow after that specific line ends, I have this: >> >> /usr/libexec/binutils227/elf/nm: 'a file or another.o' : No such file >> >> I did check, nm is an executable. Currently I cannot make symlinks either. >> What might be my trouble? >> >> How can I figure out what is going on? >> >> Jonathan Engwall >> -- next part -- >> An HTML attachment was scrubbed... >> URL: < >> http://lists.dragonflybsd.org/pipermail/users/attachments/20200617/783928e7/attachment-0001.htm >> > >> >> -- >> >> Message: 2 >> Date: Wed, 17 Jun 2020 19:07:07 -0700 >> From: Jonathan Engwall >> To: DragonFlyBSD Users >> Subject: RE: Wrecked kernel >> Message-ID: >> > t6zp+0kjkr4qzv+ghz3kadcqmp3xq2t8yspu9_mdyw2x...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> After replacing files from my partial backup X11 is acting strangely. >> >> Cannot open log file "/root/.local/share/xorg/Xorg.0.log" >> >> The log file is alive and well where it has always been in >> /var/log/Xorg.0.log >> >> There is mention of this on the internet, mostly on ArchLinux, nothing >> useful however. >> >> Another problem, if I compile kernel 5.8 can I get updates? >> >> Or is it so that I need to buildworld...because that was looking bad...but >> if I could get updates maybe it would help. >> >> Thank you anyone at all for any help >> >> Jonathan Engwall >> >> On Mon, Jun 15, 2020, 12:00 PM wrote: >> >> > Send Users mailing list submissions to >> > users@dragonflybsd.org >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > http://lists.dragonflybsd.org/mailman/listinfo/users >> > or, via email, send a message with subject or body 'help' to >> > users-requ...@dragonflybsd.org >> > >> > You can reach the person managing the list at >> > users-ow...@dragonflybsd.org >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Users digest..." >> > >> > >> > Today's Topics: >> > >> >1. Re: Damaged kernel (Jonathan Engwall) >> >2. Re: Damaged kernel (Jonathan Engwall) >> >3. Finding visually similiar images (Tim Darby) >>
Re: Damaged kernel
Not enough info to really help you here. Try going into the BIOS (usually the F2 or DEL key in early boot) and check the boot order, it might be trying to boot from internal storage first and USB second and needs to be switched around. -Matt
Re: dscheck(vn0): b_bcount 2 is not on a sector boundary
Generally speaking these messages look like one or more of these window managers are trying to probe all raw storage devices with sector sizes that are not compatible with the devices. I'm guessing they are doing so in order to try to implement e.g. auto-mount or something similar (detecting usb stick insertions, etc). The error messages themselves can just be ignored. Whether the desktops do the right thing or not is another question entirely. -Matt
Re: jail questions
We did a bunch of work on localhost bindings for jails. Basically it works as follows: * You can specify a list of IPs for the jail (more than one if you desire) * Any 'localhost' binding within the jail will automatically be translated to the localhost IP specified when the jail was created, or the host IP if no localhost IP was specified. * You can, if you desire, isolate the jail's localhost by passing something other than 127.0.0.1 to the jail. You can add 127.0.0.2 to the localhost interface as an alias for example and pass 127.0.0.2 into the jail. Any bindings to 127.0.0.1 within the jail will actually bind to 127.0.0.2 from the point of view of the host outside the jail. There are also a number of jail sysctl variables which effect how network addresses are handled. -Matt
Re: UNIX signals on DragonFly
We can add defines but someone would need to research it. Linux does have a RTSIG_MAX (set to 32) in which we do not appear to have. There is also a __SIGRTMAX define in linux (set to 64). I don't see anything for a RTSIG_MIN, however. -Matt
Re: Upgrade from 5.5 failed
If you installed the world and kernel properly then a 'uname -a' should show the dragonfly version. If it looks right, you can try forcing a pkg update with: pkg update -f pkg upgrade -f -Matt
Re: Upgrade from 5.5 failed
The most recent package sync was May 2nd, which includes chromium version 81 for both -release and -master. -Matt
Re: HAMMER2 PFSes
For now don't try to cluster anything. That work is still in-progress. You can create multiple independent masters on the same device, and you can snapshot them, as well as be able to write to the snapshots. That all works. The snapshots basically work the same as masters. -Matt
Re: New "chromium" does not sync bookmarks or any data
It should just be the "~/.config/chromium/Default/Bookmarks" file. I don't think its changed format. See if you can find your old bookmarks file. -Matt
Re: TeX Live 2020 status on DragonFlyBSD
We usually pull TeX via FreeBSD ports. Looks like there is an open PR in FreeBSD ports to update it. We will poke them a bit on the side channel from our side to see if we can get the port updated! -Matt
Re: mdconfig
We don't have any all-in-one utilities to manage jails. Just the basic jls, jexec, and jail utilities.The jail(8) manual page should be mostly up-to-date with recent changes to improve options management. MD is very, very old. Ancient, really. It was removed ages ago because it was old, crufty, and didn't work as well as VN or TMPFS. It basically formatted a UFS filesystem backed by ram. You can use the VN device if you need a block device backed by ram or swap, or the TMPFS filesystem if you just need a ram and swap backed temporary filesystem. We can't bring MD back, sorry. -Matt
Re: Byhve
preferred-stack-boundary is an option that specifies what the compiler should align procedure stacks to, as a power of 2. So the value 4 means to align procedure stacks to 16 bytes. You can probably remove this option, modern compilers should align elements properly by default. Older compilers did not. All of the floating point options such as --mno-sse5' generally tell the compiler not to use FP instructions or registers in generated kernel code. This is because the kernel does not fully save the user floating point state across system calls or interrupts. These options or the equivalent for clang are mandatory. I don't know what the -mno-abm options does. The indirect-branch mode... ummm. I think that turns on retpoline security mitigations for indirect calls, but I'm not sure. The inline-limit option tells the compiler to allow larger inline functions to be inlined. You can probably remove this. -Matt
Re: mdconfig
I don't think the FreeBSD tools are going to work, per-say. For any ram drives, you should use TMPFS instead of MD. TMPFS is heavily optimized for performance. We only use MD for boot-time strapping of crypto mounts. It's very old and should not be used for anything else. -Matt On Mon, Feb 24, 2020 at 12:31 PM Quelrond wrote: > Hello, > > I am new in DragonFly, coming from FreeBSD world. > > Trying to install FreeBSD tool for jails management CBSD > (https://github.com/cbsd/cbsd), on DragonFly, I stopped by a problem of > mdconfig absent in my DragonFly installation: > > DragonFly drugoj.reseaucloud.local 5.6-RELEASE DragonFly v5.6.2-RELEASE > > It seems that md kernel module is loaded. Where is mdconfig? > > Best regards, > > Peter > > >
Re: Byhve
Not sure what you are trying to do but porting byhve is definitely not an easy undertaking! -Matt On Fri, Feb 21, 2020 at 9:55 PM Jonathan Engwall < engwalljonathanther...@gmail.com> wrote: > I have the headers, folders and vmm.c...but I was thinking if I write over > written memory with a foreign header it would be bad. > And, of course I have to VM to test build a kernel module. > The headers came from a clone of FreeBSD I made from GitHub. Vmm.c I > simply found on the internet. One is missing vmm_ippi.h which I cannot find. > Any suggestions or offers of help? >
Re: DragonFlyBSD 5.6.2 installation failures
Hmm. We changed from hardwiring /dev/da8 to the part-by-label specification to try to avoid situations where the disk configuration didn't match a basic setup. if the kernel boot is recognizing and attaching the usb stick then it should work. Perhaps the problem is that it is having problems recognizing the USB port the stick is plugged into. You could try plugging it into different ports. I would definitely stick with the .img (usb image) files, and AHCI mode. A normal dd without special options usually works for copying (I usually do bs=32k to improve copy performance). It might also be worth trying to install the latest master image rather than the release image. If it recognizes the drive but is having problems with the uuid labeling, you can mount the usb image on another dfly box (usually something like mount /dev/da8s2a /mnt) and mess around with /mnt/boot/loader.conf and /mnt/etc/fstab, then umount and try booting with that image. -Matt
Re: HAMMER2 questions
On Mon, Jan 27, 2020 at 11:53 PM Chuck Musser wrote: > I've run Dfly for several years now, but never paused to understand > HAMMER2 (or for that matter, it's predecessor) very deeply. I've used > undo(1) a few times, but haven't dug into concepts like clustering or > even what exactly PFSes are used for. I have a feeling I'm missing out > on interesting things. So I'm finally getting around to asking a > scattershot bunch of questions. I'm using my simple one-disk setup as a > starting point. Its df -h output is below: > > Filesystem Size Used Avail Capacity Mounted on > serno/5QE57G6D.s1d 193G 1054M 192G 1%/ > devfs 1024B 1024B 0B 100%/dev > /dev/serno/5QE57G6D.s1a 1022M 187M 753M20%/boot > /dev/serno/5QE57G6D.s1e@DATA 18.5G 123M 18.4G 1%/build > /build/usr.obj18.5G 123M 18.4G 1%/usr/obj > /build/var.crash 18.5G 123M 18.4G 1%/var/crash > /build/var.cache 18.5G 123M 18.4G 1%/var/cache > /build/var.spool 18.5G 123M 18.4G 1%/var/spool > /build/var.log18.5G 123M 18.4G 1%/var/log > /build/var.tmp18.5G 123M 18.4G 1%/var/tmp > tmpfs 935M 0B 935M 0%/tmp > procfs4096B 4096B 0B 100%/proc > tmpfs 935M 0B 935M 0%/var/run/shm > > 1. What are are the three entries that begin with "serno" and why is the > root slice not prefixed with /dev/ like the other two? What does > the nomenclature mean, specifically "5QE57G6D", "s1a" and "@DATA", for > instance). > All attached storage devices have a software-readable serial number and can be addressed by that serial number instead of by the raw device. This allows the raw device numbers to probe in random order or for the drive to be moved without having to make adjustments to /etc/fstab. > 2. Are the subdirectories under /build each their own PFS? What do those > do and why are they there? As an administrator, when would I want to > create my own PFSes? I saw someone's example of creating a set of jails > on a dedicated PFS, but I don't understand the implications of doing > that. > The installer used to create a PFS for each major subdirectory but no longer does. Instead it just separates everything into three partitions. The 'a' partition is a small UFS filesystem containing /boot. We use UFS for /boot because its tiny and deletions immediately free up media space. The 'd' partition contains the root mount and anything that one would normally want to backup. For example, /home is on 'd'. The 'e' partition contains the /build mount and has all the stuff one would normally NOT want to backup, such as /var/crash, /usr/obj, /var/log, and so forth. I put /var/spool in there too but its debatable. Mostly I put /var/spool in there too because it might see significant activity and I wanted the root mount to be as unassuming as possible in crash situations. > 3. What are null mounts? It seems like that's how you mount PFSes, at > least judging by the output of the mount(1) command. > A NULL mount mounts one directory that already exists in the filesystem onto another part of the filesystem. Both places reference the same directory. Also note that these NULL mounts are not PFSs. The installer no longer creates a separate PFS for each directory, it just creates three partitions ('a' for boot, 'd' for root, and 'e' for /build) and then uses NULL mounts to arrange the subdirectories that we want to physically be on /build. 4. What are clusters, in HAMMER2? Can I use them to create a big > filesystem from storage on separate systems? > Clusters are not implemented yet, but that's the idea. > 5. I think I recall hearing that the remote mirroring feature in HAMMER > is not yet supported in HAMMER2. Is that status still accurate? > This is correct. My recommendation for now is to just use a regular rsync or cpdup, and then use hammer2's snapshot feature to create a frozen snapshot of the backup. Ultimately mirroring will be built into the clustering. Not yet though. > 6. Although I have a feeling that RAID configuration is a lower level > construct than HAMMER, what are the options? I did see that the > natacontrol(8) is able to manage RAIDs, but If I happened to get my > hands on NVMe storage, those might call for a different utility? There > is a nvmectl, but it seems like it might be for a different purpose. > > Thanks, > > Chuck > Honestly I don't recommend using NATA or natacontrol in any manner. The ATA code is extremely crufty, old, and just not reliable. Use AHCI mode (which attaches as /dev/da0, /dev/da1, etc...). The AHCI driver is fully supported and reliable. If the BIOS does not support an AHCI-based softraid then your best raid option with DragonFly is some sort of
Re: .note.tag, readelf and dsynth version detection
I pushed a hack to dsynth to use the second field if the first is zero, please test. -Matt On Mon, Jan 27, 2020 at 5:05 PM Matthew Dillon wrote: > Hmm. that's a good point. It does look like a repeated structure. I > will look into it. > > -Matt > > On Mon, Jan 27, 2020 at 4:15 PM Romick wrote: > >> I was probably lucky :) Of course, I could be wrong, but it seems to me >> that these pieces are not "fields" of the same structure, these are fields >> that belong to two records in the file. Is the order of these records >> guaranteed? >> >> I mean, it’s possible if I rebuild the world now, the linker will arrange >> these records in a different order and everything will be fine, or maybe >> not :) >> >> On Mon, Jan 27, 2020 at 03:56:25PM -0800, Matthew Dillon wrote: >> > That's ... weird. the 'zero' and the 'version' fields are transposed. >> Are you >> > compiling in any special way? I've tested -release and -master on a >> bunch of >> > boxes and they all have the version in the right spot. >> > >> > -Matt >> > >> > On Mon, Jan 27, 2020 at 1:45 PM Romick >> wrote: >> > >> > Hello, >> > It seems that dsynth defines the system version based on the >> .note.tag(s) >> > in >> > /bin/sh and a necessary condition is that these entries follow in a >> > certain order. On my system this is not so :) >> > >> > == >> > rabbit@fly ~% readelf -x .note.tag /bin/sh >> > >> > Hex dump of section '.note.tag': >> > 0x00400218 0a00 0400 2000 44726167 ...Drag >> > 0x00400228 6f6e466c 7900 0a00 onFly... >> > 0x00400238 0400 0100 44726167 6f6e466c DragonFl >> > 0x00400248 7900 e5a30700 y... >> > >> > rabbit@fly ~% >> > == >> > >> > === /usr/src/usr.bin/dsynth/config.c === >> > struct NoteTag { >> > Elf_Note note; >> > char osname1[12]; >> > int version;/* e.g. 500702 -> 5.7 */ >> > int x1; >> > int x2; >> > int x3; >> > char osname2[12]; >> > int zero; >> > }; >> > >> > >> > -- >> > with best regards, >> > Yellow Rabbit @yrab...@mastodon.sdf.org >> > DragonFly 5.7-DEVELOPMENT x86_64 >> > >> >> -- >> with best regards, >> Yellow Rabbit @yrab...@mastodon.sdf.org >> DragonFly 5.7-DEVELOPMENT x86_64 >> >
Re: .note.tag, readelf and dsynth version detection
Hmm. that's a good point. It does look like a repeated structure. I will look into it. -Matt On Mon, Jan 27, 2020 at 4:15 PM Romick wrote: > I was probably lucky :) Of course, I could be wrong, but it seems to me > that these pieces are not "fields" of the same structure, these are fields > that belong to two records in the file. Is the order of these records > guaranteed? > > I mean, it’s possible if I rebuild the world now, the linker will arrange > these records in a different order and everything will be fine, or maybe > not :) > > On Mon, Jan 27, 2020 at 03:56:25PM -0800, Matthew Dillon wrote: > > That's ... weird. the 'zero' and the 'version' fields are transposed. > Are you > > compiling in any special way? I've tested -release and -master on a > bunch of > > boxes and they all have the version in the right spot. > > > > -Matt > > > > On Mon, Jan 27, 2020 at 1:45 PM Romick > wrote: > > > > Hello, > > It seems that dsynth defines the system version based on the > .note.tag(s) > > in > > /bin/sh and a necessary condition is that these entries follow in a > > certain order. On my system this is not so :) > > > > == > > rabbit@fly ~% readelf -x .note.tag /bin/sh > > > > Hex dump of section '.note.tag': > > 0x00400218 0a00 0400 2000 44726167 ...Drag > > 0x00400228 6f6e466c 7900 0a00 onFly... > > 0x00400238 0400 0100 44726167 6f6e466c DragonFl > > 0x00400248 7900 e5a30700 y... > > > > rabbit@fly ~% > > == > > > > === /usr/src/usr.bin/dsynth/config.c === > > struct NoteTag { > > Elf_Note note; > > char osname1[12]; > > int version;/* e.g. 500702 -> 5.7 */ > > int x1; > > int x2; > > int x3; > > char osname2[12]; > > int zero; > > }; > > > > > > -- > > with best regards, > > Yellow Rabbit @yrab...@mastodon.sdf.org > > DragonFly 5.7-DEVELOPMENT x86_64 > > > > -- > with best regards, > Yellow Rabbit @yrab...@mastodon.sdf.org > DragonFly 5.7-DEVELOPMENT x86_64 >
Re: .note.tag, readelf and dsynth version detection
That's ... weird. the 'zero' and the 'version' fields are transposed. Are you compiling in any special way? I've tested -release and -master on a bunch of boxes and they all have the version in the right spot. -Matt On Mon, Jan 27, 2020 at 1:45 PM Romick wrote: > Hello, > It seems that dsynth defines the system version based on the .note.tag(s) > in > /bin/sh and a necessary condition is that these entries follow in a > certain order. On my system this is not so :) > > == > rabbit@fly ~% readelf -x .note.tag /bin/sh > > Hex dump of section '.note.tag': > 0x00400218 0a00 0400 2000 44726167 ...Drag > 0x00400228 6f6e466c 7900 0a00 onFly... > 0x00400238 0400 0100 44726167 6f6e466c DragonFl > 0x00400248 7900 e5a30700 y... > > rabbit@fly ~% > == > > === /usr/src/usr.bin/dsynth/config.c === > struct NoteTag { > Elf_Note note; > char osname1[12]; > int version;/* e.g. 500702 -> 5.7 */ > int x1; > int x2; > int x3; > char osname2[12]; > int zero; > }; > > > -- > with best regards, > Yellow Rabbit @yrab...@mastodon.sdf.org > DragonFly 5.7-DEVELOPMENT x86_64 >
Re: OT: third party relay attack
I last looked at it a few years ago but there were numerous DNS based services that you could use to test IP addresses and domains. But they never worked well... they tended to block a lot of legitimate mail along with the spam, and tended to always be out of date. You can also turn on SPF validation, which actually helps protect against third-party relays (at least for well known domains). There might be some setup required though, it depends on the mail server. You can google up instructions for e.g. turning SPF on with postfix (I think it requires a few perl modules). Instructions won't be accurate for DFly but they will give you a good template for what needs to be done. But if you do that, be sure to test that the mail server is still accepting mail from important domains that you communicate with. It's really easy to misconfigure. -Matt
Re: OT: third party relay attack
There isn't a whole lot that can be done short of white-listing only allowed originators and recipients. Most anti-spam services filter out critical non-spam emails along with the spam. What I do for my personal domain is actually forward all my mail, spam and all, to my gmail account and let Google's spam filters deal with it (to the tune of hundreds of spams a day). And for DragonFlyBSD's domain... we've mostly not using it for email beyond the mailing list server and the mailing list server is essentially white-listed based on the subscriptions. -Matt On Fri, Jan 10, 2020 at 7:36 AM Steffen Nurpmeso wrote: > Pierre Abbat wrote in <3633605.BztBv1gPr2@puma>: > |My mailserver is being attacked by what looks like a botnet since > December \ > |16 > |at 6:07 (11:07 UTC). Many hosts all over the world are sending mail \ > |purporting > |to be from many domains all over the world to a few domains in Russia. \ > |Most of > |the IP addresses are blocked by uceprotect.net; a few are blocked by > other > |blocklists. A few are not blocked, but are rejected with "Relay access > |denied". The messages come at a rate of several per second. > | > |There are 133 emails stuck in leaf's mail queue, but they do not appear \ > |to be > |related to this attack. > > Fwiw, not being an administrator and having had no idea of that > side of the road, i learned to let connections "sleep" for > a while. This is possible with Postfix, for example. First i let > them hang, before blacklist lookups. It reduced those attacks > a little bit. E.g., > > smtpd_relay_restrictions = > sleep NUMBER, > reject_invalid_helo_hostname, > reject_non_fqdn_helo_hostname, > reject_non_fqdn_sender, > reject_non_fqdn_recipient, > sleep NUMBER > > You can set restrictive error counts > > smtpd_soft_error_limit = 1 > smtpd_hard_error_limit = 1 > smtpd_per_record_deadline = yes > smtpd_timeout = 21s > > This i did after i have switched to OpenSMTPD for one day. Like > magic, a few hours after i did, there was one connection, it did > nothing for a few seconds, followed by another one, and then these > two started sending mails like grazy to Taiwenese Yahoo addresses > i think it was. They then entered a wave of disconnections and > reconnections with other addresses which continued this work. (My > firewall throttles over time.) Well, i got a nice information > mail from Yahoo Taiwan i think it was saying that they blocked my > IP temporarily because of the activity. Blocking had no influence > on the attack itself. Realizing the OpenSMTPD config error > i fixed that, but their misuse continued, and OpenSMPTD did not > seem to have something like Postfix's _error_limit (my query on > OpenSMTPD bugs/tracker never received an answer), so after > continuously blacklisting the bots' IP addresses i threw away > OpenSMTPD and reinstalled Postfix, with the error_limit reduced > from 3 to 1. Attack over. > > Having said that, it would be tremendous if servers like Postfix, > dovecot, ssh, would offer hooks which would get invoked on > connection establishment and break, to be able to track > un/successful logins as well as "nonsense connections" etc. so > that the entire [di]notify/log file parse sauce could vanish. > Always strived me being total nonsense that log files are parsed > to collect the info that servers had at hand. Christos Zoulas of > NetBSD implemented the blacklistd with patches for i think at > least Postfix and ssh, this does implement that for logins at > least. FreeBSD imported that. > > Of course all that does not help against firewall rules aka tables > filling with lots of addresses to be blocked. I have some general > rate limiting, but sometimes this bites real connectivity, for > example if people merge their readily prepared git topic branches > into mainline repositories, and dozen of messages from the same > server fly in. I have no idea on what to do against these two > problems. > > --steffen > | > |Der Kragenbaer,The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt) >
Re: Failed make buildworld on 5.6.2
Check the BIOS settings for the memory and, if possible, run the memory at a lower frequency or pump the voltage up slightly from stock, and you may be able to temporarily work around failing memory. You can also try re-seating the memory, it could be dust. Try this first, actually. -Matt
Re: One ssh hangs on exit, the other doesn't
ssh won't exit unless all virtual links, such as the X forwarding, are also gone. I think there are some bugs related to that (I'm pretty sure it isn't dragonfly specific). So running with X forwarding can cause the situation you describe. -Matt On Thu, Dec 5, 2019 at 12:15 PM Steffen Nurpmeso wrote: > Pierre Abbat wrote in <12216388.yjY6BSWRI4@mooncat>: > |On Thursday, 5 December 2019 03.48.18 EST Harald Brinkhof wrote: > |> Maybe some background processes keep running and prevent the X program > \ > |> from > |> sitting down? > |> > |> http://www.snailbook.com/faq/background-jobs.auto.html > | > |The examples at the top of that page don't make sense. If I start an \ > |xterm in > |an ssh session, and log out of the ssh session with the xterm still \ > |running, I > |expect ssh to keep running until I close the xterm. When I ssh into \ > |leopard, > |my local mail server, and forward the mail ports, the ssh session hangs > on > |exit, because kmail still has the mail ports open. > | > |When I shelled into zyxomma, I exited each program before logging out \ > |of ssh. > |I don't understand why one ssh session closed and the other hung. > > Does it still hang when ControlMaster exits? > I have seen this on Linux->Linux (same host even) with > X forwarding in muxer sessions, but it went away as rolling > updates flew by. (At the moment i have > > Load key "/home/steffen/local_ed25519.pub": invalid format > > that that makes no sense :}) > > |Pierre > |-- > |loi mintu se ckaji danlu cu jmaji > --End of <12216388.yjY6BSWRI4@mooncat> > > --steffen > | > |Der Kragenbaer,The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt) >
Re: Hang in tcdrain(3) after write(3)
If modem control is turned on you have to go into clocal mode to drain commands, otherwise its waiting for a carrier. That's one possibility. -Matt On Mon, Nov 25, 2019 at 3:14 PM Jeffrey Walton wrote: > Hi Everyone, > > I'm testing some software on DragonFly. There's not much to it. It > talks to the modem, and sends an ATZ and then reads the response. > Linux, FreeBSD, NetBSD, OpenBSD and OS X are OK. > > My test rig is DragonFly 5.6-RELEASE x86_64 (fully patched) with a > USR5637 modem, https://www.amazon.com/gp/product/B0013FDLM0. The modem > is located at /dev/cuaU0. > > DragonFly hangs on the call to tcdrain(3). Looking at the man page I > don't see any special handling. Cf., > http://man.dragonflybsd.org/?command=tcdrain. > > Attached is the reproducer. The trace is: > > % ./test.exe > Setting TIOCEXCL > Getting tty > Setting tty options > Flushing tty > Setting tty > Writing ATZ > Waiting for write <<-- call to tcdrain(fd) > > Thanks. >
master users - heads up, full buildworld/buildkernel needed for recent commits
Everyone using master needs to be sure to do a full buildworld, buildkernel, installkernel, installworld sequence due to the addition of a new system call. Piecemeal compilation could result in a non-working system due to libc and pthreads linking for the new symbols. -Matt
Signal safety additions now in master
Some fairly significant signal safety additions have been made to libc and pthreads in master. Anyone upgrading please note that you will need to do a full world and kernel build. The basis for this is that for a very long time we've had problems building a number of large packages, in particular lang/mono and lang/rust. The builds would only succeed one time out of ten, or one time out of twenty, and just fail to complete the rest of the time. After a lot of messing around we traced the failures down to signal safety issues in both libc (primarily the malloc subsystem), but also pthreads (primarily mutex recursions and deadlocks from signal handlers). But not just these two applications. Really any large, sophisticated application that tries to implement any sort of signal-based asynchronous garbage collection or other mechanism is stretching the limits and causing issues. As these applications become more sophisticated they begin not only to trip over our library code, but their own code as well. The result is that some applications are just aren't as robust as we'd like them to be. The basic problem with doing signal-safety is that it normally eats at least two system calls, often in performance-critical code paths, as well as messes with signal masks which can interfere with the expectations of the application. To solve the problem properly and be able to enable signal safety across every important function in libc and pthreads I have created another shared user/kernel memory mapping called 'lpmap', which is similar to the upmap and kpmap, but which is able to give us a per-thread interface. lpmap is per-thread, upmap is per-process, and kpmap is system-wide. With this new mechanism signal safety is as easy as incrementing and decrementing a variable in memory that the kernel also has access to and can use to prevent posts during critical section of code. The signal safety now being tested in master uses this new mechanism. Two new libc routines have been created to support it: https://leaf.dragonflybsd.org/cgi/web-man?command=sigblockall=ANY The memory allocator and most internal mutex used by pthreads have been wrapped with this functionality. We couldn't do this without the 'lpmap', it would just be too expensive otherwise. Languages such as rust and mono are now far more reliable because of this, and we continue to test other applications. The primary repo commits implementing this are shown below. Note that there are also some follow-up stability and cleanup commits to these: https://gitweb.dragonflybsd.org/dragonfly.git/commit/721505dec240e78696660384d988da78813a33bd https://gitweb.dragonflybsd.org/dragonfly.git/commit/64b5a8a550c3c782ab04d04d63723691ac054ffc Master will see more commits in coming weeks wrapping more library code and/or replacing existing uses of sigprocmask() system calls (particularly in rtld) to improve performance. sigblockall()/sigunblockall() is a much cleaner mechanism. -Matt
Re: Re[2]: pkg upgrade error after system failure
A complete wipe and reinstall... ok, well, you may have to re-bootstrap the packages. If you are getting a certificate verification failure you may be missing /etc/ssl/cert.pem or /usr/local/etc/ssl/cert.pem. cd /usr make pkg-bootstrap-force Then see if the 'pkg' command works again by trying to install a few things. I'm not sure if that will wipe the corrupt database. If not you will have to rm the database... Antonio, do you remember what files must be removed to do that? -Matt On Sun, Nov 10, 2019 at 10:44 AM Ilia Gorstkine wrote: > > Hi Antonio, > I use Hammer. > How can I reinstall all packages from scratch if both pkgng and dports > fails with these errors respectively? > Certificate verification failed for... > and pkg: sqlite error while executing... > > Пятница, 8 ноября 2019, 18:07 +03:00 от Antonio Huete Jiménez < > tuxi...@quantumachine.net>: > > Hi, > > I've just made a fresh 5.6.2 installation to a VM and everything > worked as expected. I suspect your hard reset has caused some kind of > corruption both in the certificates and the pkgng database. You may > have to reinstall all the packages again from the scratch. > > BTW, which filesystem did you use for the installation? > > Regards, > Antonio Huete > > > > > Ilia Gorstkine escribió: > > > Sony vaio pcg-41213v > > 5.6-release DragonFly v5.6.2.4.g39d387-RELEASE #4 > > > > When updating packages by pkg upgrade at the installation stage my > > laptop freezes tightly and I had to turn it off via the power button. > > After booting the system, the pkg upgrade command throws errors: > > Updating Avalon repository catalogue... > > Certificate verification failed for /C=SE/O=AddTrust AB/OU=AddTrust > > External TTP Network/CN=AddTrust External CA Root > > 34371375740:error:14007086:SSL routines:CONNECT_CR_CERT:certificate > > verify > > failed:/usr/src/lib/libressl/../../crypto/libressl/ssl/ssl_clnt.c:1121: > > pkg: > > > https://mirror-master.dragonflybsd.org/dports/dragonfly:5.6:x86:64/LATEST/meta.txz: > Authentication > > error... > > ... > > pkg: > > > https://mirror-master.dragonflybsd.org/dports/dragonfly:5.6:x86:64/LATEST/packagesite.txz: > Authentication > > error > > Unable to update repository Avalon > > Error updating repositories! > > > > I have tried other repositories, but with the same results. > > I tried installing ca_root_nss, but pkg install ca_root_nss returned > > the same error. > > pkg fetch, pkg search, pkg update - same error. > > pkg backup -r /var/backups.sql.xz - recovery went ok, but nothing > changed. > > pkg version returns the following: > > pkg: sqlite error while executing ALTER TABLE packages ADD > > licenselogic INTEGER NOT NULL DEFAULT(1); in file pkgdb.c:2477: no > > such table: packages > > > > Can anyone help restore pkg database? > > Thanks in advance! > > -- > > Ilia Gorstkine > > > > > > -- > Ilia Gorstkine >
Re: Arena and threads
Generally speaking the allocation algorithm depends on the language, the OS doesn't have a lot to do with it. Anything linked against libc (that is, any C program), however, will use the allocator in libc which we do control. That allocator allocates per-thread zones. That said, the memory allocator itself, no matter what the language, is still just allocating memory that is shared across all the threads. No copying is involved. All the threads share the same address space. Any locking is up to the program (or the language) itself. -Matt
Re: Samsung QX411L laptop...
Yah, those older laptops do tend to have better compatibility. The newer ones have wifi chipsets that we often don't have support for, among other issues that crop up. -Matt
Master can now be updated - a full world + kernel build and pkgs upgrade is required.
For people using master, all the ABI breakage has been committed and new binary packages have been generated. A full world + kernel build is required, reboot, and then a full replacement of all packages (typically via 'pkg upgrade -f') is required. Remember to upgrade packages after rebooting, not before. Generally speaking, the breakage means that numerous new packages will not run on an old kernel and numerous old packages will not run on a new kernel. Packages which use messaging (which would be many of the web packages such as chrome and X applications such as xpdf) or mess with network interfaces (such as named / bind) will break if not matched. Sometimes the packages database can get confused when doing a major replacement like this. If the pkg upgrade finds dangling packages that it can't figure out it is usually best to ^C, delete them with pkg delete, and then try again, until pkg upgrade is able to run cleanly with only its normal deinstall/install/reinstall output. If things get really bad you might have to delete your packages and reinstall. You really don't want any older packages or libraries sitting around if you upgrade to the latest master. -Matt
ABI breakage commits complete - a few more days for dports (for master)
The ABI-breaking commits to master are now in the tree. However, it is going to be a few more days before we can rebuild dports. If you are using master, I recommend refraining from updating until early next week when we get the binary package set updated. For master users, when you do update next week you will have to rebuild and reinstall everything. world, kernel, and pkg upgrade. -Matt
Heads up - ABI breakage going into master
Commits today, this evening, and tomorrow are going to break a few ABI's. Since we're breaking one, we might as well fix the other little niggling issues at the same time. If you are on master, we recommend not updating until new binary packages are available for it, which might not be until the weekend or possibly even early next week. If you decide to upgrade anyway, a full world and kernel build is required. Commits are still incoming, but by tomorrow afternoon they should all be in. I do not recommending rebuilding until at least tomorrow evening, or later if you also need to update ports. -Matt
New servers in the colo, monster is being retired.
We have three new servers in the colo now that will be taking most/all bulk package building duties from monster and the two blades (muscles and pkgbox64) that previously did the work. Monster will be retired. The new servers are a dual-socket Xeon (sting) and two 3900X based systems (thor and loki) which all together burn only around half the wattage that monster burned (500W vs 1000W) and 3 times the performance. That's at least a 6:1 improvement in performance efficiency. With SSD prices down significantly the new machines have all-SSDs. These new machines allow us to build dports binary packages for release, master, and staged at the same time and reduces the full-on bulk build times for getting all three done down from 2 weeks to 2 days. It will allow us to more promptly synchronize updates to ports with dports and get binary packages up sooner. -- Monster, our venerable 48-core quad-socket opteron is being retired. This was a wonderful dev machine for working on DragonFly's SMP algorithms over the last 6+ years precisely because its inter-core and inter-socket latencies were quite high. If a SMP algorithm wasn't spot-on, you could feel it. Over the years DragonFly's performance on monster in doing things like bulk builds increased radically as the SMP algorithms got better and the cores became more and more localized. This kept monster relevant far longer than I thought it would be. But we are at a point now where improvements in efficiency are just too good to ignore. Monster's quad-socket opteron (4 x 12 core 6168's) pulls 1000W under full load while a single Ryzen 3900X (12 core / 24 thread) in a server configuration pulls only 150W, and is slightly faster on the same workload to boot. I would like to thank everyone's generous donations over the last few years! We burned a few thousand on the new machines (as well as the major SSD upgrades we did to the blades) and made very good use of the money, particularly this year as prices for all major components (RAM, SSDs, CPUs, Mobos, etc) have dropped significantly. -Matt
Heads up - TAP and TUN changes in master may require a little rejiggering
For anyone using openvpn or otherwise using TAP and TUN, changes in TAP and TUN in master may break your systems. However, they should be easy to fix. Basically TAP and TUN no longer pre-create the first four interfaces (tap0...tap3 and tun0...tun3). Some application code might depend on scanning unit numbers to find an available device and no longer work. These are auto-clone devices, which means that opening "/dev/tap" or "/dev/tun" automatically creates a new interface. Not all application code handles this properly but there is an easier way to deal with these interfaces and that is to use 'ifconfig tap8 create' or something like that for tap and tun to pre-create a specific unit, and then specify that specific device in the application needing it. "/dev/tap8", etc. For example: ifconfig tap8 create ... and in the openvpn configuration one might then specify 'dev tap8'. If you are running release, this methodology also works the same so you can get an early start and fixup the system now so it smoothly upgrades to a new DFly version later on. -Matt
Re: DragonFlyBSD Project Update - colo upgrade, future trends
The mailing list software has been less than stellar, but the bigger problem is in areas that we have very little control over. We have no control over other people's spam filters, and the mailing list software itself has to deal with a constant influx of spam (which is why you have to be subscribed, now). It is almost impossible to manage it any other way. Nearly all of the internet has moved on to WWW based forum-like mechanisms because they are a whole lot easier to manage. We're going to have to as well. I feel that we do not have a choice here. Privately-run mail systems, in general, are almost dead due to the spam load. I have to forward my own personal domain email through GMail just to be able to continue using it and my GMail spam mailbox consistently contains more than 3000 spams in it (30-day expiration, so ... 100+ spams per day). And that doesn't count the ones Google auto-deletes immediately or the ones my smtp server discards. I've tried everything possible to keep my personal domain and dragonfly's domain email usable but its an impossible task. -Matt On Sun, Jul 28, 2019 at 1:41 PM Constantine A. Murenin wrote: > On Mon, 22 Jul 2019 at 15:56, Matthew Dillon wrote: > >> The mailing lists are not seeing much if any activity any more. This is >> more a generational issue... people kinda prefer web-based forums these >> days and younger generations do not use mailing lists at all for group >> stuff (not really). Even the devs almost universally use IRC and not >> mailing lists for discussions now (its kinda too bad that we don't have a >> permanent irc log stored on DFly servers for posterity). So we are looking >> into potentially shifting user interaction to a web-based forum, perhaps >> this year, and retiring the mailing lists, leaving just an archive for the >> mailing list. Possibly sometime this year, so look for action on that >> upcoming. >> > > I would think that part of this must be because messages sent to the > mailing lists are silently discarded from non-subscribers. IME, > DragonFlyBSD.org doesn't even send out any error messages in this > instance. I've repeatedly had this happen to me, several times over the > years, and I bet others have been affected as well. On OpenBSD.org, in > these instances, you simply receive a confirmation email asking you to > confirm that you've sent the message (good because the feedback and the > resolution are both instant — apart from the greylisting by PF spamd). On > FreeBSD.org, depending on list, moderators eventually approve any such > messages (often causing a delay of several days). > > I think it'll be a sad day to see the mailing lists go. They are so much > better than the forums from so many perspectives, including archival. I > find forums problematic due to censorship and lack of accountability, not > to mention archival issues — not even posters themselves would have copies > of their own posts, unless extra care is taken, usually on the part of the > poster, requiring quite some discipline. On nginx.org, there is some > sort of a forum-based mirror and gateway for the mailing lists, perhaps > that's what DragonFly might be interested in adopting as well, if forum > availability is a requirement? > > Cheers, > Constantine. > http://cm.su/ >
DragonFlyBSD Project Update - colo upgrade, future trends
NetBSD and OpenBSD and I'd kinda like to know what their plans are, because the future is clearly going not only multi-core, but many-core. For everything. But as I like to say, for SMP there are only three at the moment. One can't dispute that Linux has nearly all the eyeballs, and DragonFly has very few. But OpenSource tends to live on forever and algorithms never die... I think there is a place for all of these projects and there really aren't any alternatives if you want a sparkling clean system that doesn't have too many layers of abstraction. At the current Juncture DragonFlyBSD is doing well and there are no plans to slow down or stop. There are many other developers who help out with DragonFlyBSD on a regular basis, or drop in from time to time, as well as past developers who did an awful lot of work. For this I am going to run the names out of the git log in alphabetical order, so I don't miss anyone (hopefully). And to 'User' and 'Charlie Root'... we will never know who you were, but the party is still going! -Matt Aaron LI Adam Hoka Adam Sakareassen Adrian Chadd Aggelos Economopoulos Alex Hornung Alexander Kuleshov Alexander Polakov Alexandre Perrin Antonio Huete Antonio Huete Jimenez Antonio Nikishaev Aycan iRiCAN Ben Woolley Bill Yuan Brad Hoffman Brills Peng Charlie Root Chris Pressey Chris Turner Chris Turner Chris Wilson Christian Groessler Constantine A. Murenin Daniel Bilik Dave Hayes David P. Reese David Rhodus David Shao David Xu Diederik de Groot Dimitris Papastamos Dylan Reinhold Ed Schouten Eirik Nygaard Eitan Adler Francis GUDIN Franco Fichtner Fran\xc3\xa7ois Tigeot Gregory Neil Shapiro Gwenio Hasso Tepper Hidetoshi Shimokawa Hiroki Sato Hiten Pandya Ilya Dryomov Imre Vadasz Imre Vad\xc3\xa1sz Imre Vad\xc3\xa1sz Jan Lentfer Jan Sucan Javier Alc\xc3\xa1zar Jean-S\xc3\xa9bastien P\xc3\xa9dron Jeffrey Hsu Jeremy C. Reed Jeroen Ruigrok/asmodai Joe Talbott Joerg Sonnenberger Johannes Hofmann John Marino Jordan Gordeev Joris Giovannangeli Justin C. Sherrill Levente Kurusa Liam J. Foy Lubos Boucek Magliano Andrea Markus Pfeiffer Matt Dillon Matteo Cypriani Matthew Dillon Matthias Rampke Matthias Schmidt Maurizio Lombardi Max Herrgard Max Herrg\xc3\xa5rd Max Okumoto Maxim Ag Michael Neumann Michael Neumann Michael Neumann Mihai Carabas Nicolas Thery Nicolas Thery Nolan Lum Noritoshi Demizu Nuno Antunes Nuno Antunes Peeter Peeter Must Peter Avalos Pierre-Alain TORET Robert Garrett Robin Hahling Rui Paulo Rumko Samuel J. Greear Sascha Wildner Scott Ullrich Sepherosa Ziehau Simon 'corecode' Schubert Simon Arlott Simon Schubert Simon Schubert Stathis Kamperis Sylvestre Gallon Thomas E. Spanjaard Thomas Nikolajsen Tim Tim Bisson Tobias Heilig Tomasz Konojacki Tomohiro Kusumi Ulrich Sp\xc3\xb6rlein User Venkatesh Srinivas Victor Balada Diaz Vishesh Yadav Walter Sheets YONETANI Tomokazu Yellow Rabbit Yonghong Yan Zach Crownover b86 dumbbell glebius hrs jkim minux rnoland sinetek zrj \xc3\x81kos Kov\xc3\xa1cs -Matt
Re: Taskset for Dragonfly BSD
We have a utility called 'usched' which does basically that, though our usched is pretty old and minimalist... it doesn't handle specifications for more than 64 cores whereas the cpumask API handles up to (currently) 256 cores. Our usched definitely needs to be updated. -Matt
Re: Errors during mirror-copy after upgrading to 5.6.1
No changes were made to HAMMER1 that would effect the btree code, so I suspect it is a coincidence. If the main filesystems look ok it could be that the error resides on one of the snapshots. You can try deleting all the snapshots on the source machine via 'hammer prune-everything ' and then see if you still get mirroring errors. If you still get mirroring errors after doing that, your best bet would be to backup the two PFSs in question, then destroy and recreate them and restore. -Matt
Three more items brought into -release (fix for paging-to-swap bug, ahci polling, improved read/write fairness for hard drives)
The release branch's kernel has two more bug fixes and a new read/write fairness feature for (primarily) hard drive accesses. The first bug fix deals with a paging issue. A machine which pages heavily to swap can wind up looping on pages in the inactive queue, writing the same pages out over and over again without making progress. This can eventually make a machine unusable. This bug has been fixed. The second item is a mitigation to a possible AHCI chipset bug, which is how most SATA drives are attached. The mitigation is to add a poll every 10 seconds just in case the chipset misses an interrupt somehow. We've had a number of reports of sata drives deadlocking for no reason. This mitigation is an attempt to narrow down the problem. The third item is a modification to the 'da*' disk device attachment which balances read and write I/O when both operations are present. Hard drives have large write buffers and even though the driver makes sure that both reads and writes get tags, a hard drive can wind up starving read requests due to its write buffer filling up. A single tag is all that is needed to fill up a hard drive's write buffer. The new feature detects this situation and ensures that read TPS is given a fair shake by temporarily blocking writes. This last item significantly improves concurrent reads and writes to a hard drive, particularly one used for swap (NOTE: we recommend only using SSDs for swap, but yes, some people still use hard drives for swap). It may also avoid a possible read starvation issue caused by the hard drive itself that could cause a read command tag to not get processed in a reasonable period of time, long enough to potentially cause a CMD TIMEOUT by the driver. -- We are tracking several bug reports related to "indefinite wait buffer" messages on the console, typically related to heavy paging to/from swap. The AHCI polling mitigation and the TPS balancing feature are an attempt to narrow down the possible problem source and possibly even fix the issue. -Matt
Re: DragonFly 5.6.1 tagged and built
As karma would have it, we've been working on getting a more recent version of chrome operational and we were having issues with blank or incomplete pages coming up. It turns out that the best way to deal with it is to make yet another change to the kernel. This change has been pushed to master and release. We now expect to have chromium-75 in dports and our binary packages sometime this weekend for both release and master. It will be significantly more stable than the current chromium package. However, you will need to compile and install the latest release or master kernel depending on your system. I don't think we are going to roll a 5.6.2 for this so soon after 5.6.1, I'd like to wait maybe a month to see if anything else comes out of the woodwork before rolling 5.6.2. -Matt
Re: How to get the thread number and every thread's ID of a running process?
My favorite is: ps axlRH That not only gives you the threads, it gives you all the parent/child relationships in nicely indented output. Use a wide xterm. Or do it without the 'l' (ps axRH) to make more room. -Matt
Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master
Ok, I'll try to reproduce it. What GPU do you have? Is this an Intel iGPU (which cpu?), or is this a radeon of some sort ? -Matt
Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master
Try going back to the default /usr/src/sys/config/X86_64_GENERIC kernel config and see if you still have the lockup problem. There are two possibilities. One is that the removal of the DDB option caused the kernel to enter into a cpu-bound loop instead of panicing on an improper user space address, and the second possibility is that there is some other bit of code related to one of the other options, such as INVARIANTS, whos removal is causing problems (if you commented out other options as well). Since you are running a custom kernel config, if you could post a diff -u of the default verses your current config, that would be helpful. I might need it to reproduce the issue. -Matt On Fri, Jun 21, 2019 at 5:46 PM Matthew Dillon wrote: > The unused function is because DDB isn't compiled into your kernel. I > will push a fix for that so DDB doesn't have to be compiled in. > > I'll see if I can reproduce the lockup issue with compton. > > -Matt >
Re: Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master
The unused function is because DDB isn't compiled into your kernel. I will push a fix for that so DDB doesn't have to be compiled in. I'll see if I can reproduce the lockup issue with compton. -Matt
Late commit to drm code fixes lockup - for 5.6 and master, and opie removal instructions for master
We had a bit of an oops with the 5.6.0 release. We did insufficient testing of a particular drm change and a serious X lockup bug wound up in the release. We will be rolling 5.6.1 but in the mean time anyone running the release can update their sources to get the fix and then simply rebuild and reinstall their kernel. There is no need to rebuild/reinstall the world. Just the kernel (including modules). (if you don't have the system sources: cd /usr; make src-create-shallow) cd /usr/src git pull make -j 4 nativekernel make installkernel And then reboot normally. This will fix an issue with X that could result in a machine freeze. -- The opie removal only applies to master. Opie still exists in release (but is deprecated). Also note that the upgrade procedure for master has not yet been fixed to deal with the opie removal, so if you are using a recent master or updating to master please manually remove any 'opie' related lines from /etc/pam.d/*, if any are present. If you have updated to the latest master and accidentally rebooted the system into an unusable state (where you are unable to login), The solution there is to reboot and choose single-user mode, make sure / is mounted rw (usually just 'mount -u -o rw /' does the job), and make the appropriate edits to /etc/pam.d/*. Then reboot once more. -Matt
Re: HEADS UP on master
We're going to fix 'make upgrade' to not require this manual intervention, but it may take a day or two, so anyone using master... beware of this issue. -Matt On Mon, Jun 17, 2019 at 7:21 AM Rimvydas Jasinskas wrote: > Hi, > > deprecated OPIE removal from base requires manual intervention > *before* rebooting on updated master, > The "make upgrade" script only warns about detected opie presence and > suggest to manually edit or reinstall default PAM configs (cd > /usr/src/etc/pam.d && make install) on both 5.6-RELEASE and > 5.7-DEVELOPMENT. > Make sure that /etc/pam.d/* configs no longer have hardcoded pam_opie > entries on master. For more information, see > ed5666c1699a23a9ae3c0aca97dabaae71e26431 > > Also OpenSSH was recently updated to 8.0p1. UsePAM option is enabled > by default if sshd(8) is compiled with -DUSE_PAM. > From now on default base sshd(8) configs are installed into > /usr/share/examples/ssh/ together with cert.pem and openssl.cnf in > /usr/share/examples/ssl/ too. > Please check your /etc/ssh/sshd_config for deprecated options. > > RJ >
VM work will be in the upcoming 5.6 release.
June 9 2019 5.4 RELEASE vs UPCOMING 5.6 RELEASE Here is a set of simple semi-scientific tests headlining performance improvements in the upcoming 5.6 release over the 5.4 release. These improvements were primarily obtained by rewriting (again) major chunks of the VM system and the PMAP system. Prior work was able to move many exclusive locks to shared locks. This new work is able to do-away with many locks entirely, and reduces the amount of cache-line ping-ponging occurring between cpu cores when taking faults on shared VM objects. These tests were done on a little Haswell 2/4 box and on a Xeon 16/32 dual-socket box. It demonstrates the following: * The massive VM rework modestly reduces per-thread VM fault overheads and significantly reduces VM fault overheads on shared VM pages. Thus we see a MASSIVE improvement in the concurrent self-exec tests when any part of the binary is shared or if it is a dynamic binary (uses shared libraries). We see a modest improvement for ad-hoc concurrent compile tests. We see a small improvement in the buildkernel test on the haswell and a more significant improvement on the xeon, which roughly matches expectations. Buildkernel bottlenecks in the linker and a few other places (even with NO_MODULES=TRUE). What is important to note here is the huge reduction in system time. System time dropped by 40%. * The zero-fill fault rate has significantly improved. It's a bit hard to test because I am butting up against bandwidth limitations in the hardware, but the improvement is a very real 17% (haswell) and 14% (xeon), respectively. * Scheduler fixes in 5.6 improve concurrency and reduce cache-line ping-ponging. Note, however, that the scheduler heuristic in 5.4 was a bit broken so this mostly restores scheduler performance from 5.2. This only effects the DOCOMP test (see note 2 below). Other observations (not shown here) * The VM rework got rid of all pv_entry structures for terminal PTEs. This can save an enormous amount of ram in certain limited situations such as a postgres server with many service processes sharing a single, huge, shared-memory cache. * There is a huge reduction in system overheads in some tests. In fact, in most tests, but keep in mind that most tests are already cpu-bound in user-mode so the overall real-time improvement in those tests is more modest. * In synth-based bulk runs I am observing a drop in system overhead from 15-20% to 10-15%, and the bulk build does appear to take commensurately less time (around 5%). That said, certain aspects of the synth bulk run are much, much faster now. The port scans used to be able to run around 5%/sec on our threadripper (and that was already considered fast!). Now the port scans run around 10%/sec. This is because the insane concurrent exec load involved with doing the port scan is directly impacted by this work. SELF-EXEC TESTS This tests a concurrent exec loop sequencing across N CPUs. It is a simple program which exec's itself and otherwise does nothing. We test (1) A statically linked binary that copies itself to $NAME.$N so each cpu is exec()ing a separate copy, (2) A statically linked binary that does not do the copy step so multiple CPUs are exec()ing the same binary. (3) A dynamic binary that copies itself (but not the shared libraries it links against), meaning that the shared libraries cause shared faults, and (4) A dynamic binary that is fully shared, along with the libraries, so all vnode faults are shared faults. FAULTZF This tests N concurrent processes doing zero-fill VM faults in a private per-process mmap(). Each process is doing a mmap()/force-faults/munmap() loop. DOCOMP This does N concurrent compiles of a small .c program, waits for them to complete, and then loops. The compiler is locked to gcc-4.7 (same compiler for release vs master). This is a good concurrent fork/exec/exit/wait test with a smattering of file ops and VM faults in the mix. It tests some of the scheduler heuristics, too. NATIVEKERNEL This does a high-concurrency buildkernel tests that does not include modules, simulating
Major kernel VM pmap work in master
Master has received some major VM work so please take care if you decide to update or upgrade your system, it may lose a little stability. A full buildworld and buildkernel is needed due to internal structural changes. The work is also not entirely complete, there are two or three memory conservation routines that have not been put back in yet. That said, the work looks pretty solid under brute force testing. The new work going in basically rewrites the handling of leaf PTEs in the pmap subsystem. Each vm_page entered into the MMU's pmap used to be tracked with a 'pv_entry' structure. The new work gets rid of these tracking structures for leaf pages. This saves memory, helps deal with certain degenerate situations when many processes share lots of memory, and significantly improves concurrent page fault performance because we no longer have to do any list manipulation on a per-page basis. Replacing this old system is a new system where we use vm_map_backing structures which hang off of vm_map_entry's... essentially one structure for each 'whole mmap() operation', with some replication for copy-on-write shadowing. So, instead of having a structure for each individual page in each individual pmap, we now have a single structure that covers potentially many pages. The new tracking structures are locked, but the number of lock operations is reduced by a factor of 100 (at least), or even better. Currently the committed work is undergoing stability testing and there will be follow-up commits to fix things like minor memory leaks and so forth, so expect those to be incoming. Work still to do: * I need to optimize vm_fault_collapse() to retain backing vnodes. Currently any shadow object chain deeper than 5 causes the entry to fault all pages to the front object and then disconnect the backing objects. But this includes the terminal vnode object which I don't actually want to include. * I need to put page table pruning back in (right now empty page table pages are just left in the pmap until exit() to avoid racing the pmap's pmap_page_*() code) * I need to implement a new algorithm to locate and destroy completely shadowed anonymous pages. None of this is critical for the majority of use cases, though. The vm_object shadowing code does limit the depth so completely shadowed objects won't just build up forever. -- These changes significantly improve page fault performance, particularly under heavy concurrent loads. * kernel overhead during the 'synth everything' bulk build is now under 15% system time. It used to be over 20%. (system time / (system time + user time)). Tested on the threadripper (32-core/64-thread). * The heavy use of shared mmap()s across processes no longer multiplies the pv_entry use, saving a lot of memory. This can be particularly important for postgres. * Concurrent page faults now have essentially no SMP lock contention and only four cache-line bounces for atomic ops per fault (something that we may now also be able to deal with with the new work as a basis). * Zero-fill fault rate appears to max-out the CPU chip's internal data busses, though there is still room for improvement. I top out at 6.4M zfod/sec (around 25 GBytes/sec worth of zero-fill faults) on the threadripper and I can't seem to get it to go higher. Note that obviously there is a little more dynamic ram overhead than that from the executing kernel code, but still... * Heavy concurrent exec rate on the TR (all 64 threads) for a shared dynamic binary increases from around 6000/sec to 45000/sec. This is actually important, because bulk builds * Heavy concurrent exec rate on the TR for independent static binaries now caps out at around 45 execs per second. Which is an insanely high number. * Single-threaded page fault rate is still a bit wonky but hit 500K-700K faults/sec (2-3 GBytes/sec). -- Small system comparison using a Ryzen 2400G (4-core/8-thread), release vs master (this includes other work that has gone into master since the last release, too): * Single threaded exec rate (shared dynamic binary) - 3180/sec to 3650/sec * Single threaded exec rate (independent static binary) - 10307/sec to 12443/sec * Concurrent exec rate (shared dynamic binary x 8) - 15160/sec to 19600/sec * Concurrent exec rate (independent static binary x 8) - 60800/sec to 78900/sec * Single threaded zero-fill fault rate - 550K zfod/sec -> 604K zfod/sec * Concurrent zero-fill fault rate (8 threads) - 1.2M zfod/sec -> 1.7M zfod/sec * make -j 16 buildkernel test (tmpfs /usr/src, tmpfs /usr/obj): 4.4% improvement in overall time on the first run (6.2% improvement on subsequent runs). system% 15.6% down to 11.2% of total cpu seconds. This is a kernel overhead reduction of 31%. Note that the increased time on release is probably due to inefficient buffer cache recycling. 1309.445u 242.506s 3:53.54 664.5% (release) 1315.890u 258.165s 4:00.97 653.2% (release, run 2) 1318.458u 259.394s 4:00.51
Some recent fairly important fixes in master and release
Two important fixes have gone into master with a version of them also brought in to the release branch. The first is a floating point bug related to a (long time) known hardware issue on Intel CPUs. We thought we had fixed this bug ages ago, but it turns out we didn't so it is being fixed permanently with the removal of the of the remainder of the FP switching heuristic. If you do not want to update immediately, you can fix the problem live with a simple sysctl: sysctl machdep.npx_fpu_heuristic=1 or put 'machdep.npx_fpu_heuristic=1' in your /etc/sysctl.conf and reboot. -- The second bug is related to mmap()s MAP_STACK feature, which a number of interpreted languages use, in particular 'ruby'. The kernel was not handling several of the cases properly. In addition to fixing that case, we have also basically stopped allowing user programs to create grow-down segments in memory any more. We do this by converting MAP_STACK into a normal anonymous mapping. The grow-down feature will ultimately be removed entirely as it is not really applicable to 64-bit systems, but because the -release threading libraries still assume that the main user stack uses this mode, it will be another one or two release cycles before we actually scrap it completely. This fix requires updating sources and building and installing a new kernel. -- This work is in both master and the release branch. No sub-release has been scheduled for the release branch at this time, we need to talk about it internally a bit first. Thanks for the bug reports everyone! -Matt
Re: How to run a vkernel as a background process
Try redirecting stdout and stderr to a file. notty -12 blahblahblabhalh >& /tmp/logfile That's with csh or tcsh . With sh it would be notty -12 blsahblahblbha > /tmp/logfile 2>&1 And then see what error you are getting. -Matt
Re: How to run a vkernel as a background process
You should be able to run the vkernel without a tty by using 'notty' instead of 'nohup'. It will fork and run detached, and then you can ssh into it. -Matt
Hammer2 fix now in the -release branch
The H2 fix is now in the release branch so pulling sources and compiling up a new kernel will have it. We will roll a sub-release that also includes the fix in a little more than a week. -Matt
Re: What programming languages and operating systems that will to be used after Jesus to return?
Presumably the original post was just a spam. In anycase... -Matt
Hammer2 corruption bug fix under test
A Hammer 2 corruption bug fix is currently under test in master. My expectation is that it will be merged into the -release branch on Saturday after further testing (I will post another message then). The corruption is modestly difficult to cause but please read the commit message for more details. It can occur when significant filesystem write activity occurs during a bulkfree operation. This operation typically occurs in the early morning (~3 a.m.). The corruption is caught by the crc code and reported on the console and in /var/log/messages as CHECK FAIL entries. https://gitweb.dragonflybsd.org/dragonfly.git/commit/83815ec6515002d007c3800cb9fd83c9451852f7 Master also has some performance work for H2 under test that may be of interest. https://gitweb.dragonflybsd.org/dragonfly.git/commit/1159c75c92fbfdd230dd598904ede92791c00843 -Matt
Re: HAMMER2 snapshots and other help
Originally I wanted to be able to snapshot subdirectory trees but I could never make it work. Snapshots are always of the mount-point now. The hammer2 manual page is a bit behind in that respect. Otherwise though the snapshot command works as advertised. the 'pfs-list' command lists available snapshots. You can mount a snapshot with @LABEL, but if anything on that device has already been mounted you can use a simple shortcut 'mount @LABEL ...' to mount the snapshot. Snapshots are writable. Unlike Hammer-1, in Hammer-2 the snapshot must be mounted to access it. -Matt
Re: Realtek NIC patch to try
Yah, unfortunately the patch didn't fix the 'chip stops working every month or so' problem on my server (not a big deal, I have other NICs on that box that I can move the ethernet cable to). But it didn't break anything new, either. So in that respect it certainly doesn't hurt :-) -Matt On Mon, Mar 4, 2019 at 1:07 AM Sepherosa Ziehau wrote: > Chip stops working. Well, after all, it's a desktop level chip. > > On Mon, Mar 4, 2019 at 1:26 PM Eric Melville wrote: > > > > I put this on a pair of cheap Chinese mini PCs, also 8111/8168. Looks > good so far. > > > > What is the problem seen after a month or two? > > > > > > On February 19, 2019 at 1:00 PM Matthew Dillon > wrote: > > > > So far so good on the RealTek 8111/8168 PCIe Gigabit Ethernet. I'll run > it and see if it continues to lock up every once in a while, but it usually > takes a month or two to reproduce that particular hw bug. > > > > -Matt > > > > On Sat, Feb 16, 2019 at 3:47 AM Sepherosa Ziehau < sepher...@gmail.com> > wrote: > > > > Hi all, > > > > Please help testing the following patch: > > > https://gitweb.dragonflybsd.org/~sephe/dragonfly.git/commit/9df33aeb3d49f4ac11af479ea0f3a2a2a48b538d > > > > I have tested it for a while, should be safe to apply. > > > > Thanks, > > sephe > > > > -- > > Tomorrow Will Never Die > > > > -- > Tomorrow Will Never Die >
Note: current master breaks smartctl (will be fixed in 1-3 days)
The current master made a change to the kernel callout structure which broke the CAM ABI used by smartctl, camcontrol, etc. The problem is that the userland is being exposed to a kernel structure that it should be exposed to. A followup commit will be made in 1-3 days (as early as tomorrow) to fix the issue and make the kernel compatible with smartctl and so forth again. -Matt
Re: Can't shell into leaf on IPv6
The colo had connectivity problems today. It appears to be resolved now. -Matt
Re: desktop problem
Make sure /etc/rc.conf has: dbus_enable="YES" You should be able to add and remove packages at-will with the 'pkg' program as root. If it isn't on the system you can bootstrap it with: cd /usr make pkg-bootstrap -Matt On Tue, Feb 19, 2019 at 3:46 PM Jonathan Engwall < engwalljonathanther...@gmail.com> wrote: > Downloading 5.4.1 now hoping for better results. > Any advice, suggestion, inquiry is welcome. > Jonathan Engwall > > On Tue, Feb 19, 2019, 2:06 PM Jonathan Engwall < > engwalljonathanther...@gmail.com wrote: > >> Also you should know that I am using it on virtualbox. >> >> On Tue, Feb 19, 2019, 1:52 PM Jonathan Engwall < >> engwalljonathanther...@gmail.com wrote: >> >>> I have spent all day working on this, since 8 a.m. it is now 1:45. X11 >>> and DBUS did not configure properly. Mkdesktop is the pkg I used to build >>> KDE and now, because /var/run/dbus is literaly not there I cannot log in. >>> I need to bypass the desktop to get console access. I have used the live >>> image several times and made changes which ultimately have not produced >>> results. >>> I have changed the entry for tty8, I have written an .xinitrc, I have >>> changed the .login and .login-config both to BAK.login. And I get nowhere. >>> At this point I want to remove KDE, but I can't. Using the live image as >>> root does not allow me to remove any packages. >>> How can I get around the desktop login? >>> Jonathan Engwall >>> Also I have noticed that /etc/X11/ is empty >>> >>
Re: Realtek NIC patch to try
So far so good on the RealTek 8111/8168 PCIe Gigabit Ethernet. I'll run it and see if it continues to lock up every once in a while, but it usually takes a month or two to reproduce that particular hw bug. -Matt On Sat, Feb 16, 2019 at 3:47 AM Sepherosa Ziehau wrote: > Hi all, > > Please help testing the following patch: > > https://gitweb.dragonflybsd.org/~sephe/dragonfly.git/commit/9df33aeb3d49f4ac11af479ea0f3a2a2a48b538d > > I have tested it for a while, should be safe to apply. > > Thanks, > sephe > > -- > Tomorrow Will Never Die >
Re: 5.4 hangs in UFS
This feels more like an issue with the I/O and not with UFS specifically. But since you tried two different storage devices it couldn't be that. Perhaps there is a power or overheat issue on the system. -Matt On Sun, Feb 10, 2019 at 1:42 PM Eric Melville wrote: > Hello there, > > After installing 5.4 my system has been getting stuck in UFS, apparently > in softdeps. > > At first I was faulting the -j12 buildworld, but then saw it in lower > parallel counts, and > > eventually saw it when looping buildworld with no -j option at all. Then I > was faulting > > my fast new NVME but eventually factored that out too by changing back to > an old > > hard drive. In any case, the faster the hardware and the more work > running, the > > more quickly and easily this seems to reproduce. > > > Typically during the phase that removes old output, the build will hang > indefinitely. > > Some processes continue to run but new ones never get going, and the old > world > > clean never makes any progress. For example ssh to the host in this state > would > > succeed to connect and authenticate, but the new shell never seems to run. > > > I suppose I should try disabling softdeps next. >
Re: system crashing with 5.2 and 5.0 but not with 4.8
EHCI has some sort of niggling problem that I haven't been able to track down. I may try to re-port that piece from FreeBSD, if the infrastructure hasn't diverged too much. The code in the original port was extremely fragile. -Matt On Sat, Feb 9, 2019 at 9:56 AM wrote: > On 10/10/18 7:06 AM, Aaron LI wrote: > > On Sun, 7 Oct 2018 13:11:42 +0200 > > spuelmasch...@email.de wrote: > > > >> Hallo, > >> > >> my HP Microserver N40L (Turion2) keeps crashing while or shortly after > >> installing any 5.x version of DragonFly. > >> I reinstalled 4.8 and this is running for several days now. > >> > >> A memcheck run from a KNOPPIX CD test was successfull. > >> KNOPPIX itself was also running for several days. > >> > >> Below are two ocr-ed screen photographs of these crashes. I do not know > >> how to keep that text in a more elegant way. > >> > >> Now this is kind of a dilemma. > >> - hardware ok -> run another OS > >> - hardware defect -> renew server, run dfly ... > >> > >> Hopefully there's a better solution. > > > > Hi Sascha, > > > > According to your panic log below, both were happened during the > installation > > phase and related to the EHCI (USB 2.0). > > > > Did you use the same USB stick and the same USB port on the machine for > 4.8, > > 5.0 and 5.2 installations? If yes, this is a strange and serious bug. > > > > I don't remember there were any EHCI related changes after 4.8. Maybe > others > > have better knowledge about this. > > > > > > Hallo Aaron, > > Finally i did setup the external SATA port to install directly onto > the SSD which i used via a SATA - USB adapter taken from an > external harddisk. > Now it's running without any problems since then and is still running. > The SSD meanwhile went into the case. > > For the external SATA port i had to find a modified BIOS for the HP > microserver. There are some fans who have the knowledge to modify > the BIOS and activate the port ... > > Thank's for your useful hints. > > Tschuess, > Sascha > > > > > > > > > > >> = 1 == > >> | /mnt/boot/kernel/radeonkmsfwTAHITI_rlc.ko copy-ok > >> > >> panic: Bad link elm 0xff8125709368 prev->next != elm > >> cpuid = 1 > >> Trace beginning at frame 0xff81258d2870 > >> panic() at panic+0x236 0x805ec136 > >> panic() at panic+0x236 0x805ec136 > >> _callout_stop() at _callout_stop+0x3c3 0x80610973 > >> callout_reset[) at callout_reset+0x96 0x80610bd6 > >> ehci_interrupt() at ehci_interrupt+0x198 0x821aa3d8 > >> ithread_handler() at ithread_handler+0x2e9 0x805bal09 > >> Debugger("panic") > >> CPU1 stopping CPUs: 0x0001 > >> stopped > >> Stopped at Debugger+0x7c: movb $0,0xd31929(%rip) > >> db> | > >> = 1 == > >> > >> = 2 == > >> ,-<<< Executing ‘/shin/newfs_hammer -f -L ROoT > /dev/serno/2011090623F8.s2d' > >> panic: Bad link elm 0xf80125b47368 prev->next != elm > >> cpuid = 1 > >> Trace beginning at frame 0xf80125ccbB870 > >> panic() at panic+0x236 @x805f8666 > >> panie() at panic+0x236 0x805f8666 > >> _callout_stop() at _callout_stop+0x3d3 0x8061d803 > >> callout_reset() at callout_reset+0x96 0xfff8061d916 > >> ehci_interrupt() at ehei;interrupt+0x198 0x822973c8 > >> ithread_handler() at ithread_handler+0x2e9 0x805c5d09 > >> Debugger("panic") > >> CPU1 stopping CPUs: 0x0001 > >> stopped > >> Stopped at Debugger+0x7c: movb $0,0xe67a49(%rip) > >> db> | > >> = 2 == > > > > > > Cheers, > > > >
Re: scheduling permissions (jackd)
Yah, only superuser can access the rtprio classes. I do not recommend using rtprio though, because you can completely lock up the system if the program you are running rtprio goes into a cpu-bound loop. Try using nice -n -20 without the rtprio, it should work nearly as well. You still need root to be able to nice to -20 but your solution to root su user works well. -Matt
Re: Wired memory analysis
Danial, Francois Tigeot just made a commit to master (not release yet) that fixes a probably wired memory leak in the drm (gpu) code. It could very well be the cause of the excessive wired memory usage you have been reporting. This fix will eventually go into -release but we have to test it first. It's in master now. -Matt
Re: HAMMER2 - mounting root from dm_crypt
HAMMER2 by default names the PFS based on the partition it was created on. And the H2 mount code will automatically add an @blah suffix according to the same metrics. But they both get confused when the mount is via the mapper and that's why the problem occurs. It might have been a bit of a stretch on my part to try to be fancy in defining the default names when they weren't otherwise being provided. I think we could detect the crypto or mapper paths and fix the defaults this way but nobody has gotten around to coding and validating it all up. -Matt On Fri, Jan 18, 2019 at 12:38 PM Daniel Bilik wrote: > Hi, Aaron. > > On Fri, 18 Jan 2019 23:02:45 +0800 > Aaron LI wrote: > > > Since the HAMMER2 FS was created with '-L ROOT', it requires '@ROOT' to > > be mounted. > > Ah, that was it. Thank you for pointing out this detail. I've not expected > it could be related. > > > Another simple way to workaround your issues is use the following setting > > in /boot/loader.conf: > > vfs.root.realroot="crypt:hammer2:/dev/serno/.s1d:rootd" > > Note the final 'd'! > > Indeed, this workaround really works. I've not examined the magic behind > it but my "ROOT" hammer2 filesystem is correctly mounted during boot with > original non-patched initrd. Thank you very much for the hint. > > -- > Daniel >