Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state
This time poudriere came to an end: % sysctl vfs.highest_numvnodes vfs.highest_numvnodes: 500976 On Wed, 17 Mar 2021 18:55:43 +0100 Mateusz Guzik wrote: > Thanks, I'm going to have to ponder a little bit. > > In the meantime can you apply this: > https://people.freebsd.org/~mjg/maxvnodes.diff > > Once you boot, tweak maxvnodes: > sysctl kern.maxvnodes=1049226 > > Run poudriere. Once it finishes, inspect sysctl vfs.highest_numvnodes > > On 3/17/21, Yamagi wrote: > > Hi Mateusz, > > the sysctl output after about 10 minutes into the problem is attached. > > In case that its stripped by Mailman a copy can be found here: > > https://deponie.yamagi.org/temp/sysctl_vlruwk.txt.xz > > > > Regards, > > Yamagi > > > > On Wed, 17 Mar 2021 15:57:59 +0100 > > Mateusz Guzik wrote: > > > >> Can you reproduce the problem and run obtain "sysctl -a"? > >> > >> In general, there is a vnode limit which is probably too small. The > >> reclamation mechanism is deficient in that it will eventually inject > >> an arbitrary pause. > >> > >> On 3/17/21, Yamagi wrote: > >> > Hi, > >> > me and some other users in the ##bsdforen.de IRC channel have the > >> > problem that during Poudriere runs processes getting stuck in the > >> > 'vlruwk' state. > >> > > >> > For me it's fairly reproduceable. The problems begin about 20 to 25 > >> > minutes after I've started poudriere. At first only some ccache > >> > processes hang in the 'vlruwk' state, after another 2 to 3 minutes > >> > nearly everything hangs and the total CPU load drops to about 5%. > >> > When I stop poudriere with ctrl-c it takes another 3 to 5 minutes > >> > until the system recovers. > >> > > >> > First the setup: > >> > * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2. > >> > The zvol has a 8k blocksize, the guests partition are aligned to 8k. > >> > The guest has only zpool, the pool was created with ashift=13. The > >> > vm has 16 E5-2620 and 16 gigabytes RAM assigned to it. > >> > * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing > >> > either of these options lowers the probability of the problem to show > >> > up significantly. > >> > > >> > I've tried several git revisions starting with 14-CURRENT at > >> > 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at > >> > least one known to be good revision. No chance, even a kernel build > >> > from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020 > >> > +) has the problem. The problem isn't reproduceable with > >> > 12.2-RELEASE. > >> > > >> > The kernel stack ('procstat -kk') of a hanging process is: > >> > mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1 > >> > sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d > >> > amd64_syscall+0x140 fast_syscall_common+0xf8 > >> > > >> > The kernel stack of vnlru is changing, even while the processes are > >> > hanging: > >> > * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b > >> > _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe > >> > * fork_exit+0x80 fork_trampoline+0xe > >> > > >> > Since vnlru is accumulating CPU time it looks like it's doing at least > >> > something. As an educated guess I would say that vn_alloc_hard() is > >> > waiting a long time or even forever to allocate new vnodes. > >> > > >> > I can provide more information, I just need to know what. > >> > > >> > > >> > Regards, > >> > Yamagi > >> > > >> > -- > >> > Homepage: https://www.yamagi.org > >> > Github: https://github.com/yamagi > >> > GPG: 0x1D502515 > >> > > >> > >> > >> -- > >> Mateusz Guzik > > > > > > -- > > Homepage: https://www.yamagi.org > > Github: https://github.com/yamagi > > GPG: 0x1D502515 > > > > > -- > Mateusz Guzik -- Homepage: https://www.yamagi.org Github: https://github.com/yamagi GPG: 0x1D502515 pgpxJ4zMx8L2P.pgp Description: PGP signature
Re: I'm upset about FreeBSD
On Mon, 17 Oct 2016 03:44:14 +0300 Rostislav Krasnywrote: > First of all I faced an old problem that I reported here a year ago: > http://comments.gmane.org/gmane.os.freebsd.stable/96598 > Completely new USB flash drive flashed by the > FreeBSD-11.0-RELEASE-i386-mini-memstick.img file kills every Windows > again. If I use the Rufus util to write the img file (using DD mode) > the Windows dies immediately after the flashing. If I use the > Win32DiskImager (suggested by the Handbook) it doesn't reinitialize > the USB storage and Windows dies only if I remove and put that USB > flash drive again or boot Windows when it is connected. Nothing was > done to fix this nasty bug for a year. As was already said in the other answers this is a bug in Windows. Particulary in the partition parser. partmgr.sys (running in kernel mode) crashes while parsing the FreeBSD installation images GPT setup. This may be a variant of the bug known as "Kindle is crashing Win 10": http://answers.microsoft.com/en-us/windows/forum/windows_10-performance/plugging-in-kindle-is-crashing-windows-10-after/5db0d867-0822-4512-919e-3d7786353f95?page=1 That bug was patched on september 13 and I'm unable to reproduce the crash on a fully patched Win 10 VM. But there's no patch for Win 7, even with all patches applied my Win 7 VM is still crashing as soon as the FreeBSD installation image is connected. I did some debugging and I'm pretty sure that the problem is not the pmbr used for classic BIOS boot but the GPT itself. But my knowledge of GPT and especially Windows internals is limit. So maybe someone with more insight can look into this. Or even better: Complain to Microsoft. Even if the GPT is invalid it should crash the kernel. Regards, Yamagi -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: UEFI & ZFS
Hello, I've observed the slowness only on the local console, I haven't tested the seriel console. Put the FreeBSD legcay installation usb stick into the box, select it as boot device and watch the cursor spinning for about 10 minutens until the kernel boots. Do the same with an UEFI installation stick and it's a matter of seconds... I've seen this on an Gigabyte GA-Z170XP-SLI with a Core i7 6700k and on two Supermicro X11SBA-LN4F based machines with Skylake Xeon CPUs. There are several other reports of slow boot on Skylake CPUs on the net. In the thread mentioned below some changes in the hardware, the firmware or somewhere else were suspected. I didn't even try to debug it,instead I went with the UEFI loader. Regards, Yamagi On Sun, 14 Feb 2016 13:47:05 +0200 Daniel Braniss <da...@cs.huji.ac.il> wrote: > > On 14 Feb 2016, at 11:52, Yamagi Burmeister <li...@yamagi.org> wrote: > > > > https://lists.freebsd.org/pipermail/freebsd-current/2015-December/059037.html > when saying ‘slow’, do you see slowness when printing output to the screen? > I mention this, because in the past I saw something similar, and it was a > misconfiguration with the serial console … > > danny > > > > > Regard, > > > > > > On Fri, 12 Feb 2016 15:36:10 -0500 > > "Thomas Laus" <lau...@acm.org> wrote: > > > >>> I have a new Asus H170-Plus-D3 motherboard that will be used for a DOM0 > >>> Xen > >>> Server. It uses an Intel i5-6300 processor and a Samsung 840 EVO SSD. I > >>> would like to use ZFS on this new installation. The Xen Kernel does not > >>> have UEFI support at this time, so I installed FreeBSD CURRENT r295345 in > >>> 'legacy mode'. It takes about 7 minutes to go from the first '|' > >>> character > >>> to getting the 'beastie' menu. I changed the BIOS to UEFI and did another > >>> installation. The boot process goes in an instant. > >> > >> Several others have the same problem. See here on the freebsd forums: > >> > >> http://tinyurl.com/z9oldkc > >> > >> That is my exact problem. It takes 4 minutes to get a complete 'beastie' > >> menu and 7 minutes 34 seconds to login. > >> > >> Tom > >> > >> -- > >> Public Keys: > >> PGP KeyID = 0x5F22FDC1 > >> GnuPG KeyID = 0x620836CF > >> > >> ___ > >> freebsd-stable@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > > > > > > -- > > Homepage: www.yamagi.org > > XMPP: yam...@yamagi.org > > GnuPG/GPG: 0xEFBCCBCB > > ___ > > freebsd-stable@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: UEFI & ZFS
Hello, this is a known problem with Intel Skylake CPUs. Legacy boot os dead slow, UEFI boot is blazing fast. Have a look at this thread, it contains some more informations: https://lists.freebsd.org/pipermail/freebsd-current/2015-December/059037.html As far as I know now one has found / analyzed the root cause of this until now. Regard, On Fri, 12 Feb 2016 15:36:10 -0500 "Thomas Laus"wrote: > > I have a new Asus H170-Plus-D3 motherboard that will be used for a DOM0 Xen > > Server. It uses an Intel i5-6300 processor and a Samsung 840 EVO SSD. I > > would like to use ZFS on this new installation. The Xen Kernel does not > > have UEFI support at this time, so I installed FreeBSD CURRENT r295345 in > > 'legacy mode'. It takes about 7 minutes to go from the first '|' character > > to getting the 'beastie' menu. I changed the BIOS to UEFI and did another > > installation. The boot process goes in an instant. > > Several others have the same problem. See here on the freebsd forums: > > http://tinyurl.com/z9oldkc > > That is my exact problem. It takes 4 minutes to get a complete 'beastie' > menu and 7 minutes 34 seconds to login. > > Tom > > -- > Public Keys: > PGP KeyID = 0x5F22FDC1 > GnuPG KeyID = 0x620836CF > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: NFS deadlock on 9.2-Beta1
On Wed, 21 Aug 2013 16:10:32 +0300 Konstantin Belousov kostik...@gmail.com wrote: I already described what to do with this. I need the debugging information to see what is going on. Without the data, it is only wasted time of everybody involved. Some technical notes. The sendfile() uses shared lock for the duration of vnode i/o, so any thread which is sleeping on the vnode lock cannot be in the sendfile path, at least for UFS and NFS which do support true shared locks. The right lock order is vnode lock - page busy wait. From this PoV, the ordering in the sendfile is correct. Rick, are you aware of any situation where the VOP_READ in nfs client could drop vnode lock and then re-acquire it ? I was not able to find this from the code inspection. But, if such situation exists, it would be problematic in 9. Last note. The HEAD dropped pre-busying pages in the sendfile() syscall. As I understand, this is because new Attilio' busy implementation cannot support both busy and sbusy states simultaneously, and vfs_busy_pages()/ vfs_drain_busy_pages() actually created such situation. I think that because the sbusy is removed from the sendfile(), and the vm object lock is dropped, there is no sense to require vm_page_grab() to wait for the busy state to clean. It is done by buffer cache or filesystem code later. See the patch at the end. Still, I do not know what happens in the supposedly reported deadlock. diff --git a/sys/kern/uipc_syscalls.c b/sys/kern/uipc_syscalls.c index 4797444..b974f53 100644 --- a/sys/kern/uipc_syscalls.c +++ b/sys/kern/uipc_syscalls.c @@ -2230,7 +2230,8 @@ retry_space: pindex = OFF_TO_IDX(off); VM_OBJECT_WLOCK(obj); pg = vm_page_grab(obj, pindex, VM_ALLOC_NOBUSY | - VM_ALLOC_NORMAL | VM_ALLOC_WIRED | VM_ALLOC_RETRY); + VM_ALLOC_IGN_SBUSY | VM_ALLOC_NORMAL | + VM_ALLOC_WIRED | VM_ALLOC_RETRY); /* * Check if page is valid for what we need, Could the problem be related to this deadlock / LOR? - http://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018052.html My test setup is still in place. Will test with r250907 reverted tomorrow morning and report back. Additional informations could be provided if necessary. I just need to know what exactly. Ciao, Yamagi -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpiQ7wf0tTdP.pgp Description: PGP signature
Re: And for our next trick (Audio problems, Envy24HT driver)
On Fri, 15 Feb 2013 15:58:37 -0600 Karl Denninger k...@denninger.net wrote: FreeBSD 9.1-STABLE #2 r244942M: Tue Feb 5 21:54:29 CST 2013 k...@dbms.denninger.net:/usr/obj/usr/src/sys/KSD-SMP (custom kernel is there to support PPS for my GPS clock) Attempting to add a generic card that claims to have a Envy24DT chipset in it; it identifies and loads under the snd_envy24ht driver as: pci6: multimedia, audio at device 0.0 (no driver attached) pcm0: Envy24HT audio (Generic) port 0xcc00-0xcc1f,0xc880-0xc8ff irq 16 at device 0.0 on pci6 pcm0: [GIANT-LOCKED] pcm0: system configuration SubVendorID: 0x1412, SubDeviceID: 0x2403 XIN2 Clock Source: 24.576MHz(96kHz*256) MPU-401 UART(s) #: not implemented ADC #: 1 and SPDIF receiver connected DAC #: 4 Multi-track converter type: AC'97(SDATA_OUT:packed) S/PDIF(IN/OUT): 1/1 ID# 0x00 GPIO(mask/dir/state): 0xff/0xff/0xff cat /dev/sndstat returns: [root@NewFS /boot/kernel]# cat /dev/sndstat FreeBSD Audio Driver (newpcm: 64bit 2009061500/amd64) Installed devices: pcm0: Envy24HT audio (Generic) at io 0xcc00:32,0xc880:128 irq 16 (1p:1v/5r:1v) default So it appears it did attach properly. No it did not. :) It's a longer story. While the Envy family had of course a generic chip design, there wasn't a generic card design. So every Envy card is different and needs a different driver. The snd_envy24ht driver solves this problem with distinct device sections for each supported devices. See envy24ht.c line 279. If no device section could be found the card is detected as generic and a default device section is used. That may or may not work... So the solution would be to add a device section for your card but that's everything but easy. In an ideal world there would be a datasheet for that card, in reality it may be necessary to reverse engineer it. And there are other problems with the driver: - It's not MPSAFE (at least it's not marked MPSAFE) - Recording doesn't work - The debug mode is prone to panics - All channels supported by the Envy chip are exposed to the mixer regardless if they're connect in hardware or not. This leads to invalid channels. I've once had the idea to clean it up but never found the time. Nevertheless it still would be really nice if someone could give this driver some love. :) Ciao, Yamagi -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [stable 9] broken hwpstate calls
On Sat, 26 May 2012 12:34:25 +0300 Andriy Gapon a...@freebsd.org wrote: if we decide so, then I think that we could still keep the things simple. As we currently use the wholesale approach (all CPUs are set to the same P-state regardless of topology), then we could first make a pass of writing the MSR on all processors with a new P-state value and then make another pass of checking via MSR C001_0063 that the P-state is acquired. No, I believe checking MSRC001_0071[18:16] is much simpler if it works. And it does not break current cpufreq(4) design principles. One potential problem with MSRC001_0071[18:16] is, that it's on older CPUs supported by hwpstate it's the same as C001_0063. Only on newer models with turbo it containts the actual hardware p-state. So additional logic would be required: 1. Set new p-state 2. Check CPUID for support of hardware p-states 3.1 If yes, read MSRC001_0071[18:16] and convert the hardware p-state into a software p-state 3.2 If not, just read MSRC001_0071[18:16] 4. Compare read (and converted) p-state to the requested p-state I don't think that it's worth this additional efford. The solution suggest by Andriy Gapon is trivial, works fine and is supported by all CPUs supported by hwpstate. I believe the approach that I suggested to be so trivial to implement (and also correct) that here is a patch: diff --git a/sys/x86/cpufreq/hwpstate.c b/sys/x86/cpufreq/hwpstate.c index 40e1943..9c17a41 100644 --- a/sys/x86/cpufreq/hwpstate.c +++ b/sys/x86/cpufreq/hwpstate.c @@ -186,16 +186,21 @@ hwpstate_goto_pstate(device_t dev, int pstate) id, PCPU_GET(cpuid)); /* Go To Px-state */ wrmsr(MSR_AMD_10H_11H_CONTROL, id); + } + CPU_FOREACH(i) { + /* Bind to each cpu. */ + thread_lock(curthread); + sched_bind(curthread, i); + thread_unlock(curthread); /* wait loop (100*100 usec is enough ?) */ for(j = 0; j 100; j++){ + /* get the result. not assure msr=id */ msr = rdmsr(MSR_AMD_10H_11H_STATUS); if(msr == id){ break; } DELAY(100); } - /* get the result. not assure msr=id */ - msr = rdmsr(MSR_AMD_10H_11H_STATUS); HWPSTATE_DEBUG(dev, result P%d-state on cpu%d\n, (int)msr, PCPU_GET(cpuid)); if (msr != id) { I can confirm, that this patchs works on a Bulldozer CPU and on an old Phenom II Deneb. -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpt74TfFnBSR.pgp Description: PGP signature
Re: [stable 9] broken hwpstate calls
On Fri, 25 May 2012 16:05:56 -0400 Jung-uk Kim j...@freebsd.org wrote: if we decide so, then I think that we could still keep the things simple. As we currently use the wholesale approach (all CPUs are set to the same P-state regardless of topology), then we could first make a pass of writing the MSR on all processors with a new P-state value and then make another pass of checking via MSR C001_0063 that the P-state is acquired. No, I believe checking MSRC001_0071[18:16] is much simpler if it works. And it does not break current cpufreq(4) design principles. Okay, thank's for your input. I'll come up with a patch. But it won't happen until tuesday or wednesday next week. -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpcDDwjMMIFb.pgp Description: PGP signature
Re: kernel: panic: softdep_sync_buf: Unknown type jnewblk
Hi :) On Thu, 26 Jan 2012 17:54:30 -0800 Jeremy Chadwick free...@jdc.parodius.com wrote: I'll also point out, though the OP isn't using snapshots, that it's confirmed use of snapshots on SU+J can result in a full filesystem hang. Confirmation is from Kirk McKusick (author of SU and designer of SU+J): http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013429.html Something really needs to be done about this combination of problems. Since SU+J is the default choice for all UFS filesystems on 9.0-RELEASE, the only solution I can think of is to send a massive announcement to relevant mailing lists, AND put something on the freebsd.org home page about these issues. The problem was analyzed but no immediate solution found. Therefor snapshots on filesystems with SU+J (but not SU alone) were disabled by Kirk McKusick in SVN r230250. The MFC to 9-STABLE has still to be done. Maybe this fix is a candidate for an errata notice / patch, distributed by freebsd-update? For the snapshot issue, I believe not using SU+J (and only SU) works around the problem, so possibly that would be the best choice of recommendation at this time. Yes, snapshots with SU alone are working. I urge key members of the community and (as always) kernel developers to chime in here with advice. Something needs to be done, users need to be made aware of these problems, and so on. Due to lack of time and not being the author of the rather complex code Kirk McKusick was not able to fix the problem. We're now waiting for Jeff Roberson to step up, but I was told that he is very busy and has no time too. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org -- Homepage: www.yamagi.org XMPP: yam...@yamagi.org GnuPG/GPG: 0xEFBCCBCB pgpu9XP6eQiPl.pgp Description: PGP signature
Re: Pack of CAM improvements
On Tue, 19 Jan 2010, Alexander Motin wrote: Hi. I've made a patch, that should solve set of problems of CAM ATA and CAM generally. I would like to ask for testing and feedback. [snip] Hello, applied this patch to 8-stable recompiled the kernel and rebooted. The kernel did not boot it hangs while probing the ahci-controller. The error message is: ahcich0: Timeout on slot 0 ahcich0: is 0002 cs ss 0 rs 0001 tfs 50 serr run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config After that the kernel hangs forever. A 8-stable without the patch shows ahcich0: Timeout on slot 0 run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config and boots but doesn't find any hard disks. It's an Asus M3A-H/HDMI motherboard with AMD SB710 southbridge. The harddisk is an Western Digital WD10EAVS. Both are working with the old ata implementation in AHCI mode. pciconf output of the controller is atap...@pci0:0:17:0: class=0x010601 card=0x43911002 chip=0x43911002 rev=0x00 hdr=0x00 Thanks, Yamagi -- Homepage: www.yamagi.org Jabber: yam...@yamagi.org GnuPG/GPG:0xEFBCCBCB ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org