Re: FreeBSD 9: Group quotas increase but don't decrease automatically

2012-02-03 Thread Konstantin Belousov
On Fri, Feb 03, 2012 at 07:30:54PM +0700, Adam Strohl wrote:
> I'm running FreeBSD 9 on a number of systems and finally decided to take 
> advantage of the quota system to enforce limits on my users.
> 
> No real issues setting it all up aside from finding that 
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/quotas.html 
> needs to be updated.   The new /etc/rc.conf entry is quota_enable="YES" 
> not enable_quotas="YES" as it says (assuming it used to be this in 
> 8.x?).  I'll file a PR for this shortly.
> 
> I did however run into a more serious issue (I think):
> 
> A group or user's allocation as reported by repquota(8) will increases 
> with new/growing files, however when a file is deleted or chgrped out of 
> the quota's group, the amount of space reported by repquota(8) does not 
> decrease.  I have verified that the system does not register the freed 
> space by going over the soft limit, being denied write, then deleting 
> files.  Even if I delete files which drop me below the soft quota limit, 
> I will not be able to add them as I am still "over quota".So it does 
> not appear to be reporting issue, the system really doesn't realize the 
> usage has gone down.
> 
> Interestingly the inode counts do decrease automatically/"instantly" as 
> I would expect.
> 
> Running quotacheck(8) fixes the issue and updates the allocation counts, 
> but does not magically fix auto-updating, so needs to be done 
> periodically which can be a bit intensive depending on file count.
> 
> I see this on all FreeBSD 9 machines with quotas turned on.
> 
> For now I have a cron script which tries to guess (based on changing 
> inode counts, etc) if it should run quotacheck, and does so if needed 
> (to avoid just blindly running it periodically).
> 
> Anyone else run into this?  Am I missing something?  Known issue?  Let 
> me know if anyone wants more info, etc.   I can also paste the work 
> around "smart" cron script if anyone is interested (and I'm not missing 
> something silly :P).

This is a bug in +J code (even if you do not use +J). Do you have
softupdates enabled on the volume ? If yes, try the following patch.

diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
index 5b4b6b9..ed2db79 100644
--- a/sys/ufs/ffs/ffs_softdep.c
+++ b/sys/ufs/ffs/ffs_softdep.c
@@ -43,6 +43,7 @@
 __FBSDID("$FreeBSD$");
 
 #include "opt_ffs.h"
+#include "opt_quota.h"
 #include "opt_ddb.h"
 
 /*
@@ -6428,7 +6429,7 @@ softdep_setup_freeblocks(ip, length, flags)
}
 #ifdef QUOTA
/* Reference the quotas in case the block count is wrong in the end. */
-   quotaref(vp, freeblks->fb_quota);
+   quotaref(ITOV(ip), freeblks->fb_quota);
(void) chkdq(ip, -datablocks, NOCRED, 0);
 #endif
freeblks->fb_chkcnt = -datablocks;


pgpQDka7ROEOo.pgp
Description: PGP signature


Re: 9-stable from i386 to amd64

2012-02-10 Thread Konstantin Belousov
On Sat, Feb 11, 2012 at 12:02:07AM -0600, Dan Nelson wrote:
> In the last episode (Feb 10), Randy Bush said:
> > is there a recipe for moving from i386 to amd64?
> > 
> > on a very remote system, i made the migration from 7.4 to 8.2 to 9.0, all
> > 32-bit.  it was done with repeated
> > 
> > make buildworld
> > make kernel.new [0]
> > nextboot -k kernel.new
> > reboot
> > make installworld
> > etc
> > 
> > [0] - well, there were some mv(1)s in there :)
> > 
> > so after it was happy with 9.0 i386, i went to move to amd64 with
> > 
> > make buildworld TARGET=amd64
> > make kernel TARGET=amd64 DESTDIR=kernel.new [0]
> > nextboot -k kernel.new
> > reboot
> > 
> > it did not come back from the reboot, and required a manual reset.  i have
> > no console access to the machine, not my choice.
> > 
> > clue bat please.
> 
> You probably got bit by a mismatched /libexec/ld-elf.so. The kernel expects
> that to be the "native" version, and on a 64-bit kernel it also expects a
> ld-elf32.so to be the "compat" 32-bit version.  When you rebooted onto the
> 64-bit kernel, it couldn't find /libexec/ld-elf32.so to run any of the
> 32-bit binaries on the system.  My guess is that your reboot attempt died in
> /sbin/init, prompting for a path to /bin/sh.  If you compiled with a static
> /bin/sh for performance, it probably died very early in /etc/rc.
These statements are false, esp. worrying is that they are
interwinned with some facts that get tilted to support false presumption.

Kernel do not care about which interpreter is /libexec/ld-elf.so.
The path to the interpreter is specified in the binary itself. So if you
have 32bit binary that put '/libexec/ld-elf.so.1' into PH_INTERP,
and /libexec/ld-elf.so.1 is 32bit, then amd64 kernel properly executes
that combination.

Kernel has a hack that falls back to try to use /libexec/ld-elf32.so.1
for some 'brands' of ELF images, in particular, for 32bit binaries. This
is done to help in situation when 32bit binaries also specified the
same path for interpreter.

If you have 32bit world installed and booted 64bit kernel, it will boot. 
It is the same as running 32bit world in the jail.
The management functions, like configuring network interfaces, ZFS
and many other system setup functionality does not work, indeed.
> 
> I think copying ld-elf.so over to ld-elf32.so might have been all you needed
> to boot, but that would end up with a 64-bit kernel running a true 32-bit
> userland with all the libraries in the "wrong" place, and your
> "installworld" step would replace them with their 64-bit equivalents and
> your install would die halfway through, leaving you with a large mess to
> clean up.
Absolute false.

> 
> The cleanest upgrade path is to prepare your 32-bit root to be bootable by
> both 32- and 64-bit kernels: copy the ld-elf32.so that was built during your
> buildworld over to /libexec/ld-elf32.so, and also make copies of
> /lib and /usr/lib to /lib32 and /usr/lib32 respectively.  That way when you
> reboot to a 64-bit kernel, your 32-bit executables will be running
> "correctly" out of compat32 paths and your installworld should succeed.
> 
> When I did all this on a local system, I made judicious use of ZFS snapshots
> and clones, preserving a bootable clone of my original system plus
> intermediate versions all the way until I was happy with the result.  I've
> never done it completely remotely, but if you do a trial run or two on a
> local machine or VM, you should be able to it confidently remotely.
> 
> -- 
>   Dan Nelson
>   dnel...@allantgroup.com
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pgp1XcZGXdzR3.pgp
Description: PGP signature


Re: ZFS + nullfs + Linuxulator = panic?

2012-02-14 Thread Konstantin Belousov
On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last 
> built 2012-02-08).  It will panic during the daily periodic scripts that run 
> at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer = 0x20:0x8069d266
> stack pointer   = 0x28:0xff8094b90390
> frame pointer   = 0x28:0xff8094b903a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process = 72566 (ps)
> trap number = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0x8062cf8e at kdb_backtrace+0x5e
> #1 0x805facd3 at panic+0x183
> #2 0x808e6c20 at trap_fatal+0x290
> #3 0x808e715a at trap+0x10a
> #4 0x808cec64 at calltrap+0x8
> #5 0x805ee034 at fill_kinfo_thread+0x54
> #6 0x805eee76 at fill_kinfo_proc+0x586
> #7 0x805f22b8 at sysctl_out_proc+0x48
> #8 0x805f26c8 at sysctl_kern_proc+0x278
> #9 0x8060473f at sysctl_root+0x14f
> #10 0x80604a2a at userland_sysctl+0x14a
> #11 0x80604f1a at __sysctl+0xaa
> #12 0x808e62d4 at amd64_syscall+0x1f4
> #13 0x808cef5c at Xfast_syscall+0xfc

Please look up the line number for the fill_kinfo_thread+0x54.


pgpJipexj3Uac.pgp
Description: PGP signature


Re: disk devices speed is ugly

2012-02-14 Thread Konstantin Belousov
On Wed, Feb 15, 2012 at 07:02:58AM +1100, Peter Jeremy wrote:
> On 2012-Feb-13 08:28:21 -0500, Gary Palmer  wrote:
> >The filesystem is the *BEST* place to do caching.  It knows what metadata
> >is most effective to cache and what other data (e.g. file contents) doesn't
> >need to be cached.
> 
> Agreed.
> 
> >  Any attempt to do this in layers between the FS and
> >the disk won't achieve the same gains as a properly written filesystem. 
> 
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
Block caching is still there, only user-accessible interface was removed.
UFS utilizes the buffer cache for the device which carries the volume,
for metadata caching. There are some memory areas in UFS which can be
classified as caches on its own, but their existence is mostly to support
operation, and not caching (e.g. the inodeblock copy accompaniying each
inode).

> 
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
> 
> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
Other filesystems which use vfs_bio, like cd9660 or ufs, use the same
disk cache layer as UFS.


pgpqbAGs3GLrm.pgp
Description: PGP signature


Re: disk devices speed is ugly

2012-02-15 Thread Konstantin Belousov
On Wed, Feb 15, 2012 at 12:27:19AM -0600, Adam Vande More wrote:
> On Tue, Feb 14, 2012 at 10:50 PM, Scott Long  wrote:
> 
> >
> > Any filesystem that uses bread/bwrite/cluster_read are already using the
> > "generic caching subsystem" that you propose.  This includes UDF, CD9660,
> > MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local storage
> > filesystem in the tree except for ZFS.  Not all of them implement
> > VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations for the vnode
> > pager, not requirements for using buffer-cache services on block devices.
> >  As Kostik pointed out in a parallel email, the only thing that was removed
> > from FreeBSD was the userland interface to cached devices via /dev nodes.
> >
> 
> Does this mean the Architecture Handbook page is wrong?:
> 
> http://www.freebsd.org/doc/en/books/arch-handbook/driverbasics-block.html

No, why did you decided that it is wrong ?


pgpObX1YCs2ug.pgp
Description: PGP signature


Re: ZFS + nullfs + Linuxulator = panic?

2012-02-16 Thread Konstantin Belousov
On Thu, Feb 16, 2012 at 10:09:27AM -0500, Paul Mather wrote:
> On Feb 14, 2012, at 7:47 PM, Konstantin Belousov wrote:
> 
> > On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> >> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, 
> >> last built 2012-02-08).  It will panic during the daily periodic scripts 
> >> that run at 3am.  Here is the most recent panic message:
> >> 
> >> Fatal trap 9: general protection fault while in kernel mode
> >> cpuid = 0; apic id = 00
> >> instruction pointer = 0x20:0x8069d266
> >> stack pointer   = 0x28:0xff8094b90390
> >> frame pointer   = 0x28:0xff8094b903a0
> >> code segment= base 0x0, limit 0xf, type 0x1b
> >>= DPL 0, pres 1, long 1, def32 0, gran 1
> >> processor eflags= resume, IOPL = 0
> >> current process = 72566 (ps)
> >> trap number = 9
> >> panic: general protection fault
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> #0 0x8062cf8e at kdb_backtrace+0x5e
> >> #1 0x805facd3 at panic+0x183
> >> #2 0x808e6c20 at trap_fatal+0x290
> >> #3 0x808e715a at trap+0x10a
> >> #4 0x808cec64 at calltrap+0x8
> >> #5 0x805ee034 at fill_kinfo_thread+0x54
> >> #6 0x805eee76 at fill_kinfo_proc+0x586
> >> #7 0x805f22b8 at sysctl_out_proc+0x48
> >> #8 0x805f26c8 at sysctl_kern_proc+0x278
> >> #9 0x8060473f at sysctl_root+0x14f
> >> #10 0x80604a2a at userland_sysctl+0x14a
> >> #11 0x80604f1a at __sysctl+0xaa
> >> #12 0x808e62d4 at amd64_syscall+0x1f4
> >> #13 0x808cef5c at Xfast_syscall+0xfc
> > 
> > Please look up the line number for the fill_kinfo_thread+0x54.
> 
> 
> Is there a way for me to do this from the above information? As
> I said in the original message, I failed to get a crash dump after
> reboot (because, it turns out, I hadn't set up my gmirror swap device
> properly). Alas, with the latest panic, it appears to have hung[1]
> during the "Dumping" phase, so it looks like I won't get a saved crash
> dump this time, either. :-(

Load the kernel.debug into kgdb, and from there do
"list *fill_kinfo_thread+0x54".


pgpYsD5idJdoe.pgp
Description: PGP signature


Re: ZFS + nullfs + Linuxulator = panic?

2012-02-17 Thread Konstantin Belousov
On Thu, Feb 16, 2012 at 12:07:46PM -0500, Paul Mather wrote:
> On Feb 16, 2012, at 10:49 AM, Konstantin Belousov wrote:
> 
> > On Thu, Feb 16, 2012 at 10:09:27AM -0500, Paul Mather wrote:
> >> On Feb 14, 2012, at 7:47 PM, Konstantin Belousov wrote:
> >> 
> >>> On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
> >>>> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, 
> >>>> last built 2012-02-08).  It will panic during the daily periodic scripts 
> >>>> that run at 3am.  Here is the most recent panic message:
> >>>> 
> >>>> Fatal trap 9: general protection fault while in kernel mode
> >>>> cpuid = 0; apic id = 00
> >>>> instruction pointer = 0x20:0x8069d266
> >>>> stack pointer   = 0x28:0xff8094b90390
> >>>> frame pointer   = 0x28:0xff8094b903a0
> >>>> code segment= base 0x0, limit 0xf, type 0x1b
> >>>>   = DPL 0, pres 1, long 1, def32 0, gran 1
> >>>> processor eflags= resume, IOPL = 0
> >>>> current process = 72566 (ps)
> >>>> trap number = 9
> >>>> panic: general protection fault
> >>>> cpuid = 0
> >>>> KDB: stack backtrace:
> >>>> #0 0x8062cf8e at kdb_backtrace+0x5e
> >>>> #1 0x805facd3 at panic+0x183
> >>>> #2 0x808e6c20 at trap_fatal+0x290
> >>>> #3 0x808e715a at trap+0x10a
> >>>> #4 0x808cec64 at calltrap+0x8
> >>>> #5 0x805ee034 at fill_kinfo_thread+0x54
> >>>> #6 0x805eee76 at fill_kinfo_proc+0x586
> >>>> #7 0x805f22b8 at sysctl_out_proc+0x48
> >>>> #8 0x805f26c8 at sysctl_kern_proc+0x278
> >>>> #9 0x8060473f at sysctl_root+0x14f
> >>>> #10 0x80604a2a at userland_sysctl+0x14a
> >>>> #11 0x80604f1a at __sysctl+0xaa
> >>>> #12 0x808e62d4 at amd64_syscall+0x1f4
> >>>> #13 0x808cef5c at Xfast_syscall+0xfc
> >>> 
> >>> Please look up the line number for the fill_kinfo_thread+0x54.
> >> 
> >> 
> >> Is there a way for me to do this from the above information? As
> >> I said in the original message, I failed to get a crash dump after
> >> reboot (because, it turns out, I hadn't set up my gmirror swap device
> >> properly). Alas, with the latest panic, it appears to have hung[1]
> >> during the "Dumping" phase, so it looks like I won't get a saved crash
> >> dump this time, either. :-(
> > 
> > Load the kernel.debug into kgdb, and from there do
> > "list *fill_kinfo_thread+0x54".
> 
> 
> gromit# kgdb /usr/obj/usr/src/sys/GENERIC/kernel.debug
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> (kgdb) list *fill_kinfo_thread+0x54
> 0x805ee034 is in fill_kinfo_thread 
> (/usr/src/sys/kern/kern_proc.c:854).
> 849 thread_lock(td);
> 850 if (td->td_wmesg != NULL)
> 851 strlcpy(kp->ki_wmesg, td->td_wmesg, 
> sizeof(kp->ki_wmesg));
> 852 else
> 853 bzero(kp->ki_wmesg, sizeof(kp->ki_wmesg));
> 854 strlcpy(kp->ki_ocomm, td->td_name, sizeof(kp->ki_ocomm));
> 855 if (TD_ON_LOCK(td)) {
> 856 kp->ki_kiflag |= KI_LOCKBLOCK;
> 857 strlcpy(kp->ki_lockname, td->td_lockname,
> 858 sizeof(kp->ki_lockname));
> (kgdb) 

This is indeed strange. It can only occur if td pointer is damaged.

Please, try to get a core and at least print the content of *td in this case.


pgp5KWcNFJhq0.pgp
Description: PGP signature


Re: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

2012-02-22 Thread Konstantin Belousov
On Wed, Feb 22, 2012 at 07:36:42PM +0530, Desai, Kashyap wrote:
> Hi,
> 
> I am doing some code changes in mps dirver. While working on those changes, I 
> come to know about something which is new to me.
> Some expert help is required to clarify my doubt.
> 
> 1. When any irq is register with FreeBSD OS, it sets " TDP_NOSLEEPING" pflag. 
> It means though irq in freebsd is treated as thread,
> We cannot sleep in IRQ because of " "TDP_NOSLEEPING " set.
> 2. In mps driver we have below code snippet in ISR routine.
> 
> 
> mps_dprint(sc, MPS_TRACE, "%s\n", __func__);
> mps_lock(sc);
> mps_intr_locked(data);
> mps_unlock(sc);
> 
> I wonder why there is no issue with above code ? Theoretical we cannot sleep 
> in ISR. (as explained in #1)
> Any thoughts ?
> 
> 
> 3. I recently added few place msleep() instead of DELAY in ISR context and I 
> see 
> " Trying sleep, but thread marked as sleeping prohibited".
> 
FreeBSD has several basic ways to prevent a thread from executing on CPU.
They mostly fall into two categories: bounded sleep, sometimes called
blocking, and unbounded sleep, usually abbreviated as sleep. The bounded
there refers to amount of code executed by other thread that hold resource
preventing blocked thread from making a progress.

Examples of the blocking primitives are mutexes, rw locks and rm locks.
The blocking is not counted as sleeping, so interrupt threads, which are
designated as non-sleeping, still can lock mutexes.

Examples of the sleeping primitives are msleep(), sx locks, lockmgr locks
and conditional variables.

In essence, the locking facilities are split into several classes that
form the hierarchy, and you cannot legally obtain the lock of higher class
while holding a lock of lower class:
spin mutexes -> blocking locks -> sleeping locks.
It establishes some meta-order on the all locks.

Does this make sense ?


pgpdX3XMm8XDr.pgp
Description: PGP signature


Re: panic in 8.3-PRERELEASE

2012-02-22 Thread Konstantin Belousov
On Wed, Feb 22, 2012 at 11:29:40AM -0500, Rick Macklem wrote:
> Hiroki Sato wrote:
> > Hi,
> > 
> > Just a report, but I got the following panic on an NFS server running
> > 8.3-PRERELEASE:
> > 
> > (from here)
> > pool.allbsd.org dumped core - see /var/crash/vmcore.0
> > 
> > Tue Feb 21 10:59:44 JST 2012
> > 
> > FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #7: Thu
> > Feb 16 19:29:19 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL
> > amd64
> > 
> > panic: Assertion lock == sq->sq_lock failed at
> > /usr/src/sys/kern/subr_sleepqueue.c:335
> > 
> Oops, I didn't know that mixing msleep() and tsleep() calls on the same
> event wasn't allowed.
> There are two places in the code where it did a:
>   mtx_unlock();
>   tsleep();
> left over from the days when it was written for OpenBSD.
This sequence allows to lost the wakeup which is happen right after
cache unlock (together with clearing the RC_WANTED flag) but before
the thread enters sleep state. The tsleep has a timeout so thread should
recover in 10 seconds, but still.

Anyway, you should use consistent outer lock for the same wchan, i.e.
no lock (tsleep) or mtx (msleep), but not mix them.
> 
> I don't think the mix would actually break anything, except that the
> MPASS() assertion fails, but I've cc'd jhb@ since he seems to have been
> the author of the sleep() stuff.
> 
> Anyhow, please try the attached patch which replaces the mtx_unlock(); 
> tsleep(); with
> msleep()s using PDROP. If the attachment gets lost, the patch is also here:
>   http://people.freebsd.org/~rmacklem/tsleep.patch
> 
> Thanks for reporting this, rick
> ps: Is mtx_lock() now preferred over msleep()?
What do you mean ?


pgp1W95ytq8Xp.pgp
Description: PGP signature


Re: panic in 8.3-PRERELEASE

2012-02-23 Thread Konstantin Belousov
On Wed, Feb 22, 2012 at 06:53:55PM -0500, Rick Macklem wrote:
> John Baldwin wrote:
> > On Wednesday, February 22, 2012 2:24:14 pm Konstantin Belousov wrote:
> > > On Wed, Feb 22, 2012 at 11:29:40AM -0500, Rick Macklem wrote:
> > > > Hiroki Sato wrote:
> > > > > Hi,
> > > > >
> > > > > Just a report, but I got the following panic on an NFS server
> > > > > running
> > > > > 8.3-PRERELEASE:
> > > > >
> > > > > (from here)
> > > > > pool.allbsd.org dumped core - see /var/crash/vmcore.0
> > > > >
> > > > > Tue Feb 21 10:59:44 JST 2012
> > > > >
> > > > > FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE
> > > > > #7: Thu
> > > > > Feb 16 19:29:19 JST 2012
> > > > > h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL
> > > > > amd64
> > > > >
> > > > > panic: Assertion lock == sq->sq_lock failed at
> > > > > /usr/src/sys/kern/subr_sleepqueue.c:335
> > > > >
> > > > Oops, I didn't know that mixing msleep() and tsleep() calls on the
> > > > same
> > > > event wasn't allowed.
> > > > There are two places in the code where it did a:
> > > >   mtx_unlock();
> > > >   tsleep();
> > > > left over from the days when it was written for OpenBSD.
> > > This sequence allows to lost the wakeup which is happen right after
> > > cache unlock (together with clearing the RC_WANTED flag) but before
> > > the thread enters sleep state. The tsleep has a timeout so thread
> > > should
> > > recover in 10 seconds, but still.
> > >
> > > Anyway, you should use consistent outer lock for the same wchan,
> > > i.e.
> > > no lock (tsleep) or mtx (msleep), but not mix them.
> > 
> > Correct.
> > 
> > > > I don't think the mix would actually break anything, except that
> > > > the
> > > > MPASS() assertion fails, but I've cc'd jhb@ since he seems to have
> > > > been
> > > > the author of the sleep() stuff.
> > > >
> > > > Anyhow, please try the attached patch which replaces the
> > > > mtx_unlock();
> > tsleep(); with
> > > > msleep()s using PDROP. If the attachment gets lost, the patch is
> > > > also
> > here:
> > > >   http://people.freebsd.org/~rmacklem/tsleep.patch
> > > >
> > > > Thanks for reporting this, rick
> > > > ps: Is mtx_lock() now preferred over msleep()?
> > > What do you mean ?
> > 
> > mtx_sleep() is preferred over msleep(), but I doubt I will remove
> > msleep()
> > anytime soon.
> > 
> Ok, I'll redo the patch with mtx_sleep() and get one of you guys to
> review it.
I do not see a need in the changing to mtx_sleep, esp. if other places
in nfsd use msleep(). There are more then 570 uses of msleep(9) in the
kernel, and undefined number of them in third-party modules.

> 
> One question. Do you think this is serious enough to worry about for
> 8.3? (Just wondering if I need to rush a patch into head with a 1 week
> MFC. I realize it would still be up to re@, even if I rush it.)
I think it is usual routine bugfix, which is as good to have in release
as any other bugfix. 8.3 is in the stabilization period, made exactly
for pushing bugfixes.


pgpsgOJ3O3iIR.pgp
Description: PGP signature


Re: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

2012-02-23 Thread Konstantin Belousov
On Thu, Feb 23, 2012 at 05:52:12AM +0530, Desai, Kashyap wrote:
> 
> 
> > -Original Message-
> > From: Konstantin Belousov [mailto:kostik...@gmail.com]
> > Sent: Thursday, February 23, 2012 12:45 AM
> > To: Desai, Kashyap
> > Cc: freebsd-s...@freebsd.org; freebsd-stable; Justin T. Gibbs; Kenneth
> > D. Merry; McConnell, Stephen
> > Subject: Re: mpslsi0 : Trying sleep, but thread marked as sleeping
> > prohibited
> > 
> > On Wed, Feb 22, 2012 at 07:36:42PM +0530, Desai, Kashyap wrote:
> > > Hi,
> > >
> > > I am doing some code changes in mps dirver. While working on those
> > changes, I come to know about something which is new to me.
> > > Some expert help is required to clarify my doubt.
> > >
> > > 1. When any irq is register with FreeBSD OS, it sets " TDP_NOSLEEPING"
> > > pflag. It means though irq in freebsd is treated as thread, We cannot
> > sleep in IRQ because of " "TDP_NOSLEEPING " set.
> > > 2. In mps driver we have below code snippet in ISR routine.
> > >
> > >
> > > mps_dprint(sc, MPS_TRACE, "%s\n", __func__);
> > > mps_lock(sc);
> > > mps_intr_locked(data);
> > > mps_unlock(sc);
> > >
> > > I wonder why there is no issue with above code ? Theoretical we cannot
> > > sleep in ISR. (as explained in #1) Any thoughts ?
> > >
> > >
> > > 3. I recently added few place msleep() instead of DELAY in ISR context
> > > and I see " Trying sleep, but thread marked as sleeping prohibited".
> > >
> > FreeBSD has several basic ways to prevent a thread from executing on
> > CPU.
> > They mostly fall into two categories: bounded sleep, sometimes called
> > blocking, and unbounded sleep, usually abbreviated as sleep. The bounded
> > there refers to amount of code executed by other thread that hold
> > resource preventing blocked thread from making a progress.
> > 
> > Examples of the blocking primitives are mutexes, rw locks and rm locks.
> > The blocking is not counted as sleeping, so interrupt threads, which are
> > designated as non-sleeping, still can lock mutexes.
> Thanks for the tech help.  . 
> 
> As per you comment, So now I understood as "TDP_NOSLEEPING" is only
> for unbounded sleep restriction. Just curious to know, What is a
> reason that thread can do blocking sleep but can't do unbounded sleep
> ? Since technically we introduced sleeping restriction on interrupt
> thread is to avoid starvation and that can be fit with either of the
> sleep type. Is this not true ?
No, not to avoid starvation.

The intent of the blocking primitives is to acquire resources for limited
amount of time. In other words, you never take a mutex for undefinitely
long computation process. On the other hand, msleep sleep usually has
no limitations.

You do not want the interrupt thread to be put off the processor for
undefined time, so sleep is prohibited.

Another issue is that sleeping locks do not do priority propagation
to the resource owners, while turnstiles used for blocking do. This way,
interrupt thread waiting for mutex donates its priority to the current
mutex owner, or at least it shall do.

> 
> I will be able to progress on my work based on your comment. A much thanks 
> for correcting my doubt.
> 
> ~ Kashyap
> 
> > 
> > Examples of the sleeping primitives are msleep(), sx locks, lockmgr
> > locks and conditional variables.
> > 
> > In essence, the locking facilities are split into several classes that
> > form the hierarchy, and you cannot legally obtain the lock of higher
> > class while holding a lock of lower class:
> > spin mutexes -> blocking locks -> sleeping locks.
> > It establishes some meta-order on the all locks.
> > 
> > Does this make sense ?
> 


pgpmeKadis1Mc.pgp
Description: PGP signature


Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Konstantin Belousov
On Thu, Feb 23, 2012 at 11:45:58PM +0900, Hiroki Sato wrote:
> Hi,
> 
>  This is another reproducible panic.  This seems to happen only when
>  top(1) is running for a long time (a sysctl() call for
>  CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it).
> 
> 
> pool.allbsd.org dumped core - see /var/crash/vmcore.0
> 
> Thu Feb 23 23:21:52 JST 2012
> 
> FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 23 
> 04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  amd64
> 
> panic:
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 4; apic id = 04
> fault virtual address = 0x800e96000
> fault code= supervisor write data, protection violation
> instruction pointer   = 0x20:0x809440cb
> stack pointer = 0x28:0xff86c63890b0
> frame pointer = 0x28:0xff86c6389100
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 47211 (top)
> lock order reversal: (Giant after non-sleepable)
>  1st 0xff0244b85568 process lock (process lock) @ 
> /usr/src/sys/kern/kern_proc.c:1211
>  2nd 0x80d74c80 Giant (Giant) @ /usr/src/sys/dev/usb/input/ukbd.c:2018
> KDB: stack backtrace:
> Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort)  (CTRL-C 
> to abort) ..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
> /boot/kernel/geom_mirror.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/geom_mirror.ko
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
> /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
> /boot/kernel/ipfw.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ipfw.ko
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> 263   if (textdump_pending)
> (kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> #1  0x801f8cfc in db_fncall (dummy1=Variable "dummy1" is not 
> available.
> )
> at /usr/src/sys/ddb/db_command.c:548
> #2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
> cmd_table=Variable "cmd_table" is not available.
> 
> ) at /usr/src/sys/ddb/db_command.c:445
> #3  0x801f9280 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:498
> #4  0x801fb369 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #5  0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000)
> at /usr/src/sys/kern/subr_kdb.c:548
> #6  0x809461ed in trap_fatal (frame=0xff86c6389000, eva=Variable 
> "eva" is not available.
> )
> at /usr/src/sys/amd64/amd64/trap.c:820
> #7  0x809468b5 in trap (frame=0xff86c6389000)
> at /usr/src/sys/amd64/amd64/trap.c:326
> #8  0x8092d2f4 in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:228
> #9  0x809440cb in copyout () at /usr/src/sys/amd64/amd64/support.S:258
> #10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0,
> p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276
> #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470,
> req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
> #12 0x8065ff6c in sysctl_out_proc (p=0xff0244b85470,
> req=0xff86c63899c0, flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/kern_proc.c:1114
> #13 0x8066245e in sysctl_kern_proc (oidp=Variable "oidp" is not 
> available.
> )
> at /usr/src/sys/kern/kern_proc.c:1302
> #14 0x806756e8 in sysctl_root (oidp=Variable "oidp" is not available.
> )
> at /usr/src/sys/kern/kern_sysctl.c:1455
> #15 0x8067598e in userland_sysctl (td=0x0, name=0xff86c6389a80,
> namelen=3, old=0x800e96000, oldlenp=Variable "oldlenp" is not available.
> )
> at /usr/src/sys/kern/kern_sysctl.c:1565
> #16 0x80675e3a in __sysctl (td=0xff0396ec5460,
> uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
> #17 0x80945809 in amd64_syscall (td=0xff0396ec5460, traced=0)
> at subr_syscall.c:114

Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Konstantin Belousov
On Fri, Feb 24, 2012 at 04:33:36PM +0200, Konstantin Belousov wrote:
> On Thu, Feb 23, 2012 at 11:45:58PM +0900, Hiroki Sato wrote:
> > Hi,
> > 
> >  This is another reproducible panic.  This seems to happen only when
> >  top(1) is running for a long time (a sysctl() call for
> >  CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it).
> > 
> > 
> > pool.allbsd.org dumped core - see /var/crash/vmcore.0
> > 
> > Thu Feb 23 23:21:52 JST 2012
> > 
> > FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 
> > 23 04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  
> > amd64
> > 
> > panic:
> > 
> > GNU gdb 6.1.1 [FreeBSD]
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain 
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "amd64-marcel-freebsd"...
> > 
> > Unread portion of the kernel message buffer:
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 4; apic id = 04
> > fault virtual address   = 0x800e96000
> > fault code  = supervisor write data, protection violation
> > instruction pointer = 0x20:0x809440cb
> > stack pointer   = 0x28:0xff86c63890b0
> > frame pointer   = 0x28:0xff86c6389100
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 47211 (top)
> > lock order reversal: (Giant after non-sleepable)
> >  1st 0xff0244b85568 process lock (process lock) @ 
> > /usr/src/sys/kern/kern_proc.c:1211
> >  2nd 0x80d74c80 Giant (Giant) @ 
> > /usr/src/sys/dev/usb/input/ukbd.c:2018
> > KDB: stack backtrace:
> > Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort)  
> > (CTRL-C to abort) ..41%..51%..61%..71%..81%..91%
> > 
> > Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
> > /boot/kernel/geom_mirror.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/geom_mirror.ko
> > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> > /boot/kernel/zfs.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/zfs.ko
> > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
> > /boot/kernel/opensolaris.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/opensolaris.ko
> > Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
> > /boot/kernel/ipfw.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ipfw.ko
> > #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> > 263 if (textdump_pending)
> > (kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> > #1  0x801f8cfc in db_fncall (dummy1=Variable "dummy1" is not 
> > available.
> > )
> > at /usr/src/sys/ddb/db_command.c:548
> > #2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
> > cmd_table=Variable "cmd_table" is not available.
> > 
> > ) at /usr/src/sys/ddb/db_command.c:445
> > #3  0x801f9280 in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:498
> > #4  0x801fb369 in db_trap (type=Variable "type" is not available.
> > ) at /usr/src/sys/ddb/db_main.c:229
> > #5  0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000)
> > at /usr/src/sys/kern/subr_kdb.c:548
> > #6  0x809461ed in trap_fatal (frame=0xff86c6389000, 
> > eva=Variable "eva" is not available.
> > )
> > at /usr/src/sys/amd64/amd64/trap.c:820
> > #7  0x809468b5 in trap (frame=0xff86c6389000)
> > at /usr/src/sys/amd64/amd64/trap.c:326
> > #8  0x8092d2f4 in calltrap ()
> > at /usr/src/sys/amd64/amd64/exception.S:228
> > #9  0x809440cb in copyout () at 
> > /usr/src/sys/amd64/amd64/support.S:258
> > #10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0,
> > p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276
> > #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470,
> > req=0xf

Re: another panic in 8.3-PRERELEASE

2012-02-28 Thread Konstantin Belousov
On Sat, Feb 25, 2012 at 02:58:28AM +0900, Hiroki Sato wrote:
> Konstantin Belousov  wrote
>   in <20120224150259.gv55...@deviant.kiev.zoral.com.ua>:
> 
> ko> > > #19 0x000800abecfc in ?? ()
> ko> > > Previous frame inner to this frame (corrupt stack?)
> ko> > > (kgdb)
> ko> > Can you, please, print out the content of *td, e.g. from the frame 16 ?
> ko> 
> ko> And *req from the frame 11, please.
> 
>  Here:
> 
> (kgdb) f 16
> #16 0x80675e3a in __sysctl (td=0xff0396ec5460, 
> uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
> 1491  error = userland_sysctl(td, name, uap->namelen,
> (kgdb) print *td
> $2 = {td_lock = 0x80d7f540, td_proc = 0xff03969bf470, td_plist = {
> tqe_next = 0x0, tqe_prev = 0xff03969bf480}, td_runq = {tqe_next = 
> 0x0, 
> tqe_prev = 0x80d7f788}, td_slpq = {tqe_next = 0x0, 
> tqe_prev = 0xff0396ebe800}, td_lockq = {tqe_next = 0x0, 
> tqe_prev = 0xff86c57b48a0}, td_cpuset = 0xff0005789dc8, 
>   td_sel = 0xff01b5dd0500, td_sleepqueue = 0xff0396ebe800, 
>   td_turnstile = 0xff01334cf600, td_umtxq = 0xff0396ec3a80, 
>   td_tid = 100763, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, 
> sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, 
>   tqh_last = 0xff0396ec5500}, sq_proc = 0xff03969bf470, 
> sq_flags = 1}, td_flags = 65540, td_inhibitors = 0, td_pflags = 0, 
>   td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, 
>   td_lastcpu = 4 '\004', td_oncpu = 4 '\004', td_owepreempt = 0 '\0', 
>   td_tsqueue = 255 '?', td_locks = 4, td_rw_rlocks = 0, td_lk_slocks = 0, 
>   td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, 
>   td_sleeplocks = 0x80ecebf0, td_intr_nesting_level = 0, 
>   td_pinned = 0, td_ucred = 0xff007d537b00, td_estcpu = 0, td_slptick = 
> 0, 
>   td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
>   tv_sec = 0, tv_usec = 0}, ru_maxrss = 1864, ru_ixrss = 66288, 
> ru_idrss = 1347856, ru_isrss = 176768, ru_minflt = 263901, ru_majflt = 
> 10, 
> ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, 
> ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14937, ru_nivcsw = 3286}, 
>   td_incruntime = 0, td_runtime = 15204044088, td_pticks = 15, td_sticks = 
> 15, 
>   td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 
>   0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 18223, 
>   td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, 
>   td_profil_addr = 0, td_profil_ticks = 0, 
>   td_name = "top", '\0' , td_fpop = 0x0, td_dbgflags = 0, 
>   td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {
>   si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, 
>   si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, 
> sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {
> _fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, 
> _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 
> 0, 
>   __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 0, 
> ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, 
> osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, 
>   td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', 
>   td_pri_class = 3 '\003', td_user_pri = 129 '\201', 
>   td_base_user_pri = 129 '\201', td_pcb = 0xff86c6389d10, 
>   td_state = TDS_RUNNING, td_retval = {0, 34375032832}, td_slpcallout = {
> c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
> tqe_prev = 0xff800042ccd0}}, c_time = 51568077, 
> c_arg = 0xff0396ec5460, c_func = 0x806a84c0 , 
> c_lock = 0x0, c_flags = 18, c_cpu = 4}, td_frame = 0xff86c6389c50, 
>   td_kstack_obj = 0xff03410b20d8, td_kstack = 18446743553049124864, 
>   td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, 
>   td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, 
>   td_sched = 0xff0396ec5890, td_ar = 0x0, td_syscalls = 469926, 
>   td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, 
>   td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {
> rux_runtime = 15204044088, rux_uticks = 226, rux_sticks = 1140, 
> rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, 
>   td_map_def_user = 0x0, td_dbg_forked = 0}
> (kgdb) f 11
> #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6

Re: [CFT] modular kernel config

2012-03-02 Thread Konstantin Belousov
On Fri, Mar 02, 2012 at 02:33:17PM +0100, Alexander Leidinger wrote:
> Quoting Pavel Timofeev  (from Thu, 1 Mar 2012  
> 10:35:17 +0400):
> 
> >I have just tried lastest configs and see following messages while  
> >kernel boot:
> >
> >Copyright (c) 1992-2012 The FreeBSD Project.
> >Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> >The Regents of the University of California. All rights reserved.
> >FreeBSD is a registered trademark of The FreeBSD Foundation.
> >FreeBSD 10.0-CURRENT #0: Wed Feb 29 22:47:35 MSK 2012
> >mox@rock:/usr/obj/usr/src/sys/SMALL amd64
> >WARNING: WITNESS option enabled, expect reduced performance.
> >link_elf_obj: symbol xpt_create_path undefined
> >KLD file hptiop.ko - could not finalize loading
> >link_elf_obj: symbol firmware_get undefined
> >KLD file isp.ko - could not finalize loading
> >link_elf_obj: symbol xpt_freeze_simq undefined
> >KLD file mps.ko - could not finalize loading
> >link_elf_obj: symbol cam_simq_alloc undefined
> >KLD file hptmv.ko - could not finalize loading
> >CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz (2906.39-MHz  
> >K8-class CPU)
> >
> >Don't you know why do I get it?
> 
> The xpt_* symbols are all in the cam module. If you downloaded the  
> loader.conf the cam module should be surely there (except you lost  
> it). Without the cam module I can understand the messages about xpt_*,  
> with the cam module I can't (I speicied cam before hptiop/mps/hptmv).
> 
> Regarding the firmware_get module I changed the order of module  
> loading in the loader.conf, I've put the firmware_load to the front to  
> load it before a lot of other modules. Theoretically this should solve  
> the isse.

The issue there, at least with mps(4), is that the driver erronously
lacks the line
MODULE_DEPEND(mps, cam, 1, 1, 1);
somewhere in mps_pci.c to record dependency on the cam(4).

Never looked at the other drivers, but I suspect that the problem is same.


pgplaHC3ZoGuA.pgp
Description: PGP signature


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread Konstantin Belousov
On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
> Thanks for your email, Chuck.
> 
> > > Conversely, if a page *does not* occur in the resident
> > > memory of any process, it must not occupy any space in the active +
> > > inactive lists.
> > 
> > Hmm...if a process gets swapped out entirely, the pages for it will be 
> > moved 
> > to the cache list, flushed, and then reused as soon as the disk I/O 
> > completes. 
> >   But there is a window where the process can be marked as swapped out (and 
> > considered no longer resident), but still has some of it's pages in 
> > physical 
> > memory.
> 
> There's no swapping happening on these machines (intentionally so,
> because as soon as we hit swap everything goes tits up), so this window
> doesn't concern me.
> 
> I'm trying to confirm that, on a system with no pages swapped out, that
> the following is a true statement:
> 
> a page is accounted for in active + inactive if and only if it
> corresponds to one or more of the pages accounted for in the
> resident memory lists of all the processes on the system (as per
> the output of 'top' and 'ps')
No.

The pages belonging to vnode vm object can be active or inactive or cached
but not mapped into any process address space.


pgpsL0ZHiYz3A.pgp
Description: PGP signature


Re: Text relocations in kernel modules

2012-03-30 Thread Konstantin Belousov
On Fri, Mar 30, 2012 at 01:38:14PM -0400, Richard Yao wrote:
> As a disclaimer, I would like to clarify that Gentoo/FreeBSD uses a
> FreeBSD userland and that Gentoo/FreeBSD has nothing to do with Debian
> GNU/kFreeBSD. People seem to think Gentoo/FreeBSD is related to Debian
> GNU/kFreeBSD, which has made collaboration difficult.
> 
> With that said, Gentoo Portage is warning about text relocations in
> kernel modules. This is in a Gentoo/FreeBSD port of
> emulators/freebsd-kmod that I wrote. For instance, I see:
> 
> # readelf -d /boot/modules/virtio.ko
> 
> Dynamic section at offset 0x2f6c contains 13 entries:
>   TagType Name/Value
>  0x0004 (HASH)   0xd4
>  0x6ef5 (GNU_HASH)   0x238
>  0x0005 (STRTAB) 0x4a8
>  0x0006 (SYMTAB) 0x298
>  0x000a (STRSZ)  397 (bytes)
>  0x000b (SYMENT) 16 (bytes)
>  0x0011 (REL)0x638
>  0x0012 (RELSZ)  1568 (bytes)
>  0x0013 (RELENT) 8 (bytes)
>  0x0016 (TEXTREL)0x0
>  0x001e (FLAGS)  TEXTREL
>  0x6ffa (RELCOUNT)   108
>  0x (NULL)   0x0
> 
> Checking /boot/kernel, it seems that all modules have text relocations.
> My Gentoo/FreeBSD install is a 32-bit chroot on a ZFS Guru install of
> amd64 FreeBSD. amd64 FreeBSD does not appear to have any text relocations.
> 
> I don't have a reference i386 install, but according to frogs in
> ##freebsd on freenode, his i386 FreeBSD also has text relocations.
> 
> Is this a bug?
No. This is by design.

Why do you consider this a bug ?
> 
> Yours truly,
> Richard Yao
> 
> On 03/30/12 12:49, Richard Yao wrote:
> > Dear Ports Maintainers and kuriyama,
> > 
> > emulators/freebsd-kmod has a typo in pkg-descr, where it says "lodable"
> > instead of "loadable".
> > 
> > In addition, I have done the work necessary to port
> > emulators/freebsd-kmod to Gentoo/FreeBSD.
> > 
> > https://bugs.gentoo.org/show_bug.cgi?id=410199
> > 
> > The ebuild contains a few improvements on the original FreeBSD port
> > where we copy only the parts of SYSDIR that we need to build the module.
> > We also do hardlinks instead of copies when Gentoo Portage builds with
> > user privileges.
> > 
> > The NEEDSUBDIRS part of the ebuild was written by naota AT gentoo.org as
> > part of Gentoo's review process. I have permission from him to upstream
> > the improvements we made on the port. Feel free to adopt any
> > improvements in the attachments in that bug report.
> > 
> > Lastly, I have sent an email to gentoo-dev AT lists.gentoo.org and
> > gentoo-bsd AT lists.gentoo.org requesting that the FreeBSD specific
> > parts of the portage tree be relicensed under terms of the BSD-2
> > license. With a little luck, it will be possible to upstream
> > improvements made in Gentoo/FreeBSD without any hassle in the future.
> > 
> > Yours truly,
> > Richard Yao
> > 
> 
> 




pgpl5RLyaPw25.pgp
Description: PGP signature


Re: Text relocations in kernel modules

2012-03-30 Thread Konstantin Belousov
On Fri, Mar 30, 2012 at 04:11:29PM -0400, Richard Yao wrote:
> On 03/30/12 15:46, Konstantin Belousov wrote:
> > On Fri, Mar 30, 2012 at 03:42:22PM -0400, Richard Yao wrote:
> >> On 03/30/12 15:07, Konstantin Belousov wrote:
> >>>> Is this a bug?
> >>> No. This is by design.
> >>>
> >>> Why do you consider this a bug ?
> >>
> >> It occurs on i386, but not amd64. It could be that something is wrong
> >> with how things are being compiled i386, or it could be that i386
> >> requires things to be compiled this way. I do not know which.
> >>
> > Again, let me repeat my question. Why do you consider the presence
> > of relocations against text section a problem ?
> 
> The linker emits warnings:
> i686-gentoo-freebsd9.0-ld: warning: creating a DT_TEXTREL in object.
> 
> Furthermore, this triggers a QA check in Gentoo/FreeBSD's package manager.
> 
>  * QA Notice: The following files contain runtime text relocations
>  *  Text relocations force the dynamic linker to perform extra
>  *  work at startup, waste system resources, and may pose a security
>  *  risk.  On some architectures, the code may not even function
>  *  properly, if at all.
>  *  For more information, see http://hardened.gentoo.org/pic-fix-guide.xml
>  *  Please include the following list of files in your report:
>  * TEXTREL boot/modules/if_vtnet.ko
>  * TEXTREL boot/modules/virtio_blk.ko
>  * TEXTREL boot/modules/virtio.ko
>  * TEXTREL boot/modules/virtio_balloon.ko
>  * TEXTREL boot/modules/virtio_pci.ko
> 
> I wrote that ebuild as part of something entirely unrelated. If it is a
> feature, I can disable the QA check, but I should at least know why the
> text relocations are needed.
> 
> Gentoo maintainers are expected to patch text relocations and send
> patches upstream. The only exception is in the case of binary packages,
> which they cannot patch.
> 
> Investigating the text relocations in my port of emulators/virtio-kmod
> revealed that all kernel modules on i386 Gentoo/FreeBSD have text
> relocations, yet none have them on amd64 FreeBSD, so I do not know
> whether this is a bug or a feature.
> 

First, there _are_ relocations against text in the amd64 modules, but I
suspect that your scripts do not detect this. Most likely, scripts look
for DT_TEXTREL dynamic tag, and tags are only present in the executables
or shared objects, not in the object files. The amd64 modules are object
files, so you just mis-interpret the situation.

Second, from what you wrote, I see the issue in either wrong policy
being established in your project, or (another) mis-interpretation of
the policy. Indeed, having text relocations in the shared objects is
bad, because said relocations hinder text pages sharing. Relocated page
is modified, so COW mechanism causes it to become private to process.

On the other hand, there is only one instance of the loaded kernel module,
its text segment (or section, for amd64) is not shared, so modifications
to the text pages do not cause increased memory use. More, not compiling
modules with -fPIC (absence of -fPIC is what makes the text relocations to
appear in the final link result) makes the code faster, esp. on i386.

So, there is nothing to report, and fix is outside the FreeBSD domain:
either fix your policy by not stating that text relocation in kernel
module is banned, or just find that policy only applicable to usermode
objects.


pgpc8leWKmOO5.pgp
Description: PGP signature


Re: Text relocations in kernel modules

2012-03-30 Thread Konstantin Belousov
On Fri, Mar 30, 2012 at 06:34:55PM -0400, Richard Yao wrote:
> On 03/30/12 16:36, Konstantin Belousov wrote:
> > First, there _are_ relocations against text in the amd64 modules, but I
> > suspect that your scripts do not detect this. Most likely, scripts look
> > for DT_TEXTREL dynamic tag, and tags are only present in the executables
> > or shared objects, not in the object files. The amd64 modules are object
> > files, so you just mis-interpret the situation.
> 
> readelf is a part of binutils. It is not a script. Here is the version
> that Gentoo/FreeBSD uses:
So you completely missed what I told you.

> 
> # readelf --version
> GNU readelf (GNU Binutils) 2.20.1.20100303
> Copyright 2009 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or (at your option) any later
> version.
> This program has absolutely no warranty.
> 
> In addition, this is what it says when I ask it to look at virtio_blk.ko:
> 
> # readelf -d /boot/modules/virtio_blk.ko
> 
> Dynamic section at offset 0x2f6c contains 13 entries:
>   TagType Name/Value
>  0x0004 (HASH)   0xd4
>  0x6ef5 (GNU_HASH)   0x480
>  0x0005 (STRTAB) 0x9d0
>  0x0006 (SYMTAB) 0x4e0
>  0x000a (STRSZ)  1295 (bytes)
>  0x000b (SYMENT) 16 (bytes)
>  0x0011 (REL)0xee0
>  0x0012 (RELSZ)  1664 (bytes)
>  0x0013 (RELENT) 8 (bytes)
>  0x0016 (TEXTREL)0x0
>  0x001e (FLAGS)  TEXTREL
>  0x6ffa (RELCOUNT)   87
>  0x (NULL)   0x0
> 
> Running the same command on amd64 FreeBSD's version returns nothing. I
> have attached the result of `readelf -a ...` on both the i386 version
> and the amd64 version.
Reread what I wrote to you. Also, it pays off learning how ELF works
before making conclusion from the absence of the output of readelf -d.
Amd64 modules _are not_ shared objects.

> 
> > Second, from what you wrote, I see the issue in either wrong policy
> > being established in your project, or (another) mis-interpretation of
> > the policy. Indeed, having text relocations in the shared objects is
> > bad, because said relocations hinder text pages sharing. Relocated page
> > is modified, so COW mechanism causes it to become private to process.
> 
> I believe that relocations also cause the linker to work harder when the
> modules themselves are loaded the first time. They can also cause bugs
> when code is ported to another architecture.
I can only answer that this is your fantasy, however ample.
I see that conversation is going nowhere, I will not reply further.

> 
> > On the other hand, there is only one instance of the loaded kernel module,
> > its text segment (or section, for amd64) is not shared, so modifications
> > to the text pages do not cause increased memory use. More, not compiling
> > modules with -fPIC (absence of -fPIC is what makes the text relocations to
> > appear in the final link result) makes the code faster, esp. on i386.
> 
> Compiling with -fPIC breaks the build.
> 
> > So, there is nothing to report, and fix is outside the FreeBSD domain:
> > either fix your policy by not stating that text relocation in kernel
> > module is banned, or just find that policy only applicable to usermode
> > objects.
> 
> Linux has no such text relocations in its modules. I have checked on
> both i386 and amd64. I have difficulty believing that FreeBSD needs text
> relocations when Linux does not.
> 
> I am fairly certain that this is going to interfere with ASLR in the
> kernel, which is a security issue. It is definitely something to report.


pgphVz6PK3bYZ.pgp
Description: PGP signature


Re: kernel panic while detecting cpu in FreeBSD 9

2012-04-17 Thread Konstantin Belousov
On Tue, Apr 17, 2012 at 12:52:28PM -0400, Chad C wrote:
> 
> Hello,
> I posted to the FreeBSD forum and was told to seek help on the stable 
> mailing list.  I recently build a new system and attempted to install 
> FreeBSD 9 amd64 using the dvd.  Shortly after the boot loader menu while 
> the kernel attempts to detect the processor cores I receive a kernel 
> trap 12 error.  Kernel trap messages begin scrolling non stop but I was 
> able to get a picture when they paused and here is the text:
> 
> 
> kernel trap 12 with interrupts disabled
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x18
> fault code= supervisor read data, page not present
> instruction pointer= 0x20:0x80823368
> stack pointer= 0x28:0x811a5030
> frame pointer= 0x28:0x811a5070
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= resume, IOPL = 0
> current process= 0 ()
> trap number= 12
> panic: page fault
> cpuid = 0
> 
> 
> The first suggestion from a forum poster was bad memory but I swapped 
> out the memory and still received the panics.  Also tested the memory 
> with memtest86+ and the bios memory test feature.  Both reported no 
> errors.  I finally was able to get it to boot and install by breaking to 
> the loader prompt and typing "kern.smp.disabled=1".  But the installed 
> system also panics at the same point during boot and the only way to get 
> it to boot is by disabling smp at the loader prompt.
> 
> I recompiled the kernel with "options DDB" and and tried tracing the 
> problem, but since it happens so soon into the boot process I cannot get 
> a crash dump.  Here is some of my system hardware if that helps as well:
> MSI P67A-GD65 (B3) mainboard
> intel core i5-2500K (quad core)
> G.SKILL Ripjaws X Series 8GB (2 x 4GB) DDR3 1600
ddb is not for getting crash dump, it is for backtrace.
System with ddb present should print out the stack trace of the
panic location. Please post this information verbatim.

> 
> My original FreeBSD forum post is at: 
> http://forums.freebsd.org/showthread.php?t=31156 if that helps as well.  
> The last forum poster suggest the problem might be in 
> sys/amd64/amd64/mp_machdep.c and redirected me here.
> 
> Thanks for any assistance,
> 
> -Chad
> 
> 
> 
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pgpkcS3Rhv7HJ.pgp
Description: PGP signature


Re: kernel panic while detecting cpu in FreeBSD 9

2012-04-18 Thread Konstantin Belousov
On Wed, Apr 18, 2012 at 12:24:17AM -0400, Chad C wrote:
> Updated the mainboard UEFI to latest version but am still getting the 
> same kernel page fault.
> 
> I compiled "options DDB" and "options GDB" into the generic kernel.  
> After rebooting and entering DDB I get the following from various commands:
> 
> On bootup I am now getting the following:
> 
> real memory = 8589934592 (8192 MB)
> avail memory = 8198332416 (7818 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: 
> panic: AP #2 (PHY# 4) failed!
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at KDB_backtrace+0x37
> panic() at panic+0x187
> cpu_mp_start() at cpu_mp_start+0x589
> mp_start() at mp_start+0x85
> mi_startup() at mi_startup+0x77
> btext() at btext+0x2c
> KDB: enter: panic
> [ thread pid 0 tid 0 ]
> Stopped atkdb_enter+0x3b: movq$0,0x905dc2(%rip)
> db>

So your other core failed to start. You might try a lack posting exact
model/bios version of the machine and mainboard.

But indeed, this is most often BIOS bugs. Sometimes in the strange areas
like USB, e.g. SMI handler for emulating legacy PS/2 keyboard. As a shot
in the dark, try to fiddle with this setting.


pgpkNyhmBXRh6.pgp
Description: PGP signature


Re: Unable to build RELENG_9 (r234602)

2012-04-24 Thread Konstantin Belousov
On Wed, Apr 25, 2012 at 08:26:28AM +0200, Dimitry Andric wrote:
> On 2012-04-25 04:32, Alie Tan wrote:
> > I got this compilation error for 9-STABLE
> > 
> > -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality
> > -Wno-unused-function -Wno-conversion -Wno-switch-enum -Wno-empty-body -c
> > /usr/src/usr.sbin/sysinstall/dispatch.c
> > /usr/src/usr.sbin/sysinstall/dispatch.c:594:17: error: format string is not
> > a string literal (potentially insecure) [-Werror,-Wformat-security]
> > msgConfirm(err);
> > 
> 
> 9-STABLE doesn't compile without -Werror warnings with clang yet.  This
> sysinstall warning is specifically one that must still be fixed, but
> since sysinstall was removed from HEAD, it cannot be MFC'd.
Commit directly into stable branch ?

> 
> For now, just put NO_WERROR= in your src.conf.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pgpvQZurBvfN7.pgp
Description: PGP signature


Re: Restricting users from certain privileges

2012-04-28 Thread Konstantin Belousov
On Sat, Apr 28, 2012 at 11:29:58AM +0200, Dimitry Andric wrote:
> On 2012-04-28 09:50, Zenny wrote:
> > On Sat, Apr 28, 2012 at 9:38 AM, Daniel Braniss  wrote:
> ...
> >> try sudo from ports, security/sudo
> > Thanks Daniel, but sudo gives all (not selective) root privileges to the
> > user (admin in my case).
> 
> This isn't true.  With sudo, you can give specific users, or groups of
> users, restricted lists of commands they can run, and even specify on
> which particular machines they can be run.
Sure, but if the allowed commands were not specifically designed to
be run with elevated privileges, you typically give the user ability
to run any command with elevated privileges.

Even specially designed commands sometimes give away much more power
then intended.


pgpvd54jgZVYf.pgp
Description: PGP signature


Re: i386 -march=xxx behavior [Was: FreeBSD 8 i386 gptboot corrupt - SOLVED]

2012-05-11 Thread Konstantin Belousov
On Fri, May 11, 2012 at 09:37:10AM +0300, Andriy Gapon wrote:
> on 09/05/2012 15:09 Alfred Bartsch said the following:
> > Am 09.05.2012 12:42, schrieb Andriy Gapon:
> >> on 09/05/2012 12:29 Alfred Bartsch said the following:
> >>> This behavior is restricted to 32-bit servers (i386), all 64-bit 
> >>> servers (amd64) work without any problem, as expected.
> >>> 
> >>> After some analyzing, it seems to me that the actual size of gptboot
> >>> does matter (16723 bytes, >16kB). In amd64 environment (same source
> >>> version) the actual size of /boot/gptboot is only 15443 bytes.
> > 
> >> Weird.  Both amd64 and i386 builds should produce the same binaries as
> >> the boot code is built with -m32 -march=i386 on amd64. But I can 
> >> reproduce this, so it seems that the compilation is indeed done 
> >> differently.
> > 
> >> Heh, it seems that it is -march=i386 flag that makes all the difference.
> >> Maybe we should use this flag even when doing native i386 builds...
> > 
> > 
> > after adding "-march=i386" to CFLAGS in Makefile everything looks ok 
> > (filesize: 15443, as you predicted), so I would opt for using this flag in
> > the future.
> 
> Here is a small investigation into the -march flag.  Not sure if it is of any
> practical significance, I just was curious.
> 
> First, seems that neither of i386/i486/i586/i686 values for this flag nor
> absence of it implies features like MMX, SSE, and so on.  (Saying this because
> of some assumptions about i686)
> 
> For the base GCC specifying -march with the above values is equivalent to
> specifying -mtune with the same values, when mtune is not explicitly set.
> Using "i686" or omitting the flag is equivalent to -mtune=generic.
> 
> Note that this happens despite a FreeBSD-specific change to (base) GCC that
> makes i486 a default arch.  Derivation of the tune value from the arch value,
> if any, or defaulting it otherwise is done earlier than defaulting of the arch
> value.
> Specifically I am talking about the block that deals with ix86_tune_string
> that precedes the block for ix86_arch_string.
> 
> So it seems that at the moment our sys/boot code is effectively compiled with
> -mtune=generic for i386 target (amd64 target has an explicit -march=i386 - I
> wonder why not i486).
> 
> I think that in terms of instructions repertoire the difference is only in
> availability of cmpxchg, cmpxchg8b, and xadd instructions (ignoring the
> "system" instructions that should not be generated by a compiler from C code).
>  And I guess that the sys/boot code is simple enough to not require these
> instructions?
> Otherwise, mtune seems to affect layout of the generated code and preference
> for some instructions over others.
> 
> Again, not sure what conclusions can be made...

-march=i686 also turns on use of cmov*.


pgp3BJUzL3DS5.pgp
Description: PGP signature


Re: i386 binaries on amd64: ldconfig problems

2012-05-31 Thread Konstantin Belousov
On Wed, May 30, 2012 at 06:15:59PM +0200, Oliver Fromme wrote:
> Hi,
> 
> I've recently migrated my workstation from i386 to amd64
> (finally, because I neeed to go beyond 4 GB RAM).  The
> transition went smoothly so far, except for one thing:
> I need to use several old i386 binaries, which all work
> well except for one:  olvwm.
> 
> $ uname -rsm
> FreeBSD 8.3-STABLE-20120528 amd64
> 
> $ olvwm
> /libexec/ld-elf.so.1: /usr/local/lib/libXpm.so.4: unsupported file layout
> 
> $ file olvwm
> olvwm: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD),
> dynamically linked (uses shared libs), for FreeBSD 8.2 (802502), stripped
> 
> $ ldd olvwm
> olvwm:
> libXpm.so.4 => not found (0x0)
> libolgx.so.3 => /usr/local/lib32/compat/libolgx.so.3 (0x280d1000)
> libXext.so.6 => not found (0x0)
> libX11.so.6 => not found (0x0)
> libm.so.5 => /usr/lib32/libm.so.5 (0x280df000)
> libc.so.7 => /usr/lib32/libc.so.7 (0x280f9000)
> 
> $ ldconfig -32 -r | head -2
> /var/run/ld-elf32.so.hints:
> search directories: 
> /usr/lib32:/usr/local/lib32:/usr/local/lib32/compat
> 
> $ ldconfig -32 -r | egrep 'libXpm|libXext|libX11'
> 190:-lXpm.4 => /usr/local/lib32/compat/libXpm.so.4
> 192:-lXext.6 => /usr/local/lib32/compat/libXext.so.6
> 193:-lX11.6 => /usr/local/lib32/compat/libX11.so.6
> 
> So, the 32bit libraries are there, ldconfig knows about them,
> but the runtime linker does not, apparently.
> 
> Interestingly, it works when I force the library path (this
> is currently the work-around that I'm using):
> 
> LD_32_LIBRARY_PATH=/usr/local/lib32/compat ldd olvwm
> olvwm:
> libXpm.so.4 => /usr/local/lib32/compat/libXpm.so.4 (0x280d1000)
> libolgx.so.3 => /usr/local/lib32/compat/libolgx.so.3 (0x280e1000)
> libXext.so.6 => /usr/local/lib32/compat/libXext.so.6 (0x280ef000)
> libX11.so.6 => /usr/local/lib32/compat/libX11.so.6 (0x280fe000)
> libm.so.5 => /usr/lib32/libm.so.5 (0x28216000)
> libc.so.7 => /usr/lib32/libc.so.7 (0x2823)
> libxcb.so.2 => /usr/local/lib32/compat/libxcb.so.2 (0x2834b000)
> libXau.so.6 => /usr/local/lib32/compat/libXau.so.6 (0x28362000)
> libXdmcp.so.6 => /usr/local/lib32/compat/libXdmcp.so.6 (0x28365000)
> libpthread-stubs.so.0 => 
> /usr/local/lib32/compat/libpthread-stubs.so.0 (0x2836a000)
> librpcsvc.so.5 => /usr/lib32/librpcsvc.so.5 (0x2836c000)
> 
> But actually I shouldn't have to use LD_32_LIBRARY_PATH.
> I mean, it's ldconfig's job to configure the directories for
> locating the libraries.
> 
> What is wrong here?
The library search order is LD_{32}_LIBRARY_PATH, then DT_RPATH from
the binary, then hints, then /lib:/usr/lib. So if rpath of the binary
contains /usr/local/lib, you get /usr/local/lib before hints.

Rtld uses only the search path from the hints file. When a library with
the matched name found, rtld tries to load it. Regardless of the result
of the load attempt, further components of the search path list are not
tried.

Look at the olvwm binary with readelf and see whether DT_RPATH specifies
/usr/local/lib.


pgpHz48s8hXFn.pgp
Description: PGP signature


Re: i386 binaries on amd64: ldconfig problems

2012-05-31 Thread Konstantin Belousov
On Thu, May 31, 2012 at 05:28:42PM +0700, Eugene Grosbein wrote:
> 31.05.2012 16:58, Konstantin Belousov writes:
> 
> >> But actually I shouldn't have to use LD_32_LIBRARY_PATH.
> >> I mean, it's ldconfig's job to configure the directories for
> >> locating the libraries.
> >>
> >> What is wrong here?
> > The library search order is LD_{32}_LIBRARY_PATH, then DT_RPATH from
> > the binary, then hints, then /lib:/usr/lib. So if rpath of the binary
> > contains /usr/local/lib, you get /usr/local/lib before hints.
> > 
> > Rtld uses only the search path from the hints file. When a library with
> > the matched name found, rtld tries to load it. Regardless of the result
> > of the load attempt, further components of the search path list are not
> > tried.
> > 
> > Look at the olvwm binary with readelf and see whether DT_RPATH specifies
> > /usr/local/lib.
> 
> I've faced exactly same problem. What can be done other to rebuild
> of all such 32bit bit binaries to make them work for transition period?
> Should libmap32.conf help? It seems it does not.
No idea.

The presence of rpath in the binary indicates self-inflicted damage.
Just do not specify -rpath for linking. If you have such broken binary,
use LD_LIBRARY_PATH to override.

In fact, the ELF standard requires that DT_RPATH is not overridable by
LD_LIBRARY_PATH env variable, but DT_RUNPATH is. Currently our rtld
interpretes both DT_RPATH and DT_RUNPATH as overridable, thus violating
the standard and diverging from other ELF platforms.

Dragonfly fixed this.


pgpsmTEJCvx4r.pgp
Description: PGP signature


Re: [releng_9 tinderbox] failure on powerpc64/powerpc

2012-06-15 Thread Konstantin Belousov
On Fri, Jun 15, 2012 at 10:08:55PM +, FreeBSD Tinderbox wrote:
> mmu_oea64.o:(.got+0x90): undefined reference to `elf32_nxstack'
> *** Error code 1
> 
> Stop in /obj/powerpc.powerpc64/src/sys/LINT.
> *** Error code 1

Should be fixed in r237150, sorry for the breakage.


pgpeueXgUXlkL.pgp
Description: PGP signature


Re: acpidump -dt broken in 9 stable

2012-06-16 Thread Konstantin Belousov
On Fri, Jun 15, 2012 at 09:57:45PM -0700, mnln.l4 wrote:
> Just upgrade from 9.0 to 9 stable.
> 
> `acpidump -dt` shows error message "realpath tmp file: No such file or
> directory"
> 
> It is related to the recent change made to realpath(3)

This was a bug/specific operation in acpidump relying on non-conforming
realpath(3) behaviour. The r235948 should be merged.


pgpqpVC86XDQr.pgp
Description: PGP signature


Re: acpidump -dt broken in 9 stable

2012-06-18 Thread Konstantin Belousov
On Mon, Jun 18, 2012 at 01:32:53PM -0400, Jung-uk Kim wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 2012-06-16 08:34:45 -0400, Konstantin Belousov wrote:
> > On Fri, Jun 15, 2012 at 09:57:45PM -0700, mnln.l4 wrote:
> >> Just upgrade from 9.0 to 9 stable.
> >> 
> >> `acpidump -dt` shows error message "realpath tmp file: No such
> >> file or directory"
> >> 
> >> It is related to the recent change made to realpath(3)
> > 
> > This was a bug/specific operation in acpidump relying on
> > non-conforming realpath(3) behaviour. The r235948 should be
> > merged.
> 
> Committed as r237232.  Thanks for letting me know.

Thank you for handling this.


pgp2iuNit9KsA.pgp
Description: PGP signature


Re: KMS on Sandy bridge error device_attach

2012-06-22 Thread Konstantin Belousov
On Fri, Jun 22, 2012 at 07:44:52PM +0200, Thomas Zander wrote:
> Hi,
> 
> I just updated my world to try kms which has recently been merged into
> stable. However I get this when kldload'ing i915kms:
> 
> drmn0:  on vgapci0
> info: [drm] MSI enabled 1 message(s)
> error: [drm:pid1295:drm_load] *ERROR* Card isn't AGP, or couldn't
> initialize AGP.
> device_attach: drmn0 attach returned 12
> 
> CPU is this model:
> 
> CPU: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz (2400.07-MHz K8-class CPU)
>   Origin = "GenuineIntel"  Id = 0x206a7  Family = 6  Model = 2a  Stepping = 7
> 
> Is this model supposed to work with the current code?

Show pciconf -lv output. Also show the dmesg from verbose boot.


pgp6Fw6vrbXIq.pgp
Description: PGP signature


Re: KMS on Sandy bridge error device_attach

2012-06-22 Thread Konstantin Belousov
On Fri, Jun 22, 2012 at 08:28:29PM +0200, Thomas Zander wrote:
> Hello Konstantin,
Do not strip lists from Cc:, I am not your tech support.

> 
> On Fri, Jun 22, 2012 at 7:51 PM, Konstantin Belousov
>  wrote:
> >> CPU: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz (2400.07-MHz K8-class CPU)
> >>   Origin = "GenuineIntel"  Id = 0x206a7  Family = 6  Model = 2a  Stepping 
> >> = 7
> >>
> >> Is this model supposed to work with the current code?
> >
> > Show pciconf -lv output. Also show the dmesg from verbose boot.
> 
> Thank you for your quick response. Output is attached.
> This is probably one of the relatively rarely sold CPUs, so I might
> have an edge case here...
Yes, indeed, this is something Intel calls 'SandyBridge server Integrated
Graphics'. The device id is known to agp driver, but probably it failed
to attach due to some (mis)interpretation of the state.

Your dmesg is not complete, the hda output displaced the previous
messages which contained the agp attach diagnostic. Take agp or hda
out from kernel, and rerun the test. Or, increase MSGBUF_SIZE, see
conf/NOTES for description.


pgpQJrWzvyvO2.pgp
Description: PGP signature


Re: KMS on Sandy bridge error device_attach

2012-06-23 Thread Konstantin Belousov
On Sat, Jun 23, 2012 at 09:25:40AM +0200, Thomas Zander wrote:
> On Fri, Jun 22, 2012 at 8:38 PM, Konstantin Belousov
>  wrote:
> >> Thank you for your quick response. Output is attached.
> >> This is probably one of the relatively rarely sold CPUs, so I might
> >> have an edge case here...
> > Yes, indeed, this is something Intel calls 'SandyBridge server Integrated
> > Graphics'. The device id is known to agp driver, but probably it failed
> > to attach due to some (mis)interpretation of the state.
> >
> > Your dmesg is not complete, the hda output displaced the previous
> > messages which contained the agp attach diagnostic. Take agp or hda
> > out from kernel, and rerun the test. Or, increase MSGBUF_SIZE, see
> > conf/NOTES for description.
> 
> Okay, thanks again. As you suggested, I have removed a few kernel
> modules from the boot process and it seems that the verbose dmesg
> output now fits into the buffer, see attached dmesg output.
> However I don't see more GPU attach diagnostic. Do I miss something?
Ok, but you did not tried to load i915kms, at least the dmesg you posted
lacks an indication.

Let me repeat: I need to see the lines related to the agp probe and
attachment, I believe that it will show us the next direction to
investigate.


pgphMGOfie5e6.pgp
Description: PGP signature


Re: KMS on Sandy bridge error device_attach

2012-06-23 Thread Konstantin Belousov
On Sat, Jun 23, 2012 at 01:38:35PM +0200, Thomas Zander wrote:
> On Sat, Jun 23, 2012 at 10:59 AM, Konstantin Belousov
>  wrote:
> > Ok, but you did not tried to load i915kms, at least the dmesg you posted
> > lacks an indication.
> 
> Actually, it did. i915kms was in loader.conf.
> 
> dmesg, line 37f:
> Preloaded elf obj module "/boot/kernel/i915kms.ko" at 0x8113a820.
> Preloaded elf obj module "/boot/kernel/drm2.ko" at 0x8113ae88.
> ...
> dmesg, lines 482-489:
> vgapci0:  port 0xf000-0xf03f mem
> 0xfb40-0xfb7f,0xd000-0xdfff irq 16 at device 2.0 on
> pci0
> drmn0:  on vgapci0
> vgapci0: attempting to allocate 1 MSI vectors (1 supported)
> msi: routing MSI IRQ 264 to local APIC 0 vector 59
> vgapci0: using IRQ 264 for MSI
> info: [drm] MSI enabled 1 message(s)
> error: [drm:pid0:drm_load] *ERROR* Card isn't AGP, or couldn't initialize AGP.
> device_attach: drmn0 attach returned 12
Do you have agp.ko loaded from loader.conf ? If not, what happens
if you add it there ?

The i915kms does have a dependency on agp, so it should have been
auto-loaded if not loaded explicitely. And agp should be probed at
least.

Loading i915kms from loader.conf is something I do not encourage
right now, since you loose VGA console somewhere during the kernel
startup.
> 
> > Let me repeat: I need to see the lines related to the agp probe and
> > attachment, I believe that it will show us the next direction to
> > investigate.
> 
> I did understand what you were after, but sorry, there is nothing more
> in the dmesg output. Do I need to perform additional steps besides
> verbose boot to obtain this data?

Hmm, I probably see an issue. Please try the patch below.

diff --git a/sys/dev/agp/agp_i810.c b/sys/dev/agp/agp_i810.c
index a181ad7..c0f592c 100644
--- a/sys/dev/agp/agp_i810.c
+++ b/sys/dev/agp/agp_i810.c
@@ -700,7 +700,7 @@ static const struct agp_i810_match {
.driver = &agp_i810_sb_driver
},
{
-   .devid = 0x01088086,
+   .devid = 0x010a8086,
.name = "SandyBridge server IG",
.driver = &agp_i810_sb_driver
},


pgpq164jYhyOl.pgp
Description: PGP signature


Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov
On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> On Thu, Feb 4, 2021 at 1:31 PM Alan Somers  wrote:
> >
> > After upgrading a machine to FreeBSD, 12.2, it hit the following panic on
> > its first reboot.  I suspect that a few other servers have hit this too,
> > but since it happens before swap is mounted there are no core dumps, and
> > they usually reboot immediately.  The code in question hasn't changed since
> > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody have
> > any suggestions for how I could debug further?  I can't readily reproduce
> > it, and I can't dump core, but I'd like to investigate it any way I can.
> > The server in question has dual Xeon Gold 6142 CPUs.
> >
> 
> I can't actually help :( but I can add a +1  with similar hardware or
> equivalent specs. It's not frequent, but it's often enough to be
> annoying.
> -M
> 
> > if (!(ctl & MC_CTL2_CMCI_EN))
> > /* This bank does not support CMCI. */
> > return;
> >
> > cc = &cmc_state[PCPU_GET(cpuid)][i];// <- panic here
> >
> > /* Determine maximum threshold. */
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 26; apic id = 34
> > fault virtual address = 0xd0
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0x8125a009
> > stack pointer= 0x28:0xfeb65f20
> > frame pointer= 0x28:0xfeb65f50
> > code segment = base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = resume, IOPL = 0
> > current process = 11 (idle: cpu26)
> > trap number = 12
> > panic: page fault
> > cpuid = 26
> > time = 1
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfeb65be0
> > vpanic() at vpanic+0x17b/frame 0xfeb65c30
> > panic() at panic+0x43/frame 0xfeb65c90
> > trap_fatal() at trap_fatal+0x391/frame 0xfeb65cf0
> > trap_pfault() at trap_pfault+0x4f/frame 0xfeb65d40
> > trap() at trap+0x286/frame 0xfeb65e50
> > calltrap() at calltrap+0x8/frame 0xfeb65e50
> > --- trap 0xc, rip = 0x8125a009, rsp = 0xfeb65f20, rbp =
> > 0xfeb65f50 ---
> > _mca_init() at _mca_init+0x5d9/frame 0xfeb65f50
> > init_secondary_tail() at init_secondary_tail+0xfd/frame 0xfeb65f80
> > init_secondary() at init_secondary+0x2d1/frame 0xfeb65ff0
> > KDB: enter: panic
> > [ thread pid 11 tid 100029 ]
> > Stopped at  kdb_enter+0x37: movq$0,0x12bc1f6(%rip)

Try this.

I think that there is no other dependencies in the startup order, but
cannot know it for sure.

commit 19584e3d3e9606d591fa30999b370ed758960e8c
Author: Konstantin Belousov 
Date:   Fri Feb 5 00:56:09 2021 +0200

x86: init mca before APs are started

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..e2bf2673cf69 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
 
mca_init();
 }
-SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
+SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
 
 /* Called when a machine check exception fires. */
 void
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov
On Thu, Feb 04, 2021 at 04:05:42PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 3:58 PM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers  wrote:
> > > >
> > > > After upgrading a machine to FreeBSD, 12.2, it hit the following panic
> > on
> > > > its first reboot.  I suspect that a few other servers have hit this
> > too,
> > > > but since it happens before swap is mounted there are no core dumps,
> > and
> > > > they usually reboot immediately.  The code in question hasn't changed
> > since
> > > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody
> > have
> > > > any suggestions for how I could debug further?  I can't readily
> > reproduce
> > > > it, and I can't dump core, but I'd like to investigate it any way I
> > can.
> > > > The server in question has dual Xeon Gold 6142 CPUs.
> > > >
> > >
> > > I can't actually help :( but I can add a +1  with similar hardware or
> > > equivalent specs. It's not frequent, but it's often enough to be
> > > annoying.
> > > -M
> > >
> > > > if (!(ctl & MC_CTL2_CMCI_EN))
> > > > /* This bank does not support CMCI. */
> > > > return;
> > > >
> > > > cc = &cmc_state[PCPU_GET(cpuid)][i];// <- panic here
> > > >
> > > > /* Determine maximum threshold. */
> > > >
> > > >
> > > > Fatal trap 12: page fault while in kernel mode
> > > > cpuid = 26; apic id = 34
> > > > fault virtual address = 0xd0
> > > > fault code = supervisor read data, page not present
> > > > instruction pointer = 0x20:0x8125a009
> > > > stack pointer= 0x28:0xfeb65f20
> > > > frame pointer= 0x28:0xfeb65f50
> > > > code segment = base 0x0, limit 0xf, type 0x1b
> > > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > processor eflags = resume, IOPL = 0
> > > > current process = 11 (idle: cpu26)
> > > > trap number = 12
> > > > panic: page fault
> > > > cpuid = 26
> > > > time = 1
> > > > KDB: stack backtrace:
> > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > > 0xfeb65be0
> > > > vpanic() at vpanic+0x17b/frame 0xfeb65c30
> > > > panic() at panic+0x43/frame 0xfeb65c90
> > > > trap_fatal() at trap_fatal+0x391/frame 0xfeb65cf0
> > > > trap_pfault() at trap_pfault+0x4f/frame 0xfeb65d40
> > > > trap() at trap+0x286/frame 0xfeb65e50
> > > > calltrap() at calltrap+0x8/frame 0xfeb65e50
> > > > --- trap 0xc, rip = 0x8125a009, rsp = 0xfeb65f20, rbp =
> > > > 0xfeb65f50 ---
> > > > _mca_init() at _mca_init+0x5d9/frame 0xfeb65f50
> > > > init_secondary_tail() at init_secondary_tail+0xfd/frame
> > 0xfeb65f80
> > > > init_secondary() at init_secondary+0x2d1/frame 0xfeb65ff0
> > > > KDB: enter: panic
> > > > [ thread pid 11 tid 100029 ]
> > > > Stopped at  kdb_enter+0x37: movq$0,0x12bc1f6(%rip)
> >
> > Try this.
> >
> > I think that there is no other dependencies in the startup order, but
> > cannot know it for sure.
> >
> > commit 19584e3d3e9606d591fa30999b370ed758960e8c
> > Author: Konstantin Belousov 
> > Date:   Fri Feb 5 00:56:09 2021 +0200
> >
> > x86: init mca before APs are started
> >
> > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > index 03100e77d455..e2bf2673cf69 100644
> > --- a/sys/x86/x86/mca.c
> > +++ b/sys/x86/x86/mca.c
> > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
> >
> > mca_init();
> >  }
> > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
> >
> >  /* Called when a machine check exception fires. */
> >  void
> >
> 
> I can test this patch on development servers, but so far I've only seen the
> crash on production servers.  Do you have any suggestions for how to force
> the crash, or how to test this patch besides simply making sure that my dev
> servers can boot?

The race, as I see it, is that we call mca_init() on BSP too late, so
malloc() that provides the storage for cmc_state array, could be called
too late, before one of the APs was IPIed for startup.

Patch ensures that mca_init_bsp() SYSINIT is finished before we go to
start the APs.

I do not think there is any reliable way to trigger the panic while keeping
the patch usable, except to observe enough successfull boots.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov
On Thu, Feb 04, 2021 at 05:19:43PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 4:27 PM Mark Johnston  wrote:
> 
> > On Fri, Feb 05, 2021 at 12:58:34AM +0200, Konstantin Belousov wrote:
> > > On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > > > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers 
> > wrote:
> > > > >
> > > > > After upgrading a machine to FreeBSD, 12.2, it hit the following
> > panic on
> > > > > its first reboot.  I suspect that a few other servers have hit this
> > too,
> > > > > but since it happens before swap is mounted there are no core dumps,
> > and
> > > > > they usually reboot immediately.  The code in question hasn't
> > changed since
> > > > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody
> > have
> > > > > any suggestions for how I could debug further?  I can't readily
> > reproduce
> > > > > it, and I can't dump core, but I'd like to investigate it any way I
> > can.
> > > > > The server in question has dual Xeon Gold 6142 CPUs.
> > > > >
> > > Try this.
> > >
> > > I think that there is no other dependencies in the startup order, but
> > > cannot know it for sure.
> > >
> > > commit 19584e3d3e9606d591fa30999b370ed758960e8c
> > > Author: Konstantin Belousov 
> > > Date:   Fri Feb 5 00:56:09 2021 +0200
> > >
> > > x86: init mca before APs are started
> >
> > APs only call mca_init() after they have been released by the BSP
> > though, and that happens later in SI_SUB_SMP.
> >
> > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > > index 03100e77d455..e2bf2673cf69 100644
> > > --- a/sys/x86/x86/mca.c
> > > +++ b/sys/x86/x86/mca.c
> > > @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
> > >
> > >   mca_init();
> > >  }
> > > -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> > > +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
> > >
> > >  /* Called when a machine check exception fires. */
> > >  void
> >
> 
> kib's patch causes a different problem, and this one is reproducible:
> 
>  Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x18
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0x8125762c
> stack pointer= 0x28:0x828dad90
> frame pointer= 0x28:0x828dad90
> code segment = base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = resume, IOPL = 0
> current process = 0 ()
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0x828daa50
> vpanic() at vpanic+0x17b/frame 0x828daaa0
> panic() at panic+0x43/frame 0x828dab00
> trap_fatal() at trap_fatal+0x391/frame 0x828dab60
> trap_pfault() at trap_pfault+0x4f/frame 0x828dabb0
> trap() at trap+0x286/frame 0x828dacc0
> calltrap() at calltrap+0x8/frame 0x828dacc0
> --- trap 0xc, rip = 0x8125762c, rsp = 0x828dad90, rbp =
> 0x828dad90 ---
> native_lapic_enable_cmc() at native_lapic_enable_cmc+0x1c/frame
> 0x828dad90
> _mca_init() at _mca_init+0x94c/frame 0x828dadd0
> mi_startup() at mi_startup+0xdf/frame 0x828dadf0
> btext() at btext+0x2c
> KDB: enter: panic
> [ thread pid 0 tid 0 ]
> Stopped at  kdb_enter+0x37: movq$0,0x12bc396(%rip)
> 
> If you're wondering, the panic happens at this point in
> native_lapic_enable_cmc:
> 
> apic_id = PCPU_GET(apic_id);
> KASSERT(lapics[apic_id].la_present,
>("%s: missing APIC %u", __func__, apic_id));
> lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_masked = 0;<- panic here
> lapics[apic_id].la_lvts[APIC_LVT_CMCI].lvt_active = 1;
> if (bootverbose)
> printf("lapic%u: CMCI unmasked\n", apic_id);
> }

Scratch this patch.

Do you have INVARIANTS enabled?  If not, I am curious if enabling them
would convert that rare page fault into rare "CPU %d has more MC banks"
assert.

Also might be the output of the
# for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179 
/dev/cpuctl$x; done
command will show the issue (0x179 is the MCG_CAP MSR).
You need to load cpuctl(4) if it is not loaded yet.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-04 Thread Konstantin Belousov
On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov 
> wrote:
> > Do you have INVARIANTS enabled?  If not, I am curious if enabling them
> > would convert that rare page fault into rare "CPU %d has more MC banks"
> > assert.
> >
> > Also might be the output of the
> > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > /dev/cpuctl$x; done
> > command will show the issue (0x179 is the MCG_CAP MSR).
> > You need to load cpuctl(4) if it is not loaded yet.
> >
> 
> I don't have INVARIANTS enabled, and I can't enable it on the production
> servers.  However, I can turn those three KASSERTs into VERIFYs and see
> what happens.  Here is what your command shows on the server that panicked:
> $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179
> /dev/cpuctl$x; done | uniq -c
>   16 MSR 0x179: 0x 0x0f000c14
>   16 MSR 0x179: 0x 0x0f000814

It probably explains it, but it would be more telling if you left the
output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set.

I suspect that your machine has two sockets, and processor in one socket
has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
is not quite symmetric, perhaps processors were from different bins?

If BSP is selected on reporting socket, everything boots well. If
other socket wins the BSP selection race, cmci is not initialized, but
when per-cpu mca_init() sees CMCI_P bit, it calls cmci_setup() without
allocated cmc state, because BSP did not needed it.

If I am right, then unconditionally allocating the memory is probably the
only choice there.

commit 2e2c925ac3b626edc6492a57a80f6b87895801c2
Author: Konstantin Belousov 
Date:   Fri Feb 5 04:32:05 2021 +0200

x86 mca: unconditionally allocate memory for cmc state

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..dff3f7631f5c 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1047,7 +1047,7 @@ mca_setup(uint64_t mcg_cap)
"force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0,
sysctl_mca_scan, "I", "Force an immediate scan for machine checks");
 #ifdef DEV_APIC
-   if (cmci_supported(mcg_cap))
+   if (cpu_vendor_id == CPU_VENDOR_INTEL)
cmci_setup();
else if (amd_thresholding_supported())
amd_thresholding_setup();
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-05 Thread Konstantin Belousov
On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov 
> > > wrote:
> > > > Do you have INVARIANTS enabled?  If not, I am curious if enabling them
> > > > would convert that rare page fault into rare "CPU %d has more MC banks"
> > > > assert.
> > > >
> > > > Also might be the output of the
> > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > > > /dev/cpuctl$x; done
> > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > You need to load cpuctl(4) if it is not loaded yet.
> > > >
> > >
> > > I don't have INVARIANTS enabled, and I can't enable it on the production
> > > servers.  However, I can turn those three KASSERTs into VERIFYs and see
> > > what happens.  Here is what your command shows on the server that
> > panicked:
> > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m 0x179
> > > /dev/cpuctl$x; done | uniq -c
> > >   16 MSR 0x179: 0x 0x0f000c14
> > >   16 MSR 0x179: 0x 0x0f000814
> >
> > It probably explains it, but it would be more telling if you left the
> > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit set.
> >
> 
> I didn't sort them, so the first 16 have bit 10 set and the second 16
> don't.
> 
> 
> >
> > I suspect that your machine has two sockets, and processor in one socket
> > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
> > is not quite symmetric, perhaps processors were from different bins?
> >
> 
> Could be.  Is there some MSR that reports a more specific version number?
There are CPUID %eax=1 values returned in %eax, but then it requires
some interpretation.
# cpucontrol -i 1 /dev/cpuctl$x
for $x iterating over the cpus.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-05 Thread Konstantin Belousov
On Fri, Feb 05, 2021 at 09:01:26AM -0700, Alan Somers wrote:
> On Fri, Feb 5, 2021 at 7:41 AM Konstantin Belousov 
> wrote:
> 
> > On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> > > On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov 
> > > wrote:
> > >
> > > > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <
> > kostik...@gmail.com>
> > > > > wrote:
> > > > > > Do you have INVARIANTS enabled?  If not, I am curious if enabling
> > them
> > > > > > would convert that rare page fault into rare "CPU %d has more MC
> > banks"
> > > > > > assert.
> > > > > >
> > > > > > Also might be the output of the
> > > > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m 0x179
> > > > > > /dev/cpuctl$x; done
> > > > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > > > You need to load cpuctl(4) if it is not loaded yet.
> > > > > >
> > > > >
> > > > > I don't have INVARIANTS enabled, and I can't enable it on the
> > production
> > > > > servers.  However, I can turn those three KASSERTs into VERIFYs and
> > see
> > > > > what happens.  Here is what your command shows on the server that
> > > > panicked:
> > > > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m
> > 0x179
> > > > > /dev/cpuctl$x; done | uniq -c
> > > > >   16 MSR 0x179: 0x 0x0f000c14
> > > > >   16 MSR 0x179: 0x 0x0f000814
> > > >
> > > > It probably explains it, but it would be more telling if you left the
> > > > output as is, so that we can see which CPUs have MCG_CMCI_P (10) bit
> > set.
> > > >
> > >
> > > I didn't sort them, so the first 16 have bit 10 set and the second 16
> > > don't.
> > >
> > >
> > > >
> > > > I suspect that your machine has two sockets, and processor in one
> > socket
> > > > has CPUs reporting MCG_CMCI_P, while other processor does not. Your SMP
> > > > is not quite symmetric, perhaps processors were from different bins?
> >
> 
> I found 2 other servers that exhibit the same problem: the first 16 cores
> have bit 10 set and the second 16 don't.  All 3 have dual Xeon Gold 6142
> CPUs and SuperMicro X11DPU motherboards with BIOS revision 5.12.  I have
> other examples of X11DPU motherboards that don't exhibit the problem, but
> they all have both different CPUs and different BIOS revisions.  So I can't
> be sure whether the bug follows the CPU model or the BIOS version.
I looked at the full spec update errata list for the first gen Skylake
Xeons, but did not noticed anything relevant. EDS doc does not provide
much useful info on the MSR 0x179 bit 10 either, except rewording SDM
definition.

In fact I am not sure but this bit might be writeable by software. Try
to flip the bit with cpucontrol(8). Might be it is a BIOS bug after all.

If you have Intel representative contact, or Supermicro contact, try to
engage them.  I do not have any further ideas, since spec update does not
mention the problem.

> 
> 
> > > >
> > >
> > > Could be.  Is there some MSR that reports a more specific version number?
> > There are CPUID %eax=1 values returned in %eax, but then it requires
> > some interpretation.
> > # cpucontrol -i 1 /dev/cpuctl$x
> > for $x iterating over the cpus.
> >
> 
> Apart from the Local APIC ID field, that returns the same value for all
> processors.
> 
> Your second patch doesn't cause any obvious problems on my dev system.
I hope that you would confirm that the issue is solved by it, after some
time.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-07 Thread Konstantin Belousov
On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> On Fri, Feb 5, 2021 at 10:21 AM Konstantin Belousov 
> wrote:
> 
> > On Fri, Feb 05, 2021 at 09:01:26AM -0700, Alan Somers wrote:
> > > On Fri, Feb 5, 2021 at 7:41 AM Konstantin Belousov 
> > > wrote:
> > >
> > > > On Thu, Feb 04, 2021 at 07:53:09PM -0700, Alan Somers wrote:
> > > > > On Thu, Feb 4, 2021 at 7:40 PM Konstantin Belousov <
> > kostik...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > On Thu, Feb 04, 2021 at 07:01:30PM -0700, Alan Somers wrote:
> > > > > > > On Thu, Feb 4, 2021 at 5:59 PM Konstantin Belousov <
> > > > kostik...@gmail.com>
> > > > > > > wrote:
> > > > > > > > Do you have INVARIANTS enabled?  If not, I am curious if
> > enabling
> > > > them
> > > > > > > > would convert that rare page fault into rare "CPU %d has more
> > MC
> > > > banks"
> > > > > > > > assert.
> > > > > > > >
> > > > > > > > Also might be the output of the
> > > > > > > > # for x in $(jot $(sysctl -n hw.ncpu) 0) ; do cpucontrol -m
> > 0x179
> > > > > > > > /dev/cpuctl$x; done
> > > > > > > > command will show the issue (0x179 is the MCG_CAP MSR).
> > > > > > > > You need to load cpuctl(4) if it is not loaded yet.
> > > > > > > >
> > > > > > >
> > > > > > > I don't have INVARIANTS enabled, and I can't enable it on the
> > > > production
> > > > > > > servers.  However, I can turn those three KASSERTs into VERIFYs
> > and
> > > > see
> > > > > > > what happens.  Here is what your command shows on the server that
> > > > > > panicked:
> > > > > > > $ for x in $(jot $(sysctl -n hw.ncpu) 0) ; do sudo cpucontrol -m
> > > > 0x179
> > > > > > > /dev/cpuctl$x; done | uniq -c
> > > > > > >   16 MSR 0x179: 0x 0x0f000c14
> > > > > > >   16 MSR 0x179: 0x 0x0f000814
> > > > > >
> > > > > > It probably explains it, but it would be more telling if you left
> > the
> > > > > > output as is, so that we can see which CPUs have MCG_CMCI_P (10)
> > bit
> > > > set.
> > > > > >
> > > > >
> > > > > I didn't sort them, so the first 16 have bit 10 set and the second 16
> > > > > don't.
> > > > >
> > > > >
> > > > > >
> > > > > > I suspect that your machine has two sockets, and processor in one
> > > > socket
> > > > > > has CPUs reporting MCG_CMCI_P, while other processor does not.
> > Your SMP
> > > > > > is not quite symmetric, perhaps processors were from different
> > bins?
> > > >
> > >
> > > I found 2 other servers that exhibit the same problem: the first 16 cores
> > > have bit 10 set and the second 16 don't.  All 3 have dual Xeon Gold 6142
> > > CPUs and SuperMicro X11DPU motherboards with BIOS revision 5.12.  I have
> > > other examples of X11DPU motherboards that don't exhibit the problem, but
> > > they all have both different CPUs and different BIOS revisions.  So I
> > can't
> > > be sure whether the bug follows the CPU model or the BIOS version.
> > I looked at the full spec update errata list for the first gen Skylake
> > Xeons, but did not noticed anything relevant. EDS doc does not provide
> > much useful info on the MSR 0x179 bit 10 either, except rewording SDM
> > definition.
> >
> > In fact I am not sure but this bit might be writeable by software. Try
> > to flip the bit with cpucontrol(8). Might be it is a BIOS bug after all.
> >
> > If you have Intel representative contact, or Supermicro contact, try to
> > engage them.  I do not have any further ideas, since spec update does not
> > mention the problem.
> >
> > >
> > >
> > > > > >
> > > > >
> > > > > Could be.  Is there some MSR that reports a more specific version
> > number?
> > > > There are CPUID %eax=1 values returned in %eax, but then it requires
> > > > some interpretation.
> > > > # cpucontrol -i 1 /dev/cpuctl$x
> > > > for $x iterating over the cpus.
> > > >
> > >
> > > Apart from the Local APIC ID field, that returns the same value for all
> > > processors.
> > >
> > > Your second patch doesn't cause any obvious problems on my dev system.
> > I hope that you would confirm that the issue is solved by it, after some
> > time.
> >
> 
> Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on all
> processors.  I don't have strong opinions about whether we should commit
> kib's patch too.  Kib, what do you think?

The patch causes some memory over-use.

If this issue is not too widely experienced, I prefer to not commit the patch.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-08 Thread Konstantin Belousov
On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on 
> > > all
> > > processors.  I don't have strong opinions about whether we should commit
> > > kib's patch too.  Kib, what do you think?
> > 
> > The patch causes some memory over-use.
> > 
> > If this issue is not too widely experienced, I prefer to not commit the 
> > patch.
> 
> Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> anything?
> 
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..0619a41b128 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1070,6 +1070,13 @@ cmci_monitor(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /*
> +  * It is possible for some APs to report CMCI support even if the BSP
> +  * does not, apparently due to a BIOS bug.
> +  */
> + if (cmc_state == NULL)
> + return;
> +
>   ctl = rdmsr(MSR_MC_CTL2(i));
>   if (ctl & MC_CTL2_CMCI_EN)
>   /* Already monitored by another CPU. */
> @@ -1114,6 +1121,10 @@ cmci_resume(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /* See cmci_monitor(). */
> + if (cmc_state == NULL)
> + return;
> +
>   /* Ignore banks not monitored by this CPU. */
>   if (!(PCPU_GET(cmci_mask) & 1 << i))
>   return;
I think something should be printed in this case, at least once.
I believe printf() already works, because spin locks do.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-08 Thread Konstantin Belousov
On Mon, Feb 08, 2021 at 10:48:46AM -0500, Mark Johnston wrote:
> On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote:
> > On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit 
> > > > > on all
> > > > > processors.  I don't have strong opinions about whether we should 
> > > > > commit
> > > > > kib's patch too.  Kib, what do you think?
> > > > 
> > > > The patch causes some memory over-use.
> > > > 
> > > > If this issue is not too widely experienced, I prefer to not commit the 
> > > > patch.
> > > 
> > > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> > > anything?
> > > 
> > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > > index 03100e77d45..0619a41b128 100644
> > > --- a/sys/x86/x86/mca.c
> > > +++ b/sys/x86/x86/mca.c
> 
> > I think something should be printed in this case, at least once.
> > I believe printf() already works, because spin locks do.
> 
> Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
> Access to the static flag is synchronized by mca_lock.
> 
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..8098bcfb4bd 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
>  static void
>  cmci_monitor(int i)
>  {
> + static bool first = true;
>   struct cmc_state *cc;
>   uint64_t ctl;
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /*
> +  * It is possible for some APs to report CMCI support even if the BSP
> +  * does not, apparently due to a BIOS bug.
> +  */
> + if (cmc_state == NULL) {
> + if (first) {
> + printf(
I would wrote if (bootverbose) printf().
Also it might be useful to report ACPI id/APIC id as well, since the data
is most likely the source for BIOS bug report.

Otherwise fine with me.

> + "AP %d reports CMCI support but the BSP does not\n",
> + PCPU_GET(cpuid));
> + first = false;
> + }
> + return;
> + }
> +
>   ctl = rdmsr(MSR_MC_CTL2(i));
>   if (ctl & MC_CTL2_CMCI_EN)
>   /* Already monitored by another CPU. */
> @@ -1114,6 +1129,10 @@ cmci_resume(int i)
>  
>   KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> + /* See cmci_monitor(). */
> + if (cmc_state == NULL)
> + return;
> +
>   /* Ignore banks not monitored by this CPU. */
>   if (!(PCPU_GET(cmci_mask) & 1 << i))
>   return;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Microcode update prevents boot

2021-02-14 Thread Konstantin Belousov
On Sun, Feb 14, 2021 at 07:16:29PM -0500, Mark Johnston wrote:
> On Sun, Feb 14, 2021 at 02:01:14PM +0100, Leon Dietrich wrote:
> > Hi there,
> > 
> > I already worked around the issue myself. I'm just writing this here in
> > case someone else may have the same issue and is seeking an answer.
> > 
> > 
> > I recently upgraded the intel cpu microcode update package. Since then
> > the boot process hang at the stage where the other cpu cores where
> > enabled (shortly after enabling acpi). In order to resolve the issue one
> > has to boot in safe mode (not single user mode!) and comment (or remove)
> > the lines enabling the cpu microcode update on boot in
> > /boot/loader.conf. One can and should reboot then.
> > 
> > After making these changes the system boots again and all cores are
> > started and SMT works as well. One should note that one's not running
> > the newer microcode (including some security-) fixes. Having
> > microcode_update_enable="YES" in /etc/rc.conf doesn't prevent booting
> > and does not cause noticeable instability.
> > 
> > For reference: Im running FreeBSD 12.1 on a supermicro embedded board
> > with intel xeon E3-1585L v5 cpus.
> > 
> > 
> > I hope someone will find this info useful.
> 
> I see that r347931 was not merged to stable/12 branch, but the lockless
> delayed invalidation changes were indeed present in 12.1.  Could you see
> if the hang persists when boot-time ucode loading is enabled and
> vm.pmap.di_locked=1 is configured?  Note that you could apply both
> configurations at the loader prompt, i.e., without having to edit
> loader.conf and boot in safe mode to revert the change.

Please check that this patch helps:

commit c0faf2999bfaad2fdcead26d59d60c9b9e01988a
Author: Konstantin Belousov 
Date:   Fri May 17 17:11:01 2019 +

Free microcode memory later.

(cherry picked from commit 8f7f38457f940798c149ae40b73e0d20672812de)

diff --git a/sys/x86/x86/ucode.c b/sys/x86/x86/ucode.c
index 93f82e37eb66..d8beeed68215 100644
--- a/sys/x86/x86/ucode.c
+++ b/sys/x86/x86/ucode.c
@@ -260,7 +260,7 @@ ucode_release(void *arg __unused)
goto restart;
}
 }
-SYSINIT(ucode_release, SI_SUB_KMEM + 1, SI_ORDER_ANY, ucode_release, NULL);
+SYSINIT(ucode_release, SI_SUB_SMP + 1, SI_ORDER_ANY, ucode_release, NULL);
 
 void
 ucode_load_ap(int cpu)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 13.0-BETA2 and slow IO

2021-02-16 Thread Konstantin Belousov
On Tue, Feb 16, 2021 at 12:54:55PM +0200, Christos Chatzaras wrote:
> 
> > On 16 Feb 2021, at 12:20, Christos Chatzaras  wrote:
> > 
> > I build a test system with 13.0-BETA2 and it's very slow with at least IO.
> > 
> > Doing "portsnap auto" takes much more time than 12.2.
> > 
> > Also when I do "rm -fr /usr/ports" with 12.2 takes 5 seconds and the same 
> > command with 13.0-BETA2 takes 100 seconds.
> > 
> > The disks are similar 4TB HDD drives on both systems.
> > 
> > Is this related to debug enabled in 13.0-BETA2?
> 
> I install 12.2 in the same system and "rm -fr /usr/ports" was fast. So it's 
> not related to hardware.
> 
> If I upgrade it to 13.0-BETA2 the same command is slow again.

Are you using UFS+SU or SU+J?  If yes, this is known and fix is planned
for BETA3 or BETA4.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Filesystem operations slower in 13.0 than 12.2

2021-03-05 Thread Konstantin Belousov
On Sat, Mar 06, 2021 at 12:27:55AM +0200, Christos Chatzaras wrote:
> I did some more tests. Finally I see similar results (with the exception of 
> one "portsnap extract" test). Also with 13.0 I can't trigger a bug that I 
> describe here:
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250576
> 
> --
> 
> Command: /usr/bin/time -l rm -fr /usr/ports /usr/src (these tests done with 
> exactly the same hardware - I upgrade 12.2p4 to 13.0-RC1 for the 2nd test)
> 
> FreeBSD 12.2p4
> 
>12.67 real 0.36 user 1.94 sys
>13.18 real 0.41 user 1.81 sys
>12.16 real 0.36 user 1.85 sys

> FreeBSD 13.0-RC1
> 
>16.71 real 0.63 user 3.02 sys
>14.53 real 0.48 user 2.98 sys
>13.97 real 0.70 user 2.85 sys
> 
> Command: /usr/bin/time -l tar xf src.tar (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>37.35 real 1.03 user 3.34 sys
> 
> FreeBSD 13.0-RC1
> 
>44.97 real 1.15 user 3.34 sys
> 
> --
> 
> Command: /usr/bin/time -l tar xf ports.tar (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>50.80 real 1.55 user 4.62 sys
> 
> FreeBSD 13.0-RC1
> 
>59.93 real 1.69 user 4.73 sys
> 
> --
> 
> 
> Command: /usr/bin/time -l portsnap extract (these tests done with 2 different 
> idle servers but with same 4TB HDDs models)
> 
> FreeBSD 12.2p4
> 
>99.45 real34.90 user59.63 sys
>   100.00 real34.91 user59.97 sys
>82.95 real35.98 user60.68 sys
> 
> FreeBSD 13.0-RC1
> 
>   217.43 real75.67 user   110.97 sys
>   125.50 real63.00 user96.47 sys
>   118.93 real62.91 user96.28 sys
I trimmed the data above to show the interesting numbers more compact.
In the portsnap results for 13RC1, the variance is too high to conclude
anything, I think.

There was (is) bugs in FreeBSD UFS SU < 13
- some LoR existed in SU code, where it needed to lock a containing directory
  to provide posix guarantees for fsync(), while owning the vnode lock.  I
  do not believe it is observable in a real-world uses
- in some situations UFS SU in < 13 did not performed necessary fsync()
  of the directory, related to the previous item
The end result was that after sucessfull fsync() followed by a system
failure e.g. power or panic, the parent directory for the synced
vnode would not be synced and the vnode dirent' is not written to the
permanent store. This volatiles posix requirement that after fsync, the
data can be read, since you plain cannot open the file.

During the development of the patch to fix both LoR and related
ommission of fsync, a mistake was made resulting in much more aggessive
syncing of directories. It was not exactly that, but approximately, on
most of metadata operations that created or removed directory entry,
the directory was fully synced. This resulted in the significant slow
down, which was eliminated around BETA4..RC1. I.e. most of fixes come to
BETA4, but minor parts were only discovered later and ready for RC1.

There are still more fsync(dir) in 13RC1 than it is in any 12, by the nature
of the bug and its fix, but the current belief is that all fsync calls left
in the flow are required for correctness.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Install of 13.0-RELEASE i386 with ZFS root hangs up

2021-05-07 Thread Konstantin Belousov
On Fri, May 07, 2021 at 09:48:07AM -0700, Freddie Cash wrote:
> On Fri, May 7, 2021 at 5:49 AM Yasuhiro Kimura  wrote:
> 
> > Does anyone succeed to install 13.0-RELEASE i386 with ZFS root?
> >
> > I tried this with VirtualBox and VMware Player on Windows with
> > following VM condition.
> >
> > * 4 CPUs
> > * 8GB memory
> > * 100GB disk
> > * Bridge mode NIC
> >
> > But in both cases, VM gets high CPU load and hangs up after I moved
> > to 'YES' at 'ZFS Configuration' menu and type return key.
> >
> > If I select UFS root installation completes successfully. So the
> > problem is specific to ZFS root.
> >
> 
> Running ZFS on 32-bit OSes is doable (although not recommended) but
> requires a lot of manual configuration and tweaking, especially around
> kernel memory and ARC usage.
> 
> You're limited to 4 GB of memory space, so you need to tune the ARC to use
> less than that.  The auto-tuning has improved a lot over the years, but you
> still need to limit the ARC size to around 2 GB (or less) to keep the
> system stable.  KVA memory space tuning shouldn't be needed anymore, but
> you can do research into that, just in case.
> 
> You can compile a custom kernel to enable PAE support, that will sometimes
> help with memory issues on i386 (and will allow you to use more than 4 GB
> of system RAM, although individual processes are still limited to 4 GB).
i386 kernel uses memory up to 24G since 13.0.

PAE only means that devices that can access full 64bit address are allowed
to avoid dma bouncing.


> 
> If you really need to, you can make ZFS work on i386.  If at all possible,
> though, you really should run it on amd64 instead.
> 
> -- 
> Freddie Cash
> fjwc...@gmail.com
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Install of 13.0-RELEASE i386 with ZFS root hangs up

2021-05-08 Thread Konstantin Belousov
On Sat, May 08, 2021 at 06:33:02PM +0700, Eugene Grosbein wrote:
> 08.05.2021 2:52, Konstantin Belousov wrote:
> 
> > i386 kernel uses memory up to 24G since 13.0.
> > 
> > PAE only means that devices that can access full 64bit address are allowed
> > to avoid dma bouncing.
> 
> Maybe you could tell something on similar topic?
> 
> There is FreeBSD 12.2-STABLE r369567 Base12 amd64 running
> with Intel Atom CPU capable of long mode and addressing 8GB RAM,
> ASRock A330ION motherboard and two memory modules installed: 4G+2GB.
> Why so small "avail memory"?
> 
> FreeBSD clang version 10.0.1 (g...@github.com:llvm/llvm-project.git 
> llvmorg-10.0.1-0-gef32c611aa2)
> CPU: Intel(R) Atom(TM) CPU  330   @ 1.60GHz (1600.03-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x106c2  Family=0x6  Model=0x1c  Stepping=2
>   
> Features=0xbfe9fbff
>   Features2=0x40e31d
>   AMD Features=0x2800
>   AMD Features2=0x1
>   TSC: P-state invariant, performance statistics
> real memory  = 6442450944 (6144 MB)
> Physical memory chunk(s):
> 0x0001 - 0x0009dfff, 581632 bytes (142 pages)
> 0x00103000 - 0x001f, 1036288 bytes (253 pages)
> 0x02b0 - 0xd8709fff, 3586170880 bytes (875530 pages)
> avail memory = 3571384320 (3405 MB)
> 
> Also http://www.grosbein.net/freebsd/dmidecode.txt

Some necromancy revealed that this CPU did not have memory controller
on-chip, it was a design from the 2008 where MCH handled memory.  It is
up to the chipset and BIOS to configure and report the memory above 4G
to OS.  As you clearly see from the SMAP printed above, BIOS does not
report anything above 4G.

Might be, look at bios settings.  No other ideas.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 9.1 Beta 1 fails to install in qemu-kvm on Gentoo Linux

2012-07-20 Thread Konstantin Belousov
On Fri, Jul 20, 2012 at 08:40:30PM -0400, Richard Yao wrote:
> On 07/20/2012 06:23 PM, Richard Yao wrote:
> > Dear FreeBSD Developers,
> > 
> > Trying to install FreeBSD 9.1 Beta 1 in qemu-kvm on Gentoo Linux fails
> > before the kernel dmesg with 'kernel trap 9 with interrupts disabled'. I
> > am running the following command:
> > 
> > qemu-system-x86_64 -drive
> > file=/dev/zvol/rpool/KVM/freebsd,if=scsi-bootorder=c
> > -cdrom/mnt/backup/isos/FreeBSD-9.1-BETA1-amd64-disc1.iso -m2048 -smp
> > 6,cores=6,threads=1,sockets=1 -curses -net
> > nic,model=e1000,macaddr=52:54:00:00:ee:04 -cpu host
> > 
> > If I use FreeBSD-9.0-RELEASE-amd64-dvd1.iso, I can do an install without
> > any problems.
> > 
> > Yours truly,
> > Richard Yao
> > 
> 
> I have an update.
> 
> 1. There is no backtrace. The only thing that I see printed after the
> boot screen with beastie is a single line:
> 
> 'kernel trap 9 with interrupts disabled'
This line is probably printed after the banner and might be CPU features
line. Is this true ? If so, show it.

> 
> 2. Verbose mode does not change it.
> 
> 3. Removing `-cpu host` fixes it. Here is an excerpt of the host's
> /proc/cpuinfo:
> 
> processor   : 0
> vendor_id   : AuthenticAMD
> cpu family  : 16
> model   : 10
> model name  : AMD Phenom(tm) II X6 1090T Processor
> stepping: 0
> microcode   : 0x1dc
> cpu MHz : 1600.000
> cache size  : 512 KB
> physical id : 0
> siblings: 6
> core id : 0
> cpu cores   : 6
> apicid  : 0
> initial apicid  : 0
> fpu : yes
> fpu_exception   : yes
> cpuid level : 6
> wp  : yes
> flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
> nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_l
> m cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
> osvw ibs skinit wdt cpb npt lbrv svm_lock nrip_save pausefilter
> bogomips: 6421.39
> TLB size: 1024 4K pages
> clflush size: 64
> cache_alignment : 64
> address sizes   : 48 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate [9]
> 
> FreeBSD 9.0 had no problems in KVM with `-cpu host` on this system, so
> this would seem to be a regression.
> 

Can you boot FreeBSD kernel on this machine bare ?

Also it could be useful to show the CPU features lines from FreeBSD 9.0,
or just full verbose dmesgs of the boots in KVM with and without -cpu host.


pgpsuVKrx50lf.pgp
Description: PGP signature


Re: FreeBSD 9.1 Beta 1 fails to install in qemu-kvm on Gentoo Linux

2012-07-21 Thread Konstantin Belousov
On Fri, Jul 20, 2012 at 10:37:29PM -0400, Richard Yao wrote:
> On 07/20/2012 08:56 PM, Konstantin Belousov wrote:
> > On Fri, Jul 20, 2012 at 08:40:30PM -0400, Richard Yao wrote:
> >> On 07/20/2012 06:23 PM, Richard Yao wrote:
> >>> Dear FreeBSD Developers,
> >>>
> >>> Trying to install FreeBSD 9.1 Beta 1 in qemu-kvm on Gentoo Linux fails
> >>> before the kernel dmesg with 'kernel trap 9 with interrupts disabled'. I
> >>> am running the following command:
> >>>
> >>> qemu-system-x86_64 -drive
> >>> file=/dev/zvol/rpool/KVM/freebsd,if=scsi-bootorder=c
> >>> -cdrom/mnt/backup/isos/FreeBSD-9.1-BETA1-amd64-disc1.iso -m2048 -smp
> >>> 6,cores=6,threads=1,sockets=1 -curses -net
> >>> nic,model=e1000,macaddr=52:54:00:00:ee:04 -cpu host
> >>>
> >>> If I use FreeBSD-9.0-RELEASE-amd64-dvd1.iso, I can do an install without
> >>> any problems.
> >>>
> >>> Yours truly,
> >>> Richard Yao
> >>>
> >>
> >> I have an update.
> >>
> >> 1. There is no backtrace. The only thing that I see printed after the
> >> boot screen with beastie is a single line:
> >>
> >> 'kernel trap 9 with interrupts disabled'
> > This line is probably printed after the banner and might be CPU features
> > line. Is this true ? If so, show it.
> 
> I do not know what the banner is. However, that line is printed
The copyright of UCB and registered trademark of FF lines are usually
referred as banner.

> immediately after the boot2 menu with Beastie. It occurs when I would
> expect to see "Copyright (c) 1992-2012 The FreeBSD Project.". I see no
> other visible characters aside from those from the boot2 menu with Beastie.
Ok.


> Here is the dmesg output from FreeBSD 9.0-RELEASE with -cpu host:

> Features=0x1783fbff
>   Features2=0x80802001
>   AMD Features=0xe6500800
>   AMD Features2=0x1f7
...
> Here is the dmesg output from FreeBSD 9.0-RELEASE without -cpu host:
...
> Features=0x1783fbfd
>   Features2=0x80802001
>   AMD Features=0x20100800
>   AMD Features2=0x67

The only difference between these two which triggers some memories is
the absence of Page1GB in !-cpu host case. Could you try to comment out
the
if ((amd_feature & AMDID_PAGE1GB) != 0)
ndm1g = ptoa(Maxmem) >> PDPSHIFT;
lines in create_pagetables() function from sys/amd64/amd64/pmap.c and
see whether resulting kernel boots on -cpu host configuration ?



pgp3LhHuPPTsc.pgp
Description: PGP signature


Re: local APIC error 0x40

2012-07-24 Thread Konstantin Belousov
On Mon, Jul 23, 2012 at 03:49:37PM -0600, Dan Allen wrote:
> Running FreeBSD 8.3 -- and updating sources on a daily base and building 
> everything -- I found a new APIC/ACPI problem introduced in the past week.
> 
> I have a Toshiba Satellite U205 with an Intel Core Duo (not a Core 2).  It 
> used to work fine with both cores but then sometime in on the road to BSD 8.0 
> the machine began hanging.  So I added to /boot/loader.conf
> 
>   hint.apic.0.disabled="1"
> 
> and the machine only had one core but it went back to being reliable.
> 
> The laptop sits idle a lot, so I also have in /etc/rc.conf
> 
>   performance_cx_lowest="LOW"
> 
> and the fans stay off unless I am doing a build.  Everything was good.
> 
> I went away on a trip last week for five days, came home, did a csup to 
> RELENG_8 and rebuilt the world, as usual, and now the fans are always running 
> full!
> 
> If I comment out hint.apic.0.disabled="1" from /boot/loader.conf and reboot, 
> the results are a mixed bag:
> 
> 1) I get my 2nd core back, and it no longer hangs! Hurray.
> 2) The fans go back to usually being off and silent.  Hurray!
> 3) I get zillions of error messages streaming saying:
> 
>   CPU0: local APIC error 0x40
>   CPU1: local APIC error 0x40
> 
> No good!
> 
> 
> I am sitting at a prompt, no X-Windows, no apps running (other than the usual 
> demons), and every few seconds I get another pair of these error messages.
> 
> 4) The error appears benign other than flooding the console.  Everything 
> works, nothing hangs, I can build the OS and everything appears fine.
> 
> So how do I get rid of these messages?  What does error 0x40 mean?

Does your system slows down with these messages ? 0x40 means that some
code tried to send IPI with interrupt number from the range of assigned
CPU faults. I believe that FreeBSD code never does that.

Is there a BIOS upgrade for your machine ?


pgp0dUHOcpb02.pgp
Description: PGP signature


Re: Latest stable/8 broken for mozilla ports

2012-07-24 Thread Konstantin Belousov
On Tue, Jul 24, 2012 at 01:41:58PM -0700, Doug Barton wrote:
> For both firefox and thunderbird I'm getting this:
> 
> firefox
> Fatal error 'locklevel <= 0' at line 98 in file 
> /frontier/svn/stable/8/lib/libthr/thread/thr_kern.c (errno = 2)
> Redirecting call to abort() to mozalloc_abort
> 
> Segmentation fault: 11 (core dumped)
> 
> This is on r238752, previous working version was r238655
> 
> thr_kern.c hasn't been updated since the last 8-release, so it would 
> seem to be something else.
> 
> Insights welcome,
> 
> Doug

Does reverting of r238715 fix your issue ?


pgppa2aWpKvGa.pgp
Description: PGP signature


Re: Panic 9 .1-PRERELEASE on HP Servers

2012-08-13 Thread Konstantin Belousov
On Sun, Aug 12, 2012 at 06:25:17PM -0700, Dennis Glatting wrote:
> http://www.pki2.com/bpbt2.JPG
> 
> I removed the ipmi driver and rebooted the machine five times without
> further problem. I won't be able to test the second machine until later
> in the  week.
> 
I suspect that your trouble could be mitigated by r239128.


pgpmcJTFTfiR6.pgp
Description: PGP signature


Re: FreeBSD 9.1-RC1 Available...

2012-08-23 Thread Konstantin Belousov
On Thu, Aug 23, 2012 at 05:41:49PM -0500, Mark Felder wrote:
> On Thu, 23 Aug 2012 12:37:04 -0500, Walter Hurry   
> wrote:
> 
> >One thing (welcome, but puzzling) which surprised me was that my
> >vboxguest.ko did *not* need to be recompiled. How did the upgrade manage
> >that?
> 
> FreeBSD has a stable ABI unlike Linux. A kernel module compiled for any  
> 9.x release should work on any other 9.x release without needing to be  
> recompiled.
This is a statement that is false at least two times, if not three.
This was a question about Kernel Binary Inteface, not Application
Binary Interface.

First, we have zero guarantees about ability to load or have a system
survive loading of the module compiled against the later kernel.

Second, we do not have real KBI definition, and KBI stability is managed
only ad-hock. E.g. VFS quite often breaks, while network or disk controllers
drivers are usually fine.

YMMV. Snobby false statements hurt the project.


pgpYJ1xbt3Q6F.pgp
Description: PGP signature


Re: FreeBSD 9.1-RC1 Available...

2012-08-24 Thread Konstantin Belousov
On Thu, Aug 23, 2012 at 12:41:03PM -0700, Peter Wemm wrote:
> * We have some "seed" tarballs of recently synced repo images around
> somewhere. I'll see where they're available.  But in a nutshell, you
> do this:
>   /home/peter/svnsync$ fetch svnmirror-base-r123456.txz
>   /home/peter/svnsync$ tar xf svnmirror-base-r123456.txz
>   /home/peter/svnsync$ svnsync file:///home/peter/svnsync/base
> and run that from cron with a lock file, probably with "-q" for quiet.
> Then you can have a local copy of the repo for offline use.  It has
> the same repo uuid so you can svn switch/relocate at will.  I
> personally on my laptop.

Why do you recommend lock file ? svnsync locks the repository on its own,
AFAIR. More, the lock is quite sticky, so died svnsync usualy require
manual intervention to allow other syncsync jobs to process.

Is there something I am not aware of that requires lock file ?


pgp54X45vIr4I.pgp
Description: PGP signature


Re: Resume broken in 8.3-PRERELEASE

2012-08-28 Thread Konstantin Belousov
On Tue, Aug 28, 2012 at 09:07:51AM +0700, Alexey Dokuchaev wrote:
> On Mon, Aug 27, 2012 at 05:34:54PM +0200, Hans Petter Selasky wrote:
> > If the USB HC is feeding too many such IRQ's it will be stuck. However,
> > if you see that "uhub_read_port_status()" is called, the kernel is at least
> > running, though it might be that some IRQ is stuck, hence the 100% CPU
> > usage. Could you try to get some IRQ stats?
> 
> Before zzz'ing:
> 
> db> show intrcnt
> irq1: atkbd0  168
> irq9: acpi0   8300
> irc12: psm0   2
> irq14: ata0   6301
> irq16: bge0 uhci3 13
> irq23: uhci0 ehci02
> cpu0: timer   7306385
> irq256: hdac0 30
> 
> After (within a minute after botched resume)
> 
> db> show intrcnt
> irq1: atkbd0  479
> irq9: cdpi0   8379
Was the output pasted verbatim ? I am curious about the irq9 name mangling
in the second paste.

> irc12: psm0   2
> irq14: ata0   6377
> irq16: bge0 uhci3 26
> irq23: uhci0 ehci05
> cpu0: timer   7731880
> irq256: hdac0 34
> 
> Not too much difference.  Anything else I might get from DDB?  Unfortunately,
> I am yet unable to save crashdump for later gdb analysis.
> 
> ./danfe
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pgpC7oJin9wVZ.pgp
Description: PGP signature


Re: Killing processes from DDB

2012-08-30 Thread Konstantin Belousov
On Thu, Aug 30, 2012 at 07:43:46AM +0100, Matt Burke wrote:
> Is it possible to forcibly kill process from DDB which are unkillable from
> userland? My understanding is the 'kill' command is effectively the same as
> the userland version, so perhaps a process could be terminated by invoking
> an OOM handler or something?
Processes can only be terminated at the safe points, where kernel code
explicitely checks for termination conditions and which are known to
not hold kernel resources.

Yes, kill command from ddb just kills the process, i.e. it sends a signal
to it, handling of which is subject of the normal signal delivering.

> 
> 
> I just had a VirtualBox instance crash and hog 100% CPU on my desktop:
> 
> mattb  36939 100.0 13.6 2577328 2276108 ??  I 6:13AM2:28.44
> /usr/local/lib/virtualbox/VirtualBox
> 
> I kill -9 it
> 
> mattb  36939 100.0 13.6 2577328 2275804 ??  T 6:13AM3:10.89
> /usr/local/lib/virtualbox/VirtualBox
> 
> Note it's moved to 'stop' state for some reason, yet is still eating 100%
> cpu time
> 
> # procstat -k 36939
>   PIDTID COMM TDNAME   KSTACK
> 36939 227509 VirtualBox   -
> 36939 227836 VirtualBox   -mi_switch
> thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
Stop state indicates that the process is stopped or being stopped. The later
is your case. The process has one thread executing exit1() kernel function,
which terminates the process. In the course of work, the function notifies
all other threads of the exiting process that they shall terminate ASAP at
the next safe point.

According to the procstat output, there is other thread in the process which
seems to execute in kernel. My guess is that it loops somewhere, not reaching
any check-points for termination.

> 
> 
> Could this be the trigger - 9.0 binary (from pkgng) against 9.1?
> 
> $ procstat -b 1 36939
>   PID COMMOSREL PATH
> 1 init   901000 /sbin/init
> 36939 VirtualBox 900044 /usr/local/lib/virtualbox/VirtualBox
> 
> 
> I couldn't even kill it with "dtrace -n 'pid$target:::' -p 36939 -l" -
> which so far has proven reliable in killing anything:
> 
> # dtrace -n 'pid$target:::' -p 2021 -l<--- unimportant proc
> Bus error: 10 (core dumped)
> # dtrace -n 'pid$target:::' -p 2044 -l<--- unimportant proc
> Bus error: 10 (core dumped)
> # dtrace -n 'pid$target:::' -p 36939 -l   <--- virtualbox hangs dtrace
> ^C
> 
> I couldn't truss the process or use gcore to get a dump, so my only option
> was a reboot.  Does anyone have any suggestions on a course of action in
> case this happens again? I can't get a kernel dump since the machine
> doesn't have enough swap (small SSDs)

The way to debug the issue is to break into ddb on console and get
a backtrace for the spinning thread, then continue, then break again
and get another backtrace. Do it several times, to see where the code
spins.

It is impossible to even start guessing what is wrong, without seeing
the backtrace.

Still, recompiling VB could be good idea, since VB kernel module uses
non-stable KPI and KBI, thus what you see might be just build issue.


pgp9gzv15xnvq.pgp
Description: PGP signature


Re: rpc.lockd exiting just after startup @r240811

2012-09-23 Thread Konstantin Belousov
On Sun, Sep 23, 2012 at 06:04:14AM -0700, David Wolfskill wrote:
> This (rpc.lockd exiting rather quickly) is happening on my home "build
> machine"; in hindsight, the first symptom I saw was from the
> cron-initiated attempt to perform "svn update" on the NFS-resident
> /usr/ports:
> 
> svn: E155036: Please see the 'svn upgrade' command
> svn: E155036: Working copy '/usr/ports' is an old development version (format 
> 12); to upgrade it, use a format 18 client, then use 
> 'tools/dev/wc-ng/bump-to-19.py', then use the current client
> 
> which, I admit, confused me a great deal for a while.
> 
> The machine in question is intended to do this every night -- and
> the preceding night, there was no problem, so while I didn't actually
> check explicitly to verify that rpc.lockd was running, /usr/ports
> did get updated.
> 
> I tried "sh -x /etc/rc.d/lockd restart", which wasn't exceptionally
> revealing, except to verify that I was trying to start rpc.lockd with no
> arguments.
> 
> So I did "ktrace -di /usr/sbin/rpc.lockd"; here's a cut/paste of the
> last page or so of the resulting kdump output:
> 
> ...
>   2336 rpc.lockd CALL  connect(0x3,0xbfbfce30,0x6a)
>   2336 rpc.lockd STRU  struct sockaddr { AF_LOCAL, /var/run/logpriv }
>   2336 rpc.lockd NAMI  "/var/run/logpriv"
>   2336 rpc.lockd RET   connect 0
>   2336 rpc.lockd CALL  sendto(0x3,0xbfbfd388,0x27,0,0,0)
>   2336 rpc.lockd GIO   fd 3 wrote 39 bytes
>"<30>Sep 23 05:38:34 rpc.lockd: Starting"
>   2336 rpc.lockd RET   sendto 39/0x27
>   2336 rpc.lockd CALL  sigaction(SIGALRM,0xbfbfdc70,0)
>   2336 rpc.lockd RET   sigaction 0
>   2336 rpc.lockd CALL  nlm_syscall(0,0x1e,0x4,0x2841d0c0)
>   2336 rpc.lockd RET   nlm_syscall -1 errno 14 Bad address
>   2336 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x2806ee0c,0xbfbfda88)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x2806ee20,0)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x2806ee0c,0xbfbfd208)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x2806ee20,0)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x2806ee0c,0xbfbfd208)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x2806ee20,0)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x2806ee0c,0xbfbfd208)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x2806ee20,0)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_BLOCK,0x2806ee0c,0xbfbfd208)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  sigprocmask(SIG_SETMASK,0x2806ee20,0)
>   2336 rpc.lockd RET   sigprocmask 0
>   2336 rpc.lockd CALL  exit(0x1)
> 
> I've attached a gzipped copy of the complete file in case that's
> of interest or use.
> 
> When lockd was last working, the machine was running:
> 
> FreeBSD freebeast.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #474 
> 240772M: Fri Sep 21 05:03:35 PDT 2012 
> r...@freebeast.catwhisker.org:/usr/obj/usr/src/sys/GENERIC  i386
> 
> Last night, it was running:
> 
> FreeBSD freebeast.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #475 
> 240811M: Sat Sep 22 05:02:51 PDT 2012 
> r...@freebeast.catwhisker.org:/usr/obj/usr/src/sys/GENERIC  i386
> 
> and after updating this morning, it is now running:
> 
> FreeBSD freebeast.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #476 
> 240856M: Sun Sep 23 05:10:13 PDT 2012 
> r...@freebeast.catwhisker.org:/usr/obj/usr/src/sys/GENERIC  i386
> 
> and that's the environment in which the above ktrace/kdump was produced.

Try to revert the r240799. If this does not help, then some digging with
gdb would be needed to see why kernel dislikes the buffer. Well, the same
digging would be needed even if the revert helps.


pgpYIOuxwDcbS.pgp
Description: PGP signature


Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-03 Thread Konstantin Belousov
On Wed, Oct 03, 2012 at 09:24:11AM -0400, Rick Macklem wrote:
> Norbert Aschendorff wrote:
> > Another logs - even with a /var/crash crash report :)
> > 
> > Please note: The /var/crash files stem from another crash than the big
> > syslog does!
> > 
> > The syslog file inside the tarball is about 4.7 MB; it contains
> > everything since the start of the crash.
> > 
> > New /var/crash files:
> > http://lbo.spheniscida.de/Files/nfs-rsync-crash-witnessII.tgz
> > New syslog file:
> > http://lbo.spheniscida.de/Files/nfs-rsync-crash-witnessIII-only-messages.tgz
> > 
> > (Both < 100 KB)
> > 
> > The used kernel is called GENERIC-OWN-WITNESS and has all three
> > WITNESS
> > options enabled and nothing else.
> > 
> I'll take a look at these later to-day.
> 
> > But I just get an idea: Should I try it without Rick's NFSv4
> > numeric-uid-gid patch? Or is that completely unrelated?
> > @Rick: Can you assure that it is impossible that the patch added this
> > bug?
> Doesn't seem likely, but I'd never guarantee that a patch isn't broken
> and/or can never have weird side effects. So, it might be worth trying
> backing the patch out and seeing if it still crashes.
> 
> Thanks for doing this testing, rick

So do you use nullfs exported mounts ? And stable ?
Can you try to remove nullfs from the set up ?

I wonder if there are any calls to VFS_FHTOVP() with LK_INTERLOCK set.
Specifically, nullfs probably does not handle LK_INTERLOCK properly both
for nullfs_vget and nullfs_fhtovp() at all.


pgpgec5aJbFku.pgp
Description: PGP signature


Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-03 Thread Konstantin Belousov
On Wed, Oct 03, 2012 at 07:26:47PM +0200, Norbert Aschendorff wrote:
> On 10/03/2012 05:54 PM, Konstantin Belousov wrote:
> > So do you use nullfs exported mounts ? And stable ?
> > Can you try to remove nullfs from the set up ?
> 
> Yes, I am using nullfs for the exports (mounting the exported
> directories to subdirectories of /srv). In the FreeBSD man pages and the
> documentation, the V4 export line in /etc/exports is often set to /, but
> as I come from Linux where /etc/exports looks completely different, I
> just followed my habits.
> 
> As I wrote this, I executed the critical operation. It's finished and
> the FreeBSD server still runs *yay* :) So it's quite likely that the
> problem stems from the nullfs mount.
> 
> I just remember an issue I already wanted to tell: Before the server
> crashed, the nfsd process ('nfsd: server') used 100% CPU (one core), but
> without progress on client side. Usually, the nfsd uses only up to 10%
> cpu time when rsyncing (values taken from htop - 100% = one used CPU).
> 
> ...aaand I just run it once more, and it worked again :)

Can you try HEAD kernel ?


pgpiMZaBSLrgM.pgp
Description: PGP signature


Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-04 Thread Konstantin Belousov
On Thu, Oct 04, 2012 at 10:22:37AM +0200, Norbert Aschendorff wrote:
> Hehe, sure, if you assist me :)
> I'm not very experienced with SVN, I'm actually a git user and I think
> it's also better if I do /not/ express my opinion on SVN here ;)
> The only actions I'm currently able to do in SVN are checkout, update,
> commit and view log and status -- so any help is appreciated :P

I merged the changes for you, try the patch below.

Index: .
===
--- .   (revision 241191)
+++ .   (working copy)

Property changes on: .
___
Modified: svn:mergeinfo
   Merged /head/sys:r240283-240285
Index: fs
===
--- fs  (revision 241191)
+++ fs  (working copy)

Property changes on: fs
___
Modified: svn:mergeinfo
   Merged /head/sys/fs:r240285
Index: fs/nullfs/null.h
===
--- fs/nullfs/null.h(revision 241191)
+++ fs/nullfs/null.h(working copy)
@@ -56,6 +56,7 @@
 int nullfs_init(struct vfsconf *vfsp);
 int nullfs_uninit(struct vfsconf *vfsp);
 int null_nodeget(struct mount *mp, struct vnode *target, struct vnode **vpp);
+struct vnode *null_hashget(struct mount *mp, struct vnode *lowervp);
 void null_hashrem(struct null_node *xp);
 int null_bypass(struct vop_generic_args *ap);
 
Index: fs/nullfs/null_subr.c
===
--- fs/nullfs/null_subr.c   (revision 241191)
+++ fs/nullfs/null_subr.c   (working copy)
@@ -67,7 +67,6 @@
 static MALLOC_DEFINE(M_NULLFSHASH, "nullfs_hash", "NULLFS hash table");
 MALLOC_DEFINE(M_NULLFSNODE, "nullfs_node", "NULLFS vnode private part");
 
-static struct vnode * null_hashget(struct mount *, struct vnode *);
 static struct vnode * null_hashins(struct mount *, struct null_node *);
 
 /*
@@ -98,7 +97,7 @@
  * Return a VREF'ed alias for lower vnode if already exists, else 0.
  * Lower vnode should be locked on entry and will be left locked on exit.
  */
-static struct vnode *
+struct vnode *
 null_hashget(mp, lowervp)
struct mount *mp;
struct vnode *lowervp;
@@ -209,14 +208,10 @@
struct vnode *vp;
int error;
 
-   /*
-* The insmntque1() call below requires the exclusive lock on
-* the nullfs vnode.
-*/
-   ASSERT_VOP_ELOCKED(lowervp, "lowervp");
-   KASSERT(lowervp->v_usecount >= 1, ("Unreferenced vnode %p\n", lowervp));
+   ASSERT_VOP_LOCKED(lowervp, "lowervp");
+   KASSERT(lowervp->v_usecount >= 1, ("Unreferenced vnode %p", lowervp));
 
-   /* Lookup the hash firstly */
+   /* Lookup the hash firstly. */
*vpp = null_hashget(mp, lowervp);
if (*vpp != NULL) {
vrele(lowervp);
@@ -224,6 +219,19 @@
}
 
/*
+* The insmntque1() call below requires the exclusive lock on
+* the nullfs vnode.  Upgrade the lock now if hash failed to
+* provide ready to use vnode.
+*/
+   if (VOP_ISLOCKED(lowervp) != LK_EXCLUSIVE) {
+   vn_lock(lowervp, LK_UPGRADE | LK_RETRY);
+   if ((lowervp->v_iflag & VI_DOOMED) != 0) {
+   vput(lowervp);
+   return (ENOENT);
+   }
+   }
+
+   /*
 * We do not serialize vnode creation, instead we will check for
 * duplicates later, when adding new vnode to hash.
 * Note that duplicate can only appear in hash if the lowervp is
@@ -233,8 +241,7 @@
 * might cause a bogus v_data pointer to get dereferenced
 * elsewhere if MALLOC should block.
 */
-   xp = malloc(sizeof(struct null_node),
-   M_NULLFSNODE, M_WAITOK);
+   xp = malloc(sizeof(struct null_node), M_NULLFSNODE, M_WAITOK);
 
error = getnewvnode("null", mp, &null_vnodeops, &vp);
if (error) {
Index: fs/nullfs/null_vfsops.c
===
--- fs/nullfs/null_vfsops.c (revision 241191)
+++ fs/nullfs/null_vfsops.c (working copy)
@@ -65,6 +65,7 @@
 static vfs_unmount_t   nullfs_unmount;
 static vfs_vget_t  nullfs_vget;
 static vfs_extattrctl_tnullfs_extattrctl;
+static vfs_reclaim_lowervp_t nullfs_reclaim_lowervp;
 
 /*
  * Mount null layer
@@ -121,8 +122,10 @@
 */
NDINIT(ndp, LOOKUP, FOLLOW|LOCKLEAF, UIO_SYSSPACE, target, curthread);
error = namei(ndp);
+
/*
 * Re-lock vnode.
+* XXXKIB This is deadlock-prone as well.
 */
if (isvnunlocked)
vn_lock(mp->mnt_vnodecovered, LK_EXCLUSIVE | LK_RETRY);
@@ -146,7 +149,7 @@
}
 
xmp = (struct null_mount *) malloc(sizeof(struct null_mount),
-   M_NULLFSMNT, M_WAITOK); /* XXX */
+   M_NULLFSMNT, M_WAI

Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-04 Thread Konstantin Belousov
On Thu, Oct 04, 2012 at 09:08:08AM +0200, Norbert Aschendorff wrote:
> On 10/04/2012 08:41 AM, Norbert Aschendorff wrote:
> > I just applied the numeric-uidgid patch to CURRENT (worked so far) and
> > compile the kernel with the patch and try it another time, just to
> > eliminate the possibility of a bug in this patch (the machine never
> > crashed before using this patch, but I'm not sure if I tested this
> > consciously before having applied the patch)
> 
> Sorry, does not compile. But we nevertheless know that it should be
> fixed in 10.0.
Good, thank you.

To finish the experiment, you could take the r240283, r240284 and r240285
from head and apply to stable/9. I suspect that the fix is in r240285, but
other two revisions are the prerequisites. I assume that you do not use ZFS.


pgpywIZJsBLDK.pgp
Description: PGP signature


Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-04 Thread Konstantin Belousov
On Thu, Oct 04, 2012 at 12:29:51PM +0200, Norbert Aschendorff wrote:
> Does not compile: http://nopaste.info/2bc2c189eb.html (I also #define-d
> a constant, but that works)
> 
You completely strip off the quotes and attributions, as well as your
To: address is bogus, so I do not know whom did you addressed the note.

But the errors has nothing to do with my nullfs backport.


pgpRicwIl07v5.pgp
Description: PGP signature


Re: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS"

2012-10-04 Thread Konstantin Belousov
On Thu, Oct 04, 2012 at 03:29:31PM +0200, Norbert Aschendorff wrote:
> Nop, the patch doesn't seem to work - the machine crashes again. :|
> 
This is the whole difference between stable and HEAD nullfs.
Retest the HEAD then.


pgpCMOX0jlgk4.pgp
Description: PGP signature


Re: stable/9 @r241776 panic: REDZONE: Buffer underflow detected...

2012-10-21 Thread Konstantin Belousov
On Sat, Oct 20, 2012 at 07:10:19AM -0700, David Wolfskill wrote:
> This seems ... fairly weird to me.
> 
> Yesterday, I built & booted:
> 
> FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #274 
> 241726M: Fri Oct 19 05:40:05 PDT 2012 
> r...@g1-227.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> 
> and used the machine all day; nothing unusual (including various
> reboots (e.g. when I disembarked the train for the final leg of my
> commute home, so I powered the laptop off).
> 
> This morning, I built:
> 
> FreeBSD g1-227.catwhisker.org 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #275 
> 241776M: Sat Oct 20 04:34:45 PDT 2012 
> r...@g1-227.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> 
> and on first reboot, I got a panic.
> 
> After a bit of experimentation, it appears that I get a panic @r241776
> if I attempt a normal boot into multi-user mode, but if I first boot to
> single-user mode, then exit single-user mode, it comes up without a
> problem.
> 
> I don't have a serial console, so I started to write down some of the
> panic information, but my patience ran a bit short.  Here's whet I
> recorded (warning: hand-transcripted -- twice!):
> 
> ...
> Starting devd.
> REDZONE: Buffer underflow detected.  1 byte corrupted before 0xced40080 
> (4294966796 bytes allocated).
> Allocation backtrace:
> #0 0xc0ceac8f at redzone_setup+0xcf
> #1 0xc0a5d5c9 at malloc+0x1d9
> ...[about 20 more such lines I didn't record]...
> 
> > bt
> Tracing pid 901 tid 100106 td 0xd2b99000
> kdb_enter(...)
> panic(...)
> free(...)
> devread(ce8c2d00,f7274c0c,0,c0b1e4f0,d279e380,...) at devread+0x1a6
> giant_read(...) at giant_read+0x87
> devfs_read(...) at devfs_read+0xc6
> dofileread(...) at dofileread+0x99
> sys_read(...) at sys_read+0x98
> syscall(f7274d08) at syscall+0x387
> 
> Within the bounds described above, this appears to be quite reproducible
> -- on my laptop.  My build machine (updated in parallel, at the same
> GRNs) does not exhibit the panic.
> 
> I was unable to get a crash dump; I have
> 
> dumpdev="AUTO"
> 
> in /etc/rc.conf, and the panic was occurring well after swap was
> enabled.  (Yes, I know I have swap over-allocated.  I plan to do
> something about it at some point.)
> 
> I've attached a copy of dmesg.boot.
> 
> Anyone else seeing this?  Any ideas how to diagnose it?

devread is the method of devctl(4) which passes devd notifications from
the kernel to userland (to devd, specifically). There were no changes to
devctl(4) for quite a time.

The corruption is, most likely, in some unrelated piece of code. Could
you try to bisect the stable to catch the offender ? The bisect is not
guaranteed to work, obviously, since the random corruption effects are
unpredictable.


pgp9c2yHeIYD1.pgp
Description: PGP signature


Re: stable/9 @r241776 panic: REDZONE: Buffer underflow detected...

2012-10-21 Thread Konstantin Belousov
On Sun, Oct 21, 2012 at 09:46:34AM -0700, David Wolfskill wrote:
> On Sun, Oct 21, 2012 at 09:33:22AM -0700, David Wolfskill wrote:
> > ...
> > So I tried reverting 241749 ... and I failed to reproduce the problem.
> > 
> > Well, one boot out of one, at least.  I'll try a few more reality
> > checks, and report back if a correction is in order.  But (for now, at
> > least), it looks to me as if 241749 is presenting a problem on this
> > laptop.
> > ...
> 
> 5 for 5.  I'm convinced that 241749 causes problems on this laptop for
> attempts to boot without a stop is single-user mode first.
> 
> (So that sounds like a timing issue, somehow.)
> 
> And thanks again, Konstantin!

I do not know/do not understand the CAM code, the question shall
be addressed to Alexander. It still might be a false positive.


pgpUVl3jUv9AU.pgp
Description: PGP signature


Re: pty/tty or signal strangeness, or grep/bsdgrep bug?

2012-10-23 Thread Konstantin Belousov
On Tue, Oct 23, 2012 at 07:27:03AM -0700, Jeremy Chadwick wrote:
> Please keep me CC'd as I'm not subscribed to the list.
> 
> Something "fun" today.  First off: yes, I should have been using
> 'bsdgrep -r -- "-2011" .', and yes that works fine, but that's besides
> the point.  Here we go:
> 
> % bsdgrep -r "-2011" .
> ^Z
> Suspended
> % bg
> [1]bsdgrep -r -2011 . &
> % qqq
> %
> % fg(standard input):qfgfg
> 
> bsdgrep -r -2011 .
> ^C
> %
> 
> Let me explain what transpired from an input perspective:
> 
> 1. Ran bsdgrep -r "-2011" .
> 2. Pressed Ctrl-Z
> 3. Typed bg
> 4. Typed "q" 20 times in a row exactly
> 5. Pressed Ctrl-C
> 6. Typed "fg" and pressed Enter
> 7. Typed "ffgg" and pressed Enter
> 8. Pressed Ctrl-Z and Enter
> 9. Pressed Ctrl-C
> 
> What's going on here?  Where is the famous "Suspended (tty input)"?  It
> gets more interesting.  Here's another one:
> 
> % bsdgrep -r "-2011" .
> ^Z
> Suspended
> % bg
> [1]bsdgrep -r -2011 . &
> % g(standard input):f
> fg
> gfg: Command not found.
> [1]  + Suspended (tty input) bsdgrep -r -2011 .
> % jobs
> [1]  + Suspended (tty input) bsdgrep -r -2011 .
> % fg
> bsdgrep -r -2011 .
> %
> 
> And what transpired input-wise:
> 
> 1. Ran bsdgrep -r "-2011" .
> 2. Pressed Ctrl-Z
> 3. Typed bg
> 4. Typed "fg" and Enter
> 5. Typed "fg" again and pressed Enter
> 6. Typed "jobs" and pressed Enter
> 7. Typed "fg" and pressed Enter
> 8. Pressed Ctrl-D
> 
> Some facts:
> 
> - Fully 100% reproducible
> - Tested only on RELENG_9 (source from 2012/10/21)
> - Happens regardless of shell (bash and csh tested; csh w/out dot files)
> - Similar behaviour happens with our base system grep (GNU grep) but
>   sometimes it manifests itself in a weirder way
> - bsdgrep and GNU grep are both in state "ttyin" when this happens
> 
> CC'ing some folks who might have some ideas or can explain how to
> troubleshoot this one.  For the signal part, I believe this would be
> SIGTTIN.

This is reproducable with the cat(1) as well. The telling part is that
the backgrounded process stays on the "ttyin" cv. The code for e.g.
tty read currently is structured as follows:
check for background process reading from CTTY, send SIGTTYIN
loop {
sleep waiting for input
process input
}
The problem is that the SIGCONT does not remove the sleeping process from
the sleep queue, so the sleep is not interrupted with error. Instead, the
process is woken up later when input is available.

Old tty code did the recheck for background state inside the loop after
the sleep.

Below is the hacky change that seemingly helped for exactly your case.
New code is structured so that the fix requires big movements of blocks,
e.g. even if keeping my patch, the for (;;) loops in tty_ttydisc.c
no longer have any use.

Hope Ed will comment.

diff --git a/sys/kern/tty.c b/sys/kern/tty.c
index e9c0fb6..3785e81 100644
--- a/sys/kern/tty.c
+++ b/sys/kern/tty.c
@@ -434,6 +434,7 @@ ttydev_read(struct cdev *dev, struct uio *uio, int ioflag)
if (error)
goto done;
 
+check_bg:
error = tty_wait_background(tp, curthread, SIGTTIN);
if (error) {
tty_unlock(tp);
@@ -441,6 +442,8 @@ ttydev_read(struct cdev *dev, struct uio *uio, int ioflag)
}
 
error = ttydisc_read(tp, uio, ioflag);
+   if (error == EJUSTRETURN)
+   goto check_bg;
tty_unlock(tp);
 
/*
@@ -462,6 +465,7 @@ ttydev_write(struct cdev *dev, struct uio *uio, int ioflag)
if (error)
return (error);
 
+check_bg:
if (tp->t_termios.c_lflag & TOSTOP) {
error = tty_wait_background(tp, curthread, SIGTTOU);
if (error)
@@ -484,6 +488,8 @@ ttydev_write(struct cdev *dev, struct uio *uio, int ioflag)
tp->t_flags &= ~TF_BUSY_OUT;
cv_signal(&tp->t_outserwait);
}
+   if (error == EJUSTRETURN)
+   goto check_bg;
 
 done:  tty_unlock(tp);
return (error);
diff --git a/sys/kern/tty_ttydisc.c b/sys/kern/tty_ttydisc.c
index 52a5df2..d8b8f53 100644
--- a/sys/kern/tty_ttydisc.c
+++ b/sys/kern/tty_ttydisc.c
@@ -157,7 +157,7 @@ ttydisc_read_canonical(struct tty *tp, struct uio *uio, int 
ioflag)
error = tty_wait(tp, &tp->t_inwait);
if (error)
return (error);
-   continue;
+   return (EJUSTRETURN);
}
 
/* Don't send the EOF char back to userspace. */
@@ -208,6 +208,7 @@ ttydisc_read_raw_no_timer(struct tty *tp, struct uio *uio, 
int ioflag)
error = tty_wait(tp, &tp->t_inwait);
if (error)
return (error);
+   return (EJUSTRETURN);
}
 }
 
@@ -256,6 +257,7 @@ ttydisc_read_raw_read_timer(struct tty *tp, struct uio 
*uio, int ioflag,
error = tty_timedwait(

Re: wine, gcc and clang with CPUTYPE

2012-10-24 Thread Konstantin Belousov
On Wed, Oct 24, 2012 at 10:57:22AM +0300, Volodymyr Kostyrko wrote:
> Hi all.
> 
> I just have taken some time to inspect CPUTYPE support for clang. It 
> seems to me that clang generates incorrect code in some cases.
> 
> The first failure point I discovered was inability to build gcc from 
> sources or compile something with gcc. Code produced by gcc seem to fail 
> whether this was gcc compiled from bootstrap or anything else:
> 
> http://lists.freebsd.org/pipermail/freebsd-multimedia/2012-October/013469.html
> 
> I started testing by commenting out CPUTYPE in make.conf. After first 
> rebuild I also updated the ports and installed new version of 
> wine-devel. And to my surprise it works like a charm. Rolling back to 
> the world built with CPUTYPE=native makes wine break again.
> 
> To my surprise CPUTYPE was not the cause of wine failure per se. Wine 
> continues to work for k6, k6-3, athlon and athlon-tbird. But it 
> completely fails when the world was built with athlon-4 and athlon-xp.
> 
> Trying to recompile gcc I also found that everything works and yet again 
> up to the athlon-tbird.
> 
> My conclusion is: clang incorrectly produces code within one of core 
> libraries (I haven't tested which one yet, but I suspect libgcc_s.so) 
> when optimizing for athlon-4 or athlon-xp.

I am not versed in the AMD marketing monikers. I guess that athlon-{4,xp}
turns on SSE and might be SSE2, while previous selections turn it off.
Can you confirm/deny this ?

BTW, did you tested on i386 or amd64 ?


pgppejDIRmpnL.pgp
Description: PGP signature


Re: wine, gcc and clang with CPUTYPE

2012-10-24 Thread Konstantin Belousov
On Wed, Oct 24, 2012 at 01:34:16PM +0300, Volodymyr Kostyrko wrote:
> 24.10.2012 13:05, Dimitry Andric wrote:
> >> I just have taken some time to inspect CPUTYPE support for clang. It
> >> seems to me that clang generates incorrect code in some cases.
> >>
> >> The first failure point I discovered was inability to build gcc from
> >> sources or compile something with gcc. Code produced by gcc seem to fail
> >> whether this was gcc compiled from bootstrap or anything else:
> >>
> >> http://lists.freebsd.org/pipermail/freebsd-multimedia/2012-October/013469.html
> >
> > Can you attempt to figure out what the illegal instruction was, in that
> > case?
> 
> How can I do that? I'm not very familiar with gdb.
Load the coredump in gdb, like
gdb /path/to/the/binary binary.core
then, at the gdb prompt, do
info registers
disassemble
(if the later command emited an error, do disassemble x,x+10
where x is the content of the %eip register).

Post the verbatim results of the whole gdb session.


pgpZBO9Xk492I.pgp
Description: PGP signature


Re: pty/tty or signal strangeness, or grep/bsdgrep bug?

2012-10-25 Thread Konstantin Belousov
On Thu, Oct 25, 2012 at 11:06:09AM +0200, Ed Schouten wrote:
> 2012/10/25 Jeremy Chadwick :
> > I assume a commit to HEAD + MFC in 2 weeks is in order?
> 
> Yes. We're far too late to get this into 9.1, so I'll MFC it after the 
> release.
> 
> Patch committed as r242078!

Release is performed on the separate branch, which has currently nothing
to do with stable/9. The laster is open for normal MFC procedures, so I
do not see how the merge of bugfix to stable is related to the release.


pgpjKGEVy4JAS.pgp
Description: PGP signature


Re: tmpfs nfs exports?

2012-10-30 Thread Konstantin Belousov
On Tue, Oct 30, 2012 at 02:38:16AM -0700, Alfred Perlstein wrote:
> Hey folks, any reason why not to include the following patch in 9.1? It 
> would be nice to have tmpfs be exportable.
> 
> I'm good to commit it, I can also wait until post 9.1.
It is too late for 9.1. Patch is fine for stable/9, but you merged at
the wrong point. Merge at sys/, not at the root of the sources.

> 
> $ svn diff
> Index: .
> ===
> --- .(revision 242331)
> +++ .(working copy)
> 
> Property changes on: .
> ___
> Modified: svn:mergeinfo
> Merged /head:r234346
> Index: sys
> ===
> --- sys(revision 242331)
> +++ sys(working copy)
> 
> Property changes on: sys
> ___
> Modified: svn:mergeinfo
> Merged /head/sys:r234346
> Index: sys/fs
> ===
> --- sys/fs(revision 242331)
> +++ sys/fs(working copy)
> 
> Property changes on: sys/fs
> ___
> Modified: svn:mergeinfo
> Merged /head/sys/fs:r234346
> Index: sys/fs/tmpfs/tmpfs.h
> ===
> --- sys/fs/tmpfs/tmpfs.h(revision 242331)
> +++ sys/fs/tmpfs/tmpfs.h(working copy)
> @@ -387,6 +387,9 @@
>* tmpfs_pool.c. */
>   uma_zone_ttm_dirent_pool;
>   uma_zone_ttm_node_pool;
> +
> +/* Read-only status. */
> +inttm_ronly;
>   };
>   #define TMPFS_LOCK(tm) mtx_lock(&(tm)->allnode_lock)
>   #define TMPFS_UNLOCK(tm) mtx_unlock(&(tm)->allnode_lock)
> Index: sys/fs/tmpfs/tmpfs_vfsops.c
> ===
> --- sys/fs/tmpfs/tmpfs_vfsops.c(revision 242331)
> +++ sys/fs/tmpfs/tmpfs_vfsops.c(working copy)
> @@ -82,6 +82,10 @@
>   NULL
>   };
> 
> +static const char *tmpfs_updateopts[] = {
> +"from", "export", NULL
> +};
> +
>   /* 
> - */
> 
>   static int
> @@ -193,10 +197,13 @@
>   return (EINVAL);
> 
>   if (mp->mnt_flag & MNT_UPDATE) {
> -/* XXX: There is no support yet to update file system
> - * settings.  Should be added. */
> -
> -return EOPNOTSUPP;
> +/* Only support update mounts for certain options. */
> +if (vfs_filteropt(mp->mnt_optnew, tmpfs_updateopts) != 0)
> +return (EOPNOTSUPP);
> +if (vfs_flagopt(mp->mnt_optnew, "ro", NULL, 0) !=
> +((struct tmpfs_mount *)mp->mnt_data)->tm_ronly)
> +return (EOPNOTSUPP);
> +return (0);
>   }
> 
>   vn_lock(mp->mnt_vnodecovered, LK_SHARED | LK_RETRY);
> @@ -269,6 +276,7 @@
>   tmpfs_node_ctor, tmpfs_node_dtor,
>   tmpfs_node_init, tmpfs_node_fini,
>   UMA_ALIGN_PTR, 0);
> +tmp->tm_ronly = (mp->mnt_flag & MNT_RDONLY) != 0;
> 
>   /* Allocate the root node. */
>   error = tmpfs_alloc_node(tmp, VDIR, root_uid,


pgpKnhkN3I0SC.pgp
Description: PGP signature


Re: thread taskq / unp_gc() using 100% cpu and stalling unix socket IPC

2012-11-13 Thread Konstantin Belousov
On Wed, Nov 14, 2012 at 01:41:04AM +0100, Markus Gebert wrote:
> 
> On 13.11.2012, at 19:30, Markus Gebert  wrote:
> 
> > To me it looks like the unix socket GC is triggered way too often and/or 
> > running too long, which uses cpu and worse, causes a lot of contention 
> > around the unp_list_lock which in turn causes delays for all processes 
> > relaying on unix sockets for IPC.
> > 
> > I don't know why the unp_gc() is called so often and what's triggering this.
> 
> I have a guess now. Dovecot and relayd both use unix sockets heavily. 
> According to dtrace uipc_detach() gets called quite often by dovecot closing 
> unix sockets. Each time uipc_detach() is called unp_gc_task is 
> taskqueue_enqueue()d if fds are inflight.
> 
> in uipc_detach():
> 682   if (local_unp_rights)   
> 683   taskqueue_enqueue(taskqueue_thread, &unp_gc_task);
> 
> We use relayd in a way that keeps the source address of the client when 
> connecting to the backend server (transparent load balancing). This requires 
> IP_BINDANY on the socket which cannot be set by unprivileged processes, so 
> relayd sends the socket fd to the parent process just to set the socket 
> option and send it back. This means an fd gets transferred twice for every 
> new backend connection.
> 
> So we have dovecot calling uipc_detach() often and relayd making it likely 
> that fds are inflight (unp_rights > 0). With a certain amount of load this 
> could cause unp_gc_task to be added to the thread taskq too often, slowing 
> everything unix socket related down by holding global locks in unp_gc().
> 
> I don't know if the slowdown can even cause a negative feedback loop at some 
> point by inreasing the chance of fds being inflight. This would explain why 
> sometimes the condition goes away by itself and sometimes requires 
> intervention (taking load away for a moment).
> 
> I'll look into a way to (dis)prove all this tomorrow. Ideas still welcome :-).
> 

If the only issue is indeed too aggressive scheduling of the taskqueue,
than the postpone up to the next tick could do it. The patch below
tries to schedule the taskqueue for gc to the next tick if it is not yet
scheduled. Could you try it ?

diff --git a/sys/kern/subr_taskqueue.c b/sys/kern/subr_taskqueue.c
index 90c6ffc..3bf62f9 100644
--- a/sys/kern/subr_taskqueue.c
+++ b/sys/kern/subr_taskqueue.c
@@ -252,9 +252,13 @@ taskqueue_enqueue_timeout(struct taskqueue *queue,
} else {
queue->tq_callouts++;
timeout_task->f |= DT_CALLOUT_ARMED;
+   if (ticks < 0)
+   ticks = -ticks; /* Ignore overflow. */
+   }
+   if (ticks > 0) {
+   callout_reset(&timeout_task->c, ticks,
+   taskqueue_timeout_func, timeout_task);
}
-   callout_reset(&timeout_task->c, ticks, taskqueue_timeout_func,
-   timeout_task);
}
TQ_UNLOCK(queue);
return (res);
diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index cc5360f..ed92e90 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -131,7 +131,7 @@ static const struct sockaddrsun_noname = { 
sizeof(sun_noname), AF_LOCAL };
  * reentrance in the UNIX domain socket, file descriptor, and socket layer
  * code.  See unp_gc() for a full description.
  */
-static struct task unp_gc_task;
+static struct timeout_task unp_gc_task;
 
 /*
  * The close of unix domain sockets attached as SCM_RIGHTS is
@@ -672,7 +672,7 @@ uipc_detach(struct socket *so)
if (vp)
vrele(vp);
if (local_unp_rights)
-   taskqueue_enqueue(taskqueue_thread, &unp_gc_task);
+   taskqueue_enqueue_timeout(taskqueue_thread, &unp_gc_task, -1);
 }
 
 static int
@@ -1783,7 +1783,7 @@ unp_init(void)
LIST_INIT(&unp_shead);
LIST_INIT(&unp_sphead);
SLIST_INIT(&unp_defers);
-   TASK_INIT(&unp_gc_task, 0, unp_gc, NULL);
+   TIMEOUT_TASK_INIT(taskqueue_thread, &unp_gc_task, 0, unp_gc, NULL);
TASK_INIT(&unp_defer_task, 0, unp_process_defers, NULL);
UNP_LINK_LOCK_INIT();
UNP_LIST_LOCK_INIT();


pgpuRG94AsgOJ.pgp
Description: PGP signature


Re: Increasing the DMESG buffer....

2012-11-21 Thread Konstantin Belousov
On Wed, Nov 21, 2012 at 06:08:12PM +0200, Andriy Gapon wrote:
> on 21/11/2012 18:01 Ian Lepore said the following:
> > You know what would be great?  Have this value auto-tune itself upwards
> > if bootverbose is true.
> 
> This sounds /potentially/ neat.
I do not want the bootverbose knob suddently change kernel memory layout.

> 
> > The sound drivers now spit out so much stuff
> > with bootverbose true that you need like a 128k buffer to see the early
> > boot messages.
> 
> I'd argue that snd_hda should not do that.  It should use a different knob.
> 
> -- 
> Andriy Gapon
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


pgpEpqgbl43iM.pgp
Description: PGP signature


Re: buildworld with clang breaks because no cc

2012-11-24 Thread Konstantin Belousov
On Sat, Nov 24, 2012 at 01:55:50PM +0100, Dimitry Andric wrote:
> On 2012-11-23 15:14, Beeblebrox wrote:
> > Thanks for the suggestion. Build progressed a little further then had other
> > problem:
> >
> > ===> gnu/lib/libstdc++ (all)
> > building shared library libstdc++.so.6
> > /usr/obj/asp/src/tmp/usr/bin/ld: warning: creating a DT_TEXTREL in a shared
> > object.
> 
> I am not sure what causes this.  Maybe strange CFLAGS in make.conf?  Or
> is this still with ccache?
There are two usual causes for this error:
1. missed -fPIC when compiling some .c file
2. wrong assembler which uses non-pic safe relocations.

I very much doubt that #2 is the cause.


pgpglk5Ytt4ru.pgp
Description: PGP signature


nullfs changes MFC

2012-12-07 Thread Konstantin Belousov
Hi,
I am going to merge latest batch of the nullfs improvements into
stable/9. This will bring up significant performance enchancements due
to use of the shared locks for lookups if the lower layer supports it,
much better caching on the nullfs layer, and proper handling of the text
segments on the nullfs. Also, it should improve the error recovery and
some corner cases with locking.

Unfortunately, the merge would break KBI for VFS, since it needs 5 new
VOP slots, and only three spares are left. We already are very liberal
with the VFS KBI, so I do not feel that the merge is not acceptable, due
to the benefits it brings to the nullfs.

The merge is available at 
http://people.freebsd.org/~kib/misc/nullfs_9.1.patch


pgpHMMlThCzYf.pgp
Description: PGP signature


Re: nullfs changes MFC

2012-12-07 Thread Konstantin Belousov
On Sat, Dec 08, 2012 at 03:58:16AM +0100, Baptiste Daroussin wrote:
> On Sat, Dec 08, 2012 at 03:01:09AM +0200, Konstantin Belousov wrote:
...
> > The merge is available at 
> > http://people.freebsd.org/~kib/misc/nullfs_9.1.patch
> 
> 
> Sorry I haven't checked the latest zfs related MFC, but for some time
> there was an issue with nullfs improvements and zfs, will this mfc be
> synchronize with the mfc of the related zfs fixes?

Corresponding zfs fixes were already merged to stable/9, as I was told.
Cc:ed Andrey to confirm it once more.


pgp5JWVCeKD4r.pgp
Description: PGP signature


Re: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup

2012-12-27 Thread Konstantin Belousov
On Thu, Dec 27, 2012 at 12:28:54PM +0100, Andreas Longwitz wrote:
> On a FreeBSD 8-Stable machine with UFS + GJOURNAL (no SU) I observed the
> same behaviour as described for UFS+SU+J in
>   lists.freebsd.org/pipermail/freebsd-current/2012-January/030937.html.
> 
> The snapshot was initiated by amanda with
>   dump 3ubLshf 64 1048576 0 - /dev/mirror/gm0p10.journal (pid 50347)
> and the process creating the snapshot is
>   /sbin/mksnap_ffs /home /home/.snap/dump_snapshot (pid 50350).
> 
> The process 50350 hangs and also all following processes that tried to
> access the home partition, some of them work, but don't come to an end,
> e.g. a shell after (Ctrl T):
>   load: 0.61  cmd: sh 52670 [suspfs] 43.46r 0.00u 0.00s 0% 2272k.
> 
> All write I/O's to all gjournaled partitions are stopped. Under normal
> circumstances the snapshot is taken in five seconds, so I have definitiv
> not the problem I have described in
>   lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html.
> 
> My system disk with root,var,usr and home is completely mirrored and
> gjournaled with journals in extra partitions:
> gpart show mirror/gm0 -->
> =>   34  286749420  mirror/gm0  GPT  (136G)
>  34128   1  freebsd-boot  (64k)
> 1628601600   2  freebsd-swap  (4.1G)
> 86017622097152   3  freebsd-swap  (1.0G)
>106989144194304   4  freebsd-swap  (2.0G)
>148932184194304   5  freebsd-swap  (2.0G)
>190875224194304   6  freebsd-swap  (2.0G)
>232818262097152   7  freebsd-ufs  (1.0G)
>253789788388608   8  freebsd-ufs  (4.0G)
>33767586   67108864   9  freebsd-ufs  (32G)
>   100876450  185873004  10  freebsd-ufs  (88G)
> df -h -t ufs -->
> FilesystemSizeUsed   Avail Capacity  Mounted on
> /dev/mirror/gm0p7.journal 989M313M596M34%/
> /dev/mirror/gm0p8.journal 3.9G2.2G1.4G61%/var
> /dev/mirror/gm0p9.journal  31G8.6G 19G30%/usr
> /dev/mirror/gm0p10.journal 85G 17G 62G22%/home
> gmirror status -->
>   NameStatus  Components
> mirror/gm0  COMPLETE  da6 (ACTIVE)
>   da7 (ACTIVE)
> gjournal status -->
>  Name  Status  Components
>  mirror/gm0p7.journal N/A  mirror/gm0p3
>mirror/gm0p7
>  mirror/gm0p8.journal N/A  mirror/gm0p4
>mirror/gm0p8
>  mirror/gm0p9.journal N/A  mirror/gm0p5
>mirror/gm0p9
> mirror/gm0p10.journal N/A  mirror/gm0p6
>mirror/gm0p10
> 
> I got some information from the hanging system with DDB:
> KDB: enter: Break to debugger
> [thread pid 11 tid 14 ]
> Stopped at  kdb_enter+0x3b: movq$0,0x483332(%rip)
> db> show pcpu
> cpuid= 2
> dynamic pcpu = 0xff807f7d0080
> curthread= 0xff000235c000: pid 11 "idle: cpu2"
> curpcb   = 0xff851d00
> fpcurthread  = none
> idlethread   = 0xff000235c000: tid 14 "idle: cpu2"
> curpmap  = 0x80889170
> tssp = 0x808f65d0
> commontssp   = 0x808f65d0
> rsp0 = 0xff851d00
> gs32p= 0x808f5408
> ldt  = 0x808f5448
> tss  = 0x808f5438
> db> show allpcpu
> Current CPU: 2
> 
> cpuid= 0
> dynamic pcpu = 0x449080
> curthread= 0xff0002368470: pid 11 "idle: cpu0"
> curpcb   = 0xff85bd00
> fpcurthread  = none
> idlethread   = 0xff0002368470: tid 16 "idle: cpu0"
> curpmap  = 0x80889170
> tssp = 0x808f6500
> commontssp   = 0x808f6500
> rsp0 = 0xff85bd00
> gs32p= 0x808f5338
> ldt  = 0x808f5378
> tss  = 0x808f5368
> 
> cpuid= 1
> dynamic pcpu = 0xff807f7c9080
> curthread= 0xff00023688e0: pid 11 "idle: cpu1"
> curpcb   = 0xff856d00
> fpcurthread  = none
> idlethread   = 0xff00023688e0: tid 15 "idle: cpu1"
> curpmap  = 0x80889170
> tssp = 0x808f6568
> commontssp   = 0x808f6568
> rsp0 = 0xff856d00
> gs32p= 0x808f53a0
> ldt  = 0x808f53e0
> tss  = 0x808f53d0
> 
> cpuid= 2
> dynamic pcpu = 0xff807f7d0080
> curthread= 0xff000235c000: pid 11 "idle: cpu2"
> curpcb   = 0xff851d00
> fpcurthread  = none
> idlethread   = 0xff000235c000: tid 14 "idle: cpu2"
> curpmap  = 0x80889170
> tssp = 0x808f65d0
> commontssp   = 0x808f65d0
> rsp0 = 0xff851d00
> gs32p= 0x808f5408
> ldt  = 0x808f5448
> tss  = 0x808f5438
> 
> cpuid= 3
> dynamic pcpu = 0xff807f7d7080
> curthread= 0xff000235c470: pid 11 "idle: cpu3"
> curpcb 

Re: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup

2012-12-27 Thread Konstantin Belousov
On Thu, Dec 27, 2012 at 06:47:05PM +0100, Andreas Longwitz wrote:
> Konstantin Belousov wrote:
> > On Thu, Dec 27, 2012 at 12:28:54PM +0100, Andreas Longwitz wrote:
> >> On a FreeBSD 8-Stable machine with UFS + GJOURNAL (no SU) I observed the
> >> same behaviour as described for UFS+SU+J in
> >>   lists.freebsd.org/pipermail/freebsd-current/2012-January/030937.html.
> >>
> >> The snapshot was initiated by amanda with
> >>   dump 3ubLshf 64 1048576 0 - /dev/mirror/gm0p10.journal (pid 50347)
> >> and the process creating the snapshot is
> >>   /sbin/mksnap_ffs /home /home/.snap/dump_snapshot (pid 50350).
> >>
> >> The process 50350 hangs and also all following processes that tried to
> >> access the home partition, some of them work, but don't come to an end,
> >> e.g. a shell after (Ctrl T):
> >>   load: 0.61  cmd: sh 52670 [suspfs] 43.46r 0.00u 0.00s 0% 2272k.
> >>
> >> All write I/O's to all gjournaled partitions are stopped. Under normal
> >> circumstances the snapshot is taken in five seconds, so I have definitiv
> >> not the problem I have described in
> >>   lists.freebsd.org/pipermail/freebsd-geom/2012-May/005246.html.
> >>
> >> .
> >>
> >> It seems there is a deadlock on the suspfs lock, but I could not figure
> >> out who holds this lock.
> >> Any hints how to get better diagnostic information for next time the
> >> error occurs are welcome.
> > 
> > The
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> > provides the instructions.
> > 
> > The suspfs is owned by the snapshot creator. The question is, where is it
> > blocked.
> 
> Thanks for answer.
> 
> In the meantime I can reproduce the problem and got some more
> information. It looks that there is a deadlock between the two processes
> with pid 18 (g_journal switcher) and pid 7126 (/sbin/mksnap_ffs /home
> /home/.snap/my_snapshot):
> 
> 018 0   0  45  0 016 snaplk DL??4:40,34
>  [g_journal switcher]
> 0  7126  1933   0  50  0  5836  1176 suspfs D ??0:00,44
>  /sbin/mksnap_ffs /home /home/.snap/my_snapshot
> 
> procstat -t 18 -->
>   PIDTID COMM   TDNAME   CPU  PRI STATE   WCHAN
>18 100076 g_journal switcher g_journal switch   0  129 sleep   snaplk
> procstat  -t 7126 -->
>   PIDTID COMM   TDNAME   CPU  PRI STATE   WCHAN
>  7126 100157 mksnap_ffs   -1  134 sleep   suspfs
> procstat -kk 18 -->
>   PIDTID COMM TDNAME   KSTACK
>18 100076 g_journal switcher g_journal switch mi_switch+0x186
>   sleepq_wait+0x42 __lockmgr_args+0x49b ffs_copyonwrite
>   +0x19a ffs_geom_strategy+0x1b5 bufwrite+0xe9 ffs_sbupdate+0x12a
>   g_journal_ufs_clean+0x3e g_journal_switcher+0xe5e fork
>   _exit+0x11f fork_trampoline+0xe
> procstat -kk 7126 -->
>   PIDTID COMM TDNAME   KSTACK
>  7126 100157 mksnap_ffs   -mi_switch+0x186
>   sleepq_wait+0x42 _sleep+0x373 vn_start_write+0xdf ffs_s
>   napshot+0xe2b ffs_mount+0x65a vfs_donmount+0xdc5 nmount+0x63
>   amd64_syscall+0x1f4 Xfast_syscall+0xfc
> 
> >From DDB:
> db> show lockedvnods
> Locked vnodes
> 
> 0xff012281a938: tag ufs, type VREG
> usecount 1, writecount 0, refcount 3339 mountedhere 0
> flags (VV_SYSTEM)
> lock type snaplk: EXCL by thread 0xff000807a470 (pid 7126)
>  with exclusive waiters pending
> ino 23552, on dev mirror/gm0p10.journal
...
> db> alltrace (pid 18 and 7126)
> 
> Tracing command g_journal switcher pid 18 tid 100076 td 0xff0002bd5000
> sched_switch() at sched_switch+0xde
> mi_switch() at mi_switch+0x186
> sleepq_wait() at sleepq_wait+0x42
> __lockmgr_args() at __lockmgr_args+0x49b
> ffs_copyonwrite() at ffs_copyonwrite+0x19a
> ffs_geom_strategy() at ffs_geom_strategy+0x1b5
> bufwrite() at bufwrite+0xe9
> ffs_sbupdate() at ffs_sbupdate+0x12a
> g_journal_ufs_clean() at g_journal_ufs_clean+0x3e
> g_journal_switcher() at g_journal_switcher+0xe5e
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xff8242ca8cf0, rbp = 0 ---
> 
> Tracing command mksnap_ffs pid 7126 tid 100157 td 0xff000807a470
> sched_switch() at sched_switch+0xde
> mi_switch() at mi_switch+0x186
> sleepq_wait() at sleepq_wait+0x42
> _sleep() at _sleep+0x373
> vn_start_write() at vn_start_write+0xdf
> ffs_snapshot() at ffs_snapshot+0xe2b
Can you look up the line number 

Re: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup

2012-12-28 Thread Konstantin Belousov
On Fri, Dec 28, 2012 at 10:19:31AM +0100, Andreas Longwitz wrote:
> Konstantin Belousov wrote:
> >>> On Thu, Dec 27, 2012 at 12:28:54PM +0100, Andreas Longwitz wrote:
> >> db> alltrace (pid 18 and 7126)
> >>
> >> Tracing command g_journal switcher pid 18 tid 100076 td 0xff0002bd5000
> >> sched_switch() at sched_switch+0xde
> >> mi_switch() at mi_switch+0x186
> >> sleepq_wait() at sleepq_wait+0x42
> >> __lockmgr_args() at __lockmgr_args+0x49b
> >> ffs_copyonwrite() at ffs_copyonwrite+0x19a
> >> ffs_geom_strategy() at ffs_geom_strategy+0x1b5
> >> bufwrite() at bufwrite+0xe9
> >> ffs_sbupdate() at ffs_sbupdate+0x12a
> >> g_journal_ufs_clean() at g_journal_ufs_clean+0x3e
> >> g_journal_switcher() at g_journal_switcher+0xe5e
> >> fork_exit() at fork_exit+0x11f
> >> fork_trampoline() at fork_trampoline+0xe
> >> --- trap 0, rip = 0, rsp = 0xff8242ca8cf0, rbp = 0 ---
> >>
> >> Tracing command mksnap_ffs pid 7126 tid 100157 td 0xff000807a470
> >> sched_switch() at sched_switch+0xde
> >> mi_switch() at mi_switch+0x186
> >> sleepq_wait() at sleepq_wait+0x42
> >> _sleep() at _sleep+0x373
> >> vn_start_write() at vn_start_write+0xdf
> >> ffs_snapshot() at ffs_snapshot+0xe2b
> > Can you look up the line number for the ffs_snapshot+0xe2b ?
> 
> (kgdb) list *ffs_snapshot+0xe2b
> 0x8056287b is in ffs_snapshot
> (/usr/src/sys/ufs/ffs/ffs_snapshot.c:676).
> 671/*
> 672 * Resume operation on filesystem.
> 673 */
> 674vfs_write_resume(vp->v_mount);
> 675vn_start_write(NULL, &wrtmp, V_WAIT);
> 676if (collectsnapstats && starttime.tv_sec > 0) {
> 677 nanotime(&endtime);
> 678 timespecsub(&endtime, &starttime);
> 679 printf("%s: suspended %ld.%03ld sec, redo %ld of %d\n",
> 680vp->v_mount->mnt_stat.f_mntonname, (long)endtime.tv_sec,
> 
> > I think the bug is that vn_start_write() is called while the snaplock
> > is owned, after the out1 label in ffs_snapshot() (I am looking at the
> > HEAD code).
> 
> You are right, the vn_start_write() is just after the out1 label.

Please try the following patch. It is against HEAD, might need some
adjustments for 8. I do the resume and write accounting atomically,
not allowing other suspension to intervent between.

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 3f65b05..cf49ecb 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -1434,6 +1434,40 @@ vn_closefile(fp, td)
  * proceed. If a suspend request is in progress, we wait until the
  * suspension is over, and then proceed.
  */
+static int
+vn_start_write_locked(struct mount *mp, int flags)
+{
+   int error;
+
+   mtx_assert(MNT_MTX(mp), MA_OWNED);
+   error = 0;
+
+   /*
+* Check on status of suspension.
+*/
+   if ((curthread->td_pflags & TDP_IGNSUSP) == 0 ||
+   mp->mnt_susp_owner != curthread) {
+   while ((mp->mnt_kern_flag & MNTK_SUSPEND) != 0) {
+   if (flags & V_NOWAIT) {
+   error = EWOULDBLOCK;
+   goto unlock;
+   }
+   error = msleep(&mp->mnt_flag, MNT_MTX(mp),
+   (PUSER - 1) | (flags & PCATCH), "suspfs", 0);
+   if (error)
+   goto unlock;
+   }
+   }
+   if (flags & V_XSLEEP)
+   goto unlock;
+   mp->mnt_writeopcount++;
+unlock:
+   if (error != 0 || (flags & V_XSLEEP) != 0)
+   MNT_REL(mp);
+   MNT_IUNLOCK(mp);
+   return (error);
+}
+
 int
 vn_start_write(vp, mpp, flags)
struct vnode *vp;
@@ -1470,30 +1504,7 @@ vn_start_write(vp, mpp, flags)
if (vp == NULL)
MNT_REF(mp);
 
-   /*
-* Check on status of suspension.
-*/
-   if ((curthread->td_pflags & TDP_IGNSUSP) == 0 ||
-   mp->mnt_susp_owner != curthread) {
-   while ((mp->mnt_kern_flag & MNTK_SUSPEND) != 0) {
-   if (flags & V_NOWAIT) {
-   error = EWOULDBLOCK;
-   goto unlock;
-   }
-   error = msleep(&mp->mnt_flag, MNT_MTX(mp),
-   (PUSER - 1) | (flags & PCATCH), "suspfs", 0);
-   if (error)
-   goto unlock;
-   }
-   }
-   if (flags & V_XSLEEP)
-   goto unlock;
-   mp->mnt_writeopcount++;
-unlock:
-   if (

Re: Post 9.1 stable file system problems

2012-12-31 Thread Konstantin Belousov
On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> On 01/01/2013 01:49, Dominic Fandrey wrote:
> > On 01/01/2013 01:29, Chris Rees wrote:
> >> On 1 Jan 2013 00:01, "Dominic Fandrey"  wrote:
> >>>
> >>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>> Following the update build times for packages have increased by a
> >>> factor between 5 and 20. I.e. I have packages that used to build in
> >>> 5 minutes and now take an hour.
> >>>
> >>> I'm suspecting the file system ever since I saw that the majority of CPU
> >>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> >>> time were counted that moment). The majority of the time most of the CPU
> >>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>> tools that access a lot of files.
> >>>
> >>> The file system on which packages are built is nullfs mounted from
> >>> an async mounted UFS. I turned async off, to no avail.
> >>>
> >>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>> think this is where the problem originates. I might hack the tinderbox to
> >>> use 'ln -s' or set it up for NFS to verify this.
> >>
> >> Is your kernel newer than the Jail?  The converse causes problems.
> > 
> > I ran makeJail for all jails after updating.
> > 
> > I also seem to have similar problems when building in the host-system.
> > The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> > On my notebook it takes less than 10 seconds.
> 
> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> also pointed to a nullfs mount.
> 
> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> nullfs involvement). The entire make extract took 20s.
> 
> So still faster by at least factor 30 than running it on a nullfs mount
> (I eventually SIGINTed so I don't know how long it would've run).

Start providing some useful debugging information ?

At least dmesg, mount -v and sysctl kern.maxvnodes,
'sysctl vfs | grep vnodes' outputs.

What is shown when you press ^T while slow process runs on nullfs ?
Was the ^C reaction by terminating the process instant ?


pgpFhptwUPqLP.pgp
Description: PGP signature


Re: Post 9.1 stable file system problems

2013-01-01 Thread Konstantin Belousov
On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> On 01/01/2013 07:51, Konstantin Belousov wrote:
> > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> >>> On 01/01/2013 01:29, Chris Rees wrote:
> >>>> On 1 Jan 2013 00:01, "Dominic Fandrey"  wrote:
> >>>>>
> >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> >>>>> Following the update build times for packages have increased by a
> >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> >>>>> 5 minutes and now take an hour.
> >>>>>
> >>>>> I'm suspecting the file system ever since I saw that the majority of CPU
> >>>>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> >>>>> time were counted that moment). The majority of the time most of the CPU
> >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> >>>>> tools that access a lot of files.
> >>>>>
> >>>>> The file system on which packages are built is nullfs mounted from
> >>>>> an async mounted UFS. I turned async off, to no avail.
> >>>>>
> >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> >>>>> think this is where the problem originates. I might hack the tinderbox 
> >>>>> to
> >>>>> use 'ln -s' or set it up for NFS to verify this.
> >>>>
> >>>> Is your kernel newer than the Jail?  The converse causes problems.
> >>>
> >>> I ran makeJail for all jails after updating.
Did you rebuild your modules together with the new kernel ?

> >>>
> >>> I also seem to have similar problems when building in the host-system.
> >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> >>> On my notebook it takes less than 10 seconds.
> >>
> >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> >> also pointed to a nullfs mount.
> >>
> >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> >> nullfs involvement). The entire make extract took 20s.
> >>
> >> So still faster by at least factor 30 than running it on a nullfs mount
> >> (I eventually SIGINTed so I don't know how long it would've run).
> > 
> > Start providing some useful debugging information ?
> 
> That one might be interesting. It's all system time:
> 
> # time -lh make extract
> ===>  License GPLv2 accepted by the user
> ===>  Found saved configuration for openjdk-7.9.05_1
> ===>  Extracting for openjdk-7.9.05_2
> => SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> => SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> ===>   openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - found
> ^Ctime: command terminated abnormally
> 4m29.30s real   3.03s user  4m22.55s sys
>   5008  maximum resident set size
>135  average shared memory size
>   2932  average unshared data size
>127  average unshared stack size
>   7772  page reclaims
>  0  page faults
>  0  swaps
> 19  block input operations
>101  block output operations
>  0  messages sent
>  0  messages received
> 41  signals received
>   1597  voluntary context switches
>  16590  involuntary context switches

Ok, from your mount -v output, are the three nullfs mounts the only
nullfs mount ever used ?

Is it only unzip which demostrates the silly behaviour ? Or does it
happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
also slow ?

Could you try some low-tech profiling on the slow program. For instance,
you could run ktrace/kdump -R to see which syscalls are slow.

Most darkly part of your report for me, is that I also use nullfs-backed
jails both on HEAD and stable/9, with bigger scale, and I do not have
an issue. I just did
pooma32% time unzip -q 
/usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-src-b24-09_aug_2012.zip
unzip -q   3.25s user 23.77s system 78% cpu 34.482 total
over nullfs mount of
/usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).

Please try the following patch, which changes nullfs behaviour to be
non-cached by default. You could turn on the caching with the 'mount -t
nullfs -o cach

Re: NFS-exported ZFS instability

2013-01-02 Thread Konstantin Belousov
On Wed, Jan 02, 2013 at 08:24:39AM -0500, Rick Macklem wrote:
> Hiroki Sato wrote:
> > Hello,
> > 
> > I have been in a trouble about my NFS server for a long time. The
> > symptom is that it stops working in one or two weeks after a boot. I
> > could not track down the cause yet, but it is reproducible and only
> > occurred under a very high I/O load.
> > 
> > It did not panic, just stopped working---while it responded to ping,
> > userland programs seemed not working. I could break it into DDB and
> > get a kernel dump. The following URLs are a log of ps, trace, and
> > etc.:
> > 
> > http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
> > http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102
> > 
> > Does anyone see how to debug this? I guess this is due to a deadlock
> > somewhere. I have suffered from this problem for almost two years.
> > The above log is from stable/9 as of Dec 19, but this have persisted
> > since 8.X.
> > 
> Well, I took a quick glance at the log and there are a lot of processes
> sleeping on "pfault" (in vm_waitpfault() in sys/vm/vm_page.c). I'm no
> vm guy, so I'm not sure when/why that will happen. The comment on the
> function suggests they are waiting for free pages.
> 
> Maybe something as simple as running out of swap space or a problem
> talking to the disk(s) that has the swap partition(s) or ???
> (I'm talking through my hat here, because I'm not conversant with
>  the vm side of things.)
> 
> I might take a closer look this evening and see if I can spot anything
> in the log, rick
> ps: I hope Alan and Kostik don't mind being added to the cc list.

What I see in the log is that the lock cascade rooted in the thread
100838, which owns system map mutex. I believe this prevents malloc(9)
from making a progress in other threads, which e.g. own the ZFS vnode
locks. As the result, the whole system wedged.

Looking back at the thread 100838, we can see that it executes
smp_tlb_shootdown(). It is impossible to tell from the static dump,
is the appearance of the smp_tlb_shootdown() in the backtrace is
transient, or the thread is spinning there, waiting for other CPUs to
acknowledge the request. But, since the system wedged, most likely,
smp_tlb_shootdown spins.

Taking this hypothesis, the situation can occur, most likely, due to
some other core running with the interrupts disabled. Inspection of the
backtraces of the processes running on all cores does not show any which
could legitimately own a spinlock or otherwise run with the interrupts
disabled.

One thing you could try to do is to enable WITNESS for the spinlocks,
to try to catch the leaked spinlock. I very much doubt that this is
the case.

Another thing to try is to switch the CPU idle method to something
else. Look at the machdep.idle* sysctls. It could be some CPU errata
which blocks wakeup due the interrupt in some conditions in C1 ?


pgpE8MByHmYh5.pgp
Description: PGP signature


Re: Post 9.1 stable file system problems

2013-01-05 Thread Konstantin Belousov
On Tue, Jan 01, 2013 at 05:58:06PM +0200, Konstantin Belousov wrote:
> On Tue, Jan 01, 2013 at 02:39:44PM +0100, Dominic Fandrey wrote:
> > On 01/01/2013 07:51, Konstantin Belousov wrote:
> > > On Tue, Jan 01, 2013 at 02:05:11AM +0100, Dominic Fandrey wrote:
> > >> On 01/01/2013 01:49, Dominic Fandrey wrote:
> > >>> On 01/01/2013 01:29, Chris Rees wrote:
> > >>>> On 1 Jan 2013 00:01, "Dominic Fandrey"  wrote:
> > >>>>>
> > >>>>> I have a Tinderbox that I just updated to the current RELENG_9.
> > >>>>> Following the update build times for packages have increased by a
> > >>>>> factor between 5 and 20. I.e. I have packages that used to build in
> > >>>>> 5 minutes and now take an hour.
> > >>>>>
> > >>>>> I'm suspecting the file system ever since I saw that the majority of 
> > >>>>> CPU
> > >>>>> load was caused by ls when I looked at top (more than 2 minutes of CPU
> > >>>>> time were counted that moment). The majority of the time most of the 
> > >>>>> CPU
> > >>>>> load is caused by bsdtar, pkg_add, qmake-qt4, etc. Without exception
> > >>>>> tools that access a lot of files.
> > >>>>>
> > >>>>> The file system on which packages are built is nullfs mounted from
> > >>>>> an async mounted UFS. I turned async off, to no avail.
> > >>>>>
> > >>>>> /usr/src/UPDATING says that there were nullfs optimisations. So I
> > >>>>> think this is where the problem originates. I might hack the 
> > >>>>> tinderbox to
> > >>>>> use 'ln -s' or set it up for NFS to verify this.
> > >>>>
> > >>>> Is your kernel newer than the Jail?  The converse causes problems.
> > >>>
> > >>> I ran makeJail for all jails after updating.
> Did you rebuild your modules together with the new kernel ?
> 
> > >>>
> > >>> I also seem to have similar problems when building in the host-system.
> > >>> The unzip for openjdk-7 has just passed the 11 minutes CPU time mark.
> > >>> On my notebook it takes less than 10 seconds.
> > >>
> > >> Just set WRKOBJDIRPREFIX to a tmpfs on the Tinderbox host system
> > >> and the extract takes less than a second. Originally WRKOBJDIRPREFIX
> > >> also pointed to a nullfs mount.
> > >>
> > >> Afterwards I pointed WRKOBJDIRPREFIX to a UFS file system (without
> > >> nullfs involvement). The entire make extract took 20s.
> > >>
> > >> So still faster by at least factor 30 than running it on a nullfs mount
> > >> (I eventually SIGINTed so I don't know how long it would've run).
> > > 
> > > Start providing some useful debugging information ?
> > 
> > That one might be interesting. It's all system time:
> > 
> > # time -lh make extract
> > ===>  License GPLv2 accepted by the user
> > ===>  Found saved configuration for openjdk-7.9.05_1
> > ===>  Extracting for openjdk-7.9.05_2
> > => SHA256 Checksum OK for openjdk-7u6-fcs-src-b24-09_aug_2012.zip.
> > => SHA256 Checksum OK for apache-ant-1.8.4-bin.zip.
> > ===>   openjdk-7.9.05_2 depends on file: /usr/local/bin/unzip - found
> > ^Ctime: command terminated abnormally
> > 4m29.30s real   3.03s user  4m22.55s sys
> >   5008  maximum resident set size
> >135  average shared memory size
> >   2932  average unshared data size
> >127  average unshared stack size
> >   7772  page reclaims
> >  0  page faults
> >  0  swaps
> > 19  block input operations
> >101  block output operations
> >  0  messages sent
> >  0  messages received
> > 41  signals received
> >   1597  voluntary context switches
> >  16590  involuntary context switches
> 
> Ok, from your mount -v output, are the three nullfs mounts the only
> nullfs mount ever used ?
> 
> Is it only unzip which demostrates the silly behaviour ? Or does it
> happen with any program ? E.g., does ls(1) or sha1 on the nullfs mount
> also slow ?
> 
> Could you try some low-tech profiling on the slow program. For instance,
> you could run ktrace/kdump -R to see which syscalls are slow.
> 
> Most darkly part of your report for me, is that I also use nullfs-backed
> jails both on HEAD and stable/9, with bigger scale, and I do not have
> an issue. I just did
> pooma32% time unzip -q 
> /usr/local/arch/freebsd/distfiles/openjdk-7u6-fcs-src-b24-09_aug_2012.zip
> unzip -q   3.25s user 23.77s system 78% cpu 34.482 total
> over nullfs mount of
> /usr/home on /usr/sfw/local8/opt/pooma32/usr/home (nullfs, local).
> 
> Please try the following patch, which changes nullfs behaviour to be
> non-cached by default. You could turn on the caching with the 'mount -t
> nullfs -o cache from to' mounting command. I am interested if use/non-use
> of -o cache makes a difference for you.

Ping. Any update ?


pgptVju7RyzVW.pgp
Description: PGP signature


Re: zio_done panic on unadulterated FreeBSD Release 9.1

2013-01-09 Thread Konstantin Belousov
On Wed, Jan 09, 2013 at 08:03:38PM +, Po-Li Soong wrote:
> Hi,
> 
> My name is Po-Li Soong. I ran into a crash not long after installing the 9.1 
> release on my home machine. I was performing a test run of file transfer with 
> samba server running on the FreeBSD installation. The transfer rate was about 
> 70-80 MB/sec. The core.txt is attached. If there are other crash dumps 
> needed, please let me know.
> 
> I first discussed this panic with Justin Gibbs, a coworker of mine at Spectra 
> Logic. He referred me to this email address, suggesting that the information 
> should be relevant to you. Thanks for the help.
> 
> Regards,
> 
> Po-Li Soong
> 

> maestoso dumped core - see /var/crash/vmcore.0
> 
> Sat Jan  5 19:53:24 MST 2013
> 
> FreeBSD maestoso 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 
> 09:23:10 UTC 2012 
> r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> panic: page fault
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address = 0xfffb812815d8
> fault code= supervisor read data, page not present
> instruction pointer   = 0x20:0x80b50597
> stack pointer = 0x28:0xff80fa3bc8d0
> frame pointer = 0x28:0xff80fa3bc900
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 0 (zio_write_intr_5)
> trap number   = 12
> panic: page fault
> cpuid = 3
> KDB: stack backtrace:
> #0 0x809208a6 at kdb_backtrace+0x66
> #1 0x808ea8be at panic+0x1ce
> #2 0x80bd8240 at trap_fatal+0x290
> #3 0x80bd857d at trap_pfault+0x1ed
> #4 0x80bd8b9e at trap+0x3ce
> #5 0x80bc315f at calltrap+0x8
> #6 0x80b506f5 at vm_page_free_toq+0x45
> #7 0x80b4f276 at vm_object_page_remove+0x196
> #8 0x80b46b06 at vm_map_delete+0x316
> #9 0x80b46c11 at vm_map_remove+0x51
> #10 0x80b3a70a at uma_large_free+0x3a
> #11 0x808d589a at free+0x5a
> #12 0x8169b4ce at zio_done+0x2ee
> #13 0x81699063 at zio_execute+0xc3
> #14 0x8092cf55 at taskqueue_run_locked+0x85
> #15 0x8092ded6 at taskqueue_thread_loop+0x46
> #16 0x808bb9ef at fork_exit+0x11f
> #17 0x80bc368e at fork_trampoline+0xe
> Uptime: 3h19m34s
> Dumping 571 out of 3561 MB:..3%..12%..23%..31%..42%..51%..62%..73%..82%..93%
> 
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
> /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> 224   pcpu.h: No such file or directory.
>   in pcpu.h
> (kgdb) #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> #1  0x808ea3a1 in kern_reboot (howto=260)
> at /usr/src/sys/kern/kern_shutdown.c:448
> #2  0x808ea897 in panic (fmt=0x1 )
> at /usr/src/sys/kern/kern_shutdown.c:636
> #3  0x80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is not 
> available.
> )
> at /usr/src/sys/amd64/amd64/trap.c:857
> #4  0x80bd857d in trap_pfault (frame=0xff80fa3bc820, usermode=0)
> at /usr/src/sys/amd64/amd64/trap.c:773
> #5  0x80bd8b9e in trap (frame=0xff80fa3bc820)
> at /usr/src/sys/amd64/amd64/trap.c:456
P
> #6  0x80bc315f in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:228
> #7  0x80b50597 in vm_page_remove (m=0xfe00cd733ab0)
> at /usr/src/sys/vm/vm_page.c:975
> #8  0x80b506f5 in vm_page_free_toq (m=0xfe00cd733ab0)
> at /usr/src/sys/vm/vm_page.c:1872
> #9  0x80b4f276 in vm_object_page_remove (object=0x81281580, 
> start=477512, end=477539, options=Variable "options" is not available.
> ) at /usr/src/sys/vm/vm_object.c:1899
> #10 0x80b46b06 in vm_map_delete (map=0xfe0002e8, 
> start=Variable "start" is not available.
> )
> at /usr/src/sys/vm/vm_map.c:2739
> #11 0x80b46c11 in vm_map_remove (map=0xfe0002e8, 
> start=18446743525909626880, end=18446743525909737472)
> at /usr/src/sys/vm/vm_map.c:2871
> #12 0x80b3a70a in uma_large_free (slab=0xfe0

Re: zio_done panic on unadulterated FreeBSD Release 9.1

2013-01-13 Thread Konstantin Belousov
On Fri, Jan 11, 2013 at 03:09:58PM +, Po-Li Soong wrote:
> (kgdb) p/x *(struct vm_object *)0x81281580
> $1 = {mtx = {lock_object = {lo_name = 0x80e54bbd,
>   lo_flags = 0x143, lo_data = 0x0, lo_witness = 0x0},
> mtx_lock = 0xfe0006f44000}, object_list = {
> tqe_next = 0x81281240, tqe_prev = 0x812814a0},
>   shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0x0,
> le_prev = 0x0}, memq = {tqh_first = 0xfe00cfd3f880,
> tqh_last = 0xfe00c9cac398}, root = 0xfe00cd733ab0,
>   size = 0x7ff, generation = 0x1, ref_count = 0x3f8, shadow_count = 0x0,
>   memattr = 0x6, type = 0x4, flags = 0x1000, pg_color = 0x0, pad1 = 0x0,
>   resident_page_count = 0x9b729, backing_object = 0x0,
>   backing_object_offset = 0x0, pager_object_list = {tqe_next = 0x0,
> tqe_prev = 0x0}, rvq = {lh_first = 0xfe00c7dd2140}, cache = 0x0,
>   handle = 0x0, un_pager = {vnp = {vnp_size = 0x0, writemappings = 0x0},
> devp = {devp_pglist = {tqh_first = 0x0, tqh_last = 0x0}, ops = 0x0},
> sgp = {sgp_pglist = {tqh_first = 0x0, tqh_last = 0x0}}, swp = {
>   swp_bcount = 0x0}}, cred = 0x0, charge = 0x0, paging_in_progress = 0x1}
> 
> (kgdb)  p/x *(struct vm_page *)0xfe00cd733ab0
> $2 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfe00c7e7d678}, listq = {
> tqe_next = 0xfe00cd733b28, tqe_prev = 0xfe00cd7331d8},
>   left = 0xfe00c9b31c38, right = 0xfe00cd735c70,
>   object = 0xfffb81281580, pindex = 0x7495a, phys_addr = 0xbe95a000, md = 
> {
> pv_list = {tqh_first = 0x0, tqh_last = 0xfe00cd733af8},
> pat_mode = 0x6}, queue = 0xff, segind = 0x2, hold_count = 0x0,
>   order = 0xd, pool = 0x0, cow = 0x0, wire_count = 0x0, aflags = 0x0,
>   flags = 0x0, oflags = 0x4, act_count = 0x0, busy = 0x0, valid = 0xff,
>   dirty = 0x0}
> 
> (kgdb) list *vm_page_free_toq+0x45
> 0x80b506f5 is in vm_page_free_toq (/usr/src/sys/vm/vm_page.c:1878).
> warning: Source file is more recent than executable.
> 
> 1873
> 1874/*
> 1875 * If fictitious remove object association and
> 1876 * return, otherwise delay object association removal.
> 1877 */
> 1878if ((m->flags & PG_FICTITIOUS) != 0) {
> 1879return;
> 1880}
> 1881
> 1882m->valid = 0;
> (kgdb)
This is strange. Can you disassemble your instance of the
vm_page_free_toq() and show me the assembler listing ? The line
you show has nothing to cause page fault if the m pointer itself
is valid.

> 
> 
> -Original Message-
> From: Konstantin Belousov [mailto:kostik...@gmail.com] 
> Sent: Wednesday, January 09, 2013 4:49 PM
> To: Po-Li Soong
> Cc: sta...@freebsd.org
> Subject: Re: zio_done panic on unadulterated FreeBSD Release 9.1
> 
> On Wed, Jan 09, 2013 at 08:03:38PM +, Po-Li Soong wrote:
> > Hi,
> > 
> > My name is Po-Li Soong. I ran into a crash not long after installing the 
> > 9.1 release on my home machine. I was performing a test run of file 
> > transfer with samba server running on the FreeBSD installation. The 
> > transfer rate was about 70-80 MB/sec. The core.txt is attached. If there 
> > are other crash dumps needed, please let me know.
> > 
> > I first discussed this panic with Justin Gibbs, a coworker of mine at 
> > Spectra Logic. He referred me to this email address, suggesting that the 
> > information should be relevant to you. Thanks for the help.
> > 
> > Regards,
> > 
> > Po-Li Soong
> > 
> 
> > maestoso dumped core - see /var/crash/vmcore.0
> > 
> > Sat Jan  5 19:53:24 MST 2013
> > 
> > FreeBSD maestoso 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 
> > 09:23:10 UTC 2012 
> > r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
> > 
> > panic: page fault
> > 
> > GNU gdb 6.1.1 [FreeBSD]
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and 
> > you are welcome to change it and/or distribute copies of it under certain 
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "amd64-marcel-freebsd"...
> > 
> > Unread portion of the kernel message buffer:
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01
> > fault virtual address   = 0xfffb812815d8
> > fault code  = supervisor read data, page not present
> > instruction point

Re: 9.1-stable crashes while copying data from a NFS mounted directory

2013-01-24 Thread Konstantin Belousov
On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wrote:
> Hi!
> 
> I'm using 9.1 stable svn revision 245605 and I get the panic below if I 
> execute the following commands (as single user):
> 
> # swapon -a
> # dumpon /dev/ada0s3b
> # mount -u /
> # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> # mount -t nfs -o rsize=32768 data:/multimedia /mnt
> # cp /mnt/Movies/test/a.m2ts /tmp
> 
> then the system panics almost immediately. I'll attach the stack trace.
> 
> Note, that I'm using jumbo frames (6144 byte) on a 1Gbit network, maybe 
> that's the cause for the panic, because the bcopy (see 
> stack frame #15) fails.
> 
> Any clues?
I tried a similar operation with the nfs mount of rsize=32768 and mtu
6144, but the machine runs HEAD and em instead of age. I was unable to
reproduce the panic on the copy of the 5GB file from nfs mount.

Show the output of "p *(struct uio *)0xff81b2da95a0" in kgdb.

> 
> Ciao,
> Christian.
> 
> #0  doadump (textdump=0)
> at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265
> 265 if (textdump && textdump_pending) {
> (kgdb) #0  doadump (textdump=0)
> at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265
> #1  0x802a8ba0 in db_dump (dummy=,
> dummy2=, dummy3=,
> dummy4=)
> at /spare/tmp/src-stable9/sys/ddb/db_command.c:538
> #2  0x802a84ce in db_command (last_cmdp=0x808bc5c0,
> cmd_table=, dopager=1)
> at /spare/tmp/src-stable9/sys/ddb/db_command.c:449
> #3  0x802a8720 in db_command_loop ()
> at /spare/tmp/src-stable9/sys/ddb/db_command.c:502
> #4  0x802aa859 in db_trap (type=,
> code=)
> at /spare/tmp/src-stable9/sys/ddb/db_main.c:231
> #5  0x803c4918 in kdb_trap (type=3, code=0, tf=0xff81b2da8a80)
> at /spare/tmp/src-stable9/sys/kern/subr_kdb.c:649
> #6  0x805a02cf in trap (frame=0xff81b2da8a80)
> at /spare/tmp/src-stable9/sys/amd64/amd64/trap.c:579
> #7  0x8058992f in calltrap ()
> at /spare/tmp/src-stable9/sys/amd64/amd64/exception.S:228
> #8  0x803c43cb in kdb_enter (why=0x806145f3 "panic",
> msg=0x80 ) at cpufunc.h:63
> #9  0x8038f407 in panic (fmt=)
> at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:627
> #10 0x80568049 in vm_fault_hold (map=0xfe000200,
> vaddr=18446743530148802560, fault_type=2 '\002', fault_flags=0,
> m_hold=0x0) at /spare/tmp/src-stable9/sys/vm/vm_fault.c:285
> #11 0x80568753 in vm_fault (map=0xfe000200,
> vaddr=18446743530148802560, fault_type=,
> fault_flags=0) at /spare/tmp/src-stable9/sys/vm/vm_fault.c:229
> #12 0x805a00c7 in trap_pfault (frame=0xff81b2da9170, usermode=0)
> at /spare/tmp/src-stable9/sys/amd64/amd64/trap.c:771
> #13 0x805a051e in trap (frame=0xff81b2da9170)
> at /spare/tmp/src-stable9/sys/amd64/amd64/trap.c:463
> #14 0x8058992f in calltrap ()
> at /spare/tmp/src-stable9/sys/amd64/amd64/exception.S:228
> #15 0x8059d7b5 in bcopy ()
> at /spare/tmp/src-stable9/sys/amd64/amd64/support.S:134
> #16 0x81c5963b in nfsm_mbufuio (nd=0xff81b2da9320,
> uiop=, siz=32768)
> at 
> /spare/tmp/src-stable9/sys/modules/nfscommon/../../fs/nfs/nfs_commonsubs.c:212
> #17 0x81c19571 in nfsrpc_read (vp=0xfe0005ca2000,
> uiop=0xff81b2da95a0, cred=,
> p=0xfe0005f28000, nap=0xff81b2da9480,
> attrflagp=0xff81b2da954c, stuff=0x0)
> at 
> /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_clrpcops.c:1343
> #18 0x81c3aff0 in ncl_readrpc (vp=0xfe0005ca2000,
> uiop=0xff81b2da95a0, cred=)
> at 
> /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_clvnops.c:1366
> #19 0x81c2fed3 in ncl_doio (vp=0xfe0005ca2000,
> bp=0xff816fabca20, cr=0xfe0002d59e00, td=0xfe0005f28000,
> called_from_strategy=0)
> at 
> /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_clbio.c:1605
> #20 0x81c32aaf in ncl_bioread (vp=0xfe0005ca2000,
> uio=0xff81b2da9ad0, ioflag=,
> cred=0xfe0002d59e00)
> at 
> /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_clbio.c:541
> #21 0x804379c3 in vn_read (fp=0xfe0005f3e960,
> uio=0xff81b2da9ad0, active_cred=,
> flags=, td=) at vnode_if.h:384
> #22 0x80434d40 in vn_io_fault (fp=0xfe0005f3e960,
> uio=0xff81b2da9ad0, active_cred=0xfe0002d59e00, flags=0,
> td=0xfe0005f28000) at /spare/tmp/src-stable9/sys/kern/vfs_vnops.c:903
> #23 0x803d7bd1 in dofileread (td=0xfe0005f28000, fd=3,
> fp=0xfe0005f3e960, auio=0xff81b2da9ad0,
> offset=, flags=0) at file.h:287
> #24 0x803d7f7c in kern_readv (td=0xfe0005f28000, fd=3,
> auio=0xff81b2da9ad0)
> at /spare/tmp/src-stable9/sys/kern/sys_generic.c:250
> #25 0x803d8074 in sys_read (td=,
> uap=)
> at /spare/tmp/

Re: 9.1-stable crashes while copying data from a NFS mounted directory

2013-01-24 Thread Konstantin Belousov
On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov wrote:
> On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wrote:
> > Hi!
> > 
> > I'm using 9.1 stable svn revision 245605 and I get the panic below if I 
> > execute the following commands (as single user):
> > 
> > # swapon -a
> > # dumpon /dev/ada0s3b
> > # mount -u /
> > # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> > # mount -t nfs -o rsize=32768 data:/multimedia /mnt
> > # cp /mnt/Movies/test/a.m2ts /tmp
> > 
> > then the system panics almost immediately. I'll attach the stack trace.
> > 
> > Note, that I'm using jumbo frames (6144 byte) on a 1Gbit network, maybe 
> > that's the cause for the panic, because the bcopy (see 
> > stack frame #15) fails.
> > 
> > Any clues?
> I tried a similar operation with the nfs mount of rsize=32768 and mtu
> 6144, but the machine runs HEAD and em instead of age. I was unable to
> reproduce the panic on the copy of the 5GB file from nfs mount.
> 
> Show the output of "p *(struct uio *)0xff81b2da95a0" in kgdb.

And the output of "p *(struct buf *)0xff816fabca20".


pgp7FpCXEXXAM.pgp
Description: PGP signature


Re: 9.1-stable crashes while copying data from a NFS mounted directory

2013-01-24 Thread Konstantin Belousov
On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian Gusenbauer wrote:
> On Thursday 24 January 2013 19:07:23 Konstantin Belousov wrote:
> > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov wrote:
> > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wrote:
> > > > Hi!
> > > > 
> > > > I'm using 9.1 stable svn revision 245605 and I get the panic below if I
> > > > execute the following commands (as single user):
> > > > 
> > > > # swapon -a
> > > > # dumpon /dev/ada0s3b
> > > > # mount -u /
> > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> > > > # mount -t nfs -o rsize=32768 data:/multimedia /mnt
> > > > # cp /mnt/Movies/test/a.m2ts /tmp
> > > > 
> > > > then the system panics almost immediately. I'll attach the stack trace.
> > > > 
> > > > Note, that I'm using jumbo frames (6144 byte) on a 1Gbit network, maybe
> > > > that's the cause for the panic, because the bcopy (see stack frame
> > > > #15) fails.
> > > > 
> > > > Any clues?
> > > 
> > > I tried a similar operation with the nfs mount of rsize=32768 and mtu
> > > 6144, but the machine runs HEAD and em instead of age. I was unable to
> > > reproduce the panic on the copy of the 5GB file from nfs mount.
> 
> Hmmm, I did a quick test. If I do not change the MTU, so just configuring 
> age0 
> with
> 
> # ifconfig age0 inet 192.168.2.2 up
> 
> then I can copy all files from the mounted directory without any problems, 
> too. So it's probably age0 related?
From your backtrace and the buffer printout, I see somewhat strange thing.
The buffer data address is 0xff8171418000, while kernel faulted
at the attempt to write at 0xff8171413000, which is is lower then
the buffer data pointer, at the attempt to bcopy to the buffer.

The other data suggests that there were no overflow of the data from the
server response. So it might be that mbuf_len(mp) returned negative number ?
I am not sure is it possible at all.

Try this debugging patch, please. You need to add INVARIANTS etc to the
kernel config.

diff --git a/sys/fs/nfs/nfs_commonsubs.c b/sys/fs/nfs/nfs_commonsubs.c
index efc0786..9a6bda5 100644
--- a/sys/fs/nfs/nfs_commonsubs.c
+++ b/sys/fs/nfs/nfs_commonsubs.c
@@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio *uiop, 
int siz)
}
mbufcp = NFSMTOD(mp, caddr_t);
len = mbuf_len(mp);
+   KASSERT(len > 0, ("len %d", len));
}
xfer = (left > len) ? len : left;
 #ifdef notdef
@@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio *uiop, 
int siz)
uiop->uio_resid -= xfer;
}
if (uiop->uio_iov->iov_len <= siz) {
+   KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d",
+   uiop->uio_iovcnt));
uiop->uio_iovcnt--;
uiop->uio_iov++;
} else {

I thought that server have returned too long response, but it seems to
be not the case from your data. Still, I think the patch below might be
due.

diff --git a/sys/fs/nfsclient/nfs_clrpcops.c b/sys/fs/nfsclient/nfs_clrpcops.c
index be0476a..a89b907 100644
--- a/sys/fs/nfsclient/nfs_clrpcops.c
+++ b/sys/fs/nfsclient/nfs_clrpcops.c
@@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio *uiop, struct ucred 
*cred,
NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED);
eof = fxdr_unsigned(int, *tl);
}
-   NFSM_STRSIZ(retlen, rsize);
+   NFSM_STRSIZ(retlen, len);
error = nfsm_mbufuio(nd, uiop, retlen);
if (error)
goto nfsmout;


pgpIbrH5zjxPK.pgp
Description: PGP signature


Re: 9-STABLE -> NFS -> NetAPP:

2013-02-13 Thread Konstantin Belousov
On Tue, Feb 12, 2013 at 08:50:39PM -0500, Rick Macklem wrote:
> Marc Fournier wrote:
> > Just reset server, so any further details will have to be 'next time'
> > ??? but, just did a csup and am rebuilding ??? the following three files
> > were modified since last build:
> > 
> > grep nfs /tmp/output
> > Edit src/sys/fs/nfs/nfs_commonsubs.c
> > Edit src/sys/fs/nfsclient/nfs_clrpcops.c
> > Edit src/sys/fs/nfsserver/nfs_nfsdserv.c
> > 
> > 
> > On 2013-02-10, at 4:56 PM, Marc Fournier  wrote:
> > 
> > >
> > > On 2013-02-10, at 4:31 PM, Rick Macklem 
> > > wrote:
> > >
> > >> Marc Fournier wrote:
> > >>> Hi John ???
> > >>>
> > >>> Does this help?
> > >>>
> > >>> root@io:~ # ps auxl | grep du
> > >>> root 1054 0.0 0.1 16176 6600 ?? D 3:15AM 0:05.38 du -skx /vm/2799
> > >>> 0
> > >>> 81426 0 20 0 newnfs
> > >>> root 12353 0.0 0.1 16176 5104 ?? D Sat03AM 0:05.41 du -skx
> > >>> /vm/2799 0
> > >>> 91597 0 20 0 newnfs
> > >>> root 64529 0.0 0.1 16176 5164 ?? D Fri03AM 0:05.40 du -skx
> > >>> /vm/2799 0
> > >>> 43227 0 20 0 newnfs
> > >>> root 12855 0.0 0.0 16308 1988 0 S+ 5:26AM 0:00.00 grep du 0 12847
> > >>> 0 20
> > >>> 0 piperd
> > >> It is probably too late, but all the lines (without the | grep du)
> > >> would be
> > >> more useful. I also include the "H" flag, so it lists threads as
> > >> well as
> > >> processes. The above just says the "du" command is waiting for a
> > >> vnode lock.
> > >> The interesting process/thread is the one that is holding a vnode
> > >> lock
> > >> while waiting for something else.
> > >
> > > As requested, 'ps auxlH' attached ???
> > >
> > >
> > > 
> > >
> Well, I took a look at the ps output and I didn't see anything that would
> identify what the hang is. There are a lot of processes sleeping on "newnfs"
> (waiting for a vnode lock) and many sleeping on "vofflock" (waiting for the
>  f_offset lock).
I never got any attachments on the thread.

See
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
for the description of what is needed to start debugging.
> 
> Unfortunately, I can't spot any process/thread that is blocked on something
> else, where it would seem likely to be holding either an nfs vnode lock or
> f_offset lock that isn't one of these.
> 
> There were changes about 5 months ago which it appears fixed a deadlock race
> between vnode locks and offset locks for paging (r236321 and friends).
No, I do not think that the description of the changes is right.

> 
> I am wondering if there could be other similar races, possibly specific to
> paging in over NFS? (I can't see any case where there is a LOR, so I can't
> think of what it might be?)
> 
> If you just want the hangs to go away, I'd suggest moving the executable
> is /usr/local/sbin (httpd maybe) to a local file system on the server,
> since it does seem to be related to paging this executable in over NFS.
> 
> rick
> ps: I've added kib@ to the cc, in case he is aware of other related races?
> 
> > >>
> > >> Are you still getting the:
> > >> nfs_getpages: error 13
> > >> vm_fault: pager read error, pid 11355 (https)
> > >
> > > Fairly quiet:
> > >
> > > 
> > >
> > > And that is it since last reboot ~20 days ago ???
> > >
> > >>
> > >> messages logged?
> > >>
> > >> With John's recent patch, the error# would no longer be 13 if it
> > >> was
> > >> caused by the "intr" flag resulting in a Read RPC terminating with
> > >> EINTR.
> > >> If you are still getting the above with "error 13", it suggests
> > >> that
> > >> the server is replying EACCES for the Read RPC.
> > >> I suggested before that you check to make sure that the executable
> > >> had
> > >> read access for everyone one the file server. Since I didn't hear
> > >> back,
> > >> I'll assume this is the case.
> > >
> > > Don't understand this question ??? I have 34 VPSs running off of this
> > > server right now ??? that 'du process' runs against each of those VPSs
> > > every night, and this problem started happening on Friday night's
> > > run ??? ~18 days into uptime ??? so the same process has run repeatedly,
> > > with no issues, 18 times before it hung on Friday ??? also, the hang,
> > > once 'triggered', only seems to recur against the same directory ???
> > > the same directory doesn't necessarily trigger it, but once it
> > > starts, it appears to do it for the same directory ??? I'm not sure if
> > > I've ever seem it happening to two different directories at the same
> > > time ???
> > >
> > > Also, please note that the du command is run from the physical
> > > server, as root ???
> > >
> > >> rick
> > >> ps: If it is still up and hasn't been rebooted, you could:
> > >>   sysctl debug.kdb.break_to_debugger=1
> > >>   - then type  at the console and do the following
> > >> from the debugger
> > >>   
> > >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> > >>   How well this work depends on what options your kernel was built
> > >>   with.
> > >
> 

Re: 9-STABLE -> NFS -> NetAPP:

2013-02-13 Thread Konstantin Belousov
On Wed, Feb 13, 2013 at 05:50:13PM -0500, Rick Macklem wrote:
> I got it resent from him. I've attached it to this post, just in case you
> are interested in taking a look at it.

I do not see the voffset wchains surprising. All of them seems to occur
in the multithreading process.  The usual reason for the voffset blocking
is the use of the same file (as in struct file *) to perform operations
from several threads in parallel.  One thread locked the file offset by
using read() or write(), and sleeping waiting for the vnode locked.
All other threads performing read or write on the same file, e.g. by
using the same file descriptor, are locked on the file offset before
even trying to lock the vnode.

What I see interesting in the output you mailed, is the pid 93636. Note
that several its threads are in the 'T' state. It means stopped, while
other threads obviously do file i/o due to vofflock state. I wonder if
some stopped thread owns nfs vnode lock. It could be some omission in the
handling of PBDRY/TDF_BDRY, or other bug.

It is absolutely impossible to say anything definitive without proper
diagnostic.  At least the procstat -kk is needed.


pgp7HfKTNksqm.pgp
Description: PGP signature


Re: 9-STABLE -> NFS -> NetAPP:

2013-02-15 Thread Konstantin Belousov
On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote:
> On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote:
> > Marc Fournier wrote:
> > > On 2013-02-13, at 3:54 PM, Rick Macklem  wrote:
> > > 
> > > >>
> > > > The pid that is in "T" state for the "ps auxlH".
> > > 
> > > Different server, last kernel update on Jan 22nd, https process this
> > > time instead of du last time.
> > > 
> > > I've attached:
> > > 
> > > ps auxlH
> > > ps auxlH of just the processes that are in TJ state (6 httpd servers)
> > > procstat output for each of the 6 process
> > > 
> > > 
> > > 
> > > 
> > > They are included as attachments ??? if these don't make it through, let
> > > me know, just figured I'd try and keep it compact ...
> > Well, I've looked at this call path a little closer:
> > 16693 104135 httpd-mi_switch+0x186 
> thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
> >   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 
> clnt_reconnect_call+0xfb newnfs_request+0xadb
> >   nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56 
> nfs_access+0x306 vn_open_cred+0x5a8
> >   kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 
> > 
> > I am probably way off, since I am not familiar with this stuff, but it
> > seems to me that thread_suspend_check() should just return 0 for the
> > case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set)
> > instead of sitting in the loop and doing a mi_switch(). I'm not even
> > sure if it should call thread_suspend_check() for this case, but there
> > are cases in thread_suspend_check() that I don't understand.
> > 
> > Although I don't really understand thread_suspend_check(), I've attached
> > a simple patch that might be a starting point for fixing this?
> > 
> > I wouldn't recommend trying the patch until kib and/or jhb weigh in
> > on whether it makes any sense.
> 
> I think this is the right idea, but in HEAD with the sigdeferstop() changes 
> it 
> should just check for TDF_SBDRY instead of adding a new parameter.  I think
> checking for TDF_SBDRY will work even in 9 (and will make the patch smaller). 
>  
> Also, I think this is only needed for stop signals.  Other suspend requests 
> will eventually resume the thread, it is only stop signals that can cause the 
> thread to get stuck indefinitely (since it depends on the user sending 
> SIGCONT).
> 
> Marc, are you using SIGSTOP?
> 
> Index: kern_thread.c
> ===
> --- kern_thread.c (revision 246122)
> +++ kern_thread.c (working copy)
> @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead)
>   return (ERESTART);
>  
>   /*
> +  * Ignore suspend requests for stop signals if they
> +  * are deferred.
> +  */
> + if (P_SHOULDSTOP(p) == P_STOPPED_SIG &&
> + td->td_flags & TDF_SBDRY) {
> + KASSERT(return_instead,
> + ("TDF_SBDRY set for unsafe thread_suspend_check"));
> + return (0);
> + }
> +
> + /*
>* If the process is waiting for us to exit,
>* this thread should just suicide.
>* Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.

This looks correct.


pgpwJnJsA6DUs.pgp
Description: PGP signature


Re: IPMI serial console

2013-02-21 Thread Konstantin Belousov
On Thu, Feb 21, 2013 at 02:07:57PM -0800, Navdeep Parhar wrote:
> On 02/21/13 13:56, Daniel O'Connor wrote:
> > 
> > On 22/02/2013, at 2:19, John Baldwin  wrote:
> >>> Does anyone have any hints?
> >>
> >> Rather than using all these hints, just use these three in loader.conf:
> >>
> >> console="comconsole vidconsole"
> >> console_speed=115200
> >> console_port="0x"  (where  is the correct I/O port for COM3, 
> >> 0x3e8 
> >> maybe?)
> > 
> > 
> > No dice :(
> > 
> > I also tried booting with '-D -h -S 115200' but nothing either.
> 
> What does "dmesg | grep uart" show?  I have a PCI serial card whose
> serial port I'm using as a console.  I had to setup comconsole_pcidev,
> comconsole_port, and comconsole_speed properly in loader.conf to get it
> to work.

Do you need the comconsole_port if comconsole_pcidev is set properly ?
comconsole_port should be set automatically (i.e., read from the BAR)
if _pcidev is correct.


pgpECUtKTLDEp.pgp
Description: PGP signature


Re: IPMI serial console

2013-02-21 Thread Konstantin Belousov
On Fri, Feb 22, 2013 at 09:18:51AM +1030, Daniel O'Connor wrote:
> 
> On 22/02/2013, at 9:15, Navdeep Parhar  wrote:
> >> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 on acpi0
> >> uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
> >> uart2: <16550 or compatible> port 0x3e8-0x3ef irq 5 flags 0x30 on acpi0
> >> 
> >> The loader talks on the serial console fine, it's the kernel that doesn't 
> >> use it which is the problem.

It might be not the serial port, to which the loader talks. The supermicro
boards I dealt with, have a feature of VGA text mode redirection to the
serial port. This is how bios redirection usually works.

You could look at some bios knob which controls the point where the
said redirection is stopped. It should be like 'after the OS takes
the control', and not 'forever'. For BIOS, the loader is OS.


pgpj9EA_ZQkyx.pgp
Description: PGP signature


Re: gdb broken on 9.1/amd64?

2013-03-06 Thread Konstantin Belousov
On Wed, Mar 06, 2013 at 07:02:22PM +0100, Jeremie Le Hen wrote:
> root@ingwe:~ # gdb -p 521

Try to specify the executable binary on the command line.


pgpzloJMK22HB.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-19 Thread Konstantin Belousov
On Tue, Mar 19, 2013 at 07:45:56PM +0200, Andriy Gapon wrote:
> on 19/03/2013 19:35 Jeremy Chadwick said the following:
> > On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek wrote:
> [snip]
> >> Unread portion of the kernel message buffer:
> >> Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock
> >> KDB: stack backtrace of thread 100256:
> >> #0 0x808f2d46 at mi_switch+0x186
> >> #1 0x8092bb52 at sleepq_wait+0x42
> >> #2 0x808f34d6 at _sleep+0x376
> >> #3 0x80b4f3ae at vm_object_page_remove+0x2ce
> >> #4 0x80b5ac7d at vnode_pager_setsize+0x17d
> >> #5 0x8082102c at nfscl_loadattrcache+0x2cc
> >> #6 0x80818d37 at nfs_getattr+0x287
> >> #7 0x8098f1c0 at vn_stat+0xb0
> >> #8 0x809869d9 at kern_statat_vnhook+0xf9
> >> #9 0x80986b55 at kern_statat+0x15
> >> #10 0x80986c1a at sys_lstat+0x2a
> >> #11 0x80bd7ae6 at amd64_syscall+0x546
> >> #12 0x80bc3447 at Xfast_syscall+0xf7
> >> panic: sleeping thread
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> #0 0x809208a6 at kdb_backtrace+0x66
> >> #1 0x808ea8be at panic+0x1ce
> >> #2 0x8092ed22 at propagate_priority+0x1d2
> >> #3 0x8092fa4e at turnstile_wait+0x1be
> >> #4 0x808d8d48 at _mtx_lock_sleep+0xd8
> >> #5 0x80820fa4 at nfscl_loadattrcache+0x244
> >> #6 0x8081758c at ncl_readrpc+0xac
> >> #7 0x80824c45 at ncl_getpages+0x485
> >> #8 0x80b5aa0c at vnode_pager_getpages+0x9c
> >> #9 0x80b3fc93 at vm_fault_hold+0x673
> >> #10 0x80b41cc3 at vm_fault+0x73
> >> #11 0x80bd84b4 at trap_pfault+0x124
> >> #12 0x80bd8c6c at trap+0x49c
> >> #13 0x80bc315f at calltrap+0x8
> [snip]
> 
> I think that the regular mutex which is acquired via NFSLOCKNODE() in
> nfscl_loadattrcache() can not be held across vnode_pager_setsize.
> I am not sure though when vap->va_size != np->n_size case is triggered.

When the file is modified on the server outside of the control of
the client ? E.g., by direct access on the server, or from the other
client.

The only possible solution is to move the vnode_pager_setsize() outside
the scope of the n_mtx. This is somewhat problematic because the nfsiod
threads never bother to lock the vnode, so the truncation of the vm
cache becomes racy. Still, this is probably the best cure.

Another issue I see there is that vnode_pager_setsize() call is only
performed for the VREG nodes. I believe that it is possible to cache
the pages for the directories as well.

Would you work out the patch ?


pgpSjw8_XI0By.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-20 Thread Konstantin Belousov
On Tue, Mar 19, 2013 at 07:37:43PM -0400, Rick Macklem wrote:
> Andriy Gapon wrote:
> > on 19/03/2013 19:35 Jeremy Chadwick said the following:
> > > On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek
> > > wrote:
> > [snip]
> > >> Unread portion of the kernel message buffer:
> > >> Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock
> > >> KDB: stack backtrace of thread 100256:
> > >> #0 0x808f2d46 at mi_switch+0x186
> > >> #1 0x8092bb52 at sleepq_wait+0x42
> > >> #2 0x808f34d6 at _sleep+0x376
> > >> #3 0x80b4f3ae at vm_object_page_remove+0x2ce
> > >> #4 0x80b5ac7d at vnode_pager_setsize+0x17d
> > >> #5 0x8082102c at nfscl_loadattrcache+0x2cc
> > >> #6 0x80818d37 at nfs_getattr+0x287
> > >> #7 0x8098f1c0 at vn_stat+0xb0
> > >> #8 0x809869d9 at kern_statat_vnhook+0xf9
> > >> #9 0x80986b55 at kern_statat+0x15
> > >> #10 0x80986c1a at sys_lstat+0x2a
> > >> #11 0x80bd7ae6 at amd64_syscall+0x546
> > >> #12 0x80bc3447 at Xfast_syscall+0xf7
> > >> panic: sleeping thread
> > >> cpuid = 0
> > >> KDB: stack backtrace:
> > >> #0 0x809208a6 at kdb_backtrace+0x66
> > >> #1 0x808ea8be at panic+0x1ce
> > >> #2 0x8092ed22 at propagate_priority+0x1d2
> > >> #3 0x8092fa4e at turnstile_wait+0x1be
> > >> #4 0x808d8d48 at _mtx_lock_sleep+0xd8
> > >> #5 0x80820fa4 at nfscl_loadattrcache+0x244
> > >> #6 0x8081758c at ncl_readrpc+0xac
> > >> #7 0x80824c45 at ncl_getpages+0x485
> > >> #8 0x80b5aa0c at vnode_pager_getpages+0x9c
> > >> #9 0x80b3fc93 at vm_fault_hold+0x673
> > >> #10 0x80b41cc3 at vm_fault+0x73
> > >> #11 0x80bd84b4 at trap_pfault+0x124
> > >> #12 0x80bd8c6c at trap+0x49c
> > >> #13 0x80bc315f at calltrap+0x8
> > [snip]
> > 
> > I think that the regular mutex which is acquired via NFSLOCKNODE() in
> > nfscl_loadattrcache() can not be held across vnode_pager_setsize.
> > I am not sure though when vap->va_size != np->n_size case is
> > triggered.
> > 
> Yep, I'd agree to that. The same bug is in the old NFS client and
> the new NFS client cribbed the code from there.
> 
> I have attached a simple patch that unlocks the mutex for the
> vnode_pager_setsize() call. Maybe you could test it?
> 
> Thanks for reporting this, rick
> ps: Hopefully "patch" can apply this patch (there have been
> recent changes to this file, so the line#s could be off).
> It should be easy to do manually if not. The change is
> in nfscl_loadattrcache() in sys/fs/nfsclient/nfs_clport.c.
> 
> 
> > > You're going to need to provide the following details:
> > >
> > > 1. Contents of /etc/rc.conf
> > > 2. Contents of /etc/sysctl.conf (if modified)
> > > 3. Contents of /etc/fstab
> > > 4. ifconfig -a
> > > 5. OS used by the NFS server, and all configuration details
> > > pertaining
> > > to that system
> > >
> > > You may also be asked to upgrade to 9.1-STABLE, as there may be
> > > fixes
> > > for whatever this is in base/stable/9 that are not in -RELEASE, but
> > > this
> > > is speculative on my part.
> > >
> > I do not see a need for any of these.
> > 
> > --
> > Andriy Gapon
> > ___
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
> > "freebsd-stable-unsubscr...@freebsd.org"

> --- fs/nfsclient/nfs_clport.c.savit   2013-03-19 18:37:33.0 -0400
> +++ fs/nfsclient/nfs_clport.c 2013-03-19 18:44:21.0 -0400
> @@ -444,7 +444,9 @@ nfscl_loadattrcache(struct vnode **vpp, 
>   np->n_size = vap->va_size;
>   np->n_flag |= NSIZECHANGED;
>   }
> + NFSUNLOCKNODE(np);
>   vnode_pager_setsize(vp, np->n_size);
> + NFSLOCKNODE(np);
>   } else {
>   np->n_size = vap->va_size;
>   }

I do not like it. As I said in the previous response to Andrey,
I think that moving the vnode_pager_setsize() after the unlock is
better, since it reduces races with other thread seeing half-done
attribute update or making attribute change simultaneously.


pgpZb2TvHmqTm.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-20 Thread Konstantin Belousov
On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote:
> 
> On Mar 20, 2013, at 10:49 AM, Konstantin Belousov  wrote:
> > 
> > I do not like it. As I said in the previous response to Andrey,
> > I think that moving the vnode_pager_setsize() after the unlock is
> > better, since it reduces races with other thread seeing half-done
> > attribute update or making attribute change simultaneously.
> 
> OK - so should I wait for another patch - or? 

I think the following is what I mean. As an additional note, why nfs
client does not trim the buffers when server reported node size change ?

diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
index a07a67f..4fe2e35 100644
--- a/sys/fs/nfsclient/nfs_clport.c
+++ b/sys/fs/nfsclient/nfs_clport.c
@@ -361,6 +361,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
*nap, void *nvaper,
struct nfsnode *np;
struct nfsmount *nmp;
struct timespec mtime_save;
+   u_quad_t nsize;
+   int setnsize;
 
/*
 * If v_type == VNON it is a new node, so fill in the v_type,
@@ -418,6 +420,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
*nap, void *nvaper,
} else
vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0];
np->n_attrstamp = time_second;
+   setnsize = 0;
if (vap->va_size != np->n_size) {
if (vap->va_type == VREG) {
if (dontshrink && vap->va_size < np->n_size) {
@@ -444,10 +447,13 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
*nap, void *nvaper,
np->n_size = vap->va_size;
np->n_flag |= NSIZECHANGED;
}
-   vnode_pager_setsize(vp, np->n_size);
} else {
np->n_size = vap->va_size;
}
+   if (vap->va_type == VREG || vap->va_type == VDIR) {
+   setnsize = 1;
+   nsize = vap->va_size;
+   }
}
/*
 * The following checks are added to prevent a race between (say)
@@ -480,6 +486,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
*nap, void *nvaper,
KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, vap, 0);
 #endif
NFSUNLOCKNODE(np);
+   if (setnsize)
+   vnode_pager_setsize(vp, nsize);
return (0);
 }
 


pgpa8lhv_88qt.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-20 Thread Konstantin Belousov
On Wed, Mar 20, 2013 at 11:37:56AM -0400, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek
> > wrote:
> > >
> > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov
> > >  wrote:
> > > >
> > > > I do not like it. As I said in the previous response to Andrey,
> > > > I think that moving the vnode_pager_setsize() after the unlock is
> > > > better, since it reduces races with other thread seeing half-done
> > > > attribute update or making attribute change simultaneously.
> > >
> > > OK - so should I wait for another patch - or?
> > 
> > I think the following is what I mean. As an additional note, why nfs
> > client does not trim the buffers when server reported node size change
> > ?
> > 
> > diff --git a/sys/fs/nfsclient/nfs_clport.c
> > b/sys/fs/nfsclient/nfs_clport.c
> > index a07a67f..4fe2e35 100644
> > --- a/sys/fs/nfsclient/nfs_clport.c
> > +++ b/sys/fs/nfsclient/nfs_clport.c
> > @@ -361,6 +361,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct
> > nfsvattr *nap, void *nvaper,
> > struct nfsnode *np;
> > struct nfsmount *nmp;
> > struct timespec mtime_save;
> > + u_quad_t nsize;
> > + int setnsize;
> > 
> > /*
> > * If v_type == VNON it is a new node, so fill in the v_type,
> > @@ -418,6 +420,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct
> > nfsvattr *nap, void *nvaper,
> > } else
> > vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0];
> > np->n_attrstamp = time_second;
> > + setnsize = 0;
> > if (vap->va_size != np->n_size) {
> > if (vap->va_type == VREG) {
> > if (dontshrink && vap->va_size < np->n_size) {
> > @@ -444,10 +447,13 @@ nfscl_loadattrcache(struct vnode **vpp, struct
> > nfsvattr *nap, void *nvaper,
> > np->n_size = vap->va_size;
> > np->n_flag |= NSIZECHANGED;
> > }
> > - vnode_pager_setsize(vp, np->n_size);
> > } else {
> > np->n_size = vap->va_size;
> > }
> > + if (vap->va_type == VREG || vap->va_type == VDIR) {
> > + setnsize = 1;
> > + nsize = vap->va_size;
> I might have used np->n_size here, since that is what is given
> as the argument for the pre-patched version, but since
> np->n_size should equal vap->va_size (it is set the same for
> all cases in the code at this point), it doesn't really matter.
> 
> I have no idea what the implications of doing vnode_pager_setsize()
> for VDIR is, but Kostik would be much more conversant that I on this,
> so if he thinks it's ok, that's fine with me.
> 
> > + }
> > }
> > /*
> > * The following checks are added to prevent a race between (say)
> > @@ -480,6 +486,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct
> > nfsvattr *nap, void *nvaper,
> > KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, vap, 0);
> > #endif
> > NFSUNLOCKNODE(np);
> > + if (setnsize)
> > + vnode_pager_setsize(vp, nsize);
> > return (0);
> > }
> Yes, I think Kostik's version of the patch is good. I had thought
> of doing it that way, but want for the "minimal change" version.
> I agree that avoiding unlocking/relocking the mutex is a good idea,
> although I didn't see anything after the relock that I thought
> might be an issue if something changed while unlocked.
If the parallel calls to nfscl_loadattrcache() are possible, then
IMHO at least the n_attrstamp could be cleared needlessly.

> 
> Kostik, thanks for posting this version, rick
> ps: Michael, I'd suggest you try this patch instead of mine.
Still, my patch has the issue I noted for the head as well: the buffers
are not destroyed if the size of the vnode is decreased. I would be
inclined to suggest the following change on top of my patch, but I am
sure that it does not work, since vnode is generally not locked in
the nfs_loadattrcache(), I think:

diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
index 4fe2e35..3a08424 100644
--- a/sys/fs/nfsclient/nfs_clport.c
+++ b/sys/fs/nfsclient/nfs_clport.c
@@ -487,7 +487,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
*nap, void *nvaper,
 #endif
NFSUNLOCKNODE(np);
if (setnsize)
-   vnode_pager_setsize(vp, nsize);
+   vtruncbuf(vp, NOCRED, nsize, vp->v_bufobj.bo_bsize);
return (0);
 }
 


pgpXjtJ_eVr_v.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-20 Thread Konstantin Belousov
On Wed, Mar 20, 2013 at 09:43:20AM -0400, John Baldwin wrote:
> On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote:
> > On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote:
> > > 
> > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov  
> wrote:
> > > > 
> > > > I do not like it. As I said in the previous response to Andrey,
> > > > I think that moving the vnode_pager_setsize() after the unlock is
> > > > better, since it reduces races with other thread seeing half-done
> > > > attribute update or making attribute change simultaneously.
> > > 
> > > OK - so should I wait for another patch - or? 
> > 
> > I think the following is what I mean. As an additional note, why nfs
> > client does not trim the buffers when server reported node size change ?
> 
> Will changing the size always result in an mtime change forcing the client to
> throw away the data on the next read or fault anyway (or does it only affect
> ctime)?

UFS only modifies ctime on truncation, it seems.


pgpOCtPQobYQ4.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-20 Thread Konstantin Belousov
On Wed, Mar 20, 2013 at 08:58:08PM +0200, Konstantin Belousov wrote:
> On Wed, Mar 20, 2013 at 09:43:20AM -0400, John Baldwin wrote:
> > On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote:
> > > On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote:
> > > > 
> > > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov  
> > wrote:
> > > > > 
> > > > > I do not like it. As I said in the previous response to Andrey,
> > > > > I think that moving the vnode_pager_setsize() after the unlock is
> > > > > better, since it reduces races with other thread seeing half-done
> > > > > attribute update or making attribute change simultaneously.
> > > > 
> > > > OK - so should I wait for another patch - or? 
> > > 
> > > I think the following is what I mean. As an additional note, why nfs
> > > client does not trim the buffers when server reported node size change ?
> > 
> > Will changing the size always result in an mtime change forcing the client 
> > to
> > throw away the data on the next read or fault anyway (or does it only affect
> > ctime)?
> 
> UFS only modifies ctime on truncation, it seems.

No, I was wrong. ffs_truncate() indeed only sets both IN_CHANGE | IN_UPDATE
flags for the inode, and IN_UPDATE causes mtime update in ufs_itimes(),
called from UFS_UPDATE().



pgp1vaxm89T7D.pgp
Description: PGP signature


Re: Core Dump / panic sleeping thread

2013-03-21 Thread Konstantin Belousov
On Wed, Mar 20, 2013 at 09:14:37PM -0400, Rick Macklem wrote:
> Well, read/write sharing of files over NFS is pretty rare, so I suspect
> a truncation of a file by another client (or locally in the NFS server)
> is a rare event. As such, not invalidating the buffers here doesn't seem
> like a big issue? (The client uses np->n_size to determine EOF.)
> 
> Also, I think close-to-open consistency will typically throw away the
> buffers on the next open when it sees the mtime changed. (Yes, there
> won't necessarily be another open, but...)
nfs buffers are VMIO. Each VMIO buffer wires the pages it references.
Wired pages cannot be freed by vnode_pager_setsize() if the file is
truncated.


pgpOJM_BO3RwF.pgp
Description: PGP signature


  1   2   3   4   >