Re: sysctl add macros

2013-11-25 Thread Matthew Fleming
On Mon, Nov 25, 2013 at 3:35 AM, Venkata Duvvuru 
venkatkumar.duvv...@emulex.com wrote:

 Hi,
 I'm unable to figure out how to add an unsigned short or an unsigned char
 values to a sysctl node.
 SYSCTL_ADD_INT, SYSCTL_ADD_UINT, etc., are present but to add a char or a
 short values I couldn't find any macros.

 Could you please let me know how to add them?


FreeBSD does not have any sysctl handlers for 1 or 2 byte values.  FreeBSD
code that wants to do this has used int or u_int instead of a smaller type.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Matthew Fleming
On Sun, Oct 13, 2013 at 3:58 PM, Colin Percival cperc...@freebsd.orgwrote:

 Hi all,

 I've attached a very simple patch which makes /etc/rc:

 1. Skip any rc.d scripts with the firstboot keyword if /var/db/firstboot
 does not exist,

 2. If /var/db/firstboot and /var/db/firstboot-reboot exist after running
 rc.d
 scripts, reboot.

 3. Delete /var/db/firstboot (and firstboot-reboot) after the first boot.


We use something like this at work.  However, our version creates a file
after the firstboot scripts have run, and doesn't run if the file exists.

Is there a reason to prefer one choice over the other?  Naively I'd expect
it to be better to run when the file doesn't exist, creating when done; it
solves the problem of making sure the magic file exists before first boot,
for the other polarity.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: lock order reversals on 10.0-ALPHA4

2013-10-06 Thread Matthew Fleming
On Sun, Oct 6, 2013 at 6:44 AM, Julian H. Stacey j...@berklix.com wrote:

 With:
 FreeBSD lapr.js.berklix.net 10.0-ALPHA4 FreeBSD 10.0-ALPHA4 #0
 r255933: Sun Sep 29 02:50:54 UTC 2013
 r...@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

 At boot dmesg shows several lock order reversals, eg

 --
 Trying to mount root from ufs:/dev/ad4s2a [rw]...
 ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny,
 logging disabled
 lock order reversal:
  1st 0xfe00a6e26b48 bufwait (bufwait) @
 /usr/src/sys/kern/vfs_bio.c:3059
  2nd 0xf80005cf8400 dirhash (dirhash) @
 /usr/src/sys/ufs/ufs/ufs_dirhash.c:284
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe00c8f3d3f0
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe00c8f3d4a0
 witness_checkorder() at witness_checkorder+0xd23/frame 0xfe00c8f3d530
 _sx_xlock() at _sx_xlock+0x75/frame 0xfe00c8f3d570
 ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xfe00c8f3d5b0
 ufs_direnter() at ufs_direnter+0x688/frame 0xfe00c8f3d670
 ufs_makeinode() at ufs_makeinode+0x573/frame 0xfe00c8f3d830
 ufs_symlink() at ufs_symlink+0x32/frame 0xfe00c8f3d880
 VOP_SYMLINK_APV() at VOP_SYMLINK_APV+0xf0/frame 0xfe00c8f3d8b0
 kern_symlinkat() at kern_symlinkat+0x23e/frame 0xfe00c8f3dae0
 amd64_syscall() at amd64_syscall+0x265/frame 0xfe00c8f3dbf0
 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe00c8f3dbf0
 --- syscall (57, FreeBSD ELF64, sys_symlink), rip = 0x800888ffa, rsp =
 0x7fffca58, rbp = 0x7fffdc10 ---
 lock order reversal:
  1st 0xf800881b9d50 ufs (ufs) @ /usr/src/sys/kern/vfs_mount.c:851
  2nd 0xf800881b99a0 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2099
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe00c8ed93d0
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe00c8ed9480
 witness_checkorder() at witness_checkorder+0xd23/frame 0xfe00c8ed9510
 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xfe00c8ed9640
 vop_stdlock() at vop_stdlock+0x3c/frame 0xfe00c8ed9660
 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xfe00c8ed9690
 _vn_lock() at _vn_lock+0xab/frame 0xfe00c8ed9700
 vget() at vget+0x70/frame 0xfe00c8ed9750
 devfs_allocv() at devfs_allocv+0xfd/frame 0xfe00c8ed97a0
 devfs_root() at devfs_root+0x43/frame 0xfe00c8ed97d0
 vfs_donmount() at vfs_donmount+0x115e/frame 0xfe00c8ed9aa0
 sys_nmount() at sys_nmount+0x72/frame 0xfe00c8ed9ae0
 amd64_syscall() at amd64_syscall+0x265/frame 0xfe00c8ed9bf0
 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe00c8ed9bf0
 --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x800a9dd7a, rsp =
 0x7fffccb8, rbp = 0x7fffd220 ---
 

 Those two LORs are well-known and at least the fist is definitely a false
positive.  They're rather tricky to fix; there's been previous discussion.

The first one is #261 at http://sources.zabbadoz.net/freebsd/lor.html .
 The second one is probably #280.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: -ffunction-sections, -fdata-sections and -Wl,--gc-sections

2013-09-15 Thread Matthew Fleming
On Sun, Sep 15, 2013 at 2:24 PM, Ed Schouten e...@80386.nl wrote:

 Hi all,

 Today I did a tiny experiment and I am not entirely sure what to do.
 Throw away the patch or eventually push it into the tree.

 GCC and Clang support the -ffunction-sections and -fdata-sections
 flags. Essentially, these flags force the compiler to put every
 function and variable in its own section. Though this will blow up the
 size of the .o files a lot, it has a nice advantage. In combination
 with ld's --gc-sections flag, it can garbage collect functions and
 variables during the linking phase more accurately, thereby reducing
 the size of the resulting binary. This will of course not work for
 binaries where you want to use dlsym(), so this would only be safely
 usable in cases where you want to do static linking.

 I've written a tiny patch for share/mk, crunchgen and /rescue, to set
 these flags, only for .o files that are used for static linking (e.g.
 for .a files) and when linking statically linked executables.

 http://80386.nl/pub/gc-sections.txt

 The results are interesting. On amd64:

 - devd suddenly becomes 500 KB in size, instead of a megabyte,
 - init's size drops from 900 KB to 600 KB,
 - clang becomes a megabyte smaller.

 There is a downside, though. The total size of the system becomes 8 MB
 larger, because the .a files in /usr/lib are a bit bigger than before.
 Still, I think that this can be justified:

 - Systems short on disk-space are typically not used for software
 development. As /usr/include and /usr/lib/*.a together already account
 for ~30% of the disk space used, these files will likely be removed on
 a disk-space constrained system anyway.


I can confirm that at least one vendor who is constrained on the size of
the root partition does not install /usr/include or any *.a files.  On
shipping hardware.  We do use the hardware for developer builds where we do
need include files (though not .a files as we only use shared libraries).


 - init and devd are so fundamental that they will likely be installed
 on basically any embedded FreeBSD system. Reducing the size of these
 binaries makes sense.
 - It could also reduce the size of statically linked binaries outside
 of the base system, depending on base libraries.

 What I also like about this, is that at least for the base system, it
 will become less important to spread out functions in libraries over
 separate source files, in an attempt to reduce the size of the
 resulting binary. We may now leave functions that are related to each
 other in the same source file, thus improving readability.


Would it be possible to enable this only for devd, init, and clang
binaries?  Or is it a matter of enabling it for library builds that are
linked statically with the mentioned binaries?  Could init/devd be made
smaller by finding out which functions they do/don't use and separating
those into separate .c files?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: GCC withdraw

2013-08-30 Thread Matthew Fleming
On Fri, Aug 30, 2013 at 6:47 AM, Ian Lepore i...@freebsd.org wrote:

 On Fri, 2013-08-30 at 07:39 -0600, Warner Losh wrote:
  I had a long, rambling reply to this that corrected many of the factual
 errors made in it. But why bother. You have your world view, it doesn't
 match what people are doing today and this mismatch is going to cause
 people pain and suffering in the embedded world far beyond what you think.
 And you've shown an extreme reluctance to accept that your world view isn't
 quite right, and listen to people. This makes me sad, but I recognize a
 lost cause when I see it.
 
  Do whatever the fuck you want, but it won't make your arguments right.
 And it won't keep me from saying I told you so when your optimistic
 timelines don't come to fruition, or the people processors you dismiss as
 being too weak to run a full FreeBSD (despite the fact they are doing it
 today) complain about the needless pain they are going through.
 
  Warner

 Actually, I have to put a +1 on this.  I also had a long reply full of
 reality-based refutations of various facts from this thread, and I
 also just deleted it because clearly the discussion has become
 pointless.


I don't really have any skin in this game; the vendor I work for uses x86
hardware, and we're not ready to be running on FreeBSD 10 yet.  But as an
old guy I really don't see why we'd change the plan of record so late.
 Nor do I think prioritizing ports over the base system on alternate
architectures is the right play -- there's a lot of vendors who use FreeBSD
on !x86.  And there's a lot of vendors who don't use very many ports.

And there's a lot of vendors putting money into the FreeBSD foundation, and
into the hands of FreeBSD committers, to make it better.  Vendors who,
while it would be painful to switch, do have a choice of which OS to build
their product around.

Just my 2c.  Actual value may differ.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Build error on CURRENT

2012-07-08 Thread Matthew Fleming
I'm getting a compile error during buildworld here:



=== libexec/atrun (all)
cc -g  -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\  -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'  -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\
-I/usr/data/sb/head/libexec/atrun/../../usr.bin/at
-I/usr/data/sb/head/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign -c
/usr/data/sb/head/libexec/atrun/atrun.c
${CTFCONVERT_CMD} expands to empty string
cc -g  -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\  -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'  -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\
-I/usr/data/sb/head/libexec/atrun/../../usr.bin/at
-I/usr/data/sb/head/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign -c
/usr/data/sb/head/libexec/atrun/gloadavg.c
${CTFCONVERT_CMD} expands to empty string
cc -g  -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\  -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'  -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\
-I/usr/data/sb/head/libexec/atrun/../../usr.bin/at
-I/usr/data/sb/head/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign  -o atrun atrun.o gloadavg.o
-lpam -lutil
/usr/obj/usr/data/sb/head/tmp/usr/lib/libc.so: undefined reference to `log'
*** [atrun] Error code 1

Stop in /usr/data/sb/head/libexec/atrun.
*** [all] Error code 1

Stop in /usr/data/sb/head/libexec.
*** [libexec.all__D] Error code 1

Stop in /usr/data/sb/head.
*** [everything] Error code 1

Stop in /usr/data/sb/head.
*** [buildworld] Error code 1


At first I thought maybe it was because of an old install (CURRENT as
of December 2011) so I built and installed stable/9.  That went off
just fine.

I thought it may be because my svn checkout of CURRENT got
interrupted, so I re-extracted from scratch and see the same thing.

I've also tried completely removing /usr/obj.  I've build with no -j flag.

Looking at the sources I don't see the symbol log in libc anywhere
except as a DEBUG local variable that is only used in one file, always
under DEBUG.  The only thing I see that's of interested is the
libexec.all__D target which I haven't yet grepped for to see what it
is.

My make.conf and src.conf are pretty sparse:

root@:/data/sb/head # cat /etc/make.conf
#LIBC_EXTRAMK=/usr/data/sb/ino64/tools/tools/shlib-compat/Makefile.sysfake
CFLAGS=-g

# added by use.perl 2011-09-17 07:04:17
PERL_VERSION=5.12.4

root@:/data/sb/head # cat /etc/src.conf
#WITHOUT_CLANG=yes

Clearly the tinderbox machines aren't hitting this, and I didn't see
anything in the mailing list either.  So it's almost certainly
something on my end, but I'm at a loss as to what it could be.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Adding bool, true, and false to the kernel

2011-12-08 Thread Matthew Fleming
Hello -current!

Recently on -arch@, we discussed adding the C99 keywords bool, true,
and false to the kernel.  I now have patches to do this as well as fix
up some build issues.

The original thread was here:

http://lists.freebsd.org/pipermail/freebsd-arch/2011-November/011937.html

I split the patches in three:

http://people.freebsd.org/~mdf/0001-e1000-ixgbe-fix-code-to-not-define-bool-true-false-w.patch

fixes up the e1000 and ixgbe code.  Jack, can you please let me know
if there are any issues.  This should work to build whether or not
sys/types.h has the new defines; I am testing make universe right now.

http://people.freebsd.org/~mdf/0002-Fix-code-to-not-define-bool-true-false-when-already-.patch

fixes the other code in the sys/ directory that gives build conflicts.
 Since I wasn't sure of the origin of the drivers, I conservatively
left all their defines alone to allow the same driver code to build on
both CURRENT and 9.0.  If some of the drivers are wholly-owned by
FreeBSD (unlike e1000 and ixgbe) and are not expected to be built on
older releases, then the use of the old defines can be removed.  If
anyone knows the provenance of these files, please advise.

http://people.freebsd.org/~mdf/0003-Define-bool-true-and-false-in-types.h-for-_KERNEL-co.patch

actually defines bool, true, and false, and adds some extra paranoia
in stdbool.h for anyone who has hacked their local build system and
project repo to include stdbool.h in a kernel file.  I also bumped
__FreeBSD_version, though this is probably not necessary since
__bool_true_false_are_defined is a better check than the
__FreeBSD_version.

This code should be MFC-able to stable/9 after 9.0 is released.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [TESTING]: one more boot2 shrinking patch

2011-03-10 Thread Matthew Fleming
On Thu, Mar 10, 2011 at 8:37 AM, Roman Divacky rdiva...@freebsd.org wrote:
 On Thu, Mar 10, 2011 at 09:20:58AM -0500, John Baldwin wrote:
 On Wednesday, March 09, 2011 6:24:36 pm Dimitry Andric wrote:
  On 2011-03-09 14:23, John Baldwin wrote:
   gcc nor clang emits any code to initialize static type foo = 0;
   because it's expected that BSS is zeroed, which is not the case
   in boot2 so we have to initialize that explicitly
   It used to be that if you explicitly initialized a variable to 0, it was
   initialized to 0 in .data, but now gcc and clang recognize it is set to 
   0 and
   move it to .bss.  There appears to be no way to turn this feature off,
 
  Yes, there is; both gcc and clang have this option to turn it off:
 
  -fno-zero-initialized-in-bss
       If the target supports a BSS section, GCC by default puts variables
       that are initialized to zero into BSS. This can save space in the
       resulting code.
 
       This option turns off this behavior because some programs
       explicitly rely on variables going to the data section. E.g., so
       that the resulting executable can find the beginning of that
       section and/or make assumptions based on that.
 
       The default is -fzero-initialized-in-bss.

 Hah, that is better then.  Thanks! I should have searched about this more
 myself. :(  Roman, can you try reverting the kname changes and adding this
 to CFLAGS instead for both compilers?

 when I put -fno-zero-initialized-in-bss clang does not fit by 1.7K and
 gcc by 0.5K, we dont want this :)

If there's that many variables explicitly initialized to zero, does
boot2 then need to explicitly bzero all of bss, not just kname, since
it's not handled by the loader?  Otherwise it sounds like either
there's a lot of explicitly initialized variables that didn't need to
be, or there's a lot of potential for uninitialized variable nonsense
to happen.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: acpi_resource bug?

2011-02-14 Thread Matthew Fleming
On Mon, Feb 14, 2011 at 6:24 AM, John Baldwin j...@freebsd.org wrote:
 On Sunday, February 13, 2011 2:46:07 pm Matthew Fleming wrote:
 I'm not very familiar with the acpi code, but we have seen an
 intermittent issue on boot:

 1) should the length of the bcopy() be changed to either respect
 res-Length or the actual length of the ACPI_RESOURCE_DATA for the
 type?

 It should just use res-Length:

Is there a guarantee that res-Length is = sizeof(ACPI_RESOURCE) ?

Thanks,
matthew


 Index: acpi_resource.c
 ===
 --- acpi_resource.c     (revision 218554)
 +++ acpi_resource.c     (working copy)
 @@ -82,7 +82,7 @@ acpi_lookup_irq_handler(ACPI_RESOURCE *res, void *
        req-found = 1;
        KASSERT(irq == rman_get_start(req-res),
            (IRQ resources do not match));
 -       bcopy(res, req-acpi_res, sizeof(ACPI_RESOURCE));
 +       bcopy(res, req-acpi_res, res-Length);
        return (AE_CTRL_TERMINATE);
     }
     return (AE_OK);

 --
 John Baldwin

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: acpi_resource bug?

2011-02-14 Thread Matthew Fleming
On Mon, Feb 14, 2011 at 10:37 AM, John Baldwin j...@freebsd.org wrote:
 On Monday, February 14, 2011 1:30:18 pm Jung-uk Kim wrote:
 On Monday 14 February 2011 10:29 am, Matthew Fleming wrote:
  On Mon, Feb 14, 2011 at 6:24 AM, John Baldwin j...@freebsd.org
 wrote:
   On Sunday, February 13, 2011 2:46:07 pm Matthew Fleming wrote:
   I'm not very familiar with the acpi code, but we have seen an
   intermittent issue on boot:
  
   1) should the length of the bcopy() be changed to either respect
   res-Length or the actual length of the ACPI_RESOURCE_DATA for
   the type?
  
   It should just use res-Length:
 
  Is there a guarantee that res-Length is = sizeof(ACPI_RESOURCE) ?

 No.  Please try the attached patch (after your r218685).

 I think your patch is correct, but are you saying that ACPICA will return a
 resource with a size that doesn't match its type?

 ACPI_RESOURCE_DATA is a union of all the various resource types, and it does
 contain both ACPI_RESOURCE_IRQ and ACPI_RESOURCE_EXTENDED_IRQ, so it's hard
 to see how res-Length would be greater than the size of ACPI_RESOURCE.

Jung-uk Kim's patch makes acpi_resource match the other bcopy's in the
acpica directory, and in the case of what I saw it would bcopy 8+5
bytes instead of res-Length == 16.

My concern was that res-Length seemed primarily to be an offset from
the current resource to the next one, and I didn't see why that may
not include a lot of padding (including more than the target of the
bcopy was prepared for).  However, my code will also copy bytes we
don't care about any time res-Length is rounded up from the actual
struct size.

The patch looks fine to me.  I don't have direct access to the machine
that was intermittently crashing so I can't really try the new patch,
but my change resolved the issue on it.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


acpi_resource bug?

2011-02-13 Thread Matthew Fleming
I'm not very familiar with the acpi code, but we have seen an
intermittent issue on boot:

Panic occurred in module kernel loaded at 0x8010:

Stack: --
kernel:trap_fatal+0xac
kernel:trap_pfault+0x24c
kernel:trap+0x42e
kernel:bcopy+0x16
kernel:AcpiWalkResources+0xdf
kernel:acpi_lookup_irq_resource+0x9e
kernel:acpi_alloc_resource+0x249
kernel:bus_alloc_resource+0x97
kernel:sioattach+0x446
kernel:device_attach+0x63
kernel:bus_generic_attach+0x27
kernel:acpi_probe_children+0x50
kernel:acpi_attach+0x836
kernel:device_attach+0x63
kernel:bus_generic_attach+0x27
kernel:nexus_attach+0x25
kernel:device_attach+0x63
kernel:root_bus_configure+0x2d
kernel:configure+0x1a
kernel:mi_startup+0x64
--
cpuid = 0; apic id = 00
fault virtual address   = 0xff8003abe000
fault code  = supervisor read data, page not present

acpi_lookup_irq_handler() is trying to bcopy an entire ACPI_RESOURCE
(68 bytes) from the input pointer, even though in this case the
resource was a ACPI_RESOURCE_TYPE_IRQ (5 bytes), and the loop in
AcpiWalkResourcessaw is seeing res-Length == 0x10.

In this case, I found the following resouces on the list:

(gdb) x/2wx  0xff8003abdfb0
0xff8003abdfb0: 0x0004  0x0010
(gdb) x/2wx  0xff8003abdfc0
0xff8003abdfc0: 0x0004  0x0010
(gdb) x/2wx  0xff8003abdfd0
0xff8003abdfd0: 0x  0x0010
(gdb) x/2wx  0xff8003abdfe0
0xff8003abdfe0: 0x0001  0x0010
(gdb) x/2wx  0xff8003abdff0
0xff8003abdff0: 0x0007  0x0010

So copying 68 bytes from 0xff8003abdfd0 will always fault.

What I wonder is the following:

1) should the length of the bcopy() be changed to either respect
res-Length or the actual length of the ACPI_RESOURCE_DATA for the
type?

2) why would there be no memory mapped at the next virtual page on
some boots, but not others?  I *think* that a reboot doesn't clear the
issue, but booting into a different kernel with no relevant changes
will change whether the panic on boot is hit.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: amd64 build fails within ESXi guest

2011-01-24 Thread Matthew Fleming
On Mon, Jan 24, 2011 at 8:25 AM, Eric Crist ecr...@secure-computing.net wrote:
 I'm trying to build HEAD within an ESXi guest system, and the build errors 
 while building the boot code.  I've attached the tail end of the log.  The 
 host is a Dell Vostro 230 with CPU: Intel(R) Core(TM)2 Quad CPU    Q8400  @ 
 2.66GHz (2659.61-MHz K8-class CPU) and the guest is allocated 256MB of RAM.  
 This is ESXi 4.1.0.


Locally we've been building the amd64 kernel with a few different
flags and ran into this in our kernel build.

Try this definition of do_cpuid instead:


static __inline void
do_cpuid(u_int ax, u_int *p)
{
#if 0
/*
 * Isilon: get a compile error on a new hwpmc file:

/data/sb/BR_HAB_BSDMERGE_STABLE7/src/sys/modules/hwpmc/../../dev/hwpmc/hwpmc_core.c:
In function 'pmc_core_initialize':
./machine/cpufunc.h:111: error: can't find a register in class 'BREG'
while reloading 'asm'
./machine/cpufunc.h:111: error: 'asm' operand has impossible constraints

This presumably has to do with -fPIC.  See
http://sam.zoy.org/blog/2007-04-13-shlib-with-non-pic-code-have-inline-assembly-and-pic-mix-well
for this workaround.
 */
__asm __volatile(cpuid
 : =a (p[0]), =b (p[1]), =c (p[2]), =d (p[3])
 :  0 (ax));
#else
__asm __volatile(push %%ebx   \n\t /* save %ebx */
 cpuid\n\t
 movl %%ebx, %1   \n\t /* save what cpuid just put in 
%ebx */
 pop %%ebx\n\t /* restore the old %ebx */
 : =a (p[0]), =m (p[1]), =c (p[2]), =d (p[3])
 : a(ax)
 : cc);
#endif
}

Note that using =r for the constraint on p[1] isn't sufficient as once
in a while the compiler has chosen %ebx, which then leads to garbage
in the register after the pop.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [head tinderbox] failure on amd64/amd64

2011-01-20 Thread Matthew Fleming
As far as I can tell this is another cvsup / tinderbox bug.  Both
sysctl.h and tsc.c were modified in r217616 but somehow tsc.c is
seeing the old version of sysctl.h.  This happened on another of my
commits a few weeks ago.

Hmm, does bumping __FreeBSD_version have anything to do with this?  I
belatedly realize that I should have done it for that rev since the
name of a kernel interface changed.

Thanks,
matthew

On Wed, Jan 19, 2011 at 10:18 PM, FreeBSD Tinderbox
tinder...@freebsd.org wrote:
 TB --- 2011-01-20 03:55:00 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2011-01-20 03:55:00 - starting HEAD tinderbox run for amd64/amd64
 TB --- 2011-01-20 03:55:00 - cleaning the object tree
 TB --- 2011-01-20 03:55:27 - cvsupping the source tree
 TB --- 2011-01-20 03:55:27 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/amd64/amd64/supfile
 TB --- 2011-01-20 03:55:39 - building world
 TB --- 2011-01-20 03:55:39 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-01-20 03:55:39 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-01-20 03:55:39 - TARGET=amd64
 TB --- 2011-01-20 03:55:39 - TARGET_ARCH=amd64
 TB --- 2011-01-20 03:55:39 - TZ=UTC
 TB --- 2011-01-20 03:55:39 - __MAKE_CONF=/dev/null
 TB --- 2011-01-20 03:55:39 - cd /src
 TB --- 2011-01-20 03:55:39 - /usr/bin/make -B buildworld
 World build started on Thu Jan 20 03:55:43 UTC 2011
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
 stage 5.1: building 32 bit shim libraries
 World build completed on Thu Jan 20 06:04:21 UTC 2011
 TB --- 2011-01-20 06:04:21 - generating LINT kernel config
 TB --- 2011-01-20 06:04:21 - cd /src/sys/amd64/conf
 TB --- 2011-01-20 06:04:21 - /usr/bin/make -B LINT
 TB --- 2011-01-20 06:04:21 - building LINT kernel
 TB --- 2011-01-20 06:04:21 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-01-20 06:04:21 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-01-20 06:04:21 - TARGET=amd64
 TB --- 2011-01-20 06:04:21 - TARGET_ARCH=amd64
 TB --- 2011-01-20 06:04:21 - TZ=UTC
 TB --- 2011-01-20 06:04:21 - __MAKE_CONF=/dev/null
 TB --- 2011-01-20 06:04:21 - cd /src
 TB --- 2011-01-20 06:04:21 - /usr/bin/make -B buildkernel KERNCONF=LINT
 Kernel build for LINT started on Thu Jan 20 06:04:21 UTC 2011
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
 [...]
 cc -c -O2 -frename-registers -pipe -fno-strict-aliasing  -std=c99  -Wall 
 -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes 
 -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign 
 -fformat-extensions -nostdinc  -I. -I/src/sys -I/src/sys/contrib/altq 
 -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
 -finline-limit=8000 --param inline-unit-growth=100 --param 
 large-function-growth=1000 -DGPROF -falign-functions=16 -DGPROF4 -DGUPROF 
 -fno-builtin -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone  
 -mfpmath=387 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3  -msoft-float 
 -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror -pg 
 -mprofiler-epilogue /src/sys/x86/x86/nexus.c
 cc -c -O2 -frename-registers -pipe -fno-strict-aliasing  -std=c99  -Wall 
 -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes 
 -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign 
 -fformat-extensions -nostdinc  -I. -I/src/sys -I/src/sys/contrib/altq 
 -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
 -finline-limit=8000 --param inline-unit-growth=100 --param 
 large-function-growth=1000 -DGPROF -falign-functions=16 -DGPROF4 -DGUPROF 
 -fno-builtin -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone  
 -mfpmath=387 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3  -msoft-float 
 -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror -pg 
 -mprofiler-epilogue /src/sys/x86/x86/tsc.c
 cc1: warnings being treated as errors
 /src/sys/x86/x86/tsc.c: In function 'sysctl_machdep_tsc_freq':
 /src/sys/x86/x86/tsc.c:266: warning: implicit declaration of function 
 'sysctl_handle_64'
 /src/sys/x86/x86/tsc.c:266: warning: nested extern declaration of 
 'sysctl_handle_64'
 /src/sys/x86/x86/tsc.c: At top level:
 /src/sys/x86/x86/tsc.c:274: error: 'CTLTYPE_U64' undeclared here (not in a 
 function)
 *** Error code 1

 Stop in /obj/src/sys/LINT.
 *** Error code 1

 Stop in /src.
 *** Error code 1

 Stop in /src.
 TB --- 2011-01-20 06:18:49 - WARNING: /usr/bin/make returned exit code  1
 TB --- 2011-01-20 06:18:49 - ERROR: failed to build lint kernel
 TB --- 2011-01-20 06:18:49 - 6881.73 user 1196.77 

Re: [head tinderbox] failure on arm/arm

2011-01-12 Thread Matthew Fleming
This is me, but I'm rather puzzled why it's failing.

SYSCTL_UQUAD is in the sys/sysctl.h for the image we're building.  And
why is there a file with SYSCTL_FOO being built as a library?

Any help?

Thanks,
matthew

On Wed, Jan 12, 2011 at 2:20 PM, FreeBSD Tinderbox
tinder...@freebsd.org wrote:
 TB --- 2011-01-12 22:00:00 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2011-01-12 22:00:00 - starting HEAD tinderbox run for arm/arm
 TB --- 2011-01-12 22:00:00 - cleaning the object tree
 TB --- 2011-01-12 22:00:09 - cvsupping the source tree
 TB --- 2011-01-12 22:00:09 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/arm/arm/supfile
 TB --- 2011-01-12 22:00:30 - building world
 TB --- 2011-01-12 22:00:30 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-01-12 22:00:30 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-01-12 22:00:30 - TARGET=arm
 TB --- 2011-01-12 22:00:30 - TARGET_ARCH=arm
 TB --- 2011-01-12 22:00:30 - TZ=UTC
 TB --- 2011-01-12 22:00:30 - __MAKE_CONF=/dev/null
 TB --- 2011-01-12 22:00:30 - cd /src
 TB --- 2011-01-12 22:00:30 - /usr/bin/make -B buildworld
 World build started on Wed Jan 12 22:00:30 UTC 2011
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 [...]
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:678:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:682:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:684:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:686:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:689:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:691:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:694:
  error: expected ')' before '(' token
 /src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:698:
  error: expected ')' before '(' token
 *** Error code 1

 Stop in /src/cddl/lib/libzpool.
 *** Error code 1

 Stop in /src/cddl/lib.
 *** Error code 1

 Stop in /src.
 *** Error code 1

 Stop in /src.
 *** Error code 1

 Stop in /src.
 *** Error code 1

 Stop in /src.
 TB --- 2011-01-12 22:20:50 - WARNING: /usr/bin/make returned exit code  1
 TB --- 2011-01-12 22:20:50 - ERROR: failed to build world
 TB --- 2011-01-12 22:20:50 - 926.31 user 233.05 system 1249.77 real


 http://tinderbox.freebsd.org/tinderbox-head-HEAD-arm-arm.full
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [head tinderbox] failure on ia64/ia64

2010-12-21 Thread Matthew Fleming
This one has me puzzled.  The line numbers don't match up with the
current version of the code, so it may be from in between my two drops
today.

I will start a make universe on my test box meanwhile.

Thanks,
matthew

On Tue, Dec 21, 2010 at 12:45 PM, FreeBSD Tinderbox
tinder...@freebsd.org wrote:
 TB --- 2010-12-21 19:12:26 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2010-12-21 19:12:26 - starting HEAD tinderbox run for ia64/ia64
 TB --- 2010-12-21 19:12:26 - cleaning the object tree
 TB --- 2010-12-21 19:12:37 - cvsupping the source tree
 TB --- 2010-12-21 19:12:37 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/ia64/ia64/supfile
 TB --- 2010-12-21 19:12:53 - building world
 TB --- 2010-12-21 19:12:53 - MAKEOBJDIRPREFIX=/obj
 TB --- 2010-12-21 19:12:53 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2010-12-21 19:12:53 - TARGET=ia64
 TB --- 2010-12-21 19:12:53 - TARGET_ARCH=ia64
 TB --- 2010-12-21 19:12:53 - TZ=UTC
 TB --- 2010-12-21 19:12:53 - __MAKE_CONF=/dev/null
 TB --- 2010-12-21 19:12:53 - cd /src
 TB --- 2010-12-21 19:12:53 - /usr/bin/make -B buildworld
 World build started on Tue Dec 21 19:12:54 UTC 2010
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
 World build completed on Tue Dec 21 20:33:09 UTC 2010
 TB --- 2010-12-21 20:33:09 - generating LINT kernel config
 TB --- 2010-12-21 20:33:09 - cd /src/sys/ia64/conf
 TB --- 2010-12-21 20:33:09 - /usr/bin/make -B LINT
 TB --- 2010-12-21 20:33:09 - building LINT kernel
 TB --- 2010-12-21 20:33:09 - MAKEOBJDIRPREFIX=/obj
 TB --- 2010-12-21 20:33:09 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2010-12-21 20:33:09 - TARGET=ia64
 TB --- 2010-12-21 20:33:09 - TARGET_ARCH=ia64
 TB --- 2010-12-21 20:33:09 - TZ=UTC
 TB --- 2010-12-21 20:33:09 - __MAKE_CONF=/dev/null
 TB --- 2010-12-21 20:33:09 - cd /src
 TB --- 2010-12-21 20:33:09 - /usr/bin/make -B buildkernel KERNCONF=LINT
 Kernel build for LINT started on Tue Dec 21 20:33:09 UTC 2010
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
 [...]
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c: In function 'clear_entries':
 /src/sys/kern/kern_fail.c:569: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:569: warning: left-hand operand of comma expression 
 has no effect
 *** Error code 1

 Stop in /obj/ia64.ia64/src/sys/LINT.
 *** Error code 1

 Stop in /src.
 *** Error code 1

 Stop in /src.
 TB --- 2010-12-21 20:45:06 - WARNING: /usr/bin/make returned exit code  1
 TB --- 2010-12-21 20:45:06 - ERROR: failed to build lint kernel
 TB --- 2010-12-21 20:45:06 - 4476.16 user 685.79 system 5560.20 real


 http://tinderbox.freebsd.org/tinderbox-head-HEAD-ia64-ia64.full
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [head tinderbox] failure on powerpc64/powerpc

2010-12-21 Thread Matthew Fleming
On Tue, Dec 21, 2010 at 2:37 PM, FreeBSD Tinderbox
tinder...@freebsd.org wrote:
 TB --- 2010-12-21 20:56:55 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2010-12-21 20:56:55 - starting HEAD tinderbox run for powerpc64/powerpc
 TB --- 2010-12-21 20:56:55 - cleaning the object tree
 TB --- 2010-12-21 20:57:12 - cvsupping the source tree
 TB --- 2010-12-21 20:57:12 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/powerpc64/powerpc/supfile
 TB --- 2010-12-21 20:57:29 - building world
 TB --- 2010-12-21 20:57:29 - MAKEOBJDIRPREFIX=/obj
 TB --- 2010-12-21 20:57:29 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2010-12-21 20:57:29 - TARGET=powerpc
 TB --- 2010-12-21 20:57:29 - TARGET_ARCH=powerpc64
 TB --- 2010-12-21 20:57:29 - TZ=UTC
 TB --- 2010-12-21 20:57:29 - __MAKE_CONF=/dev/null
 TB --- 2010-12-21 20:57:29 - cd /src
 TB --- 2010-12-21 20:57:29 - /usr/bin/make -B buildworld
 World build started on Tue Dec 21 20:57:29 UTC 2010
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
 stage 5.1: building 32 bit shim libraries
 World build completed on Tue Dec 21 22:29:06 UTC 2010
 TB --- 2010-12-21 22:29:06 - generating LINT kernel config
 TB --- 2010-12-21 22:29:06 - cd /src/sys/powerpc/conf
 TB --- 2010-12-21 22:29:06 - /usr/bin/make -B LINT
 TB --- 2010-12-21 22:29:06 - building LINT kernel
 TB --- 2010-12-21 22:29:06 - MAKEOBJDIRPREFIX=/obj
 TB --- 2010-12-21 22:29:06 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2010-12-21 22:29:06 - TARGET=powerpc
 TB --- 2010-12-21 22:29:06 - TARGET_ARCH=powerpc64
 TB --- 2010-12-21 22:29:06 - TZ=UTC
 TB --- 2010-12-21 22:29:06 - __MAKE_CONF=/dev/null
 TB --- 2010-12-21 22:29:06 - cd /src
 TB --- 2010-12-21 22:29:06 - /usr/bin/make -B buildkernel KERNCONF=LINT
 Kernel build for LINT started on Tue Dec 21 22:29:06 UTC 2010
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
 [...]
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:557: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c: In function 'clear_entries':
 /src/sys/kern/kern_fail.c:569: error: dereferencing pointer to incomplete type
 /src/sys/kern/kern_fail.c:569: warning: left-hand operand of comma expression 
 has no effect
 *** Error code 1

I think Tinderbox has a bad source tree.  Lines 557 and 569 make sense
from the old version of kern_fail.c before either of my commits.  So
is Tinderbox somehow building with an old kern_fail.c but an updated
sys/fail.h?  That would explain the build error, but I have no idea
how it could have gotten into such a situation.

But line 557 is a comment in between functions after both r216616 and r216620.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Lock order reversal .

2010-12-07 Thread Matthew Fleming
On Tue, Dec 7, 2010 at 9:18 AM, Julian Elischer jul...@freebsd.org wrote:
 On 12/7/10 3:41 AM, Attilio Rao wrote:

 2010/12/7 Erik Cederstrande...@cederstrand.dk:

 Den 07/12/2010 kl. 10.20 skrev Garrett Cooper:

 On Dec 7, 2010, at 12:26 AM, Mehmet Erol
 Sanliturkm.e.sanlit...@gmail.com  wrote:

 A Dmesg.TXT is attached having a lock order reversal .

    The mount LOR is well known.

 I see that this is the standard response to lot's of LOR reports. It
 seems to be one of the most-reported errors on CURRENT (and it's certainly a
 loud one), but I think a lot of people waste time researching the error and
 browsing Bjoerns LOR page, only to get the above response (not picking on
 you, Garrett).

 Do we have the possibility of silencing well-known and presumably
 harmless LOR's if there isn't sufficient motivation to fix the source?

 Witness has an 'internal blessing list' we never wanted to use in
 order to keep them popping up as reminder.
 Actually, the fact the LOR is 'known' doesn't mean it is 'analyzed'.
 The very few 'Analyzed but harmless' cases in the past have been
 handled via _NOWITNESS flags I guess.

 the problem is that the witness output tells you the second case (the
 reversed case)
 but it doesn't have any clues about the first case (the one that wsa the
 other way around).

 An extended witness might use a lot of memory but associate with each lock a
 'last place called when a lock was already held'
 that might give a clue as to where the other instance was. I'm not
 volunteering to write it,
 but it might be very worth while.. I'd certainly like to hear other ideas as
 well.

I have a small patch against stable/7 that adds a single bit to each
witness structure so that, if the normal lock order is ever
encountered after a reversal, the stack is printed.  It doesn't help
when the order is defined statically, though.

I could try to roll this up against -CURRENT this weekend.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: kern_sysctl.c compilation failure

2010-11-29 Thread Matthew Fleming
On Mon, Nov 29, 2010 at 12:07 PM, Michael Butler
i...@protected-networks.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Seems that 'treat warnings as errors' snags on this .. patch attached,

Which compiler are you using?  I didn't have any trouble with this
file on a make universe last night...

There's nothing wrong with the patch; I'd just like to understand why
you see an error and I do not.

Thanks,
matthew



        imb


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (FreeBSD)

 iEYEARECAAYFAkz0B/YACgkQQv9rrgRC1JKLBgCeNhKn2W6Z2XFN/zt70PbFhKbP
 eHcAoIwI0Iz0g5TmU/pjbnG8zlcY6a1y
 =a/KQ
 -END PGP SIGNATURE-

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-08 Thread Matthew Fleming
On Mon, Nov 8, 2010 at 6:47 AM, John Baldwin j...@freebsd.org wrote:
 On Saturday, November 06, 2010 4:33:17 pm Matthew Fleming wrote:
 On Sat, Nov 6, 2010 at 7:22 AM, Hans Petter Selasky hsela...@c2i.net wrote:
  Hi,
 
  On Saturday 06 November 2010 14:57:50 Matthew Fleming wrote:
 
  I think you're misunderstanding the existing taskqueue(9) implementation.
 
  As long as TQ_LOCK is held, the state of ta-ta_pending cannot change,
  nor can the set of running tasks.  So the order of checks is
  irrelevant.
 
  I agree that the order of checks is not important. That is not the problem.
 
  Cut  paste from suggested taskqueue patch from Fleming:
 
    +int
   +taskqueue_cancel(struct taskqueue *queue, struct task *task)
   +{
   +       int rc;
   +
   +       TQ_LOCK(queue);
   +       if (!task_is_running(queue, task)) {
   +               if ((rc = task-ta_pending)  0)
   +                       STAILQ_REMOVE(queue-tq_queue, task, task,
   ta_link); +               task-ta_pending = 0;
   +       } else {
   +               rc = -EBUSY;
 
  What happens in this case if ta_pending  0. Are you saying this is not
  possible? If ta_pending  0, shouldn't we also do a STAILQ_REMOVE() ?

 Ah!  I see what you mean.

 I'm not quite sure what the best thing to do here is; I agree it would
 be nice if taskqueue_cancel(9) dequeued the task, but I believe it
 also needs to indicate that the task is currently running.  I guess
 the best thing would be to return the old pending count by reference
 parameter, and 0 or EBUSY to also indicate if there is a task
 currently running.

 Adding jhb@ to this mail since he has good thoughts on interfacing.

 I agree we should always dequeue when possible.  I think it should return
 -EBUSY in that case.  That way code that uses 'cancel' followed by a
 conditional 'drain' to implement a blocking 'cancel' will DTRT.

Do we not also want the old ta_pending to be returned?  In the case
where a task is pending and is also currently running (admittedly a
narrow window), how would we do this?  This is why I suggested
returning the old ta_pending by reference.  This also allows callers
who don't care about the old pending to pass NULL and ignore it.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-08 Thread Matthew Fleming
On Mon, Nov 8, 2010 at 8:42 AM, John Baldwin j...@freebsd.org wrote:
 On Monday, November 08, 2010 10:34:33 am Matthew Fleming wrote:
 On Mon, Nov 8, 2010 at 6:47 AM, John Baldwin j...@freebsd.org wrote:
  On Saturday, November 06, 2010 4:33:17 pm Matthew Fleming wrote:
  On Sat, Nov 6, 2010 at 7:22 AM, Hans Petter Selasky hsela...@c2i.net 
  wrote:
   Hi,
  
   On Saturday 06 November 2010 14:57:50 Matthew Fleming wrote:
  
   I think you're misunderstanding the existing taskqueue(9) 
   implementation.
  
   As long as TQ_LOCK is held, the state of ta-ta_pending cannot change,
   nor can the set of running tasks.  So the order of checks is
   irrelevant.
  
   I agree that the order of checks is not important. That is not the 
   problem.
  
   Cut  paste from suggested taskqueue patch from Fleming:
  
     +int
+taskqueue_cancel(struct taskqueue *queue, struct task *task)
+{
+       int rc;
+
+       TQ_LOCK(queue);
+       if (!task_is_running(queue, task)) {
+               if ((rc = task-ta_pending)  0)
+                       STAILQ_REMOVE(queue-tq_queue, task, task,
ta_link); +               task-ta_pending = 0;
+       } else {
+               rc = -EBUSY;
  
   What happens in this case if ta_pending  0. Are you saying this is not
   possible? If ta_pending  0, shouldn't we also do a STAILQ_REMOVE() ?
 
  Ah!  I see what you mean.
 
  I'm not quite sure what the best thing to do here is; I agree it would
  be nice if taskqueue_cancel(9) dequeued the task, but I believe it
  also needs to indicate that the task is currently running.  I guess
  the best thing would be to return the old pending count by reference
  parameter, and 0 or EBUSY to also indicate if there is a task
  currently running.
 
  Adding jhb@ to this mail since he has good thoughts on interfacing.
 
  I agree we should always dequeue when possible.  I think it should return
  -EBUSY in that case.  That way code that uses 'cancel' followed by a
  conditional 'drain' to implement a blocking 'cancel' will DTRT.

 Do we not also want the old ta_pending to be returned?  In the case
 where a task is pending and is also currently running (admittedly a
 narrow window), how would we do this?  This is why I suggested
 returning the old ta_pending by reference.  This also allows callers
 who don't care about the old pending to pass NULL and ignore it.

 I would be fine with that then.  I wonder if taskqueue_cancel() could then
 return a simple true/false.  False if the task is running, and true
 otherwise?

Sure, though since we don't really have a bool type in the kernel I'd
still prefer to return an int with EBUSY meaning a task was running.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-08 Thread Matthew Fleming
On Mon, Nov 8, 2010 at 8:46 AM, Matthew Fleming mdf...@gmail.com wrote:
 On Mon, Nov 8, 2010 at 8:42 AM, John Baldwin j...@freebsd.org wrote:
 On Monday, November 08, 2010 10:34:33 am Matthew Fleming wrote:
 On Mon, Nov 8, 2010 at 6:47 AM, John Baldwin j...@freebsd.org wrote:
  On Saturday, November 06, 2010 4:33:17 pm Matthew Fleming wrote:
  On Sat, Nov 6, 2010 at 7:22 AM, Hans Petter Selasky hsela...@c2i.net 
  wrote:
   Hi,
  
   On Saturday 06 November 2010 14:57:50 Matthew Fleming wrote:
  
   I think you're misunderstanding the existing taskqueue(9) 
   implementation.
  
   As long as TQ_LOCK is held, the state of ta-ta_pending cannot change,
   nor can the set of running tasks.  So the order of checks is
   irrelevant.
  
   I agree that the order of checks is not important. That is not the 
   problem.
  
   Cut  paste from suggested taskqueue patch from Fleming:
  
     +int
+taskqueue_cancel(struct taskqueue *queue, struct task *task)
+{
+       int rc;
+
+       TQ_LOCK(queue);
+       if (!task_is_running(queue, task)) {
+               if ((rc = task-ta_pending)  0)
+                       STAILQ_REMOVE(queue-tq_queue, task, task,
ta_link); +               task-ta_pending = 0;
+       } else {
+               rc = -EBUSY;
  
   What happens in this case if ta_pending  0. Are you saying this is not
   possible? If ta_pending  0, shouldn't we also do a STAILQ_REMOVE() ?
 
  Ah!  I see what you mean.
 
  I'm not quite sure what the best thing to do here is; I agree it would
  be nice if taskqueue_cancel(9) dequeued the task, but I believe it
  also needs to indicate that the task is currently running.  I guess
  the best thing would be to return the old pending count by reference
  parameter, and 0 or EBUSY to also indicate if there is a task
  currently running.
 
  Adding jhb@ to this mail since he has good thoughts on interfacing.
 
  I agree we should always dequeue when possible.  I think it should return
  -EBUSY in that case.  That way code that uses 'cancel' followed by a
  conditional 'drain' to implement a blocking 'cancel' will DTRT.

 Do we not also want the old ta_pending to be returned?  In the case
 where a task is pending and is also currently running (admittedly a
 narrow window), how would we do this?  This is why I suggested
 returning the old ta_pending by reference.  This also allows callers
 who don't care about the old pending to pass NULL and ignore it.

 I would be fine with that then.  I wonder if taskqueue_cancel() could then
 return a simple true/false.  False if the task is running, and true
 otherwise?

 Sure, though since we don't really have a bool type in the kernel I'd
 still prefer to return an int with EBUSY meaning a task was running.

I'll commit this later today unless there are objections.

http://people.freebsd.org/~mdf/0001-Add-a-taskqueue_cancel-9-to-cancel-a-pending-task-wi.patch

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-08 Thread Matthew Fleming
On Mon, Nov 8, 2010 at 10:24 AM, John Baldwin j...@freebsd.org wrote:
 On Monday, November 08, 2010 11:46:58 am Matthew Fleming wrote:
 On Mon, Nov 8, 2010 at 8:42 AM, John Baldwin j...@freebsd.org wrote:
  On Monday, November 08, 2010 10:34:33 am Matthew Fleming wrote:
  On Mon, Nov 8, 2010 at 6:47 AM, John Baldwin j...@freebsd.org wrote:
   On Saturday, November 06, 2010 4:33:17 pm Matthew Fleming wrote:
   On Sat, Nov 6, 2010 at 7:22 AM, Hans Petter Selasky hsela...@c2i.net 
   wrote:
Hi,
   
On Saturday 06 November 2010 14:57:50 Matthew Fleming wrote:
   
I think you're misunderstanding the existing taskqueue(9) 
implementation.
   
As long as TQ_LOCK is held, the state of ta-ta_pending cannot 
change,
nor can the set of running tasks.  So the order of checks is
irrelevant.
   
I agree that the order of checks is not important. That is not the 
problem.
   
Cut  paste from suggested taskqueue patch from Fleming:
   
  +int
 +taskqueue_cancel(struct taskqueue *queue, struct task *task)
 +{
 +       int rc;
 +
 +       TQ_LOCK(queue);
 +       if (!task_is_running(queue, task)) {
 +               if ((rc = task-ta_pending)  0)
 +                       STAILQ_REMOVE(queue-tq_queue, task, 
 task,
 ta_link); +               task-ta_pending = 0;
 +       } else {
 +               rc = -EBUSY;
   
What happens in this case if ta_pending  0. Are you saying this is 
not
possible? If ta_pending  0, shouldn't we also do a STAILQ_REMOVE() ?
  
   Ah!  I see what you mean.
  
   I'm not quite sure what the best thing to do here is; I agree it would
   be nice if taskqueue_cancel(9) dequeued the task, but I believe it
   also needs to indicate that the task is currently running.  I guess
   the best thing would be to return the old pending count by reference
   parameter, and 0 or EBUSY to also indicate if there is a task
   currently running.
  
   Adding jhb@ to this mail since he has good thoughts on interfacing.
  
   I agree we should always dequeue when possible.  I think it should 
   return
   -EBUSY in that case.  That way code that uses 'cancel' followed by a
   conditional 'drain' to implement a blocking 'cancel' will DTRT.
 
  Do we not also want the old ta_pending to be returned?  In the case
  where a task is pending and is also currently running (admittedly a
  narrow window), how would we do this?  This is why I suggested
  returning the old ta_pending by reference.  This also allows callers
  who don't care about the old pending to pass NULL and ignore it.
 
  I would be fine with that then.  I wonder if taskqueue_cancel() could then
  return a simple true/false.  False if the task is running, and true
  otherwise?

 Sure, though since we don't really have a bool type in the kernel I'd
 still prefer to return an int with EBUSY meaning a task was running.

 The only reason I would prefer the other sense (false if already running)
 is that is what callout_stop()'s return value is (true if it stopped the
 callout, false if it couldn't stop it)).

I think the symmetry with callout(9) can't go very far, mostly because
callout generally takes a specified mtx before calling the callback,
and taskqueue(9) does not.  That means fundamentally that, when using
a taskqueue(9) the consumer has much less knowledge of the exact state
of a task.

Thanks,
matthew

 However, returning EBUSY is ok.  One difference is that callout_stop()
 returns false if the callout is not pending (i.e. not on softclock).  Right
 now taskqueue_cancel() returns 0 in that case (ta_pending == 0), but that is
 probably fine as long as it is documented.  If you wanted to return EINVAL
 instead, that would be fine, too.

 --
 John Baldwin

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-06 Thread Matthew Fleming
On Sat, Nov 6, 2010 at 1:37 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Friday 05 November 2010 20:06:12 John Baldwin wrote:
 On Friday, November 05, 2010 3:00:37 pm Hans Petter Selasky wrote:
  On Friday 05 November 2010 19:48:05 Matthew Fleming wrote:
   On Fri, Nov 5, 2010 at 11:45 AM, Hans Petter Selasky hsela...@c2i.net
 
  wrote:
On Friday 05 November 2010 19:39:45 Matthew Fleming wrote:
True, but no taskqueue(9) code can handle that.  Only the caller can
prevent a task from becoming enqueued again.  The same issue exists
with taskqueue_drain().
   
I find that strange, because that means if I queue a task again while
it is running, then I doesn't get run? Are you really sure?
  
   If a task is currently running when enqueued, the task struct will be
   re-enqueued to the taskqueue.  When that task comes up as the head of
   the queue, it will be run again.
 
  Right, and the taskqueue_cancel has to cancel in that state to, but it
  doesn't because it only checks pending if !running() :-) ??

 You can't close that race in taskqueue_cancel().  You have to manage that
 race yourself in your task handler.  For the callout(9) API we are only
 able to close that race if you use callout_init_mtx() so that the code
 managing the callout wheel can make use of your lock to resolve the races.
 If you use callout_init() you have to explicitly manage these races in your
 callout handler.

 Hi,

 I think you are getting me wrong! In the initial 0001-Implement-
 taskqueue_cancel-9-to-cancel-a-task-from-a.patch you have the following code-
 chunk:

 +int
 +taskqueue_cancel(struct taskqueue *queue, struct task *task)
 +{
 +       int rc;
 +
 +       TQ_LOCK(queue);
 +       if (!task_is_running(queue, task)) {
 +               if ((rc = task-ta_pending)  0)
 +                       STAILQ_REMOVE(queue-tq_queue, task, task, ta_link);
 +               task-ta_pending = 0;
 +       } else
 +               rc = -EBUSY;
 +       TQ_UNLOCK(queue);
 +       return (rc);
 +}

 This code should be written like this, having the if (!task_is_running())
 after checking the ta_pending variable.

 +int
 +taskqueue_cancel(struct taskqueue *queue, struct task *task)
 +{
 +       int rc;
 +
 +       TQ_LOCK(queue);
 +               if ((rc = task-ta_pending)  0) {
 +                       STAILQ_REMOVE(queue-tq_queue, task, task, ta_link);
 +                                 task-ta_pending = 0;
 +                         }
 +       if (!task_is_running(queue, task))
 +               rc = -EBUSY;
 +       TQ_UNLOCK(queue);
 +       return (rc);
 +}

 Or completely skip the !task_is_running() check. Else you are opening up a
 race in this function! Do I need to explain that more? Isn't it obvious?

I think you're misunderstanding the existing taskqueue(9) implementation.

As long as TQ_LOCK is held, the state of ta-ta_pending cannot change,
nor can the set of running tasks.  So the order of checks is
irrelevant.

As John said, the taskqueue(9) implementation cannot protect consumers
of it from re-queueing a task; that kind of serialization needs to
happen at a higher level.

taskqueue(9) is not quite like callout(9); the taskqueue(9)
implementation drops all locks before calling the task's callback
function.  So once the task is running, taskqueue(9) can do nothing
about it until the task stops running.  This is why Jeff's
implementation of taskqueue_cancel(9) slept if the task was running,
and why mine returns an error code.  The only way to know for sure
that a task is about to run is to ask taskqueue(9), because there's
a window where the TQ_LOCK is dropped before the callback is entered.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-06 Thread Matthew Fleming
On Sat, Nov 6, 2010 at 7:22 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 Hi,

 On Saturday 06 November 2010 14:57:50 Matthew Fleming wrote:

 I think you're misunderstanding the existing taskqueue(9) implementation.

 As long as TQ_LOCK is held, the state of ta-ta_pending cannot change,
 nor can the set of running tasks.  So the order of checks is
 irrelevant.

 I agree that the order of checks is not important. That is not the problem.

 Cut  paste from suggested taskqueue patch from Fleming:

   +int
  +taskqueue_cancel(struct taskqueue *queue, struct task *task)
  +{
  +       int rc;
  +
  +       TQ_LOCK(queue);
  +       if (!task_is_running(queue, task)) {
  +               if ((rc = task-ta_pending)  0)
  +                       STAILQ_REMOVE(queue-tq_queue, task, task,
  ta_link); +               task-ta_pending = 0;
  +       } else {
  +               rc = -EBUSY;

 What happens in this case if ta_pending  0. Are you saying this is not
 possible? If ta_pending  0, shouldn't we also do a STAILQ_REMOVE() ?

Ah!  I see what you mean.

I'm not quite sure what the best thing to do here is; I agree it would
be nice if taskqueue_cancel(9) dequeued the task, but I believe it
also needs to indicate that the task is currently running.  I guess
the best thing would be to return the old pending count by reference
parameter, and 0 or EBUSY to also indicate if there is a task
currently running.

Adding jhb@ to this mail since he has good thoughts on interfacing.

Thanks,
matthew


  +       }
  +       TQ_UNLOCK(queue);
  +       return (rc);
  +}



 As John said, the taskqueue(9) implementation cannot protect consumers
 of it from re-queueing a task; that kind of serialization needs to
 happen at a higher level.

 Agreed, but that is not what I'm pointing at. I'm pointing at what happens if
 you re-queue a task and then cancel while it is actually running. Will the
 task still be queued for execution after taskqueue_cancel()?

 taskqueue(9) is not quite like callout(9); the taskqueue(9)
 implementation drops all locks before calling the task's callback
 function.  So once the task is running, taskqueue(9) can do nothing
 about it until the task stops running.

 This is not the problem.


 This is why Jeff's
 implementation of taskqueue_cancel(9) slept if the task was running,
 and why mine returns an error code.  The only way to know for sure
 that a task is about to run is to ask taskqueue(9), because there's
 a window where the TQ_LOCK is dropped before the callback is entered.

 And if you re-queue and cancel in this window, shouldn't this also be handled
 like in the other cases?

 Cut  paste from kern/subr_taskqueue.c:

                task-ta_pending = 0;
                tb.tb_running = task;
                TQ_UNLOCK(queue);

 If a concurrent thread at exactly this point in time calls taskqueue_enqueue()
 on this task, then we re-add the task to the queue-tq_queue. So far we
 agree. Imagine now that for some reason a following call to taskqueue_cancel()
 on this task at same point in time. Now, shouldn't taskqueue_cancel() also
 remove the task from the queue-tq_queue in this case aswell? Because in
 your implementation you only remove the task if we are not running, and that
 is not true when we are at exactly this point in time.

                task-ta_func(task-ta_context, pending);

                TQ_LOCK(queue);
                tb.tb_running = NULL;
                wakeup(task);


 Another issue I noticed is that the ta_pending counter should have a wrap
 protector.

 --HPS

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-05 Thread Matthew Fleming
On Fri, Nov 5, 2010 at 5:58 AM, John Baldwin j...@freebsd.org wrote:
 On Thursday, November 04, 2010 5:49:22 pm Matthew Fleming wrote:
 On Thu, Nov 4, 2010 at 2:22 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, November 04, 2010 4:15:16 pm Hans Petter Selasky wrote:
  I think that if a task is currently executing, then there should be a 
  drain
  method for that. I.E. two methods: One to stop and one to cancel/drain. 
  Can
  you implement this?
 
  I agree, this would also be consistent with the callout_*() API if you had
  both stop() and drain() methods.

 Here's my proposed code.  Note that this builds but is not yet tested.


 Implement a taskqueue_cancel(9), to cancel a task from a queue.

 Requested by:       hps
 Original code:      jeff
 MFC after:  1 week


 http://people.freebsd.org/~mdf/bsd-taskqueue-cancel.diff

 For FreeBSD taskqueue_cancel() should return EBUSY, not -EBUSY.  However, I
 would prefer that it follow the semantics of callout_stop() and return true
 if it stopped the task and false otherwise.  The Linux wrapper for
 taskqueue_cancel() can convert the return value.

I used -EBUSY since positive return values reflect the old pending
count.  ta_pending was zero'd, and I think needs to be to keep the
task sane, because all of taskqueue(9) assumes a non-zero ta_pending
means the task is queued.

I don't know that the caller often needs to know the old value of
ta_pending, but it seems simpler to return that as the return value
and use -EBUSY than to use an optional pointer to a place to store the
old ta_pending just so we can keep the error return positive.

Note that phk (IIRC) suggested using -error in the returns for
sbuf_drain to indicate the difference between success ( 0 bytes
drained) and an error, so FreeBSD now has precedent.  I'm not entirely
sure that's a good thing, since I am not generally fond of Linux's use
of -error, but for some cases it is convenient.

But, I'll do this one either way, just let me know if the above hasn't
convinced you.

 I'm not sure I like reusing the memory allocation flags (M_NOWAIT / M_WAITOK)
 for this blocking flag.  In the case of callout(9) we just have two functions
 that pass an internal boolean to the real routine (callout_stop() and
 callout_drain() are wrappers for _callout_stop_safe()).  It is a bit
 unfortunate that taskqueue_drain() already exists and has different semantics
 than callout_drain().  It would have been nice to have the two APIs mirror 
 each
 other instead.

 Hmm, I wonder if the blocking behavior cannot safely be provided by just
 doing:

        if (!taskqueue_cancel(queue, task, M_NOWAIT)
                taskqueue_drain(queue, task);

This seems reasonable and correct.  I will add a note to the manpage about this.

Thanks,
matthew


 If that works ok (I think it does), I would rather have taskqueue_cancel()
 always be non-blocking.  Even though there is a race where the task could
 be rescheduled by another thread in between cancel and drain, the race still
 exists since if the task could be scheduled between the two, it could also
 be scheduled just before the call to taskqueue_cancel() (in which case a
 taskqueue_cancel(queue, task, M_WAITOK) would have blocked to wait for it
 matching the taskqueue_drain() above).  The caller still always has to
 provide synchronization for preventing a task's execution outright via their
 own locking.

 --
 John Baldwin

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-05 Thread Matthew Fleming
On Fri, Nov 5, 2010 at 7:18 AM, John Baldwin j...@freebsd.org wrote:
 On Friday, November 05, 2010 9:50:10 am Matthew Fleming wrote:
 On Fri, Nov 5, 2010 at 5:58 AM, John Baldwin j...@freebsd.org wrote:
  On Thursday, November 04, 2010 5:49:22 pm Matthew Fleming wrote:
  On Thu, Nov 4, 2010 at 2:22 PM, John Baldwin j...@freebsd.org wrote:
   On Thursday, November 04, 2010 4:15:16 pm Hans Petter Selasky wrote:
   I think that if a task is currently executing, then there should be a 
   drain
   method for that. I.E. two methods: One to stop and one to 
   cancel/drain. Can
   you implement this?
  
   I agree, this would also be consistent with the callout_*() API if you 
   had
   both stop() and drain() methods.
 
  Here's my proposed code.  Note that this builds but is not yet tested.
 
 
  Implement a taskqueue_cancel(9), to cancel a task from a queue.
 
  Requested by:       hps
  Original code:      jeff
  MFC after:  1 week
 
 
  http://people.freebsd.org/~mdf/bsd-taskqueue-cancel.diff
 
  For FreeBSD taskqueue_cancel() should return EBUSY, not -EBUSY.  However, I
  would prefer that it follow the semantics of callout_stop() and return true
  if it stopped the task and false otherwise.  The Linux wrapper for
  taskqueue_cancel() can convert the return value.

 I used -EBUSY since positive return values reflect the old pending
 count.  ta_pending was zero'd, and I think needs to be to keep the
 task sane, because all of taskqueue(9) assumes a non-zero ta_pending
 means the task is queued.

 I don't know that the caller often needs to know the old value of
 ta_pending, but it seems simpler to return that as the return value
 and use -EBUSY than to use an optional pointer to a place to store the
 old ta_pending just so we can keep the error return positive.

 Note that phk (IIRC) suggested using -error in the returns for
 sbuf_drain to indicate the difference between success ( 0 bytes
 drained) and an error, so FreeBSD now has precedent.  I'm not entirely
 sure that's a good thing, since I am not generally fond of Linux's use
 of -error, but for some cases it is convenient.

 But, I'll do this one either way, just let me know if the above hasn't
 convinced you.

 Hmm, I hadn't considered if callers would want to know the pending count of
 the cancelled task.

  I'm not sure I like reusing the memory allocation flags (M_NOWAIT / 
  M_WAITOK)
  for this blocking flag.  In the case of callout(9) we just have two 
  functions
  that pass an internal boolean to the real routine (callout_stop() and
  callout_drain() are wrappers for _callout_stop_safe()).  It is a bit
  unfortunate that taskqueue_drain() already exists and has different 
  semantics
  than callout_drain().  It would have been nice to have the two APIs mirror 
  each
  other instead.
 
  Hmm, I wonder if the blocking behavior cannot safely be provided by just
  doing:
 
         if (!taskqueue_cancel(queue, task, M_NOWAIT)
                 taskqueue_drain(queue, task);

 This seems reasonable and correct.  I will add a note to the manpage about 
 this.

 In that case, would you be fine with dropping the blocking functionality from
 taskqueue_cancel() completely and requiring code that wants the blocking
 semantics to use a cancel followed by a drain?

New patch is at
http://people.freebsd.org/~mdf/0001-Implement-taskqueue_cancel-9-to-cancel-a-task-from-a.patch

I'll try to set up something to test it today too.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-05 Thread Matthew Fleming
On Fri, Nov 5, 2010 at 10:36 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Friday 05 November 2010 18:15:01 Matthew Fleming wrote:
 On Fri, Nov 5, 2010 at 7:18 AM, John Baldwin j...@freebsd.org wrote:
  On Friday, November 05, 2010 9:50:10 am Matthew Fleming wrote:
  On Fri, Nov 5, 2010 at 5:58 AM, John Baldwin j...@freebsd.org wrote:
   On Thursday, November 04, 2010 5:49:22 pm Matthew Fleming wrote:
   On Thu, Nov 4, 2010 at 2:22 PM, John Baldwin j...@freebsd.org wrote:
On Thursday, November 04, 2010 4:15:16 pm Hans Petter Selasky wrote:
I think that if a task is currently executing, then there should
be a drain method for that. I.E. two methods: One to stop and one
to cancel/drain. Can you implement this?
   
I agree, this would also be consistent with the callout_*() API if
you had both stop() and drain() methods.
  
   Here's my proposed code.  Note that this builds but is not yet
   tested.
  
  
   Implement a taskqueue_cancel(9), to cancel a task from a queue.
  
   Requested by:       hps
   Original code:      jeff
   MFC after:  1 week
  
  
   http://people.freebsd.org/~mdf/bsd-taskqueue-cancel.diff
  
   For FreeBSD taskqueue_cancel() should return EBUSY, not -EBUSY.
    However, I would prefer that it follow the semantics of
   callout_stop() and return true if it stopped the task and false
   otherwise.  The Linux wrapper for taskqueue_cancel() can convert the
   return value.
 
  I used -EBUSY since positive return values reflect the old pending
  count.  ta_pending was zero'd, and I think needs to be to keep the
  task sane, because all of taskqueue(9) assumes a non-zero ta_pending
  means the task is queued.
 
  I don't know that the caller often needs to know the old value of
  ta_pending, but it seems simpler to return that as the return value
  and use -EBUSY than to use an optional pointer to a place to store the
  old ta_pending just so we can keep the error return positive.
 
  Note that phk (IIRC) suggested using -error in the returns for
  sbuf_drain to indicate the difference between success ( 0 bytes
  drained) and an error, so FreeBSD now has precedent.  I'm not entirely
  sure that's a good thing, since I am not generally fond of Linux's use
  of -error, but for some cases it is convenient.
 
  But, I'll do this one either way, just let me know if the above hasn't
  convinced you.
 
  Hmm, I hadn't considered if callers would want to know the pending count
  of the cancelled task.
 
   I'm not sure I like reusing the memory allocation flags (M_NOWAIT /
   M_WAITOK) for this blocking flag.  In the case of callout(9) we just
   have two functions that pass an internal boolean to the real routine
   (callout_stop() and callout_drain() are wrappers for
   _callout_stop_safe()).  It is a bit unfortunate that
   taskqueue_drain() already exists and has different semantics than
   callout_drain().  It would have been nice to have the two APIs mirror
   each other instead.
  
   Hmm, I wonder if the blocking behavior cannot safely be provided by
   just doing:
  
          if (!taskqueue_cancel(queue, task, M_NOWAIT)
                  taskqueue_drain(queue, task);
 
  This seems reasonable and correct.  I will add a note to the manpage
  about this.
 
  In that case, would you be fine with dropping the blocking functionality
  from taskqueue_cancel() completely and requiring code that wants the
  blocking semantics to use a cancel followed by a drain?

 New patch is at
 http://people.freebsd.org/~mdf/0001-Implement-taskqueue_cancel-9-to-cancel-
 a-task-from-a.patch

 I think the:

 +       if (!task_is_running(queue, task)) {

 check needs to be omitted. Else you block the possibility of enqueue and
 cancel a task while it is actually executing/running ??

Huh?  If the task is currently running, there's nothing to do except
return failure.  Task running means it can't be canceled, because...
it's running.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-05 Thread Matthew Fleming
On Fri, Nov 5, 2010 at 11:35 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Friday 05 November 2010 19:13:08 Matthew Fleming wrote:
 On Fri, Nov 5, 2010 at 10:36 AM, Hans Petter Selasky hsela...@c2i.net
 wrote:
  On Friday 05 November 2010 18:15:01 Matthew Fleming wrote:
  On Fri, Nov 5, 2010 at 7:18 AM, John Baldwin j...@freebsd.org wrote:
   On Friday, November 05, 2010 9:50:10 am Matthew Fleming wrote:
   On Fri, Nov 5, 2010 at 5:58 AM, John Baldwin j...@freebsd.org wrote:
On Thursday, November 04, 2010 5:49:22 pm Matthew Fleming wrote:
On Thu, Nov 4, 2010 at 2:22 PM, John Baldwin j...@freebsd.org
 wrote:
 On Thursday, November 04, 2010 4:15:16 pm Hans Petter Selasky
 wrote:
 I think that if a task is currently executing, then there
 should be a drain method for that. I.E. two methods: One to
 stop and one to cancel/drain. Can you implement this?

 I agree, this would also be consistent with the callout_*() API
 if you had both stop() and drain() methods.
   
Here's my proposed code.  Note that this builds but is not yet
tested.
   
   
Implement a taskqueue_cancel(9), to cancel a task from a queue.
   
Requested by:       hps
Original code:      jeff
MFC after:  1 week
   
   
http://people.freebsd.org/~mdf/bsd-taskqueue-cancel.diff
   
For FreeBSD taskqueue_cancel() should return EBUSY, not -EBUSY.
 However, I would prefer that it follow the semantics of
callout_stop() and return true if it stopped the task and false
otherwise.  The Linux wrapper for taskqueue_cancel() can convert
the return value.
  
   I used -EBUSY since positive return values reflect the old pending
   count.  ta_pending was zero'd, and I think needs to be to keep the
   task sane, because all of taskqueue(9) assumes a non-zero ta_pending
   means the task is queued.
  
   I don't know that the caller often needs to know the old value of
   ta_pending, but it seems simpler to return that as the return value
   and use -EBUSY than to use an optional pointer to a place to store
   the old ta_pending just so we can keep the error return positive.
  
   Note that phk (IIRC) suggested using -error in the returns for
   sbuf_drain to indicate the difference between success ( 0 bytes
   drained) and an error, so FreeBSD now has precedent.  I'm not
   entirely sure that's a good thing, since I am not generally fond of
   Linux's use of -error, but for some cases it is convenient.
  
   But, I'll do this one either way, just let me know if the above
   hasn't convinced you.
  
   Hmm, I hadn't considered if callers would want to know the pending
   count of the cancelled task.
  
I'm not sure I like reusing the memory allocation flags (M_NOWAIT /
M_WAITOK) for this blocking flag.  In the case of callout(9) we
just have two functions that pass an internal boolean to the real
routine (callout_stop() and callout_drain() are wrappers for
_callout_stop_safe()).  It is a bit unfortunate that
taskqueue_drain() already exists and has different semantics than
callout_drain().  It would have been nice to have the two APIs
mirror each other instead.
   
Hmm, I wonder if the blocking behavior cannot safely be provided by
just doing:
   
       if (!taskqueue_cancel(queue, task, M_NOWAIT)
               taskqueue_drain(queue, task);
  
   This seems reasonable and correct.  I will add a note to the manpage
   about this.
  
   In that case, would you be fine with dropping the blocking
   functionality from taskqueue_cancel() completely and requiring code
   that wants the blocking semantics to use a cancel followed by a
   drain?
 
  New patch is at
  http://people.freebsd.org/~mdf/0001-Implement-taskqueue_cancel-9-to-canc
  el- a-task-from-a.patch
 
  I think the:
 
  +       if (!task_is_running(queue, task)) {
 

 If it is running, it is dequeued from the the taskqueue, right? And while it
 is running it can be queued again, which your initial code didn't handle?

True, but no taskqueue(9) code can handle that.  Only the caller can
prevent a task from becoming enqueued again.  The same issue exists
with taskqueue_drain().

BTW, I planned to commit the patch I sent today after testing,
assuming jhb@ has no more issues.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-05 Thread Matthew Fleming
On Fri, Nov 5, 2010 at 11:45 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Friday 05 November 2010 19:39:45 Matthew Fleming wrote:
 True, but no taskqueue(9) code can handle that.  Only the caller can
 prevent a task from becoming enqueued again.  The same issue exists
 with taskqueue_drain().

 I find that strange, because that means if I queue a task again while it is
 running, then I doesn't get run? Are you really sure?

If a task is currently running when enqueued, the task struct will be
re-enqueued to the taskqueue.  When that task comes up as the head of
the queue, it will be run again.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-04 Thread Matthew Fleming
On Mon, Nov 1, 2010 at 1:15 PM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Monday 01 November 2010 21:07:29 Matthew Fleming wrote:
 On Mon, Nov 1, 2010 at 12:54 PM, Hans Petter Selasky hsela...@c2i.net
 wrote:
  Hi!
 
  I've wrapped up an outline patch for what needs to be done to integrate
  the USB process framework into the kernel taskqueue system in a more
  direct way that to wrap it.
 
  The limitation of the existing taskqueue system is that it only
  guarantees execution at a given priority level. USB requires more. USB
  also requires a guarantee that the last task queued task also gets
  executed last. This is for example so that a deferred USB detach event
  does not happen before any pending deferred I/O for example in case of
  multiple occurring events.
 
  Mostly this new feature is targeted for GPIO-alike system using slow
  busses like the USB. Typical use case:
 
  2 tasks to program GPIO on.
  2 tasks to program GPIO off.
 
  Example:
 
  a) taskqueue_enqueue_odd(sc-sc_taskqueue, sc-sc_task_on[0], sc-
 
 sc_task_on[1]);
 
  b) taskqueue_enqueue_odd(sc-sc_taskqueue, sc-sc_task_off[0], sc-
 
 sc_task_off[1]);
 
  No matter how the call ordering of code-line a) and b), we are always
  guaranteed that the last queued state on or off is reached before the
  head of the taskqueue empties.
 
 
  In lack of a better name, the new function was called
  taskqueue_enqueue_odd [some people obviously think that USB processes
  are odd, but not taskqueues
 
  :-)]

 I'd like to make sure I understand the USB requirements.

 (1) does USB need the task priority field?  Many taskqueue(9) consumers do
 not.

 No, USB does not need a task priority field, but a sequence field associated
 with the task and task queue head to figure out which task was queued first
 without having to scan all the tasks queued.

 (2) if there was a working taskqueue_remove(9) that removed the task
 if pending or returned error if the task was currently running, would
 that be sufficient to implement the required USB functionality?
 (assuming that taskqueue_enqueue(9) put all tasks with equal priority
 in order of queueing).

 No, not completely. See comment above. I also need information about which
 task was queued first, or else I have to keep this information separately,
 which again, confuse people. The more layers the more confusion?

I don't follow why keeping the information about which task was queued
first privately rather than having taskqueue(9) maintain it is
confusing.  So far, USB seems to be the only taskqueue consumer which
needs this information, so it makes a lot more sense to me for it to
be USB private.

To my mind, there's primary operations, and secondary ones.  A
secondary operation can be built from the primary ones.  It reads to
me that, if there was a taskqueue_cancel(9) (and there is in Jeff's
OFED branch) then you could build the functionality you want (and
maybe you don't need cancel, even).  While there is sometimes an
argument for making secondary operations available in a library, in
this case you need extra data which most other taskqueue consumers do
not.  That would break the KBI.  That is another argument in favor of
keeping the implementation private to USB.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-04 Thread Matthew Fleming
On Thu, Nov 4, 2010 at 11:41 AM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Thursday 04 November 2010 15:29:51 John Baldwin wrote:
  (and there is in Jeff's OFED branch)

 Is there a link to this branch? I would certainly have a look at his work and
 re-base my patch.

It's on svn.freebsd.org:

http://svn.freebsd.org/viewvc/base/projects/ofed/head/sys/kern/subr_taskqueue.c?view=log
http://svn.freebsd.org/viewvc/base?view=revisionrevision=209422

For the purpose of speed, I'm not opposed to breaking the KBI by using
a doubly-linked TAILQ, but I don't think the difference will matter
all that often (perhaps I'm wrong and some taskqueues have dozens of
pending tasks?)

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-04 Thread Matthew Fleming
On Thu, Nov 4, 2010 at 12:11 PM, Hans Petter Selasky hsela...@c2i.net wrote:
 On Thursday 04 November 2010 20:01:57 Matthew Fleming wrote:
 On Thu, Nov 4, 2010 at 11:41 AM, Hans Petter Selasky hsela...@c2i.net
 wrote:
  On Thursday 04 November 2010 15:29:51 John Baldwin wrote:
   (and there is in Jeff's OFED branch)
 
  Is there a link to this branch? I would certainly have a look at his work
  and re-base my patch.

 It's on svn.freebsd.org:

 http://svn.freebsd.org/viewvc/base/projects/ofed/head/sys/kern/subr_taskque
 ue.c?view=log
 http://svn.freebsd.org/viewvc/base?view=revisionrevision=209422

 For the purpose of speed, I'm not opposed to breaking the KBI by using
 a doubly-linked TAILQ, but I don't think the difference will matter
 all that often (perhaps I'm wrong and some taskqueues have dozens of
 pending tasks?)

 Thanks,
 matthew

 At first look I see that I need a non-blocking version of:

 taskqueue_cancel(

 At the point in the code where these functions are called I cannot block. Is
 this impossible to implement?

It depends on whether the queue uses a MTX_SPIN or MTX_DEF.  It is not
possible to determine whether a task is running without taking the
taskqueue lock.  And it is certainly impossible to dequeue a task
without the lock that was used to enqueue it.

However, a variant that dequeued if the task was still pending, and
returned failure otherwise (rather than sleeping) is definitely
possible.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-04 Thread Matthew Fleming
On Thu, Nov 4, 2010 at 2:22 PM, John Baldwin j...@freebsd.org wrote:
 On Thursday, November 04, 2010 4:15:16 pm Hans Petter Selasky wrote:
 I think that if a task is currently executing, then there should be a drain
 method for that. I.E. two methods: One to stop and one to cancel/drain. Can
 you implement this?

 I agree, this would also be consistent with the callout_*() API if you had
 both stop() and drain() methods.

Here's my proposed code.  Note that this builds but is not yet tested.


Implement a taskqueue_cancel(9), to cancel a task from a queue.

Requested by:   hps
Original code:  jeff
MFC after:  1 week


http://people.freebsd.org/~mdf/bsd-taskqueue-cancel.diff

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] Outline of USB process integration in the kernel taskqueue system

2010-11-01 Thread Matthew Fleming
On Mon, Nov 1, 2010 at 12:54 PM, Hans Petter Selasky hsela...@c2i.net wrote:
 Hi!

 I've wrapped up an outline patch for what needs to be done to integrate the
 USB process framework into the kernel taskqueue system in a more direct way
 that to wrap it.

 The limitation of the existing taskqueue system is that it only guarantees
 execution at a given priority level. USB requires more. USB also requires a
 guarantee that the last task queued task also gets executed last. This is for
 example so that a deferred USB detach event does not happen before any pending
 deferred I/O for example in case of multiple occurring events.

 Mostly this new feature is targeted for GPIO-alike system using slow busses
 like the USB. Typical use case:

 2 tasks to program GPIO on.
 2 tasks to program GPIO off.

 Example:

 a) taskqueue_enqueue_odd(sc-sc_taskqueue, sc-sc_task_on[0], sc-
sc_task_on[1]);


 b) taskqueue_enqueue_odd(sc-sc_taskqueue, sc-sc_task_off[0], sc-
sc_task_off[1]);


 No matter how the call ordering of code-line a) and b), we are always
 guaranteed that the last queued state on or off is reached before the head
 of the taskqueue empties.


 In lack of a better name, the new function was called taskqueue_enqueue_odd
 [some people obviously think that USB processes are odd, but not taskqueues
 :-)]

I'd like to make sure I understand the USB requirements.

(1) does USB need the task priority field?  Many taskqueue(9) consumers do not.

(2) if there was a working taskqueue_remove(9) that removed the task
if pending or returned error if the task was currently running, would
that be sufficient to implement the required USB functionality?
(assuming that taskqueue_enqueue(9) put all tasks with equal priority
in order of queueing).

Thanks,
matthew

 Manpage:

 .Ft int
 .Fn taskqueue_enqueue_odd struct taskqueue *queue struct task *t0 struct
 task *t1

 ..

 The function
 .Fn taskqueue_enqueue_odd
 should be used if it is important that the last queued task is also
 executed last in the task queue at the given priority level. This
 function requires two valid task pointers. Depending on which of the
 tasks passed are queued at the calling moment, the last queued task of
 the two will get dequeued and put last in the task queue, while not
 touching the first queued task. If no tasks are queued at the calling
 moment, this function behaves like
 .Fn taskqueue_enqueue .
 This function returns zero if the first task was queued last, one if
 the second task was queued last.

 Preliminary patch - see e-mail attachment.

 Comments are welcome!

 --HPS

 More docs: Also see talk about the new USB stack in FreeBSD on Youtube.

 === share/man/man9/taskqueue.9
 ==
 --- share/man/man9/taskqueue.9  (revision 214211)
 +++ share/man/man9/taskqueue.9  (local)
 @@ -46,11 +46,15 @@
  typedef void (*taskqueue_enqueue_fn)(void *context);

  struct task {
 -       STAILQ_ENTRY(task)      ta_link;        /* link for queue */
 -       u_short                 ta_pending;     /* count times queued */
 -       u_short                 ta_priority;    /* priority of task in queue 
 */
 -       task_fn_t               ta_func;        /* task handler */
 -       void                    *ta_context;    /* argument for handler */
 +       TAILQ_ENTRY(task) ta_link;      /* link for queue */
 +#define        TASKQUEUE_SEQUENCE_MAX  255
 +       u_char  ta_sequence;            /* sequence number */
 +#define        TASKQUEUE_PENDING_MAX   255
 +       u_char  ta_pending;             /* count times queued */
 +#define        TASKQUEUE_PRIORITY_MAX  65535U
 +       u_short ta_priority;            /* priority of task in queue */
 +       task_fn_t *ta_func;             /* task handler */
 +       void    *ta_context;            /* argument for handler */
  };
  .Ed
  .Ft struct taskqueue *
 @@ -62,6 +66,8 @@
  .Ft int
  .Fn taskqueue_enqueue struct taskqueue *queue struct task *task
  .Ft int
 +.Fn taskqueue_enqueue_odd struct taskqueue *queue struct task *t0 
 struct task *t1
 +.Ft int
  .Fn taskqueue_enqueue_fast struct taskqueue *queue struct task *task
  .Ft void
  .Fn taskqueue_drain struct taskqueue *queue struct task *task
 @@ -134,6 +140,19 @@
  if the queue is being freed.
  .Pp
  The function
 +.Fn taskqueue_enqueue_odd
 +should be used if it is important that the last queued task is also
 +executed last in the task queue at the given priority level. This
 +function requires two valid task pointers. Depending on which of the
 +tasks passed are queued at the calling moment, the last queued task of
 +the two will get dequeued and put last in the task queue, while not
 +touching the first queued task. If no tasks are queued at the calling
 +moment, this function behaves like
 +.Fn taskqueue_enqueue .
 +This function returns zero if the first task was queued last, one if
 +the second task was queued last.
 +.Pp
 +The function
  .Fn taskqueue_enqueue_fast
  should be used in place 

intr_event_destroy(9)

2010-10-26 Thread Matthew Fleming
It looks like a bug in intr_event_destroy(9): I'm trying to unload a
new driver being developed internally for NVRAM, and I get this
WITNESS warning and hang:


# kldunload rnv
Sleeping on ithdty with the following non-sleepable locks held:
exclusive sleep mutex intr event list (intr event list) r = 0
(0x806f9560) locked @
/data/sb/BR_BONNEVILLE_HW/src/sys/kern/kern_intr.c:404
KDB: stack backtrace:
[801a544d] db_trace_self_wrapper+0x3d
[802e7b26] witness_warn+0x2f6
[802a1a43] _sleep+0xc3
[8026dad5] intr_event_destroy+0xe5
[ff87b05ba805] rnv_pci_detach+0xc5
[802c9414] device_detach+0xb4
[802c974f] devclass_delete_driver+0xdf
[802c991d] driver_module_handler+0x11d
[802843a2] module_unload+0x42
[80279f4b] linker_file_unload+0x19b
[8027aa1b] kern_kldunload+0x10b
[802a2609] isi_syscall+0x99
[804dee3e] ia32_syscall+0x1ce
[804a7e50] Xint0x80_syscall+0x60
--- syscall (444, FreeBSD ELF32, kldunloadf), rip = 0x280c1aff, rsp =
0xd44c, rbp = 0xdc98 ---

Looking at intr_event_destroy, I see this snippet from r157728:


#ifndef notyet
if (ie-ie_thread != NULL) {
ithread_destroy(ie-ie_thread);
ie-ie_thread = NULL;
}
#endif

There is an msleep(9) in ithread_destroy(9).  And everywhere else that
uses notyet has #ifdef, not #ifndef.  So... is this a typo?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: intr_event_destroy(9)

2010-10-26 Thread Matthew Fleming
On Tue, Oct 26, 2010 at 9:46 AM, John Baldwin j...@freebsd.org wrote:
 On Tuesday, October 26, 2010 11:55:10 am Matthew Fleming wrote:
 It looks like a bug in intr_event_destroy(9): I'm trying to unload a
 new driver being developed internally for NVRAM, and I get this
 WITNESS warning and hang:


 # kldunload rnv
 Sleeping on ithdty with the following non-sleepable locks held:
 exclusive sleep mutex intr event list (intr event list) r = 0
 (0x806f9560) locked @
 /data/sb/BR_BONNEVILLE_HW/src/sys/kern/kern_intr.c:404
 KDB: stack backtrace:
 [801a544d] db_trace_self_wrapper+0x3d
 [802e7b26] witness_warn+0x2f6
 [802a1a43] _sleep+0xc3
 [8026dad5] intr_event_destroy+0xe5
 [ff87b05ba805] rnv_pci_detach+0xc5
 [802c9414] device_detach+0xb4
 [802c974f] devclass_delete_driver+0xdf
 [802c991d] driver_module_handler+0x11d
 [802843a2] module_unload+0x42
 [80279f4b] linker_file_unload+0x19b
 [8027aa1b] kern_kldunload+0x10b
 [802a2609] isi_syscall+0x99
 [804dee3e] ia32_syscall+0x1ce
 [804a7e50] Xint0x80_syscall+0x60
 --- syscall (444, FreeBSD ELF32, kldunloadf), rip = 0x280c1aff, rsp =
 0xd44c, rbp = 0xdc98 ---

 Looking at intr_event_destroy, I see this snippet from r157728:


 #ifndef notyet
       if (ie-ie_thread != NULL) {
               ithread_destroy(ie-ie_thread);
               ie-ie_thread = NULL;
       }
 #endif

 There is an msleep(9) in ithread_destroy(9).  And everywhere else that
 uses notyet has #ifdef, not #ifndef.  So... is this a typo?

 No, it's actually on purpose I think as the other bits under notyet destroy
 the thread when the last handler for it goes away.

 However, ithread_destroy() does not block in any of 7.x or later:

 static void
 ithread_destroy(struct intr_thread *ithread)
 {
        struct thread *td;

        CTR2(KTR_INTR, %s: killing %s, __func__, ithread-it_event-ie_name);
        td = ithread-it_thread;
        thread_lock(td);
        ithread-it_flags |= IT_DEAD;
        if (TD_AWAITING_INTR(td)) {
                TD_CLR_IWAIT(td);
                sched_add(td, SRQ_INTR);
        }
        thread_unlock(td);
 }

 Maybe you have a local change?  If so, you can probably unlock the global
 event_list lock before calling ithread_destroy() (but after the
 TAILQ_REMOVE()) in intr_event_destroy().

Gah, yes, it looks like a local change we can probably do without.

And as it turns out the driver can pass NULL to siw_add() and skip the
intr_event_destroy() anyways.

Thanks for the help!
Sorry for the noise.
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [zfs] Mounting from (...) failed with error 19

2010-10-20 Thread Matthew Fleming
On Wed, Oct 20, 2010 at 1:39 PM, Garrett Cooper gcoo...@freebsd.org wrote:
 On Wed, Oct 20, 2010 at 12:45 PM, Peter Jeremy peterjer...@acm.org wrote:
 On 2010-Oct-20 10:50:38 +0400, KOT MATPOCKuH matpoc...@gmail.com wrote:
 I fixed it with attached patch.
Omg... Why You are using strcmp, but not strncmp(fs, zfs, strlen(zfs))?

 Can you explain why you think it should be strncmp() please.

 I'd say that strcmp is perfectly fine because zfs is a 3 character (4
 if you count NUL) string. The comparison logic is dang near the same:

It wouldn't be about the length of the string, but whether you want a
match on other strings like zfs2, zfs_foo, and anything else
prefixed with zfs.  That's the difference between strcmp and
strncmp(... strlen()).

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dmesg mangled by acd0 probe

2010-10-03 Thread Matthew Fleming
On Sun, Oct 3, 2010 at 7:16 PM, Steve Kargl
s...@troutmask.apl.washington.edu wrote:
 I suspect that 517 ( are not expected during boot.


 acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00
 (((cd0
  at ata0 bus 0 scbus1 target 0 lun 0
 cd0: SONY CDRWDVD CRX880A KD09 Removable CD-ROM SCSI-0 device

Does this always happen with acd0?  There was another instance I think
mav@ saw for a different driver that showed up when I made the sbuf
drain additions.  I can't think of how my code could cause that, but
there's always a possibility.

Thanks,
matthew

 cd0: 33.000MB/s transfers
 cd0: Attempt to query device size failed: NOT READY, Medium not present
 SMP: AP CPU #1 Launched!
 Root mount waiting for: usbus6 usbus2


 --
 Steve
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CACHE_LINE_SIZE too small, so struct vpglocks size alignment doesn't work

2010-10-01 Thread Matthew Fleming
On Fri, Oct 1, 2010 at 1:00 PM, Svatopluk Kraus onw...@gmail.com wrote:
 Hallo,

  a size of 'struct vpglocks' is padded to CACHE_LINE_SIZE size in
 'sys/vm/vm_page.h'
 header file. I work on a 'coldfire' port where CACHE_LINE_SIZE is 16 bytes and
 sizeof(struct mtx) is 20 bytes thus size alignment doesn't work.

  I solved it somehow, but I like to learn how to solve it in spirit
 of FreeBSD.
 There are a couple of possibilities:

 A1. Do nothing for small CACHE_LINE_SIZE.
 A2. Pad to multiple of CACHE_LINE_SIZE.

 B1. use #if with CACHE_LINE_SIZE
 B2. use #if with __coldfire__

 When I use B1 solution I need to known sizeof(struct mtx) value in
 preprocessing time.
 So, is it correct to use something like 'assym.s' magic
 (sys/i386/i386/genassym.c)
 in MI code? Or has someone another suggestion?

What about padding to CACHE_LINE_SIZE - (sizeof(struct vpglocks) 
(CACHE_LINE_SIZE-1)) ?

Some compilers will complain about 0-sized arrays, but gcc isn't one of them.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Examining the VM splay tree effectiveness

2010-09-30 Thread Matthew Fleming
On Thu, Sep 30, 2010 at 9:37 AM, Andre Oppermann an...@freebsd.org wrote:
 Just for the kick of it I decided to take a closer look at the use of
 splay trees (inherited from Mach if I read the history correctly) in
 the FreeBSD VM system suspecting an interesting journey.

 The VM system has two major structures:
  1) the VM map which is per process and manages its address space
    by tracking all regions that are allocated and their permissions.
  2) the VM page table which is per VM map and provides the backing
    memory pages to the regions and interfaces with the pmap layer
    (virtual-physical).

 Both of these are very frequently accessed through memory allocations
 from userspace and page faults from the pmap layer.  Their efficiency
 and SMP scalability is crucial for many operations beginning with fork/
 exec, going through malloc/mmap and ending with free/exit of a process.

 Both the vmmap and page table make use of splay trees to manage the
 entries and to speed up lookups compared to long to traverse linked
 lists or more memory expensive hash tables.  Some structures though
 do have an additional linked list to simplify ordered traversals.

 A splay tree is an interesting binary search tree with insertion,
 lookup and removal performed in O(log n) *amortized* time.  With
 the *amortized* time being the crucial difference to other binary trees.
 On every access *including* lookup it rotates the tree to make the
 just found element the new root node.  For all gory details see:
  http://en.wikipedia.org/wiki/Splay_tree

 This behavior has a few important implications:
  1) the root node and constant rotation works like a cache with
    least recent looked up node at the top and less recently ones
    close to the top;
  2) every lookup that doesn't match the root node, ie. a cache miss,
    causes a rotation of the tree to make the newly found node the new
    root;
  3) during the rotate it has to re-arrange and touch possibly a large
    number of other nodes;
  4) in worst case the tree may resemble a linked list.  A splay tree
    makes no guarantee that it is balanced.

 For the kernel and the splay tree usage in the VM map and page table
 some more implications come up:
  a) the amortization effect/cost only balance if the majority of
    lookups are root node (cache) hits, otherwise the cost of
    rebalancing destroys any advantage and quickly turns into a
    large overhead.
  b) the overhead becomes even worse due to touching many nodes and
    causing a lot of CPU cache line dirtying.  For processes with
    shared memory or threads across CPU's this overhead becomes
    almost excessive because the CPU's stomp on each others cached
    nodes and get their own caches invalidated.  The penalty is a
    high-latency memory refetch not only for the root node lookup
    but every hop in the following rebalancing operation = thrashing.

 To quantify the positive and negative effects of the splay tree I
 instrumented the code and added counters to track the behavior of
 the splay tree in the vmmap and page table.

 The test case is an AMD64 kernel build to get a first overview.
 Other system usages may have different results depending on their
 fork and memory usage patters.

 The results are very interesting:

  The page table shows a *very* poor root node hit rate and an excessive
  amount of rotational node modifications representing almost the
  worst case for splay trees.

 http://chart.apis.google.com/chart?chs=700x300chbh=a,23chds=0,127484769chd=t:16946915,16719791,48872230,131057,74858589,105206121cht=bvgchl=insert|remove|lookup|cachehit|rotates|rotatemodifies

  The vmmap shows a high root node hit rate but still a significant
  amount of rotational node modifications.

 http://chart.apis.google.com/chart?chs=700x300chbh=a,23chds=0,21881583chd=t:724293,723701,20881583,19044544,3719582,4553350cht=bvgchl=insert|remove|lookup|cachehit|rotates|rotatemodifies

 From these observations I come to the conclusion that a splay tree
 is suboptimal for the normal case and quickly horrible even on small
 deviations from the normal case in the vmmap.  For the page table it
 is always horrible.  Not to mention the locking complications due to
 modifying the tree on lookups.

 Based on an examination of other binary trees I put forward the
 hypothesis that a Red-Black tree is a much better, if not the optimal,
 fit for the vmmap and page table.  RB trees are balanced binary trees
 and O(log n) in all cases.  The big advantage in this context is that
 lookups are pure reads and do not cause CPU cache invalidations on
 other CPU's and always only require a read lock without the worst
 case properties of the unbalanced splay tree.  The high cache locality
 of the vmmap lookups can be used with the RB tree as well by simply
 adding a pointer to the least recently found node.  To prevent write locking
 this can be done lazily.  More profiling is required to make
 a non-speculative statement on this 

Re: MAXCPU preparations

2010-09-29 Thread Matthew Fleming
On Wed, Sep 29, 2010 at 1:54 PM, Robert N. M. Watson
rwat...@freebsd.org wrote:

 On 29 Sep 2010, at 12:49, John Baldwin wrote:

 On Tuesday, September 28, 2010 6:24:32 pm Robert N. M. Watson wrote:

 On 28 Sep 2010, at 19:40, Sean Bruno wrote:

 If you go fully dynamic you should use mp_maxid + 1 rather than maxcpus.

 I assume that mp_maxid is the new kern.smp.maxcpus?  Can you inject some
 history here so I can understand why one is better than the other?

 So, unlike maxcpus, mp_maxid is in theory susceptible to races in a brave 
 new world in which we support hotplug CPUs -- a brave new world we're
 not yet ready for, however. If you do use mp_maxid, be aware you need to add 
 one to get the size of the array -- maxcpus is the number of CPUs that
 can be used, whereas mp_maxid is the highest CPU ID (counting from 0) that 
 is used.

 Hmm, I'm not sure that is true.  My feeling on mp_maxid is that it is the
 largest supported CPUID.  Platforms that support hotplug would need to set
 mp_maxid such that it has room for hotpluggable CPUs.  You don't want to
 go reallocate the UMA datastructures for every zone when a CPU is hotplugged
 for example.

 Yes, we'll have to break one (or even both) of two current assumptions with 
 the move to hotplug: contiguous in-use CPU IDs and mp_maxid representing the 
 greatest possible CPU ID as a constant value. The former is guaranteed, but I 
 wonder about the latter. On face value, you might say oh, we know how many 
 sockets there are, but if course, we don't know how many threads will arrive 
 when a package is inserted.  For many subsystems, DPCPU will present a more 
 than adequate answer for avoiding resizing, although not for low-level 
 systems (perhaps such as UMA?). Likewise, on virtual machine platforms where 
 hotplug actually might reflect a longer-term scheduling choice by the 
 admin/hypervisor (i.e., resource partitioning), it may be harder to reason 
 about what the future holds.


As a historical note, when AIX added hotplug CPU support, we kept the
MAXCPU define as the largest number of CPUs supported by the OS image.
At the time it was 256; as it shifted to 1024 there was a large
cleanup effort to eliminate as many statically sized arrays as
possible.

AIX also has an mp_maxid equivalent which changed when new higher core
numbers were used.  For various reasons, new CPUs were added only
while a single CPU was running, so any loop that looked like (for i =
0; i = mp_maxid; i++) could get interrupted by an IPI (the handler
spun until the core was added by the coordinating CPU) and find that
it had a stale mp_maxid value.  This was not a bug in 99% of the uses
of the code, since whatever it was doing with each CPU was either not
atomic (e.g. summing per-cpu counters) or was some kind of
initializing work which was also done by the new CPU before it was
fully awake.

I don't necessarily recommend this as an architected plan for adding
CPUs, but I wanted to offer the historical note that it could work.

Also, CPU id's may not be contiguous with hotplug.  Even if they are
contiguous on boot, there is no reason to assume CPUs are offlined
from highest ID down.  For reasons of cache sharing, the best
performance may be obtained by picking a specific CPU to offline.  It
also may be that it makes more sense to reserve CPU ids so that e.g.
CPUs N*T through N*(2T-1) are all HMT threads on the same core, (for
T-way HMT).  In this case CPU IDs become sparse if HMT is completely
disabled, and the best performance for the box overall is probably
obtained by offlining T3 and T2 for each core while keeping T0 and T1
active.

But discontiguous IDs shouldn't be a problem as all the code I've seen
checks CPU_ABSENT(i) in the loop.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: System reboots building mysql51-client with ZFS v28

2010-09-09 Thread Matthew Fleming
On Thu, Sep 9, 2010 at 4:04 PM, Xin LI delp...@delphij.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 2010/09/09 15:33, Xin LI wrote:
 On 2010/09/09 15:28, Rob Farmer wrote:
 On Thu, Sep 9, 2010 at 15:14, Xin LI delp...@delphij.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi, Rob,

 On 2010/09/09 15:01, Rob Farmer wrote:
 [...]
 Just a guess - are you compiling under X?  If so what about if you
 compile under text mode?

 Nope - this is a new install and X isn't running yet. It is installed,
 but not started yet. I ran across this building dependencies for KDE
 (why a desktop requires MySQL server is beyond me, but anyways).

 I just tried to compile MySQL 5.1 client but it passed on my test system
 (the compile time option is slightly different as the VFS debugging is
 not yet enabled).  I'll do another round of test with these VFS options
 enabled.

 Ok - so I shut the system down for about 15 minutes and then tried
 again and it worked.

 So maybe it was heat or something - I had reproduced it about 5 times
 back to back at the exact same spot. I didn't think this laptop had
 cooling problems but I will keep a closer eye on it in the future.

 Oh...  If it's so reproducible like this way then I'm more inclined to
 believe that it's a software issue.  I'll try to see if I can reproduce
 the problem myself anyways, thanks for letting us know :)

 Ok mine went through with these recommended debugging options.

 Note that on my kernel I have bumped FULLGRAPH_SBUF_SIZE to 131072 in
 /sys/kern/subr_witness.c.  I'm not sure if it's related but without that
 I used get a few panics in the past.

As of r212370 on CURRENT this is no longer necessary, as the
sbuf_printf() will drain to the sysctl struct.  The define and its use
are gone.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580

2010-08-17 Thread Matthew Fleming
On Tue, Aug 17, 2010 at 4:04 PM, Kostik Belousov kostik...@gmail.com wrote:
 On Tue, Aug 17, 2010 at 07:42:41PM +0400, pluknet wrote:
 2010/8/16 Kostik Belousov kostik...@gmail.com:
  On Mon, Aug 16, 2010 at 09:07:24PM +0400, pluknet wrote:
  On 16 August 2010 21:05, pluknet pluk...@gmail.com wrote:
   Hi.
  
   Seeing on mostly idle, recently updated current, while closing a file.
   Presumably never reported on ML.
 [...]
 
  Both LORs are valid. The fork performed deep inside the VFS call stack
  is obviously problematic. As a workaround, you may fix the number of
  nfsiods.
 
  Proper fix might consist of creating a shepherd thread which only task
  is to act on the requests on creating new nfsiods.
 
  Would you try to implement this ? I will provide the assistance, if needed.

 Hmm.. I tried to move kproc_create() under shepherd thread and now stuck
 with cp process lockup in [bo_wwait] when cp'ing something on nfs: cp a b.
 Did I screw up something?
 See weird draft patch attached (weird, as I have no idea how to nicely
 exchange data between nfs_nfsiodnew() and shep_thread() thread).
 Most likely, you loose the requests to create nfsiods since the
 existing request in the global variable shep_chan can be overwritten
 by new request. You should either sleep till existing request is serviced,
 or form a queue.

If you sleep for the request to be serviced, this presumably has the
same LOR/deadlock possibility (unless locks are released before
sleep), except now WITNESS can't see the LOR.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Interrupt Problems

2010-07-28 Thread Matthew Fleming
David Naylor wrote:
 I have been having interrupt related problems with various subsystems.  I
 suspect this is related to the changes in the event timer infrastructure.

 The subsystems that have experienced interrupt problems:
  - hda: this is the easiest to reproduce and what I used to isolate the
 commits.  I get ``pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt
 timeout, channel dead'' reported and sound no longer plays.
  - nfe: this has happened on occasion with no reliable way to reproduce.
 ``watchdog timeouts'' are reported.  After this happens all network traffic 
 dies
 and doing `ifconfig nfe0 down; ifconfig nfe0 up' panics the computer.
  - dc: same thing as above.
  - nvidia: has reported interrupt timeouts.  This is independent of the
 locking problem (that is fixed with recently published patch).  No reliable 
 way
 to reproduce, appears to happen when under heavy load.  X freezes as a result.
  - ata: I had a HDD detach twice.  I am not sure if this is related.  I have
 two HDD, each attached to a different controller.

 I tested this by using a kernel built from a cvsup date of 2010/06/20 and
 2010/06/22 (at midnight for both, aka 00:00:00).  The former kernel does not
 exhibit any problems while the latter does.  This problem is also present with
 a kernel from today.

 The motherboard is a N650SLI-DS4L with one graphics card.  See attached for
 more system information.

 Is there anything I can do to help diagnose the problem?

When you say midnight on June 22, is that 11:59pm June 22 or 12:01 am June 22?

I don't expect it, but there's a possibility that r210377 affected
this.  Can you check whether sys/sys/_task.h changes between the
kernel that does work and the one that doesn't?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Fleming
   As an aside, this is a quad-core in one package CPU (an X3363). On both
 this box and a similar one with an X5470, console messages continue to
 print out after the system has been halted - press any key to reboot -
 in particular, the shutdown makes a bunch of the behind the scenes man-
 agement stuff like the virtual keyboard and monitor appear. Plugging or
 unplugging USB devices will go through the whole deal of detecting and
 making their service available.

Oops, youre right that other CPUs are running.

The stop_cpus() call is only made if kdb is entered.  doadump() is called out 
of boot() which comes later.  At Isilon weve been running with a patch that 
does stop_cpus() pretty close to the front of panic(9).

As an design decision it seems reasonable to call stop_cpus() early in panic(9) 
simply because most causes for panic means something unexpected, and the sooner 
the other CPUs arent running the more likely it is that they dont do more 
damage, leaving the system in a more useful state for dump or {g,d}db analysis. 
 This should be done before dump or entering kdb.

Im ccing -current@ since I would like a small discussion of moving the 
stop_cpus() to earlier in panic.  If this change is agreeable I can roll up a 
patch and test it on CURRENT.  Im not sure yet how much of the other 
panic-related changes we have made at Isilon would be required.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


sleep bug in taskqueue(9)

2010-04-28 Thread Matthew Fleming
It looks to me like taskqueue_drain(taskqueue_thread, foo) will not
correctly detect whether or not a task is currently running.  The check
is against a field in the taskqueue struct, but for the taskqueue_thread
queue with more than one thread, multiple threads can simultaneously be
running a task, thus stomping over the tq_running field.

I have not seen any problem with the code as-is in actual use, so this
is purely an inspection bug.

The following patch should fix the problem.  Because it changes the size
of struct task I'm not sure if it would be suitable for MFC.

Thanks,
matthew

diff --git a/sys/kern/subr_taskqueue.c b/sys/kern/subr_taskqueue.c
index 8405b3d..3b18269 100644
--- a/sys/kern/subr_taskqueue.c
+++ b/sys/kern/subr_taskqueue.c
@@ -51,7 +51,6 @@ __FBSDID($FreeBSD$);
const char  *tq_name;
taskqueue_enqueue_fntq_enqueue;
void*tq_context;
-   struct task *tq_running;
struct mtx  tq_mutex;
struct thread   **tq_threads;
int tq_tcount;
@@ -233,13 +232,13 @@ taskqueue_run(struct taskqueue *queue)
STAILQ_REMOVE_HEAD(queue-tq_queue, ta_link);
pending = task-ta_pending;
task-ta_pending = 0;
-   queue-tq_running = task;
+   task-ta_flags |= TA_FLAGS_RUNNING;
TQ_UNLOCK(queue);
 
task-ta_func(task-ta_context, pending);
 
TQ_LOCK(queue);
-   queue-tq_running = NULL;
+   task-ta_flags = ~TA_FLAGS_RUNNING;
wakeup(task);
}
 

@@ -256,14 +255,16 @@ taskqueue_drain(struct taskqueue *queue, struct
task *task)
 {
if (queue-tq_spin) {   /* XXX */
mtx_lock_spin(queue-tq_mutex);
-   while (task-ta_pending != 0 || task ==
queue-tq_running)
+   while (task-ta_pending != 0 ||
+   (task-ta_flags  TA_FLAGS_RUNNING) != 0)
msleep_spin(task, queue-tq_mutex, -, 0);
mtx_unlock_spin(queue-tq_mutex);
} else {
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,
__func__);
 
mtx_lock(queue-tq_mutex);
-   while (task-ta_pending != 0 || task ==
queue-tq_running)
+   while (task-ta_pending != 0 ||
+   (task-ta_flags  TA_FLAGS_RUNNING) != 0)
msleep(task, queue-tq_mutex, PWAIT, -, 0);
mtx_unlock(queue-tq_mutex);
}
diff --git a/sys/sys/_task.h b/sys/sys/_task.h
index 2a51e1b..c57bdd5 100644
--- a/sys/sys/_task.h
+++ b/sys/sys/_task.h
@@ -36,15 +36,21 @@
  * taskqueue_run().  The first argument is taken from the 'ta_context'
  * field of struct task and the second argument is a count of how many
  * times the task was enqueued before the call to taskqueue_run().
+ *
+ * List of locks
+ * (c) const after init
+ * (q) taskqueue lock
  */
 typedef void task_fn_t(void *context, int pending);
 
 struct task {
-   STAILQ_ENTRY(task) ta_link; /* link for queue */
-   u_short ta_pending; /* count times queued */
-   u_short ta_priority;/* Priority */
-   task_fn_t *ta_func; /* task handler */
-   void*ta_context;/* argument for handler */
+   STAILQ_ENTRY(task) ta_link; /* (q) link for queue */
+   u_char  ta_flags;   /* (q) state of this task */
+#defineTA_FLAGS_RUNNING0x01
+   u_short ta_pending; /* (q) count times queued */
+   u_short ta_priority;/* (c) Priority */
+   task_fn_t *ta_func; /* (c) task handler */
+   void*ta_context;/* (c) argument for handler */
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[PATCH] Fix LOR #185

2010-02-15 Thread Matthew Fleming
We've seen LOR #185 on http://sources.zabbadoz.net/freebsd/lor.html
locally on and off since October 2007.  This patch has been compiled but
I don't have a reliable way to repro the LOR so it's not been properly
tested.

If someone who has seen this LOR can confirm the patch fixes it, and
could commit, that would be nice.

Thanks,
matthew


bug35626.diff
Description: bug35626.diff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org