Re: panic: ia64 r255811: deadlkres: possible deadlock detected for 0xe000000012d07b00, blocked for 902743 ticks

2013-10-15 Thread Anton Shterenlikht
From davide.itali...@gmail.com Fri Oct 11 15:39:49 2013

If you're not able to get a full dump, a textdump would be enough.
In your DDB scripts just remove the 'call doadump' step and you should
be done, unless I'm missing something.
Please adjust the script as well to include all the informations
requested as mentioned in my previous link, e.g. 'show lockedvnods' is
not mentioned on the example section of textdump(4) manpage but it
could be useful to ease debugging.

It seems call doadump is always needed.
At least it is included in textdump(4) examples.

Also, this tutorial incluldes it:

http://www.etinc.com/122/Using-FreeBSD-Text-Dumps

I think I still haven't got textdump right, because
instead of

savecore: writing core to textdump.tar.0

I see:

savecore: writing core to /var/crash/vmcore.9

Thanks

Anton
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock

2013-10-15 Thread Anton Shterenlikht
From davide.itali...@gmail.com Mon Oct 14 12:50:44 2013

This is fair enough -- If you're still at the ddb prompt, please print
the whole panic message (or at least the address of the lock reported
as deadlocked by DEADLKRES), so that we can at least have a candidate.


Here's another one, followed by savecore deadlock.
ia64 r255488

panic: wrong page state m 0xe0027a9adb40
cpuid = 0
KDB: stack backtrace:
db_trace_self(0x9ffc00158380) at db_trace_self+0x40
db_trace_self_wrapper(0x9ffc00607370) at db_trace_self_wrapper+0x70
kdb_backtrace(0x9ffc00ed0e10, 0x9ffc0058e660, 0x40c, 
0x9ffc010a44a0) at kdb_backtrace+0xc0
vpanic(0x9ffc00dd3fe0, 0xa0009de61118, 0x9ffc00ef9670, 
0x9ffc00ed0bc0) at vpanic+0x260
kassert_panic(0x9ffc00dd3fe0, 0xe0027a9adb40, 0x81f, 
0xe002013cf400, 0x9ffc006a0220, 0x2c60, 0xe002013cf400, 
0xe002013cf418) at kassert_panic+0x120
vn_sendfile(0x8df, 0xd, 0x0, 0x0, 0x0, 0x8df, 0x7fffdfe0, 0x0) at 
vn_sendfile+0x15d0
sys_sendfile(0xe00012aef200, 0xa0009de614e8, 0x10, 0xa0009de61360) 
at sys_sendfile+0x2b0
syscall(0xe000154f2940, 0xd, 0x0, 0xe00012aef200, 0x0, 0x0, 
0x9ffc00ab7280, 0x8) at syscall+0x5e0
epc_syscall_return() at epc_syscall_return
KDB: enter: panic
[ thread pid 5989 tid 100111 ]
Stopped at  kdb_enter+0x92: [I2]addl r14=0xffe2c990,gp ;;
db 

db scripts
lockinfo=show locks; show alllocks; show lockedvnods
zzz=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; 
capture off; call doadump; reset
db 

db run zzz

I get to 

db:0:alltrace  capture off
db:0:off  call doadump
Dumping 10220 MB (25 chunks)
  chunk 0: 1 pages ... ok
  chunk 1: 159 pages ... ok
  chunk 2: 256 pages ... ok
  chunk 3: 7680 pages ... ok
  chunk 4: 8192 pages ... ok
  chunk 5: 239734 pages ... ok
  chunk 6: 748 pages ... ok
  chunk 7: 533 pages ... ok
  chunk 8: 21 pages ... ok
  chunk 9: 1572862 pages ... ok
  chunk 10: 781683 pages ... ok
  chunk 11: 512 pages ... ok
  chunk 12: 139 pages ... ok
  chunk 13: 484 pages ... ok
  chunk 14: 1565 pages ... ok
  chunk 15: 1 pages ... ok
  chunk 16: 506 pages ... ok
  chunk 17: 1 pages ... ok
  chunk 18: 3 pages ... ok
  chunk 19: 566 pages ... ok
  chunk 20: 66 pages ... ok
  chunk 21: 1 pages ... ok
  chunk 22: 285 pages ... ok
  chunk 23: 6 pages ... ok
  chunk 24: 354 pages ... ok

Dump complete
= 0
db:0:doadump  reset

So far, so good.

On reboot I get:

Starting ddb.
ddb: sysctl: debug.ddb.scripting.scripts: Invalid argument
/etc/rc: WARNING: failed to start ddb

This probably already indicates some problem?

Eventually I get to:

savecore: reboot after panic: wrong page state m 0xe0027a9adb40
Oct 15 09:05:50 mech-as28 savecore: reboot after panic: wrong page state m 
0xe0027a9adb40
savecore: writing core to /var/crash/vmcore.9

So here I'm confused.

I think I set up textdump as in the man page.
So I think the core should not be written.
Instead I was expecting ddb.txt, config.txt, etc.,
as in textdump(4).

Anyway, savecore eventually deadlocks:

panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked 
for 901401 ticks

cpuid = 0
KDB: stack backtrace:
db_trace_self(0x9ffc00158380) at db_trace_self+0x40
db_trace_self_wrapper(0x9ffc00607370) at db_trace_self_wrapper+0x70
kdb_backtrace(0x9ffc00ed0e10, 0x9ffc0058e660, 0x40c, 
0x9ffc010a44a0) at kdb_backtrace+0xc0
vpanic(0x9ffc00db8a18, 0xa0009dca7518) at vpanic+0x260
panic(0x9ffc00db8a18, 0x9ffc00db8c70, 0xe000127b7b00, 0xdc119) at 
panic+0x80
deadlkres(0xdc119, 0xe000127b7b00, 0x9ffc00dbb648, 0x9ffc00db89a8) 
at deadlkres+0x420
fork_exit(0x9ffc00e0fca0, 0x0, 0xa0009dca7550) at fork_exit+0x120
enter_userland() at enter_userland
KDB: enter: panic
[ thread pid 0 tid 100053 ]
Stopped at  kdb_enter+0x92: [I2]addl r14=0xffe2c990,gp ;;
db 

db scripts
lockinfo=show locks; show alllocks; show lockedvnods
db run lockinfo
db:0:lockinfo show locks
db:0:locks  show alllocks
db:0:alllocks  show lockedvnods
Locked vnodes

0xe000127cbba8: tag devfs, type VCHR
usecount 1, writecount 0, refcount 19 mountedhere 0xe000126ab200
flags (VI_ACTIVE)
v_object 0xe000127c2b00 ref 0 pages 422
lock type devfs: EXCL by thread 0xe0001269 (pid 21, syncer, tid 
100062)
dev da3p1

0xe000127f4ec0: tag ufs, type VREG
usecount 1, writecount 1, refcount 32934 mountedhere 0
flags (VI_ACTIVE)
v_object 0xe000127f7200 ref 0 pages 1242850
lock type ufs: EXCL by thread 0xe000127b7b00 (pid 805, savecore, tid 
100079)
ino 6500740, on dev da3p1
db 
db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
  805   80324 0  L+ *vm page  0xe00012402fc0 savecore
  8032424 0  DL+ vm map ( 0xe0001285fa88 sh
  801 1   801 0  Ss  select   0xe00010c296c0 syslogd
  792 1   792 0  Ss  

Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Nick Hibma
 Yes, it's hard to store state on diskless systems... but I figured
 that anyone building a diskless system would know to not create a
 run firstboot scripts marker.  And not all embedded systems are
 diskless...
 
 The embedded systems we create at $work have readonly root and mfs /var,
 but we do have writable storage on another filesystem.  It would work
 for us (not that we need this feature right now) if there were an rcvar
 that pointed to the marker file.  Of course to make it work, something
 would have to get the alternate filesystem mounted early enough to be
 useful (that is something we do already with a custom rc script).
 
 Indeed... the way my patch currently does things, it looks for the
 firstboot sentinel at the start of /etc/rc, which means it *has* to
 be on /.  Making the path an rcvar is a good idea (updated patch
 attached) but we still need some way to re-probe for that file after
 mounting extra filesystems.

In many cases a simple 

test -f /firstboot  bla_enable='YES' || bla_enable='NO'
rm -f /firstboot

in your specific rc.d script would suffice. Or for installing packages:

for pkg in $PKGS; do
if ! pkg_info $pkg-'[0-9]*' /dev/null 21; then
pkg_add /some/dir/$pkg.txz
fi
done

I am not quite sure why we need /firstboot handling in /etc/rc.

Perhaps it is a better idea to make this more generic, to move the rc.d script 
containing a 'runonce' keyword to a subdirectory as the last step in rc (or 
make that an rc.d script in itself!). That way you could consider moving it 
back if you need to re-run it. Or have an rc.d script setup something like a 
database after installing a package by creating a rc.d runonce script.

Default dir could be ./run-once relative to the rc.d dir it is in, configurable 
through runonce_directory .

Note: The move would need to be done at the very end of rc.d to prevent rcorder 
returning a different ordering and skipping scripts because of that.

Nick
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock

2013-10-15 Thread Davide Italiano
On Tue, Oct 15, 2013 at 10:43 AM, Anton Shterenlikht me...@bris.ac.uk wrote:

 Anyway, savecore eventually deadlocks:

 panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked 
 for 901401 ticks


[trim]


 Tracing command savecore pid 805 tid 100079 td 0xe000127b7b00
 cpu_switch(0xe000127b7b00, 0xe00011178900, 0xe00012402fc0, 
 0x9ffc005e7e80) at cpu_switch+0xd0
 sched_switch(0xe000127b7b00, 0xe00011178900, 0x9ffc00f15698, 
 0x9ffc00f15680) at sched_switch+0x890
 mi_switch(0x103, 0x0, 0xe000127b7b00, 0x9ffc0062d1f0) at 
 mi_switch+0x3f0
 turnstile_wait(0xe00012402fc0, 0xe00012400480, 0x0, 
 0x9ffc00dcb698) at turnstile_wait+0x960
 __mtx_lock_sleep(0x9ffc010f9998, 0xe000127b7b00, 0xe00012402fc0, 
 0x9ffc00dc0558, 0x742) at __mtx_lock_sleep+0x2f0
 __mtx_lock_flags(0x9ffc010f9980, 0x0, 0x9ffc00dd4a90, 0x742) at 
 __mtx_lock_flags+0x1e0
 vfs_vmio_release(0xa0009ebe72f0, 0xe0027ed2ab70, 0x3, 
 0xa0009ebe736c, 0xa0009ebe7498, 0xa0009ebe72f8, 
 0x9ffc00dd4a90, 0x9ffc010f9680) at vfs_vmio_release+0x290
 getnewbuf(0xe000127f4ec0, 0x0, 0x0, 0x8000, 0xa0009ebe99a8, 0x0, 
 0x9ffc010f0798, 0xa0009ebe72f0) at getnewbuf+0x7e0
 getblk(0xe000127f4ec0, 0x4cbaa, 0x8000, 0x0, 0x0, 0x0, 0x0, 0x0) at 
 getblk+0xee0
 ffs_balloc_ufs2(0xe000127f4ec0, 0x4cbaa, 0xa000c60ba000, 
 0xe00011165a00, 0x7f05, 0xa0009dd79160) at ffs_balloc_ufs2+0x2950
 ffs_write(0xa0009dd79248, 0x3000, 0x265d5) at ffs_write+0x5c0
 VOP_WRITE_APV(0x9ffc00e94ac0, 0xa0009dd79248, 0x0, 0x0) at 
 VOP_WRITE_APV+0x330
 vn_write(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 
 0xe000129ae830, 0xe000127f4ec0) at vn_write+0x450
 vn_io_fault(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 
 0xe000127b7b00) at vn_io_fault+0x330
 dofilewrite(0xe000127b7b00, 0x7, 0xe000129ae820, 0xa0009dd79360, 
 0x, 0x0) at dofilewrite+0x180
 kern_writev(0xe000127b7b00, 0x7, 0xa0009dd79360) at kern_writev+0xa0
 sys_write(0xe000127b7b00, 0xa0009dd794e8, 0x9ffc00abac80, 0x48d) 
 at sys_write+0x100
 syscall(0xe000129d04a0, 0x140857000, 0x8000, 0xe000127b7b00, 0x0, 
 0x0, 0x9ffc00ab7280, 0x8) at syscall+0x5e0
 --More--

I'm not commenting on the first panic you got -- but on the deadlock
reported by DEADLKRES. I think that's the vm_page lock.
You can run kgdb /boot/${KERNEL}/kernel where ${KERNEL} is the incrimined one
then l *vfs_vmio_release+0x290
to get the exact point where it fails.
I'm unsure here because 'show alllocks' and 'show locks' outputs are
empty -- are you building your kernel with WITNESS etc..?

Thanks,

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: vmstat -z: zfs related failures on r255173

2013-10-15 Thread Dmitriy Makarov
Please, any idea, thougth, help! 
Maybe what information can be useful for diggin - anything...

System what I'm talkin about has a huge problem: performance degradation in 
short time period (day-two). Don't know can we somehow relate this vmstat fails 
with degradation.


 
 Hi all
 
 On CURRENT r255173 we have some interesting values from vmstat -z : REQ = FAIL
 
 [server]# vmstat -z
 ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
 ... skipped
 NCLNODE:                528,      0,       0,       0,       0,   0,   0
 space_seg_cache:         64,      0,  289198,  299554,25932081,25932081,   0
 zio_cache:              944,      0,   37512,   50124,1638254119,1638254119,  
  0
 zio_link_cache:          48,      0,   50955,   38104,1306418638,1306418638,  
  0
 sa_cache:                80,      0,   63694,      56,  198643,198643,   0
 dnode_t:                864,      0,  128813,       3,  184863,184863,   0
 dmu_buf_impl_t:         224,      0, 1610024,  314631,157119686,157119686,   0
 arc_buf_hdr_t:          216,      0,82949975,   56107,156352659,156352659,   0
 arc_buf_t:               72,      0, 1586866,  314374,158076670,158076670,   0
 zil_lwb_cache:          192,      0,    6354,    7526, 2486242,2486242,   0
 zfs_znode_cache:        368,      0,   63694,      16,  198643,198643,   0
 . skipped ..
 
 Can anybody explain this strange failures in zfs related parameters in 
 vmstat, can we do something with this and is this really bad signal?
 
 Thanks! 


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: panic: wrong page state m 0xe00000027a9adb40 + savecore deadlock

2013-10-15 Thread Anton Shterenlikht
From davide.itali...@gmail.com Tue Oct 15 11:30:07 2013

On Tue, Oct 15, 2013 at 10:43 AM, Anton Shterenlikht me...@bris.ac.uk wrote:

 Anyway, savecore eventually deadlocks:

 panic: deadlkres: possible deadlock detected for 0xe000127b7b00, blocked 
 for 901401 ticks


[trim]


 Tracing command savecore pid 805 tid 100079 td 0xe000127b7b00
 cpu_switch(0xe000127b7b00, 0xe00011178900, 0xe00012402fc0, 
 0x9ffc005e7e80) at cpu_switch+0xd0
 sched_switch(0xe000127b7b00, 0xe00011178900, 0x9ffc00f15698, 
 0x9ffc00f15680) at sched_switch+0x890
 mi_switch(0x103, 0x0, 0xe000127b7b00, 0x9ffc0062d1f0) at 
 mi_switch+0x3f0
 turnstile_wait(0xe00012402fc0, 0xe00012400480, 0x0, 
 0x9ffc00dcb698) at turnstile_wait+0x960
 __mtx_lock_sleep(0x9ffc010f9998, 0xe000127b7b00, 0xe00012402fc0, 
 0x9ffc00dc0558, 0x742) at __mtx_lock_sleep+0x2f0
 __mtx_lock_flags(0x9ffc010f9980, 0x0, 0x9ffc00dd4a90, 0x742) at 
 __mtx_lock_flags+0x1e0
 vfs_vmio_release(0xa0009ebe72f0, 0xe0027ed2ab70, 0x3, 
 0xa0009ebe736c, 0xa0009ebe7498, 0xa0009ebe72f8, 
 0x9ffc00dd4a90, 0x9ffc010f9680) at vfs_vmio_release+0x290
 getnewbuf(0xe000127f4ec0, 0x0, 0x0, 0x8000, 0xa0009ebe99a8, 0x0, 
 0x9ffc010f0798, 0xa0009ebe72f0) at getnewbuf+0x7e0
 getblk(0xe000127f4ec0, 0x4cbaa, 0x8000, 0x0, 0x0, 0x0, 0x0, 0x0) at 
 getblk+0xee0
 ffs_balloc_ufs2(0xe000127f4ec0, 0x4cbaa, 0xa000c60ba000, 
 0xe00011165a00, 0x7f05, 0xa0009dd79160) at ffs_balloc_ufs2+0x2950
 ffs_write(0xa0009dd79248, 0x3000, 0x265d5) at ffs_write+0x5c0
 VOP_WRITE_APV(0x9ffc00e94ac0, 0xa0009dd79248, 0x0, 0x0) at 
 VOP_WRITE_APV+0x330
 vn_write(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 
 0xe000129ae830, 0xe000127f4ec0) at vn_write+0x450
 vn_io_fault(0xe000129ae820, 0xa0009dd79360, 0xe00011165a00, 0x0, 
 0xe000127b7b00) at vn_io_fault+0x330
 dofilewrite(0xe000127b7b00, 0x7, 0xe000129ae820, 0xa0009dd79360, 
 0x, 0x0) at dofilewrite+0x180
 kern_writev(0xe000127b7b00, 0x7, 0xa0009dd79360) at kern_writev+0xa0
 sys_write(0xe000127b7b00, 0xa0009dd794e8, 0x9ffc00abac80, 0x48d) 
 at sys_write+0x100
 syscall(0xe000129d04a0, 0x140857000, 0x8000, 0xe000127b7b00, 0x0, 
 0x0, 0x9ffc00ab7280, 0x8) at syscall+0x5e0
 --More--

I'm not commenting on the first panic you got -- but on the deadlock
reported by DEADLKRES. I think that's the vm_page lock.
You can run kgdb /boot/${KERNEL}/kernel where ${KERNEL} is the incrimined one
then l *vfs_vmio_release+0x290
to get the exact point where it fails.

Like this?

# kgdb /boot/kernel/kernel
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as ia64-marcel-freebsd...
(kgdb) l *vfs_vmio_release+0x290
0x9ffc006b8830 is in vfs_vmio_release (/usr/src/sys/kern/vfs_bio.c:1859).
1854/*
1855 * In order to keep page LRU ordering consistent, put
1856 * everything on the inactive queue.
1857 */
1858vm_page_lock(m);
1859vm_page_unwire(m, 0);
1860
1861/*
1862 * Might as well free the page if we can and it has
1863 * no valid data.  We also free the page if the
(kgdb) 


I'm unsure here because 'show alllocks' and 'show locks' outputs are
empty -- are you building your kernel with WITNESS etc..?

I think so:

# Debugging support.  Always need this:
options KDB # Enable kernel debugger support.
options KDB_TRACE   # Print a stack trace for a panic.
# For full debugger support use (turn off in stable branch):
options DDB # Support DDB
options GDB # Support remote GDB
options DEADLKRES   # Enable the deadlock resolver
options INVARIANTS  # Enable calls of extra sanity checking
options INVARIANT_SUPPORT # required by INVARIANTS
options WITNESS # Enable checks to detect deadlocks and cycles
options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed
options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
# textdump(4)
options TEXTDUMP_PREFERRED
options TEXTDUMP_VERBOSE
# 
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC

Also, does this look right:

$ sysctl -a | grep kdb
debug.ddb.scripting.scripts: kdb.enter.panic=textdump set; capture on; run 
lockinfo; show pcpu; bt; ps; alltrace; capture off; call 

Re: claws-mail deadlocking in iconv

2013-10-15 Thread Fabian Keil
Fabian Keil freebsd-lis...@fabiankeil.de wrote:

 After the iconv import claws-mail started to deadlock in iconv every now
 and then on my system, which prevented claws-mail from rendering windows
 or reacting to input.
[...] 
 Did anyone else run into this or can comment on the patch or
 the backtraces?

Thanks for the feedback, everyone. This is now bin/182994:
http://www.freebsd.org/cgi/query-pr.cgi?pr=182994

Fabian


signature.asc
Description: PGP signature


Re: vmstat -z: zfs related failures on r255173

2013-10-15 Thread Allan Jude
On 2013-10-15 07:53, Dmitriy Makarov wrote:
 Please, any idea, thougth, help! 
 Maybe what information can be useful for diggin - anything...

 System what I'm talkin about has a huge problem: performance degradation in 
 short time period (day-two). Don't know can we somehow relate this vmstat 
 fails with degradation.


  
 Hi all

 On CURRENT r255173 we have some interesting values from vmstat -z : REQ = 
 FAIL

 [server]# vmstat -z
 ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
 ... skipped
 NCLNODE:528,  0,   0,   0,   0,   0,   0
 space_seg_cache: 64,  0,  289198,  299554,25932081,25932081,   0
 zio_cache:  944,  0,   37512,   50124,1638254119,1638254119, 
   0
 zio_link_cache:  48,  0,   50955,   38104,1306418638,1306418638, 
   0
 sa_cache:80,  0,   63694,  56,  198643,198643,   0
 dnode_t:864,  0,  128813,   3,  184863,184863,   0
 dmu_buf_impl_t: 224,  0, 1610024,  314631,157119686,157119686,0
 arc_buf_hdr_t:  216,  0,82949975,   56107,156352659,156352659,0
 arc_buf_t:   72,  0, 1586866,  314374,158076670,158076670,0
 zil_lwb_cache:  192,  0,6354,7526, 2486242,2486242,   0
 zfs_znode_cache:368,  0,   63694,  16,  198643,198643,   0
 . skipped ..

 Can anybody explain this strange failures in zfs related parameters in 
 vmstat, can we do something with this and is this really bad signal?

 Thanks! 

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
I am guessing those 'failures' are failures to allocate memory. I'd
recommend you install sysutils/zfs-stats and send the list the output of
'zfs-stats -a'

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[3]: vmstat -z: zfs related failures on r255173

2013-10-15 Thread Dmitriy Makarov
[:~]# zfs-stats -a


ZFS Subsystem ReportTue Oct 15 16:48:43 2013


System Information:

Kernel Version: 151 (osreldate)
Hardware Platform:  amd64
Processor Architecture: amd64

ZFS Storage pool Version:   5000
ZFS Filesystem Version: 5

FreeBSD 10.0-CURRENT #3 r255173: Fri Oct 11 17:15:50 EEST 2013 root
16:48  up 16:27, 1 user, load averages: 12,58 12,51 14,44



System Memory:

15.05%  18.76   GiB Active, 0.05%   61.38   MiB Inact
83.42%  103.98  GiB Wired,  0.55%   702.44  MiB Cache
0.92%   1.14GiB Free,   0.01%   16.93   MiB Gap

Real Installed: 128.00  GiB
Real Available: 99.96%  127.95  GiB
Real Managed:   97.41%  124.65  GiB

Logical Total:  128.00  GiB
Logical Used:   98.52%  126.11  GiB
Logical Free:   1.48%   1.89GiB

Kernel Memory:  91.00   GiB
Data:   99.99%  90.99   GiB
Text:   0.01%   13.06   MiB

Kernel Memory Map:  124.65  GiB
Size:   69.88%  87.11   GiB
Free:   30.12%  37.54   GiB



ARC Summary: (HEALTHY)
Memory Throttle Count:  0

ARC Misc:
Deleted:30.38m
Recycle Misses: 25.16m
Mutex Misses:   7.45m
Evict Skips:444.42m

ARC Size:   100.00% 90.00   GiB
Target Size: (Adaptive) 100.00% 90.00   GiB
Min Size (Hard Limit):  44.44%  40.00   GiB
Max Size (High Water):  2:1 90.00   GiB

ARC Size Breakdown:
Recently Used Cache Size:   92.69%  83.42   GiB
Frequently Used Cache Size: 7.31%   6.58GiB

ARC Hash Breakdown:
Elements Max:   14.59m
Elements Current:   99.70%  14.54m
Collisions: 71.31m
Chain Max:  25
Chains: 2.08m



ARC Efficiency: 1.11b
Cache Hit Ratio:93.89%  1.04b
Cache Miss Ratio:   6.11%   67.70m
Actual Hit Ratio:   91.73%  1.02b

Data Demand Efficiency: 90.56%  294.97m
Data Prefetch Efficiency:   9.64%   7.07m

CACHE HITS BY CACHE LIST:
  Most Recently Used:   8.80%   91.66m
  Most Frequently Used: 88.89%  925.41m
  Most Recently Used Ghost: 0.50%   5.16m
  Most Frequently Used Ghost:   2.97%   30.95m

CACHE HITS BY DATA TYPE:
  Demand Data:  25.66%  267.11m
  Prefetch Data:0.07%   681.36k
  Demand Metadata:  72.04%  749.94m
  Prefetch Metadata:2.24%   23.31m

CACHE MISSES BY DATA TYPE:
  Demand Data:  41.15%  27.86m
  Prefetch Data:9.43%   6.38m
  Demand Metadata:  48.71%  32.98m
  Prefetch Metadata:0.71%   478.11k



L2 ARC Summary: (HEALTHY)
Passed Headroom:1.38m
Tried Lock Failures:403.24m
IO In Progress: 1.19k
Low Memory Aborts:  6
Free on Write:  1.69m
Writes While Full:  3.48k
R/W Clashes:608.58k
Bad Checksums:  0
IO Errors:  0
SPA Mismatch:   321.48m

L2 ARC Size: (Adaptive) 268.26  GiB
Header Size:0.85%   2.27GiB

L2 ARC Breakdown:   67.70m
Hit Ratio:  54.97%  37.21m
Miss Ratio: 45.03%  30.48m
Feeds:  62.45k

L2 ARC Buffer:
Bytes Scanned:  531.83  TiB
Buffer Iterations:  

WITNESS: unable to allocate a new witness object

2013-10-15 Thread Anton Shterenlikht
I'm trying to set up textdump(4).

On boot I see:

WITNESS: unable to allocate a new witness object

also

Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 
0.002434387 s
 kickstart.

Does the first indicate I haven't set up
WITNESS correctly?

What does the second tell me?

Thanks

Anton
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS: unable to allocate a new witness object

2013-10-15 Thread Davide Italiano
On Tue, Oct 15, 2013 at 4:17 PM, Anton Shterenlikht me...@bris.ac.uk wrote:
 I'm trying to set up textdump(4).

 On boot I see:

 WITNESS: unable to allocate a new witness object

 also

It means that you run out of WITNESS object on the free list.


 Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 
 0.002434387 s
  kickstart.


It's output from DIAGNOSTIC, it's triggered when the callout handler
execution time exceeds a given threshold. You can safely ignore it.


Also, please stop spamming on mailing lists with new posts. They more
or less all refers to the same problem. Keeps posting doesn't
encourage people to look at it, neither it helps to have it solved
more quickly.

Thanks,

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Buildworld with ccache fails

2013-10-15 Thread Bryan Drewery
On Tue, Oct 15, 2013 at 09:52:44AM +0400, Sevan / Venture37 wrote:
 Hi,
 I noticed that back in April changes had been commited to allow ccache
 to build -HEAD world so give it a try again on r256380.
 The build process now fails at building libc.so.7 with error
 /usr/bin/ld: this linker was not configured to use sysroots
 cc: error: linker command failed with exit code 1 (use -v to see invocation)

This is a known failure. I haven't had a chance to look into what the
reason is, but ccache simply doesn't work with clang right now anyway.
Despite the CCACHE_CPP2 change I committed to devel/ccache, there are
still some other issues as well.

 
 
 Sevan / Venture37



pgpc_vvnmkZLr.pgp
Description: PGP signature


Re: ia64: panic: wrong page state m 0xe00000027fcc1900

2013-10-15 Thread Konstantin Belousov
On Tue, Oct 15, 2013 at 04:10:04PM +0100, Anton Shterenlikht wrote:
 This panic is always reproducible by
 starting nginx, and directing the browser
 to poudriere logs/bulk/ia64-default/latest/.

From ddb, do 'show pginfo address of the page from the panic message'.


pgpKqixxtbb2b.pgp
Description: PGP signature


Re: [rfc] small bioq patch

2013-10-15 Thread Maksim Yevmenkin
On Fri, Oct 11, 2013 at 5:14 PM, John-Mark Gurney j...@funkthat.com wrote:
 Maksim Yevmenkin wrote this message on Fri, Oct 11, 2013 at 15:39 -0700:
  On Oct 11, 2013, at 2:52 PM, John-Mark Gurney j...@funkthat.com wrote:
 
  Maksim Yevmenkin wrote this message on Fri, Oct 11, 2013 at 11:17 -0700:
  i would like to submit the attached bioq patch for review and
  comments. this is proof of concept. it helps with smoothing disk read
  service times and arrear to eliminates outliers. please see attached
  pictures (about a week worth of data)
 
  - c034 control unmodified system
  - c044 patched system
 
  Can you describe how you got this data?  Were you using the gstat
  code or some other code?

 Yes, it's basically gstat data.

 The reason I ask this is that I don't think the data you are getting
 from gstat is what you think you are...  It accumulates time for a set
 of operations and then divides by the count...  So I'm not sure if the
 stat improvements you are seeing are as meaningful as you might think
 they are...

yes, i'm aware of it. however, i'm not aware of better tools. we
also use dtrace and PCM/PMC. ktrace is not particularly useable for us
because it does not really work well when we push system above 5 Gbps.
in order to actually see any issues we need to push system to 10
Gbps range at least.

  graphs show max/avg disk read service times for both systems across 36
  spinning drives. both systems are relatively busy serving production
  traffic (about 10 Gbps at peak). grey shaded areas on the graphs
  represent time when systems are refreshing their content, i.e. disks
  are both reading and writing at the same time.
 
  Can you describe why you think this change makes an improvement?  Unless
  you're running 10k or 15k RPM drives, 128 seems like a large number.. as
  that's about halve number of IOPs that a normal HD handles in a second..

 Our (Netflix) load is basically random disk io. We have tweaked the system 
 to ensure that our io path is wide enough, I.e. We read 1mb per disk io 
 for majority of the requests. However offsets we read from are all over the 
 place. It appears that we are getting into situation where larger offsets 
 are getting delayed because smaller offsets are jumping ahead of them. 
 Forcing bioq insert tail operation and effectively moving insertion point 
 seems to help avoiding getting into this situation. And, no. We don't use 
 10k or 15k drives. Just regular enterprise 7200 sata drives.

 I assume that the 1mb reads are then further broken up into 8 128kb
 reads? so it's more like every 16 reads in your work load that you
 insert the ordered io...

i'm not sure where 128kb comes from. are you referring to
MAXPHYS/DLFPHYS? if so, then, no, we have increased *PHYS to 1MB.

 I want to make sure that we choose the right value for this number..
 What number of IOPs are you seeing?

generally we see  100 IOPs per disk on a system pushing 10+ Gbps.
i've experimented with different numbers on our system and i did not
see much of a difference on our workload. i'm up a value of 1024 now.
higher numbers seem to produce slightly bigger difference between
average and max time, but i do not think its statistically meaningful.
general shape of the curve remains smooth for all tried values so far.

[...]

  Also, do you see a similar throughput of the system?

 Yes. We do see almost identical throughput from both systems.  I have not 
 pushed the system to its limit yet, but having much smoother disk read 
 service time is important for us because we use it as one of the components 
 of system health metrics. We also need to ensure that disk io request is 
 actually dispatched to the disk in a timely manner.

 Per above, have you measured at the application layer that you are
 getting better latency times on your reads?  Maybe by doing a ktrace
 of the io, and calculating times between read and return or something
 like that...

ktrace is not particularly useful. i can see if i can come up with
dtrace probe or something. our application (or rather clients) are
_very_ sensitive to latency. having read service times outliers is not
very good for us.

 Have you looked at the geom disk schedulers work that Luigi did a few
 years back?  There have been known issues w/ our io scheduler for a
 long time...  If you search the mailing lists, you'll see lots of
 reports from some processes starving out others, probably due to a
 similar issue...  I've seen similar unfair behavior between processes,
 but spend time tracking it down...

yes, we have looked at it. it makes things worse for us, unfortunately.

 It does look like a good improvement though...

 Thanks for the work!

ok :) i'm interested to hear from people who have different workload
profile. for example lots of iops, i.e. very small files reads or
something like that.

thanks,
max
___
freebsd-current@freebsd.org mailing list

Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Tim Kientzle
Wonderful!  This capability is long overdue.

On Oct 13, 2013, at 3:58 PM, Colin Percival cperc...@freebsd.org wrote:
 As examples of what such scripts could do:

More examples:

I've been experimenting with putting gpart resize and growfs
into rc.d scripts to construct images that can be dd'ed onto some medium
and then automatically grow to fill the medium.

When cross-installing ports, there are certain operations
(e.g., updating 'info' database) that can really only be
done after the system next boots.

 I'd like to get this into HEAD in the near future in the hope that I can
 convince re@ that this is a simple enough (and safe enough) change to merge
 before 10.0-RELEASE.

Please.

Tim


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Allan Jude
On 2013-10-15 15:33, Tim Kientzle wrote:
 Wonderful!  This capability is long overdue.

 On Oct 13, 2013, at 3:58 PM, Colin Percival cperc...@freebsd.org wrote:
 As examples of what such scripts could do:
 More examples:

 I've been experimenting with putting gpart resize and growfs
 into rc.d scripts to construct images that can be dd'ed onto some medium
 and then automatically grow to fill the medium.
I didn't think of that, that is a 'killer app' for rpi and other such
devices, or any kind of 'embedded' image really

 When cross-installing ports, there are certain operations
 (e.g., updating 'info' database) that can really only be
 done after the system next boots.

 I'd like to get this into HEAD in the near future in the hope that I can
 convince re@ that this is a simple enough (and safe enough) change to merge
 before 10.0-RELEASE.
 Please.

 Tim


 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Matthew Fleming
On Sun, Oct 13, 2013 at 3:58 PM, Colin Percival cperc...@freebsd.orgwrote:

 Hi all,

 I've attached a very simple patch which makes /etc/rc:

 1. Skip any rc.d scripts with the firstboot keyword if /var/db/firstboot
 does not exist,

 2. If /var/db/firstboot and /var/db/firstboot-reboot exist after running
 rc.d
 scripts, reboot.

 3. Delete /var/db/firstboot (and firstboot-reboot) after the first boot.


We use something like this at work.  However, our version creates a file
after the firstboot scripts have run, and doesn't run if the file exists.

Is there a reason to prefer one choice over the other?  Naively I'd expect
it to be better to run when the file doesn't exist, creating when done; it
solves the problem of making sure the magic file exists before first boot,
for the other polarity.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Colin Percival
On 10/15/13 01:58, Nick Hibma wrote:
 Indeed... the way my patch currently does things, it looks for the
 firstboot sentinel at the start of /etc/rc, which means it *has* to
 be on /.  Making the path an rcvar is a good idea (updated patch
 attached) but we still need some way to re-probe for that file after
 mounting extra filesystems.
 
 In many cases a simple 
 
   test -f /firstboot  bla_enable='YES' || bla_enable='NO'
   rm -f /firstboot
 
 in your specific rc.d script would suffice. [...]
 I am not quite sure why we need /firstboot handling in /etc/rc.

Your suggestion wouldn't work if you have several scripts doing it;
the first one would remove the sentinel and the others wouldn't run.
In my EC2 code I have a single script which runs after all the others
and removes the sentinel file, but that still means that every script
has to be executed on every boot (even if just to check if it should
do anything); putting the logic into /etc/rc would allow rcorder to
skip those scripts entirely.

 Perhaps it is a better idea to make this more generic, to move the rc.d 
 script containing a 'runonce' keyword to a subdirectory as the last step in 
 rc (or make that an rc.d script in itself!). That way you could consider 
 moving it back if you need to re-run it. Or have an rc.d script setup 
 something like a database after installing a package by creating a rc.d 
 runonce script.
 
 Default dir could be ./run-once relative to the rc.d dir it is in, 
 configurable through runonce_directory .
 
 Note: The move would need to be done at the very end of rc.d to prevent 
 rcorder returning a different ordering and skipping scripts because of that.

I considered this, but decided that the most common requirement use of
run once would be for run when the system is first booted, and it
would be much simpler to provide just the firstboot functionality.

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: support for first boot rc.d scripts

2013-10-15 Thread Colin Percival
On 10/15/13 13:09, Matthew Fleming wrote:
 We use something like this at work.  However, our version creates a file after
 the firstboot scripts have run, and doesn't run if the file exists.
 
 Is there a reason to prefer one choice over the other?  Naively I'd expect it 
 to
 be better to run when the file doesn't exist, creating when done; it solves 
 the
 problem of making sure the magic file exists before first boot, for the other
 polarity.

I don't see that making sure that the magic file exists is a problem, since
you'd also need to make sure you have knobs turned on in /etc/rc.conf and/or
extra rc.d scripts installed.

In a very marginal sense, deleting a file is safer than creating one, since if
the filesystem is full you can delete but not create.  It also seems to me that
the sensible polarity is that having something extra lying around makes extra
things happen rather than inhibiting them.

But probably the best argument has to do with upgrading systems -- if you update
a 9.2-RELEASE system to 10.1-RELEASE and there's a first boot script in that
new release, you don't want to have it accidentally get run simply because you
failed to create a /firstboot file during the upgrade process.

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: panic: uma_zfree: Freeing to non free bucket index.

2013-10-15 Thread John Baldwin
On Monday, October 14, 2013 4:44:28 am Anton Shterenlikht wrote:
 
 BTW, I see in dmesg:
 
 Starting ddb.
 ddb: sysctl: debug.ddb.scripting.scripts: Invalid argument
 /etc/rc: WARNING: failed to start ddb
 
 What is that about?
 
 
 panic: uma_zfree: Freeing to non free bucket index.
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self(0x9ffc00158380) at db_trace_self+0x40
 db_trace_self_wrapper(0x9ffc00607370) at db_trace_self_wrapper+0x70
 kdb_backtrace(0x9ffc00ed0e10, 0x9ffc0058e660, 0x40c, 
0x9ffc010a44a0) at kdb_backtrace+0xc0
 vpanic(0x9ffc00dfc468, 0xa000e26e0fd8, 0x9ffc00ef9670, 
0x9ffc00ed0bc0) at vpanic+0x260
 kassert_panic(0x9ffc00dfc468, 0xe00015e25f90, 0xe00015e243e0, 
0xe0027ffd5200) at kassert_panic+0x120
 uma_zfree_arg(0xe0027ffccfc0, 0xe00015e243e0, 0x0) at 
uma_zfree_arg+0x2d0
 g_destroy_bio(0xe00015e243e0, 0x9ffc004ad4a0, 0x30a, 0x30a) at 
g_destroy_bio+0x30
 g_disk_done(0xe00015e243e0, 0xe00015e15d10, 0xe00012672700, 
0x9ffc006b18c0) at g_disk_done+0x140
 biodone(0xe00015e243e0, 0x9ffc00e0e150, 0xe00010c24030, 0x0, 
0x0, 0x0, 0x9ffc00066890, 0x614) at biodone+0x180
 dadone(0xe00012672600, 0xe00012541000, 0xe00015e243e0, 0x7) at 
dadone+0x620
 camisr_runqueue(0xe00011a2dc00, 0xe00012541054, 0xe00012541000, 
0x135d) at camisr_runqueue+0x6c0
 camisr(0xe00011a2dc20, 0xe00011a2dc00, 0x9ffc00bee9d0, 
0xa000e26e1548) at camisr+0x260
 intr_event_execute_handlers(0xe000111764a0, 0xe0001118d998, 
0xe00011191c00, 0x0) at intr_event_execute_handlers+0x280
 ithread_loop(0xe00011192f00, 0xa000e26e1550, 0xe00011192f14, 
0xe0001118d99c) at ithread_loop+0x1b0
 fork_exit(0x9ffc00e12a90, 0xe00011192f00, 0xa000e26e1550) at 
fork_exit+0x120
 enter_userland() at enter_userland
 KDB: enter: panic
 [ thread pid 12 tid 100015 ]
 Stopped at  kdb_enter+0x92: [I2]addl r14=0xffe2c990,gp ;;
 db 
 
 db scripts
 lockinfo=show locks; show alllocks; show lockedvnods
 db run lockinfo
 db:0:lockinfo show locks
 db:0:locks  show alllocks
 db:0:alllocks  show lockedvnods
 Locked vnodes
 
 0xe0001ab39ba8: tag ufs, type VDIR
 usecount 1, writecount 0, refcount 3 mountedhere 0
 flags (VI_ACTIVE)
 v_object 0xe0001cd30900 ref 0 pages 0
 lock type ufs: EXCL by thread 0xe000183d9680 (pid 41389, cpdup, tid 
100121)
 ino 5467932, on dev da5p1
 
 0xe00015ed3ba8: tag ufs, type VDIR
 usecount 1, writecount 0, refcount 3 mountedhere 0
 flags (VI_ACTIVE)
 v_object 0xe0001cd33e00 ref 0 pages 0
 lock type ufs: EXCL by thread 0xe00012a28900 (pid 41421, cpdup, tid 
100092)
 ino 5467948, on dev da5p1
 
 0xe0001ab16938: tag ufs, type VREG
 usecount 1, writecount 0, refcount 3 mountedhere 0
 flags (VI_ACTIVE)
 v_object 0xe0001cd98a00 ref 0 pages 1
 lock type ufs: EXCL by thread 0xe00018494000 (pid 41337, cpdup, tid 
100137)
 ino 5469420, on dev da5p1
 
 0xe0001b2503b0: tag ufs, type VREG
 usecount 1, writecount 0, refcount 1 mountedhere 0
 flags (VI_ACTIVE)
 lock type ufs: EXCL by thread 0xe00012a28900 (pid 41421, cpdup, tid 
100092)
 ino 5469421, on dev da5p1
 
 0xe0001ab2a760: tag ufs, type VREG
 usecount 1, writecount 0, refcount 1 mountedhere 0
 flags (VI_ACTIVE)
 lock type ufs: EXCL by thread 0xe000183d9680 (pid 41389, cpdup, tid 
100121)
 ino 5469422, on dev da5p1
 db 
 db script zzz=textdump set; capture on; run lockinfo; show pcpu; bt; ps; 
alltrace; capture off; reset
 db run zzz

I think 'reset' is going to reset without doing a dump?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Please shorten ZFS disk names.

2013-10-15 Thread James R. Van Artsdalen
BLACKIE:/root# uname -a
FreeBSD BLACKIE.housenet.jrv 10.0-BETA1 FreeBSD 10.0-BETA1 #0 r256428M:
Sun Oct 13 23:46:54 CDT 2013
r...@clank.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64

This pool is on da{0,1,2,3,4,5,6,7} - I think, only da4 is sure

NAME 
STATE READ WRITE CKSUM
z03  
ONLINE   0 0 0
  raidz2-0   
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HSTK 
ONLINE   0 0 0
da4  
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HTCQ 
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300JDT5 
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HTCE 
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HTS7 
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300JBN1 
ONLINE   0 0 0
diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HTAP 
ONLINE   0 0 0

another example:

BLACKIE:/usr/src# zpool status
  pool: BLACKIE
 state: ONLINE
  scan: none requested
config:

NAME  STATE READ WRITE CKSUM
BLACKIE   ONLINE   0 0 0
  gptid/3d882ab0-3588-11e3-b6bc-002590c08004  ONLINE   0 0 0

Based on the hardware config that's either ada0p3 or ada1p3.  Whichever
it is I want to mirror it onto the other but I don't the names to use
for src and dst.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Please shorten ZFS disk names.

2013-10-15 Thread Glen Barber
On Tue, Oct 15, 2013 at 05:49:06PM -0500, James R. Van Artsdalen wrote:
 BLACKIE:/root# uname -a
 FreeBSD BLACKIE.housenet.jrv 10.0-BETA1 FreeBSD 10.0-BETA1 #0 r256428M:
 Sun Oct 13 23:46:54 CDT 2013
 r...@clank.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64
 
 This pool is on da{0,1,2,3,4,5,6,7} - I think, only da4 is sure
 
 NAME 
 STATE READ WRITE CKSUM
 z03  
 ONLINE   0 0 0
   raidz2-0   
 ONLINE   0 0 0
 diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HSTK 

 [...]

 Based on the hardware config that's either ada0p3 or ada1p3.  Whichever
 it is I want to mirror it onto the other but I don't the names to use
 for src and dst.

You can set kern.geom.label.gptid.enable=0 in loader.conf(5), which will
use the gptid.

Glen



pgpTY95DKn_sl.pgp
Description: PGP signature


Re: Please shorten ZFS disk names.

2013-10-15 Thread Glen Barber
On Tue, Oct 15, 2013 at 07:07:47PM -0400, Glen Barber wrote:
  Based on the hardware config that's either ada0p3 or ada1p3.  Whichever
  it is I want to mirror it onto the other but I don't the names to use
  for src and dst.
 
 You can set kern.geom.label.gptid.enable=0 in loader.conf(5), which will
 use the gptid.
 

Which will *not* use the gptid...

Glen



pgpvXxAQZEXCM.pgp
Description: PGP signature


Re: Please shorten ZFS disk names.

2013-10-15 Thread Allan Jude
On 2013-10-15 19:07, Glen Barber wrote:
 On Tue, Oct 15, 2013 at 05:49:06PM -0500, James R. Van Artsdalen wrote:
 BLACKIE:/root# uname -a
 FreeBSD BLACKIE.housenet.jrv 10.0-BETA1 FreeBSD 10.0-BETA1 #0 r256428M:
 Sun Oct 13 23:46:54 CDT 2013
 r...@clank.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64

 This pool is on da{0,1,2,3,4,5,6,7} - I think, only da4 is sure

 NAME 
 STATE READ WRITE CKSUM
 z03  
 ONLINE   0 0 0
   raidz2-0   
 ONLINE   0 0 0
 diskid/DISK-%20%20%20%20%20%20%20%20%20%20%20%20Z300HSTK 

 [...]

 Based on the hardware config that's either ada0p3 or ada1p3.  Whichever
 it is I want to mirror it onto the other but I don't the names to use
 for src and dst.
 You can set kern.geom.label.gptid.enable=0 in loader.conf(5), which will
 use the gptid.

 Glen

In this case, it is the disk_ident that is being used, not the GPT
label, so you want to set: kern.geom.label.disk_ident.enable=0 in
/boot/loader.conf and then zfs won't see that device alias, and will
show the expected device name

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: vi drop outs with Bus error

2013-10-15 Thread Julian H. Stacey
Hi, Reference:
 From: Julian H. Stacey j...@berklix.com 
 Date: Mon, 14 Oct 2013 08:33:35 +0200 

Julian H. Stacey wrote:
 Anyone else seeing vi dropping out after a while with Bus error  no core ?
 
 Seen on 10.0-ALPHA4  now on 10.0-ALPHA5 
 (after buildkernel installkernel buildworld installworld )
 
 It's not hardware, the laptop is stable  has compiled 594 ports so far,
 cd /usr/bin ; ls -l nvi*
 -r-xr-xr-x  1 root  wheel  402064 Oct 12 02:42 nvi.4*
 -r-xr-xr-x  1 root  wheel  402432 Oct 12 02:42 nvi.5*
 file nvi*
 nvi.4: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically \
linked (uses shared libs), for FreeBSD 10.0 (155), stripped
 nvi.5: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically \
linked (uses shared libs), for FreeBSD 10.0 (155), stripped

I'm no longer seeing drop outs with Bus error, instead, after
xterm -sl 1024 -g 80x24 -j -n lapr  -e rlogin -D 10beta1host 
vi freezes within the xterm after I do an X11 mouse resize
(maybe that same SIGWINCH was causing Bus Error before, as resize I tend to
do a lot without remembering :-)

Anyone else see it ?

Cheers,
Julian
-- 
Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
 Interleave replies below like a play script.  Indent old text with  .
 Send plain text, not quoted-printable, HTML, base64, or multipart/alternative.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


amd64 minidump slowness

2013-10-15 Thread Bryan Venteicher
Hi,

At $JOB, we have machines with 400GB RAM that even the smallest
15GB amd64 minidump takes well over an hour. The major cause of
the slowness is that in minidumpsys(), blk_write() is called
PAGE_SIZE at a time. This causes blk_write() to poll the console
for the Ctrl-C abort once per page.

The attached patch changes blk_write() to be called with a run of
physically contiguous pages. This reduced the dump time by over a
magnitude. Of course, blk_write() could also be changed to poll
the console less frequently (like only on every IO).

If anybody else dumps on machines with lots of RAM, it would be
nice to know the difference this patch makes. I've got a second
set of patches that further reduces the dump time by over half that
I'll try to clean up soon.

http://people.freebsd.org/~bryanv/patches/minidump.patchcommit 25f9e82e4ac93e71c6cf06fe2faa1899967db725
Author: Bryan Venteicher bryanventeic...@gmail.com
Date:   Sun Sep 29 13:56:42 2013 -0500

Call blk_write() with a run of physically contiguous pages

Previously, blk_write() was being called one page at a time, which
would cause it to poll the console for every page. This change makes
dumping a magnitude faster, and is especially useful on large memory
machines.

diff --git a/sys/amd64/amd64/minidump_machdep.c b/sys/amd64/amd64/minidump_machdep.c
index f14c539..26b2b31 100644
--- a/sys/amd64/amd64/minidump_machdep.c
+++ b/sys/amd64/amd64/minidump_machdep.c
@@ -221,7 +221,8 @@ minidumpsys(struct dumperinfo *di)
 	vm_offset_t va;
 	int error;
 	uint64_t bits;
-	uint64_t *pml4, *pdp, *pd, *pt, pa;
+	uint64_t *pml4, *pdp, *pd, *pt, start_pa, pa;
+	size_t sz;
 	int i, ii, j, k, n, bit;
 	int retry_count;
 	struct minidumphdr mdhdr;
@@ -412,18 +413,29 @@ minidumpsys(struct dumperinfo *di)
 	}
 
 	/* Dump memory chunks */
-	/* XXX cluster it up and use blk_dump() */
-	for (i = 0; i  vm_page_dump_size / sizeof(*vm_page_dump); i++) {
+	for (i = 0, start_pa = 0, sz = 0;
+	 i  vm_page_dump_size / sizeof(*vm_page_dump); i++) {
 		bits = vm_page_dump[i];
 		while (bits) {
 			bit = bsfq(bits);
 			pa = (((uint64_t)i * sizeof(*vm_page_dump) * NBBY) + bit) * PAGE_SIZE;
-			error = blk_write(di, 0, pa, PAGE_SIZE);
-			if (error)
-goto fail;
+			if (sz == 0 || start_pa + sz == pa) {
+if (sz == 0)
+	start_pa = pa;
+sz += PAGE_SIZE;
+			} else {
+error = blk_write(di, 0, start_pa, sz);
+if (error)
+	goto fail;
+start_pa = pa;
+sz = PAGE_SIZE;
+			}
 			bits = ~(1ul  bit);
 		}
 	}
+	error = blk_write(di, 0, start_pa, sz);
+	if (error)
+		goto fail;
 
 	error = blk_flush(di);
 	if (error)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: What happened to nslookup?

2013-10-15 Thread Kevin Oberman
On Sun, Oct 13, 2013 at 5:47 PM, Julian Elischer jul...@freebsd.org wrote:

 On 10/12/13 10:28 AM, David Wolfskill wrote:

 On Sat, Oct 12, 2013 at 02:14:28AM +, Thomas Mueller wrote:

 ...
 Thanks for info!

 Glad to help.

  I saw that bind was removed from the current branch because of security
 problems,

 It was removed, but I believe that there was a bit more to it than
 security problems.

 I think it was just a personal preference that managed to get communicated
 as important, and no-one had the energy or will to argue about it.
 (that's the way software projects often work.. loudest and most persistent
 voice wins).


  but didn't know nslookup was part of BIND.

 Now I see in $PORTSDIR/dns/bind-tools/pkg-**plist

 bin/dig
 bin/host
 bin/nslookup

 so host is also part of BIND?

 :-}  The version of host we had when BIND was part of base was part of
 BIND, yes.  Looking in src/usr.bin/host/Makefile, I see:

 # $FreeBSD: head/usr.bin/host/Makefile 255949 2013-09-30 17:23:45Z des $

 LDNSDIR=${.CURDIR}/../../contrib/ldns
 LDNSHOSTDIR=${.CURDIR}/../../contrib/ldns-**host
 ...

 which indicates that this is a re-implementation of host as
 provided by contrib/ldns.

  I will remember to use host in the future.

 I have found it generally easy to use (easier by far than nslookup).

 Peace,
 david



nslookup(1) was deprecated about a decade ago because it often provides
misleading results when used for DNS troubleshooting. It generally works
fine for simply turning a name to an address or vice-versa.

People should really use host(1) for simple lookups. It provides the same
information and does it in a manner that will not cause misdirection when
things are broken.

If you REALLY want to dig (sorry) into DNS behavior or problems, learn to
use dig(1). It does the same as host(1) or nslookup(1) in it's simplest
form but has an extremely large number of options to adjust the query in a
variety of ways to allow real analysis of DNS behavior.

I'd love to see nslookup simply vanish, but I expect it to be around and
causing grief until the day I die (which I hope will still e at least a
couple of decades down the road.)

-- 
R. Kevin Oberman, Network Engineer
E-mail: rkober...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org