Re: About panic: bufwrite: buffer is not busy???

2011-03-08 Thread Freddie Cash
Not sure if the CC: line needs to be trimmed, leaving it as is for now.


On Sun, Feb 20, 2011 at 7:46 AM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote:
 On 2/20/2011 9:33 AM, Andrey Smagin wrote:
  On week -current I have same problem, my box paniced every 2-15 min. I 
  resolve problem by next steps - unplug network connectors from 2 intel em 
  (82574L) cards. I think last time that mpd5 related panic, but mpd5 work 
  with another re interface interated on MB. I think it may be em related 
  panic, or em+mpd5.

 The latest panic I saw didnt have anything to do with em.  Are you sure
 your crashes are because of the nic drive ?

 Not to mention, the error string the OP provided (see Subject) is only
 contained in one file: sys/ufs/ffs/ffs_vfsops.c, function
 ffs_bufwrite().  So, that would be some kind of weird filesystem-related
 issue, not NIC-specific.  I have no idea how to debug said problem.

I can semi-reliably reproduce this panic message on a 9-CURRENT box,
with sources from March 7.

On this box, it happens every other time I start hastd.

hastd creates the 12 GEOM providers used to create a ZFS pool.  A
simple service hastd onestart will generate the panic.

Is there any extra info that needed to help track this down?

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: About panic: bufwrite: buffer is not busy???

2011-02-21 Thread Kostik Belousov
On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote:
 On 2/20/2011 9:33 AM, Andrey Smagin wrote:
  On week -current I have same problem, my box paniced every 2-15 min. I 
  resolve problem by next steps - unplug network connectors from 2 intel em 
  (82574L) cards. I think last time that mpd5 related panic, but mpd5 work 
  with another re interface interated on MB. I think it may be em related 
  panic, or em+mpd5.
 
 The latest panic I saw didnt have anything to do with em.  Are you sure
 your crashes are because of the nic drive ?
 
 The latest I saw was on Friday.
 
 # kgdb /usr/obj/usr/src/sys/router/kernel.debug vmcore.11
 (kgdb) bt
 #0  doadump () at pcpu.h:231
 #1  0xc04a51f9 in db_fncall (dummy1=1, dummy2=0, dummy3=-106856,
 dummy4=0xc6b9696c ) at /usr/src/sys/ddb/db_command.c:548
 #2  0xc04a55f1 in db_command (last_cmdp=0xc096f73c, cmd_table=0x0,
 dopager=1) at /usr/src/sys/ddb/db_command.c:445
 #3  0xc04a574a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
 #4  0xc04a764d in db_trap (type=12, code=0) at
 /usr/src/sys/ddb/db_main.c:229
 #5  0xc068ba7e in kdb_trap (type=12, code=0, tf=0xc6b96b94) at
 /usr/src/sys/kern/subr_kdb.c:546
 #6  0xc088056f in trap_fatal (frame=0xc6b96b94, eva=52) at
 /usr/src/sys/i386/i386/trap.c:937
 #7  0xc0880830 in trap_pfault (frame=0xc6b96b94, usermode=0, eva=52) at
 /usr/src/sys/i386/i386/trap.c:859
 #8  0xc0880d4a in trap (frame=0xc6b96b94) at
 /usr/src/sys/i386/i386/trap.c:532
 #9  0xc086716c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
 #11 0xc0654ec9 in crcopy (dest=0xce3ee800, src=0xce3ee600) at
 /usr/src/sys/kern/kern_prot.c:1873
 #12 0xc0654fd1 in crcopysafe (p=0xc90cc810, cr=0xce3ee800) at
 /usr/src/sys/kern/kern_prot.c:1950
 #13 0xc0656d7f in seteuid (td=0xc9196b80, uap=0xc6b96cec) at
 /usr/src/sys/kern/kern_prot.c:615
 #14 0xc06985ff in syscallenter (td=0xc9196b80, sa=0xc6b96ce4) at
 /usr/src/sys/kern/subr_trap.c:315
 #15 0xc0880884 in syscall (frame=0xc6b96d28) at
 /usr/src/sys/i386/i386/trap.c:1061
 #16 0xc08671d1 in Xint0x80_syscall () at
 /usr/src/sys/i386/i386/exception.s:264
 #17 0x0033 in ?? ()
 
 (kgdb) frame 10
 #10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
 1248{
 (kgdb) list
 1243 * Place another refcount on a uidinfo struct.
 1244 */
 1245void
 1246uihold(uip)
 1247struct uidinfo *uip;
 1248{
 1249
 1250refcount_acquire(uip-ui_ref);
 1251}
 1252
 (kgdb) p *uip
 Cannot access memory at address 0x0
 (kgdb) p uip
 $1 = (struct uidinfo *) 0x0
 (kgdb)
Is this reproducable ?
What system version is it ?

Could you, please, go to frame 12 and show the output of p *p,
p *(p-p_ucred) ?


pgpqIxtvc9MgK.pgp
Description: PGP signature


Re: About panic: bufwrite: buffer is not busy???

2011-02-21 Thread Mike Tancsa
On 2/21/2011 4:10 PM, Kostik Belousov wrote:
 Is this reproducable ?

The box seems to have a number of bugs it has been triggering.  
g...@freebsd.org's netgraph patch, seems to have fixed one of them. Max seems 
to have fixed two others.  This one, I am not sure. I can re-enable memguard to 
randomly sample again, which is what seemed to have caught / triggered it.

 What system version is it ?
 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #11: Thu Feb 17 i386, 4G of RAM

 
 Could you, please, go to frame 12 and show the output of p *p,
 p *(p-p_ucred) ?


(kgdb) frame 12
#12 0xc0654fd1 in crcopysafe (p=0xc90cc810, cr=0xce3ee800) at 
/usr/src/sys/kern/kern_prot.c:1950
1950crcopy(cr, oldcred);
(kgdb) list
1945PROC_UNLOCK(p);
1946crextend(cr, groups);
1947PROC_LOCK(p);
1948oldcred = p-p_ucred;
1949}
1950crcopy(cr, oldcred);
1951
1952return (oldcred);
1953}
1954
(kgdb) p *(p-p_ucred)
$1 = {cr_ref = 3373030400, cr_uid = 3460374784, cr_ruid = 3231313392, cr_svuid 
= 7196, cr_ngroups = 0, cr_rgid = 503415038, cr_svgid = 0, cr_uidinfo = 0x0, 
cr_ruidinfo = 0x0, 
  cr_prison = 0x0, cr_pspare = 0x, cr_flags = 4294967295, cr_pspare2 = 
{0x0, 0x0}, cr_label = 0x, cr_audit = {ai_auid = 0, ai_mask = 
{am_success = 0, 
  am_failure = 1298034100}, ai_termid = {at_port = 3, at_type = 1, at_addr 
= {0, 64, 0, 0}}, ai_asid = 0, ai_flags = 0}, cr_groups = 0xc9e37900, 
cr_agroups = 16}
(kgdb) p *p
$2 = {p_list = {le_next = 0xc93ed560, le_prev = 0xc9187ac0}, p_threads = 
{tqh_first = 0xc9196b80, tqh_last = 0xc9196b88}, p_slock = {lock_object = {
  lo_name = 0xc08efca2 process slock, lo_flags = 720896, lo_data = 0, 
lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xce3ee600, p_fd = 0xc9559100, 
p_fdtol = 0x0, 
  p_stats = 0xc90cd600, p_limit = 0xc912d600, p_limco = {c_links = {sle = 
{sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 
0x0, c_func = 0, 
c_lock = 0xc90cc898, c_flags = 0, c_cpu = 0}, p_sigacts = 0xc911f000, 
p_flag = 268435713, p_state = PRS_NORMAL, p_pid = 565, p_hash = {le_next = 0x0, 
le_prev = 0xc8d148d4}, 
  p_pglist = {le_next = 0x0, le_prev = 0xc90c85c8}, p_pptr = 0xc8d2b000, 
p_sibling = {le_next = 0xc93ed560, le_prev = 0xc9187b3c}, p_children = 
{lh_first = 0x0}, p_mtx = {
lock_object = {lo_name = 0xc08efc95 process lock, lo_flags = 21168128, 
lo_data = 0, lo_witness = 0x0}, mtx_lock = 3373886336}, p_ksi = 0xc908f9b0, 
p_sigqueue = {
sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, 
sq_list = {tqh_first = 0x0, tqh_last = 0xc90cc8d0}, sq_proc = 0xc90cc810, 
sq_flags = 1}, 
  p_oppid = 0, p_vmspace = 0xc93f0e80, p_swtick = 6600, p_realtimer = 
{it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 
0}}, p_ru = {ru_utime = {
  tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss 
= 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, 
ru_nswap = 0, 
ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 
0, ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {rux_runtime = 109046064880, 
rux_uticks = 1368, 
rux_sticks = 5393, rux_iticks = 0, rux_uu = 10366008, rux_su = 40860399, 
rux_tu = 51225136}, p_crux = {rux_runtime = 0, rux_uticks = 0, rux_sticks = 0, 
rux_iticks = 0, 
rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, 
p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xc95bf96c, 
p_lock = 0, 
  p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, 
p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 
0x0, p_aioinfo = 0x0, 
  p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, 
p_pendingcnt = 0, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 802500, 
  p_comm = zebra, '\0' repeats 14 times, p_pgrp = 0xc90c85c0, p_sysent = 
0xc095c800, p_args = 0xc90c8440, p_cpulimit = 9223372036854775807, p_nice = 0 
'\0', p_fibnum = 0, 
  p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0xc062a990 
knlist_mtx_lock, kl_unlock = 0xc062a940 knlist_mtx_unlock, 
kl_assert_locked = 0xc06275f0 knlist_mtx_assert_locked, 
kl_assert_unlocked = 0xc0627600 knlist_mtx_assert_unlocked, kl_lockarg = 
0xc90cc898}, p_numthreads = 1, p_md = {
md_ldt = 0x0}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = 
{tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock 
= 0x0, c_flags = 16, 
c_cpu = 0}, p_acflag = 1, p_peers = 0x0, p_leader = 0xc90cc810, p_emuldata 
= 0x0, p_label = 0x0, p_sched = 0xc90ccac0, p_ktr = {stqh_first = 0x0, 
stqh_last = 0xc90ccaa0}, 
  p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 
0xc08f00ef ppwait, cv_waiters = 0}, p_dbgwait = {cv_description = 0xc08f00f6 
dbgwait, 
cv_waiters = 0}}
(kgdb) 

-- 
---
Mike 

Re: About panic: bufwrite: buffer is not busy???

2011-02-20 Thread Mike Tancsa
On 2/20/2011 9:33 AM, Andrey Smagin wrote:
 On week -current I have same problem, my box paniced every 2-15 min. I 
 resolve problem by next steps - unplug network connectors from 2 intel em 
 (82574L) cards. I think last time that mpd5 related panic, but mpd5 work with 
 another re interface interated on MB. I think it may be em related panic, or 
 em+mpd5.

The latest panic I saw didnt have anything to do with em.  Are you sure
your crashes are because of the nic drive ?

The latest I saw was on Friday.

# kgdb /usr/obj/usr/src/sys/router/kernel.debug vmcore.11
(kgdb) bt
#0  doadump () at pcpu.h:231
#1  0xc04a51f9 in db_fncall (dummy1=1, dummy2=0, dummy3=-106856,
dummy4=0xc6b9696c ) at /usr/src/sys/ddb/db_command.c:548
#2  0xc04a55f1 in db_command (last_cmdp=0xc096f73c, cmd_table=0x0,
dopager=1) at /usr/src/sys/ddb/db_command.c:445
#3  0xc04a574a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xc04a764d in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:229
#5  0xc068ba7e in kdb_trap (type=12, code=0, tf=0xc6b96b94) at
/usr/src/sys/kern/subr_kdb.c:546
#6  0xc088056f in trap_fatal (frame=0xc6b96b94, eva=52) at
/usr/src/sys/i386/i386/trap.c:937
#7  0xc0880830 in trap_pfault (frame=0xc6b96b94, usermode=0, eva=52) at
/usr/src/sys/i386/i386/trap.c:859
#8  0xc0880d4a in trap (frame=0xc6b96b94) at
/usr/src/sys/i386/i386/trap.c:532
#9  0xc086716c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
#11 0xc0654ec9 in crcopy (dest=0xce3ee800, src=0xce3ee600) at
/usr/src/sys/kern/kern_prot.c:1873
#12 0xc0654fd1 in crcopysafe (p=0xc90cc810, cr=0xce3ee800) at
/usr/src/sys/kern/kern_prot.c:1950
#13 0xc0656d7f in seteuid (td=0xc9196b80, uap=0xc6b96cec) at
/usr/src/sys/kern/kern_prot.c:615
#14 0xc06985ff in syscallenter (td=0xc9196b80, sa=0xc6b96ce4) at
/usr/src/sys/kern/subr_trap.c:315
#15 0xc0880884 in syscall (frame=0xc6b96d28) at
/usr/src/sys/i386/i386/trap.c:1061
#16 0xc08671d1 in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:264
#17 0x0033 in ?? ()

(kgdb) frame 10
#10 0xc0657a16 in uihold (uip=0x0) at /usr/src/sys/kern/kern_resource.c:1248
1248{
(kgdb) list
1243 * Place another refcount on a uidinfo struct.
1244 */
1245void
1246uihold(uip)
1247struct uidinfo *uip;
1248{
1249
1250refcount_acquire(uip-ui_ref);
1251}
1252
(kgdb) p *uip
Cannot access memory at address 0x0
(kgdb) p uip
$1 = (struct uidinfo *) 0x0
(kgdb)

 
 
 Wed, 16 Feb 2011 12:08:30 -0500 письмо от Andrew Boyer 
 abo...@averesystems.com:
 
 Moving this to -current and -stable and following up...

 Something is broken with coredumps on stable/8 amd64.  I tried a vanilla
 8.2-RC3 and yesterday's csup of stable/8; neither can dump a core with 
 'sysctl
 debug.kdb.panic=1'.

 For the 8.2-RC3 / amd64 / GENERIC install, I used the memstick image,
 installed on ad7 (a 250GB SATA drive), used the default partition map, and 
 set
 dumpdev to AUTO.

 I added enough tracing to show that the second panic is due to the syncer
 process flushing buffers to the other filesystems in parallel with the dump. 
 I've seen this panic and a similar one 'buffer not locked' coming from
 ffs_write().  One time out of about 30 the core ran to completion, but slowly
 (~1MB/sec).  Other times the dump just locks up completely with no other
 output.

 Does anyone know what might have changed to expose this problem?

 I don't ever see it under 7.1.

 Thanks,
 Andrew

 On Feb 3, 2011, at 12:11 AM, Eugene Grosbein wrote:

 On 02.02.2011 00:50, Gleb Smirnoff wrote:

 E Uptime: 8h3m51s
 E Dumping 4087 MB (3 chunks)
 E   chunk 0: 1MB (150 pages) ... ok
 E   chunk 1: 3575MB (915088 pages) 3559 3543panic: bufwrite: buffer is not
 busy???
 E cpuid = 3
 E Uptime: 8h3m52s
 E Automatic reboot in 15 seconds - press a key on the console to abort
 Can you add KDB_TRACE option to kernel? Your boxes for some reason can't
 dump core, but with this option we will have at least trace.

 I see Mike Tancsa's box has bufwrite: buffer is not busy??? problem too.
 Has anyone a thought how to fix generation of crashdumps?

 Eugene Grosbein


 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

 --
 Andrew Boyer abo...@averesystems.com




 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
 


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___

Re: About panic: bufwrite: buffer is not busy???

2011-02-20 Thread Andrew Boyer

On Feb 20, 2011, at 10:46 AM, Jeremy Chadwick wrote:

 On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote:
 On 2/20/2011 9:33 AM, Andrey Smagin wrote:
 On week -current I have same problem, my box paniced every 2-15 min. I 
 resolve problem by next steps - unplug network connectors from 2 intel em 
 (82574L) cards. I think last time that mpd5 related panic, but mpd5 work 
 with another re interface interated on MB. I think it may be em related 
 panic, or em+mpd5.
 
 The latest panic I saw didnt have anything to do with em.  Are you sure
 your crashes are because of the nic drive ?
 
 Not to mention, the error string the OP provided (see Subject) is only
 contained in one file: sys/ufs/ffs/ffs_vfsops.c, function
 ffs_bufwrite().  So, that would be some kind of weird filesystem-related
 issue, not NIC-specific.  I have no idea how to debug said problem.
 

The issue is the file system activity occurring in parallel with the coredump, 
which is strange.  It seems like everything else should be halted before the 
dump begins but I couldn't find a place in the code that actually tries to stop 
the other CPUs.

My question isn't about the initial panic (I was using the sysctl to provoke 
one), but about the secondary panic.

This is on 8-core systems.

-Andrew

--
Andrew Boyerabo...@averesystems.com




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: About panic: bufwrite: buffer is not busy???

2011-02-20 Thread Jeremy Chadwick
On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote:
 On 2/20/2011 9:33 AM, Andrey Smagin wrote:
  On week -current I have same problem, my box paniced every 2-15 min. I 
  resolve problem by next steps - unplug network connectors from 2 intel em 
  (82574L) cards. I think last time that mpd5 related panic, but mpd5 work 
  with another re interface interated on MB. I think it may be em related 
  panic, or em+mpd5.
 
 The latest panic I saw didnt have anything to do with em.  Are you sure
 your crashes are because of the nic drive ?

Not to mention, the error string the OP provided (see Subject) is only
contained in one file: sys/ufs/ffs/ffs_vfsops.c, function
ffs_bufwrite().  So, that would be some kind of weird filesystem-related
issue, not NIC-specific.  I have no idea how to debug said problem.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: About panic: bufwrite: buffer is not busy???

2011-02-16 Thread Andrew Boyer
Moving this to -current and -stable and following up...

Something is broken with coredumps on stable/8 amd64.  I tried a vanilla 
8.2-RC3 and yesterday's csup of stable/8; neither can dump a core with 'sysctl 
debug.kdb.panic=1'.

For the 8.2-RC3 / amd64 / GENERIC install, I used the memstick image, installed 
on ad7 (a 250GB SATA drive), used the default partition map, and set dumpdev to 
AUTO.

I added enough tracing to show that the second panic is due to the syncer 
process flushing buffers to the other filesystems in parallel with the dump.  
I've seen this panic and a similar one 'buffer not locked' coming from 
ffs_write().  One time out of about 30 the core ran to completion, but slowly 
(~1MB/sec).  Other times the dump just locks up completely with no other output.

Does anyone know what might have changed to expose this problem?

I don't ever see it under 7.1.

Thanks,
 Andrew

On Feb 3, 2011, at 12:11 AM, Eugene Grosbein wrote:

 On 02.02.2011 00:50, Gleb Smirnoff wrote:
 
 E Uptime: 8h3m51s
 E Dumping 4087 MB (3 chunks)
 E   chunk 0: 1MB (150 pages) ... ok
 E   chunk 1: 3575MB (915088 pages) 3559 3543panic: bufwrite: buffer is not 
 busy???
 E cpuid = 3
 E Uptime: 8h3m52s
 E Automatic reboot in 15 seconds - press a key on the console to abort
 Can you add KDB_TRACE option to kernel? Your boxes for some reason can't
 dump core, but with this option we will have at least trace.
 
 I see Mike Tancsa's box has bufwrite: buffer is not busy??? problem too.
 Has anyone a thought how to fix generation of crashdumps?
 
 Eugene Grosbein
 
 
 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

--
Andrew Boyerabo...@averesystems.com




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org