Re: Help debugging stable/10

2015-06-17 Thread Shane Ambler

On 17/06/2015 17:22, Hans Petter Selasky wrote:

On 06/17/15 09:24, Shane Ambler wrote:

On 24/03/2015 01:55, Shane Ambler wrote:

On 25/12/2014 23:03, Andriy Gapon wrote:

On 25/12/2014 11:29, Hans Petter Selasky wrote:

The cam_sim_free() is stuck, blocking the rest of that controller from
enumerating. It might look like a non-USB stack issue.

MAV: Do you have some ideas where to start looking, now we have a
dump? Any
refcounts to check in particular?


Apparently sim-refcount  0.
Not sure how to check who has the reference(s).



Can anyone think of something I can try?

To recap to save you going back through history -
After running 9.0 - 9.2 for 3 years I upgraded to 10.1RC3 and started
getting a locking issue, most new processes fail to start, top and ps
failing being indicators, the most info I have got is a back trace
using kgdb, there are 4 instances I got output from procstat -kk -a

On several occasions I have found that after inserting a usb memstick
the device failed to be created, leaving me unable to mount the
filesystem without a restart.

I then switched to stable/10 in hopes of a fix finding it's way in.

The back traces I have been able to collect (and a dmesg) are listed at
http://shaneware.biz/freebsddebugdata/

This is my everyday desktop machine. I am now running

FreeBSD leader.local 10.1-STABLE FreeBSD 10.1-STABLE #11 r283839: Thu
Jun  4 17:41:28 ACST 2015 root@leader.local:/usr/obj/usr/src/sys
/GENERIC  amd64

I can only say it appears to be getting worse, though I may just be
getting sick of having to restart nearly every day. Lately it seems that
the less I do the quicker it locks up. I have restarted twice this week
and then let it sit while I have gone out, after returning I get maybe
10-15 mins then have to restart, one of them was less than an hour
uptime.

While running poudriere I have got past 1 day uptime but it locks up
harder and I don't usually get a chance to record any data.




Hi,

One solution is to use fuse instead of the native fs, until the CAM/SCSI
refcount issues are resolved.

--HPS


If you refer to the usb stick, it locks up without inserting one.
I'll try with the extra fs kmods unloaded and see how it goes.


--
FreeBSD - the place to B...Software Developing

Shane Ambler

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help debugging stable/10

2015-06-17 Thread Shane Ambler

On 24/03/2015 01:55, Shane Ambler wrote:

On 25/12/2014 23:03, Andriy Gapon wrote:

On 25/12/2014 11:29, Hans Petter Selasky wrote:

The cam_sim_free() is stuck, blocking the rest of that controller from
enumerating. It might look like a non-USB stack issue.

MAV: Do you have some ideas where to start looking, now we have a
dump? Any
refcounts to check in particular?


Apparently sim-refcount  0.
Not sure how to check who has the reference(s).



Can anyone think of something I can try?

To recap to save you going back through history -
After running 9.0 - 9.2 for 3 years I upgraded to 10.1RC3 and started
getting a locking issue, most new processes fail to start, top and ps
failing being indicators, the most info I have got is a back trace
using kgdb, there are 4 instances I got output from procstat -kk -a

On several occasions I have found that after inserting a usb memstick
the device failed to be created, leaving me unable to mount the
filesystem without a restart.

I then switched to stable/10 in hopes of a fix finding it's way in.

The back traces I have been able to collect (and a dmesg) are listed at
http://shaneware.biz/freebsddebugdata/

This is my everyday desktop machine. I am now running

FreeBSD leader.local 10.1-STABLE FreeBSD 10.1-STABLE #11 r283839: Thu
Jun  4 17:41:28 ACST 2015 root@leader.local:/usr/obj/usr/src/sys
/GENERIC  amd64

I can only say it appears to be getting worse, though I may just be
getting sick of having to restart nearly every day. Lately it seems that
the less I do the quicker it locks up. I have restarted twice this week
and then let it sit while I have gone out, after returning I get maybe
10-15 mins then have to restart, one of them was less than an hour uptime.

While running poudriere I have got past 1 day uptime but it locks up
harder and I don't usually get a chance to record any data.


--
FreeBSD - the place to B...Software Developing

Shane Ambler

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help debugging stable/10

2015-06-17 Thread Hans Petter Selasky

On 06/17/15 09:24, Shane Ambler wrote:

On 24/03/2015 01:55, Shane Ambler wrote:

On 25/12/2014 23:03, Andriy Gapon wrote:

On 25/12/2014 11:29, Hans Petter Selasky wrote:

The cam_sim_free() is stuck, blocking the rest of that controller from
enumerating. It might look like a non-USB stack issue.

MAV: Do you have some ideas where to start looking, now we have a
dump? Any
refcounts to check in particular?


Apparently sim-refcount  0.
Not sure how to check who has the reference(s).



Can anyone think of something I can try?

To recap to save you going back through history -
After running 9.0 - 9.2 for 3 years I upgraded to 10.1RC3 and started
getting a locking issue, most new processes fail to start, top and ps
failing being indicators, the most info I have got is a back trace
using kgdb, there are 4 instances I got output from procstat -kk -a

On several occasions I have found that after inserting a usb memstick
the device failed to be created, leaving me unable to mount the
filesystem without a restart.

I then switched to stable/10 in hopes of a fix finding it's way in.

The back traces I have been able to collect (and a dmesg) are listed at
http://shaneware.biz/freebsddebugdata/

This is my everyday desktop machine. I am now running

FreeBSD leader.local 10.1-STABLE FreeBSD 10.1-STABLE #11 r283839: Thu
Jun  4 17:41:28 ACST 2015 root@leader.local:/usr/obj/usr/src/sys
/GENERIC  amd64

I can only say it appears to be getting worse, though I may just be
getting sick of having to restart nearly every day. Lately it seems that
the less I do the quicker it locks up. I have restarted twice this week
and then let it sit while I have gone out, after returning I get maybe
10-15 mins then have to restart, one of them was less than an hour uptime.

While running poudriere I have got past 1 day uptime but it locks up
harder and I don't usually get a chance to record any data.




Hi,

One solution is to use fuse instead of the native fs, until the CAM/SCSI 
refcount issues are resolved.


--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help debugging stable/10

2015-03-23 Thread Shane Ambler

On 25/12/2014 23:03, Andriy Gapon wrote:

On 25/12/2014 11:29, Hans Petter Selasky wrote:

The cam_sim_free() is stuck, blocking the rest of that controller from
enumerating. It might look like a non-USB stack issue.

MAV: Do you have some ideas where to start looking, now we have a dump? Any
refcounts to check in particular?


Apparently sim-refcount  0.
Not sure how to check who has the reference(s).



I am now running
FreeBSD leader.local 10.1-STABLE FreeBSD 10.1-STABLE #4 r279865: Thu
Mar 12 14:25:28 ACDT 2015 root@leader.local:/usr/obj/usr/src/sys
/GENERIC  amd64

I have just tried running with a custom kernel, GENERIC plus DDB GDB
DEADLKRES INVARIANTS INVARIANT_SUPPORT WITNESS WITNESS_SKIPSPIN

When starting Xorg I got a duplicate lock message coming from nvidia,
after running for maybe 20 mins it just reset without warning. I then
decided to go back to the GENERIC kernel and on restarting I got some
lock reversal messages.

nvidia-driver-346.47

nvidia0: GeForce GT 520 on vgapci0
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: Boot video device
hdac0: NVIDIA GF119 HDA Controller mem 0xfb08-0xfb083fff irq 17 at 
device 0.1 on pci1


Full dmesg and other kgdb outputs I have collected are at -
http://shaneware.biz/freebsddebugdata/


Trying to mount root from zfs:zrpleader []...
ums0: Logitech USB-PS2 Optical Mouse, class 0/0, rev 2.00/11.10, addr 
4 on usbus2

ums0: 3 buttons and [XYZ] coordinates ID=0
ums1: Wacom Co.,Ltd. CTH-470, class 0/0, rev 2.00/1.00, addr 5 on usbus2
ums1: 6 buttons and [XY] coordinates ID=1
uhid0: Wacom Co.,Ltd. CTH-470, class 0/0, rev 2.00/1.00, addr 5 on usbus2
uhid1: vendor 0x05af USB Keyboard, class 0/0, rev 2.00/1.92, addr 6 on 
usbus2

uhid0: at uhub5, port 2, addr 5 (disconnected)
ums1: at uhub5, port 2, addr 5 (disconnected)
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to 
deny, logging disabled

acquiring duplicate lock of same type: os.lock_sx
 1st os.lock_sx @ nvidia_os.c:609
 2nd os.lock_sx @ nvidia_os.c:609
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe0238b8d400

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0238b8d4b0
witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe0238b8d540
_sx_xlock() at _sx_xlock+0x75/frame 0xfe0238b8d580
os_acquire_mutex() at os_acquire_mutex+0x32/frame 0xfe0238b8d5a0
_nv010785rm() at _nv010785rm+0x18/frame 0xfe000f2fee90
dmapbase() at 0xf8001cc40e80/frame 0xf8001cc40e18
kernphys() at 0xc1d1/frame 0xf8001cc40e00
(null)() at 0xfec5e000/frame 0xc1d10001
acquiring duplicate lock of same type: os.lock_mtx
 1st os.lock_mtx @ nvidia_os.c:783
 2nd os.lock_mtx @ nvidia_os.c:783
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe0238b8d0e0

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0238b8d190
witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe0238b8d220
__mtx_lock_flags() at __mtx_lock_flags+0xa8/frame 0xfe0238b8d270
os_acquire_spinlock() at os_acquire_spinlock+0x1b/frame 0xfe0238b8d280
_nv012385rm() at _nv012385rm+0xd75/frame 0xfebceef0
pid 3568 (gsettings-data-conv), uid 1001: exited on signal 5




Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `vnlru' to stop...done
Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `bufdaemon' to stop...done
Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `syncer' to stop...
Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 0 
0 0 0 0 done

Mar 24 00:24:25 leader kernel: All buffers synced.
Mar 24 00:24:25 leader kernel: lock order reversal:
Mar 24 00:24:25 leader kernel: 1st 0xf800224555f0 zfs (zfs) @ 
/usr/src/sys/kern/vfs_mount.c:1229
Mar 24 00:24:25 leader kernel: 2nd 0xf800222d67c8 syncer (syncer) @ 
/usr/src/sys/kern/vfs_subr.c:2268

Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2b/frame 0xfe022df6e4c0
Mar 24 00:24:25 leader kernel: kdb_backtrace() at 
kdb_backtrace+0x39/frame 0xfe022df6e570
Mar 24 00:24:25 leader kernel: witness_checkorder() at 
witness_checkorder+0xdc2/frame 0xfe022df6e600
Mar 24 00:24:25 leader kernel: __lockmgr_args() at 
__lockmgr_args+0x9ea/frame 0xfe022df6e740
Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame 
0xfe022df6e760
Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at 
VOP_LOCK1_APV+0xfc/frame 0xfe022df6e790
Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame 
0xfe022df6e800
Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame 
0xfe022df6e860
Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame 
0xfe022df6e8e0
Mar 24 00:24:25 leader kernel: vfs_unmountall() at 
vfs_unmountall+0x61/frame 0xfe022df6e910
Mar 24