On 10/21/2014 08:54 AM, Nick via smartos-discuss wrote:
> Server running SmartOS 20140904T175324Z. Rebooted last night with a system
> panic -- bad trap error. Here is some mdb info:
> 
> mdb unix.2 vmcore.2
> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
> scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm
> mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs sata ]
>> ::status
> debugging crash dump vmcore.2 (64-bit) from xxxxx
> operating system: 5.11 joyent_20140904T175324Z (i86pc)
> image uuid: (not set)
> panic message: BAD TRAP: type=d (#gp General protection)
> rp=ffffff0021b8c1c0 addr=ffffff0021b8c3a8
> dump content: kernel pages only
>> $C
> ffffff0021b8c360 mutex_enter+0xb()
> ffffff0021b8c390 dnode_hold+0x28(ffffff04f9530040, 846ce, fffffffff7e07041,
> ffffff0021b8c3a8)
> ffffff0021b8c3f0 dmu_bonus_hold+0x37(ffffff04f9530040, 846ce, 0,
> ffffff0021b8c468)
> ffffff0021b8c420 sa_buf_hold+0x1d(ffffff04f9530040, 846ce, 0,
> ffffff0021b8c468)
> ffffff0021b8c4c0 zfs_zget+0x64(ffffff04e9e6d800, 846ce, ffffff0021b8c5f0)
> ffffff0021b8c5a0 zfs_dirent_lock+0x516(ffffff0021b8c5f8, ffffff04eeea5010,
> ffffff0021b8c9d0, ffffff0021b8c5f0, 6, 0, 0)
> ffffff0021b8c660 zfs_dirlook+0x94(ffffff04eeea5010, ffffff0021b8c9d0,
> ffffff0021b8c808, 0, 0, 0)
> ffffff0021b8c700 zfs_lookup+0x3da(ffffff0570bd2e80, ffffff0021b8c9d0,
> ffffff0021b8c808, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0,
> 0, 0, 0)
> ffffff0021b8c7b0 fop_lookup+0xa2(ffffff0570bd2e80, ffffff0021b8c9d0,
> ffffff0021b8c808, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0,
> 0, 0, 0)
> ffffff0021b8c870 lo_lookup+0xbc(ffffff0554aa1240, ffffff0021b8c9d0,
> ffffff0021b8cb18, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0,
> 0, 0, 0)
> ffffff0021b8c920 fop_lookup+0xa2(ffffff0554aa1240, ffffff0021b8c9d0,
> ffffff0021b8cb18, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0,
> 0, 0, 0)
> ffffff0021b8cb80 lookuppnvp+0x1f6(ffffff0021b8cca0, 0, 0, 0,
> ffffff0021b8ce48, ffffff04f3598780, ffffff05fae1d980, ffffff061b420ee0)
> ffffff0021b8cc20 lookuppnatcred+0x15e(ffffff0021b8cca0, 0, 0, 0,
> ffffff0021b8ce48, 0, ffffff061b420ee0)
> ffffff0021b8cd20 lookupnameatcred+0xe9(fffffd7fffdf4e50, 0, 0, 0,
> ffffff0021b8ce48, 0, ffffff061b420ee0)
> ffffff0021b8cd70 lookupnameat+0x39(fffffd7fffdf4e50, 0, 0, 0,
> ffffff0021b8ce48, 0)
> ffffff0021b8ce10 cstatat_getvp+0x107(ffd19553, fffffd7fffdf4e50, 0,
> ffffff0021b8ce48, ffffff0021b8ce40)
> ffffff0021b8ceb0 cstatat+0x6f(ffd19553, fffffd7fffdf4e50, fffffd7fffdf4dd0,
> 1000, 0)
> ffffff0021b8cee0 fstatat+0x42(ffd19553, fffffd7fffdf4e50, fffffd7fffdf4dd0,
> 1000)
> ffffff0021b8cf00 lstat+0x25(fffffd7fffdf4e50, fffffd7fffdf4dd0)
> ffffff0021b8cf10 sys_syscall+0x17a()
>>
>> ::panicinfo
>              cpu                6
>           thread ffffff050125c180
>          message BAD TRAP: type=d (#gp General protection)
> rp=ffffff0021b8c1c0 addr=ffffff0021b8c3a8
>              rdi fffbff05e2a52250
>              rsi                e
>              rdx ffffff050125c180
>              rcx                0
>               r8   200db1e5970aea
>               r9              150
>              rax                0
>              rbx            846ce
>              rbp ffffff0021b8c360
>              r10 fffffffffb8542b8
>              r11                1
>              r12                0
>              r13 fffbff05e2a52250
>              r14                1
>              r15 ffffff0021b8c3a8
>           fsbase fffffd7fff122a40
>           gsbase ffffff04e6561500
>               ds               4b
>               es               4b
>               fs                0
>               gs                0
>           trapno                d
>              err                0
>              rip fffffffffb85ef5b
>               cs               30
>           rflags            10246
>              rsp ffffff0021b8c2b8
>               ss               38
>           gdt_hi                0
>           gdt_lo         2000ffff
>           idt_hi                0
>           idt_lo         1000ffff
>              ldt                0
>             task               70
>              cr0         80050033
>              cr2          4ab8000
>              cr3        17bbf6000
>              cr4            426f8
>> ffffff050125c180::thread -p
>             ADDR             PROC              LWP             CRED
> ffffff050125c180 ffffff04f6eae090 ffffff0509c15040 ffffff061b420ee0
>> ffffff04f6eae090::ps -ft
> S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
> R  13538   6531   6531   6531      0 0x42000000 ffffff04f6eae090
> /opt/local/bin/rsync --daemon --config /opt/local/etc/rsync/rsyncd.conf
>         T  0xffffff050125c180 <TS_ONPROC>
>> ffffff04f6eae090::ptree
> fffffffffbc30440  sched
>      ffffff04e66d0010  init
>           ffffff0502a1d058  rsync
>                ffffff04f6eae090  rsync
>                     ffffff05f1581088  rsync
> 
> 
> 
> So it looks like the crash happened in this rsync daemon, which is running
> within an OS zone. It looks like rsync was actively syncing, so there was
> high I/O going on at the time. How can a crash in rsync within an OS zone
> take down the entire server? zpool scrub reports clean, all memory has been
> stress tested fine (and is ECC). Is there anything else I can try in mdb to
> debug this further? Thanks,

The problem that you encountered is a crash in ZFS, which is why the
whole machine panicked. If you could make the dump available, then we
can start taking a look at it and get folks in the ZFS community
involved and helping hunt it down.

Robert


-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to