On 27.08.09 12:14, noguaran wrote:
> Thank you so much for your reply! 
> Here are the outputs:
> 
>> 1. Find PID of the hanging 'zpool import', e.g. with 'ps -ef | grep zpool'
> root at mybox:~# ps -ef|grep zpool
>     root   915   908   0 03:34:46 pts/3       0:00 grep zpool
>     root   901   874   1 03:34:09 pts/2       0:00 zpool import drowning
> 
>> 2. Substitute PID with actual number in the below command
>> echo "0tPID::pid2proc|::walk thread|::findstack -v" | mdb -k
> 
> root at mybox:~# echo "0t901::pid2proc|::walk thread|::findstack -v" | mdb -k
> stack pointer for thread ffffff02ed8c7880: ffffff0010191a10
> [ ffffff0010191a10 _resume_from_idle+0xf1() ]
>   ffffff0010191a40 swtch+0x147()
>   ffffff0010191a70 cv_wait+0x61(ffffff02eb010dda, ffffff02eb010d98)
>   ffffff0010191ac0 txg_wait_synced+0x7f(ffffff02eb010c00, 31983c5)
>   ffffff0010191b00 dsl_sync_task_group_wait+0xee(ffffff02f1d11bd8)
>   ffffff0010191b80 dsl_sync_task_do+0x65(ffffff02eb010c00, fffffffff78be1f0, 
>   fffffffff78be250, ffffff02edc38400, ffffff0010191b98, 0)
>   ffffff0010191bd0 dsl_dataset_rollback+0x53(ffffff02edc38400, 2)
>   ffffff0010191c00 dmu_objset_rollback+0x46(ffffff02eb674b20)
>   ffffff0010191c40 zfs_ioc_rollback+0x10d(ffffff02f2b58000)
>   ffffff0010191cc0 zfsdev_ioctl+0x10b(b600000000, 5a1a, 803e240, 100003, 
>   ffffff02ee813338, ffffff0010191de4)
>   ffffff0010191d00 cdev_ioctl+0x45(b600000000, 5a1a, 803e240, 100003, 
>   ffffff02ee813338, ffffff0010191de4)
>   ffffff0010191d40 spec_ioctl+0x83(ffffff02df6a7480, 5a1a, 803e240, 100003, 
>   ffffff02ee813338, ffffff0010191de4, 0)
>   ffffff0010191dc0 fop_ioctl+0x7b(ffffff02df6a7480, 5a1a, 803e240, 100003, 
>   ffffff02ee813338, ffffff0010191de4, 0)
>   ffffff0010191ec0 ioctl+0x18e(3, 5a1a, 803e240)
>   ffffff0010191f10 _sys_sysenter_post_swapgs+0x14b()

This tells us that it is doing some snapshot rollbacks as part of an import.

Ouput of 'zpool history' would help to see what commands were executed and 
provide better idea why it is doing that.

>> 3. Do
>> echo "::spa" | mdb -k
> 
> root at mybox:~# echo "::spa" | mdb -k
> ADDR                 STATE NAME                                               
>  
> ffffff02f2b8b800    ACTIVE mypool
> ffffff02d5890000    ACTIVE rpool
> 
>> 4. Find address of your pool in the output of stage 3 and replace ADDR with 
>> it
>> in the below command (it is single line):
>> echo "ADDR::print spa_t spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | 
>> mdb -k
> 
> root at mybox:~# echo "ffffff02f2b8b800::print spa_t 
> spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
> mdb: spa_t is not a struct or union type
> 
> So I decided to remove "spa_t" to see what would happen:

actually you need to replace 'spa_t' with 'struct spa', but anyway it does not 
matter much.

> root at mybox:~# echo "ffffff02f2b8b800::print 
> spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
> mdb: failed to look up type spa_dsl_pool->dp_tx.tx_sync_thread: no symbol 
> corresponds to address
> 
>> What do you mean by halt here? Are you able to interrupt 'zpool import' with 
>> CTRL-C?
> Yes

Ok, this is another confirmation that import is not hung in-kernel. Looks like 
its doing some housecleaning as part of mounting filesystems.

>> Does 'zfs list' provide any output?
> JACKPOT!!!!!  When I run "zfs list", the import completes!  Instead, "zfs 
> list" hangs just like "zpool import" did.

Looks like it is working but slowly (or repeating the same operation 
constantly), hence it appears like hung.

> 
> root at mybox:~# ps -ef | grep zfs
>     root   940   874   0 03:49:15 pts/2       0:00 grep zfs
>     root   936   908   0 03:44:28 pts/3       0:01 zfs list
> 
> root at mybox:~# echo "0t936::pid2proc|::walk thread|::findstack -v" | mdb -k
> stack pointer for thread ffffff02d72ea020: ffffff000fdeaa10
> [ ffffff000fdeaa10 _resume_from_idle+0xf1() ]
>   ffffff000fdeaa40 swtch+0x147()
>   ffffff000fdeaa70 cv_wait+0x61(ffffff02eb010dda, ffffff02eb010d98)
>   ffffff000fdeaac0 txg_wait_synced+0x7f(ffffff02eb010c00, 31990da)
>   ffffff000fdeab00 dsl_sync_task_group_wait+0xee(ffffff02f1d11bd8)
>   ffffff000fdeab80 dsl_sync_task_do+0x65(ffffff02eb010c00, fffffffff78be1f0, 
>   fffffffff78be250, ffffff02f1d0ce00, ffffff000fdeab98, 0)
>   ffffff000fdeabd0 dsl_dataset_rollback+0x53(ffffff02f1d0ce00, 2)
>   ffffff000fdeac00 dmu_objset_rollback+0x46(ffffff02eb3322a8)
>   ffffff000fdeac40 zfs_ioc_rollback+0x10d(ffffff02ebf4e000)
>   ffffff000fdeacc0 zfsdev_ioctl+0x10b(b600000000, 5a1a, 8043a20, 100003, 
>   ffffff02ee813e78, ffffff000fdeade4)
>   ffffff000fdead00 cdev_ioctl+0x45(b600000000, 5a1a, 8043a20, 100003, 
>   ffffff02ee813e78, ffffff000fdeade4)
>   ffffff000fdead40 spec_ioctl+0x83(ffffff02df6a7480, 5a1a, 8043a20, 100003, 
>   ffffff02ee813e78, ffffff000fdeade4, 0)
>   ffffff000fdeadc0 fop_ioctl+0x7b(ffffff02df6a7480, 5a1a, 8043a20, 100003, 
>   ffffff02ee813e78, ffffff000fdeade4, 0)
>   ffffff000fdeaec0 ioctl+0x18e(3, 5a1a, 8043a20)
>   ffffff000fdeaf10 _sys_sysenter_post_swapgs+0x14b()
> 
> 
>> Apparently as you have 5TB of data there, it worked fine some time ago. What
>> happened to the pool before this issue was noticed?
> A reboot?
> This box acts as network storage for all of my computers.  All of the PCs in 
> the house are set to back up to it daily, and it is like an extra hard drive 
> for my wife's netbook and laptop.  We dump all of the pictures off of the 
> camera there as well as any HD video we capture.  I NEVER reboot this box 
> unless I am prompted to.  I'm running OpenSolaris (uname -a: SunOS mybox 5.11 
> snv_111b i86pc i386 i86pc Solaris), and if I remember right, I was prompted 
> to update.  I did so, and needed to reboot.  Rebooted, and the box would not 
> start.  I used another PC to find out how to start in single user mode and 
> tried that.  No dice.  I had to physically remove the drives to get to a 
> login prompt.  BTW, I just stopped the "zfs list" after about 30 minutes 
> running, and it was constantly writing to my drives. (used 'zpool iostat 1' 
> to check)  I am by no means an expert, but whatever "zfs list" is trying to 
> do, it is hanging.
> 
> Right now, my goal is to back up all of my important data.  Once I do that, I 
> will delete this pool and start over from scratch.  My biggest concern is to 
> keep this from happening again.  Any suggestions?

Reply via email to