Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-06 Thread Victor Latushkin

On Jul 4, 2010, at 4:58 AM, Andrew Jones wrote:

 Victor,
 
 The zpool import succeeded on the next attempt following the crash that I 
 reported to you by private e-mail! 

From the threadlist it looked like system was pretty low on memory with stacks 
of userland stuff swapped out, hence system was not responsive, but it was 
able to complete inconsistent dataset processing in the end.

 
 For completeness, this is the final status of the pool:
 
 
  pool: tank
 state: ONLINE
 scan: resilvered 1.50K in 165h28m with 0 errors on Sat Jul  3 08:02:30 2010
 config:
 
NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
cache
  c2t0d0ONLINE   0 0 0
 
 errors: No known data errors
 

Good. Run 'zpool scrub' to make sure there are no other errors.

regards
victor

 Thank you very much for your help. We did not need to add additional RAM to 
 solve this, in the end. Instead, we needed to persist with the import through 
 several panics to finally work our way through the large inconsistent 
 dataset; it is unclear whether the resilvering caused additional processing 
 delay. Unfortunately, the delay made much of the data quite stale, now that 
 it's been recovered.
 
 It does seem that zfs would benefit tremendously from a better (quicker and 
 more intuitive?) set of recovery tools, that are available to a wider range 
 of users. It's really a shame, because the features and functionality in zfs 
 are otherwise absolutely second to none.
 
 /Andrew[i][/i][i][/i][i][/i][i][/i][i][/i]
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-06 Thread Andrew Jones
 
 Good. Run 'zpool scrub' to make sure there are no
 other errors.
 
 regards
 victor
 

Yes, scrubbed successfully with no errors. Thanks again for all of your 
generous assistance.

/AJ
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-04 Thread Roy Sigurd Karlsbakk

- Original Message -
 Victor,
 
 The zpool import succeeded on the next attempt following the crash
 that I reported to you by private e-mail!
 
 For completeness, this is the final status of the pool:
 
 
 pool: tank
 state: ONLINE
 scan: resilvered 1.50K in 165h28m with 0 errors on Sat Jul 3 08:02:30

Out of curiosity, what sort of drives are you using here? Resilvering in 
165h28m is close to a week, which is rather bad imho.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-04 Thread Andrew Jones
 
 - Original Message -
  Victor,
  
  The zpool import succeeded on the next attempt
 following the crash
  that I reported to you by private e-mail!
  
  For completeness, this is the final status of the
 pool:
  
  
  pool: tank
  state: ONLINE
  scan: resilvered 1.50K in 165h28m with 0 errors on
 Sat Jul 3 08:02:30
 
 Out of curiosity, what sort of drives are you using
 here? Resilvering in 165h28m is close to a week,
 which is rather bad imho.

I think the resilvering statistic is quite misleading, in this case. We're 
using very average 1TB retail Hitachi disks, which perform just fine when the 
pool is healthy.

What happened here is that the zpool-tank process was performing a resilvering 
task in parallel with the processing of a very large inconsistent dataset, 
which took the overwhelming majority of the time to complete.

Why it actually took over a week to process the 2TB volume in an inconsistent 
state is my primary concern with the performance of ZFS, in this case.

 
 Vennlige hilsener / Best regards
 
 roy
 --
 Roy Sigurd Karlsbakk
 (+47) 97542685
 r...@karlsbakk.net
 http://blogg.karlsbakk.net/
 --
 I all pedagogikk er det essensielt at pensum
 presenteres intelligibelt. Det er et elementært
 imperativ for alle pedagoger å unngå eksessiv
 anvendelse av idiomer med fremmed opprinnelse. I de
 fleste tilfeller eksisterer adekvate og relevante
 synonymer på norsk.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-03 Thread Andrew Jones
Victor,

The zpool import succeeded on the next attempt following the crash that I 
reported to you by private e-mail! 

For completeness, this is the final status of the pool:


  pool: tank
 state: ONLINE
 scan: resilvered 1.50K in 165h28m with 0 errors on Sat Jul  3 08:02:30 2010
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
cache
  c2t0d0ONLINE   0 0 0

errors: No known data errors

Thank you very much for your help. We did not need to add additional RAM to 
solve this, in the end. Instead, we needed to persist with the import through 
several panics to finally work our way through the large inconsistent dataset; 
it is unclear whether the resilvering caused additional processing delay. 
Unfortunately, the delay made much of the data quite stale, now that it's been 
recovered.

It does seem that zfs would benefit tremendously from a better (quicker and 
more intuitive?) set of recovery tools, that are available to a wider range of 
users. It's really a shame, because the features and functionality in zfs are 
otherwise absolutely second to none.

/Andrew[i][/i][i][/i][i][/i][i][/i][i][/i]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-02 Thread Victor Latushkin

On Jul 1, 2010, at 10:28 AM, Andrew Jones wrote:

 Victor,
 
 I've reproduced the crash and have vmdump.0 and dump device files. How do I 
 query the stack on crash for your analysis? What other analysis should I 
 provide?

Output of 'echo ::threadlist -v | mdb 0' can be a good start in this case I 
think. It can be rather big to send to the list, so please find some other way 
to provide it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-02 Thread Andrew Jones
 Andrew,
 
 Looks like the zpool is telling you the devices are
 still doing work of 
 some kind, or that there are locks still held.
 

Agreed; it appears the CSV1 volume is in a fundamentally inconsistent state 
following the aborted zfs destroy attempt. See later in this thread where 
Victor has identified this to be the case. I am awaiting his analysis of the 
latest crash.

 From man of section 2 intro page the errors are
  listed.  Number 16 
 ooks to be an EBUSY.
 
 
   16 EBUSYDevice busy
 An attempt was made to mount
  a  dev-
 ice  that  was already
 mounted or an
 attempt was made to
 unmount a device
 on  which  there  is
  an active file
 (open   file,   current
   directory,
 mounted-on  file,  active
  text seg-
 ment). It  will  also
  occur  if  an
 attempt is made to
 enable accounting
 when it  is  already
  enabled.   The
 device or resource is
 currently una-
 vailable.   EBUSY is
  also  used  by
 mutexes, semaphores,
 condition vari-
 ables, and r/w  locks,
  to  indicate
 that   a  lock  is held,
  and by the
 processor  control
  function
  P_ONLINE.
 ndrew Jones wrote:
  Just re-ran 'zdb -e tank' to confirm the CSV1
 volume is still exhibiting error 16:
 
  snip
  Could not open tank/CSV1, error 16
  snip
 
  Considering my attempt to delete the CSV1 volume
 lead to the failure in the first place, I have to
 think that if I can either 1) complete the deletion
 of this volume or 2) roll back to a transaction prior
 to this based on logging or 3) repair whatever
 corruption has been caused by this partial deletion,
 that I will then be able to import the pool.
 
  What does 'error 16' mean in the ZDB output, any
 suggestions?
 
 
 -- 
 Geoff Shipman | Senior Technical Support Engineer
 Phone: +13034644710
 Oracle Global Customer Services
 500 Eldorado Blvd. UBRM-04 | Broomfield, CO 80021
 Email: geoff.ship...@sun.com | Hours:9am-5pm
 MT,Monday-Friday
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-01 Thread Andrew Jones
Victor,

I've reproduced the crash and have vmdump.0 and dump device files. How do I 
query the stack on crash for your analysis? What other analysis should I 
provide?

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-07-01 Thread Andrew Jones
Victor,

A little more info on the crash, from the messages file is attached here. I 
have also decompressed the dump with savecore to generate unix.0, vmcore.0, and 
vmdump.0.


Jun 30 19:39:10 HL-SAN unix: [ID 836849 kern.notice] 
Jun 30 19:39:10 HL-SAN ^Mpanic[cpu3]/thread=ff0017909c60: 
Jun 30 19:39:10 HL-SAN genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf 
Page fault) rp=ff0017909790 addr=0 occurred in module unknown due to a 
NULL pointer dereference
Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] 
Jun 30 19:39:10 HL-SAN unix: [ID 839527 kern.notice] sched: 
Jun 30 19:39:10 HL-SAN unix: [ID 753105 kern.notice] #pf Page fault
Jun 30 19:39:10 HL-SAN unix: [ID 532287 kern.notice] Bad kernel fault at 
addr=0x0
Jun 30 19:39:10 HL-SAN unix: [ID 243837 kern.notice] pid=0, pc=0x0, 
sp=0xff0017909880, eflags=0x10002
Jun 30 19:39:10 HL-SAN unix: [ID 211416 kern.notice] cr0: 
8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
Jun 30 19:39:10 HL-SAN unix: [ID 624947 kern.notice] cr2: 0
Jun 30 19:39:10 HL-SAN unix: [ID 625075 kern.notice] cr3: 336a71000
Jun 30 19:39:10 HL-SAN unix: [ID 625715 kern.notice] cr8: c
Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] 
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rdi:  282 
rsi:15809 rdx: ff03edb1e538
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rcx:5  
r8:0  r9: ff03eb2d6a00
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]rax:  202 
rbx:0 rbp: ff0017909880
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]r10: f80d16d0 
r11:4 r12:0
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]r13: ff03e21bca40 
r14: ff03e1a0d7e8 r15: ff03e21bcb58
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]fsb:0 
gsb: ff03e25fa580  ds:   4b
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice] es:   4b  
fs:0  gs:  1c3
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice]trp:e 
err:   10 rip:0
Jun 30 19:39:10 HL-SAN unix: [ID 592667 kern.notice] cs:   30 
rfl:10002 rsp: ff0017909880
Jun 30 19:39:10 HL-SAN unix: [ID 266532 kern.notice] ss:   38
Jun 30 19:39:10 HL-SAN unix: [ID 10 kern.notice] 
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909670 
unix:die+dd ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909780 
unix:trap+177b ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909790 
unix:cmntrap+e6 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 802836 kern.notice] ff0017909880 0 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179098a0 
unix:debug_enter+38 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179098c0 
unix:abort_sequence_enter+35 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909910 
kbtrans:kbtrans_streams_key+102 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909940 
conskbd:conskbdlrput+e7 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179099b0 
unix:putnext+21e ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff00179099f0 
kbtrans:kbtrans_queueevent+7c ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a20 
kbtrans:kbtrans_queuepress+7c ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a60 
kbtrans:kbtrans_untrans_keypressed_raw+46 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909a90 
kbtrans:kbtrans_processkey+32 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909ae0 
kbtrans:kbtrans_streams_key+175 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b10 
kb8042:kb8042_process_key+40 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b50 
kb8042:kb8042_received_byte+109 ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909b80 
kb8042:kb8042_intr+6a ()
Jun 30 19:39:10 HL-SAN genunix: [ID 655072 kern.notice] ff0017909bb0 
i8042:i8042_intr+c5 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0017909c00 
unix:av_dispatch_autovect+7c ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0017909c40 
unix:dispatch_hardint+33 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff00183552f0 
unix:switch_sp_and_call+13 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355340 
unix:do_interrupt+b8 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355350 
unix:_interrupt+b8 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff00183554a0 
unix:htable_steal+198 ()
Jun 30 19:39:11 HL-SAN genunix: [ID 655072 kern.notice] ff0018355510 
unix:htable_alloc+248 ()
Jun 30 19:39:11 

Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-29 Thread Victor Latushkin

On Jun 29, 2010, at 8:30 PM, Andrew Jones wrote:

 Victor,
 
 The 'zpool import -f -F tank' failed at some point last night. The box was 
 completely hung this morning; no core dump, no ability to SSH into the box to 
 diagnose the problem. I had no choice but to reset, as I had no diagnostic 
 ability. I don't know if there would be anything in the logs?

It sounds like it might run out of memory. Is it an option for you to add more 
memory to the box temporarily?

Even if it is an option, it is good to prepare for such outcome and have kmdb 
loaded either at boot time by adding -k to 'kernel$' line in GRUB menu, or by 
loading it from console with 'mdb -K' before attempting import (type ':c' at 
mdb prompt to continue). In case it hangs again, you can press 'F1-A' on the 
keyboard, drop into kmdb and then use '$systemdump' to force a crashdump.

If you hardware has physical or virtual NMI button, you can use that too to 
drop into kmdb, but you'll need to set a kernel variable for that to work:

http://blogs.sun.com/darren/entry/sending_a_break_to_opensolaris

 Earlier I ran 'zdb -e -bcsvL tank' in write mode for 36 hours and gave up to 
 try something different. Now the zpool import has hung the box.

What do you mean be running zdb in write mode? zdb normally is readonly tool. 
Did you change it in some way?

 Should I try zdb again? Any suggestions?

It sounds like zdb is not going to be helpful, as inconsistent dataset 
processing happens only in read-write mode. So you need to try above 
suggestions with more memory and kmdb/nmi.

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-29 Thread Andrew Jones
 
 On Jun 29, 2010, at 8:30 PM, Andrew Jones wrote:
 
  Victor,
  
  The 'zpool import -f -F tank' failed at some point
 last night. The box was completely hung this morning;
 no core dump, no ability to SSH into the box to
 diagnose the problem. I had no choice but to reset,
 as I had no diagnostic ability. I don't know if there
 would be anything in the logs?
 
 It sounds like it might run out of memory. Is it an
 option for you to add more memory to the box
 temporarily?

I'll place the order for more memory or transfer some from another machine. 
Seems quite likely that we did run out of memory.

 
 Even if it is an option, it is good to prepare for
 such outcome and have kmdb loaded either at boot time
 by adding -k to 'kernel$' line in GRUB menu, or by
 loading it from console with 'mdb -K' before
 attempting import (type ':c' at mdb prompt to
 continue). In case it hangs again, you can press
 'F1-A' on the keyboard, drop into kmdb and then use
 '$systemdump' to force a crashdump.

I'll prepare the machine this way and repeat the import to reproduce the hang, 
then break into the kernel and capture the core dump.

 
 If you hardware has physical or virtual NMI button,
 you can use that too to drop into kmdb, but you'll
 need to set a kernel variable for that to work:
 
 http://blogs.sun.com/darren/entry/sending_a_break_to_o
 pensolaris
 
  Earlier I ran 'zdb -e -bcsvL tank' in write mode
 for 36 hours and gave up to try something different.
 Now the zpool import has hung the box.
 
 What do you mean be running zdb in write mode? zdb
 normally is readonly tool. Did you change it in some
 way?

I had read elsewhere that set /zfs/:zfs_recover=/1/ and set aok=/1/ placed zdb 
into some kind of a write/recovery mode. I have set these in /etc/system. Is 
this a bad idea in this case?

 
  Should I try zdb again? Any suggestions?
 
 It sounds like zdb is not going to be helpful, as
 inconsistent dataset processing happens only in
 read-write mode. So you need to try above suggestions
 with more memory and kmdb/nmi.

Will do, thanks!

 
 victor
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Now at 36 hours since zdb process start and:


 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
   827 root 4936M 4931M sleep   590   0:50:47 0.2% zdb/209

Idling at 0.2% processor for nearly the past 24 hours... feels very stuck. 
Thoughts on how to determine where and why?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Update: have given up on the zdb write mode repair effort, as least for now. 
Hoping for any guidance / direction anyone's willing to offer...

Re-running 'zpool import -F -f tank' with some stack trace debug, as suggested 
in similar threads elsewhere. Note that this appears hung at near idle.


ff03e278c520 ff03e9c60038 ff03ef109490   1  60 ff0530db4680
  PC: _resume_from_idle+0xf1CMD: zpool import -F -f tank
  stack pointer for thread ff03e278c520: ff00182bbff0
  [ ff00182bbff0 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
dbuf_read+0x1e8()
dnode_next_offset_level+0x129()
dnode_next_offset+0xa2()
get_next_chunk+0xa5()
dmu_free_long_range_impl+0x9e()
dmu_free_object+0xe6()
dsl_dataset_destroy+0x122()
dsl_destroy_inconsistent+0x5f()
findfunc+0x23()
dmu_objset_find_spa+0x38c()
dmu_objset_find_spa+0x153()
dmu_objset_find+0x40()
spa_load_impl+0xb23()
spa_load+0x117()
spa_load_best+0x78()
spa_import+0xee()
zfs_ioc_pool_import+0xc0()
zfsdev_ioctl+0x177()
cdev_ioctl+0x45()
spec_ioctl+0x5a()
fop_ioctl+0x7b()
ioctl+0x18e()
dtrace_systrace_syscall32+0x11a()
_sys_sysenter_post_swapgs+0x149()
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Roy Sigurd Karlsbakk
- Original Message -
 Now at 36 hours since zdb process start and:
 
 
 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
 827 root 4936M 4931M sleep 59 0 0:50:47 0.2% zdb/209
 
 Idling at 0.2% processor for nearly the past 24 hours... feels very
 stuck. Thoughts on how to determine where and why?

Just a hunch, is this pool using dedup?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Malachi de Ælfweald
I had a similar issue on boot after upgrade in the past and it was due to
the large number of snapshots I had...  don't know if that could be related
or not...


Malachi de Ælfweald
http://www.google.com/profiles/malachid


On Mon, Jun 28, 2010 at 8:59 AM, Andrew Jones andrewnjo...@gmail.comwrote:

 Now at 36 hours since zdb process start and:


  PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
   827 root 4936M 4931M sleep   590   0:50:47 0.2% zdb/209

 Idling at 0.2% processor for nearly the past 24 hours... feels very stuck.
 Thoughts on how to determine where and why?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Dedup had been turned on in the past for some of the volumes, but I had turned 
it off altogether before entering production due to performance issues. GZIP 
compression was turned on for the volume I was trying to delete.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Malachi,

Thanks for the reply. There were no snapshots for the CSV1 volume that I 
recall... very few snapshots on the any volume in the tank.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Roy Sigurd Karlsbakk
- Original Message -
 Dedup had been turned on in the past for some of the volumes, but I
 had turned it off altogether before entering production due to
 performance issues. GZIP compression was turned on for the volume I
 was trying to delete.

Was there a lot of deduped data still on disk before it was put into 
production? Turning off dedup won't dedup the data, just inhibit deduplication 
of new data...

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Just re-ran 'zdb -e tank' to confirm the CSV1 volume is still exhibiting error 
16:

snip
Could not open tank/CSV1, error 16
snip

Considering my attempt to delete the CSV1 volume lead to the failure in the 
first place, I have to think that if I can either 1) complete the deletion of 
this volume or 2) roll back to a transaction prior to this based on logging or 
3) repair whatever corruption has been caused by this partial deletion, that I 
will then be able to import the pool.

What does 'error 16' mean in the ZDB output, any suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Victor Latushkin

On Jun 28, 2010, at 9:32 PM, Andrew Jones wrote:

 Update: have given up on the zdb write mode repair effort, as least for now. 
 Hoping for any guidance / direction anyone's willing to offer...
 
 Re-running 'zpool import -F -f tank' with some stack trace debug, as 
 suggested in similar threads elsewhere. Note that this appears hung at near 
 idle.

It looks like it is processing huge inconsistent data set that was destroyed 
previously. So you need to wait a bit longer.

regards
victor 

 
 
 ff03e278c520 ff03e9c60038 ff03ef109490   1  60 ff0530db4680
  PC: _resume_from_idle+0xf1CMD: zpool import -F -f tank
  stack pointer for thread ff03e278c520: ff00182bbff0
  [ ff00182bbff0 _resume_from_idle+0xf1() ]
swtch+0x145()
cv_wait+0x61()
zio_wait+0x5d()
dbuf_read+0x1e8()
dnode_next_offset_level+0x129()
dnode_next_offset+0xa2()
get_next_chunk+0xa5()
dmu_free_long_range_impl+0x9e()
dmu_free_object+0xe6()
dsl_dataset_destroy+0x122()
dsl_destroy_inconsistent+0x5f()
findfunc+0x23()
dmu_objset_find_spa+0x38c()
dmu_objset_find_spa+0x153()
dmu_objset_find+0x40()
spa_load_impl+0xb23()
spa_load+0x117()
spa_load_best+0x78()
spa_import+0xee()
zfs_ioc_pool_import+0xc0()
zfsdev_ioctl+0x177()
cdev_ioctl+0x45()
spec_ioctl+0x5a()
fop_ioctl+0x7b()
ioctl+0x18e()
dtrace_systrace_syscall32+0x11a()
_sys_sysenter_post_swapgs+0x149()
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Andrew Jones
Thanks Victor. I will give it another 24 hrs or so and will let you know how it 
goes...

You are right, a large 2TB volume (CSV1) was not in the process of being 
deleted, as described above. It is showing error 16 on  'zdb -e'
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs indefinitely (retry post in parts; too long?)

2010-06-28 Thread Geoff Shipman

Andrew,

Looks like the zpool is telling you the devices are still doing work of 
some kind, or that there are locks still held.


From man of section 2 intro page the errors are listed.  Number 16 
looks to be an EBUSY.



 16 EBUSYDevice busy

 An attempt was made to mount a  dev-
 ice  that  was already mounted or an
 attempt was made to unmount a device
 on  which  there  is  an active file
 (open   file,   current   directory,
 mounted-on  file,  active  text seg-
 ment). It  will  also  occur  if  an
 attempt is made to enable accounting
 when it  is  already  enabled.   The
 device or resource is currently una-
 vailable.   EBUSY is  also  used  by
 mutexes, semaphores, condition vari-
 ables, and r/w  locks,  to  indicate
 that   a  lock  is held,  and by the
 processor  control  function
 P_ONLINE.


On 06/28/10 01:50 PM, Andrew Jones wrote:

Just re-ran 'zdb -e tank' to confirm the CSV1 volume is still exhibiting error 
16:

snip
Could not open tank/CSV1, error 16
snip

Considering my attempt to delete the CSV1 volume lead to the failure in the 
first place, I have to think that if I can either 1) complete the deletion of 
this volume or 2) roll back to a transaction prior to this based on logging or 
3) repair whatever corruption has been caused by this partial deletion, that I 
will then be able to import the pool.

What does 'error 16' mean in the ZDB output, any suggestions?
   


--
Geoff Shipman | Senior Technical Support Engineer
Phone: +13034644710
Oracle Global Customer Services
500 Eldorado Blvd. UBRM-04 | Broomfield, CO 80021
Email: geoff.ship...@sun.com | Hours:9am-5pm MT,Monday-Friday

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss