Re: ZFS import panic with r219703

2011-03-17 Thread Freddie Cash
On Thu, Mar 17, 2011 at 10:00 AM, Olivier Smedts  wrote:
> 2011/3/17 Freddie Cash :
>>> Hrm, it looks like the "pool roll-back on import" feature is working.
>>>
>>> # zpool import -F -d /dev/hast storage
>>>
>>> The above command imported the pool successfully.  No dmu_free_range()
>>> errors.  No solaris assert.  No kernel panic.  Will try hammering on
>>> the system a bit to see if that sticks or whether the space_map errors
>>> show up again.
>>
>> Damn, of course that would be too easy.  :(  Adding or removing any
>> data from the pool still causes it to panic with the dmu_free_range()
>> assertion.
>
> Does resilvering help after the forced import ?

I think this pool is hooped.  :(  It won't import in any way now, no
matter what combination of options I use: readonly, force, roll-back,
without the corrupted hast device so it's in a degraded state, etc.

The latest panic is:

solaris assert: zio-io_type != ZIO_TYPE_WRITE || spa_writable(spa),
file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
line 2321

In case anyone is interested in the results, I've put core.txt.12 up
at http://www.sd73.bc.ca/downloads/crash/ which is the core file
relating to the above panic.

I think after lunch I'm going to destroy the pool and start over.
This box went through a lot of crashes and hangs while finding the
right loader.conf tunables for hast/zfs and issues with CompactFlash
for the OS.  Now that I've got those set and figured out, I'm going to
start over and see how things go.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS import panic with r219703

2011-03-17 Thread Olivier Smedts
2011/3/17 Freddie Cash :
> On Thu, Mar 17, 2011 at 9:24 AM, Freddie Cash  wrote:
>> On Wed, Mar 16, 2011 at 4:03 PM, Freddie Cash  wrote:
>>> Anytime I try to import my pool built using 24x HAST devices, I get
>>> the following message, and the system reboots:
>>>
>>> panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
>>> tx) == 0, file:
>>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
>>> line: 484
>>>
>>> Everything runs nicely if I don't import the pool.
>>>
>>> Doing a "zpool import" shows that one of the HAST devices is FAULTED
>>> "corrupted data".
>>>
>>> Haven't tried anything to remove/replace the faulted device, just
>>> wanted to see if anyone knew what the above error meant.
>>>
>>> Pool was created using r219523 and successfully copied over 1 TB of
>>> data from another ZFS system.  Had some issues with gptboot this
>>> morning and the system locking up and rebooting a bunch, and now the
>>> pool won't import.
>>
>> Hrm, it looks like the "pool roll-back on import" feature is working.
>>
>> # zpool import -F -d /dev/hast storage
>>
>> The above command imported the pool successfully.  No dmu_free_range()
>> errors.  No solaris assert.  No kernel panic.  Will try hammering on
>> the system a bit to see if that sticks or whether the space_map errors
>> show up again.
>
> Damn, of course that would be too easy.  :(  Adding or removing any
> data from the pool still causes it to panic with the dmu_free_range()
> assertion.

Does resilvering help after the forced import ?

>
> --
> Freddie Cash
> fjwc...@gmail.com
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email & vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  "Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS import panic with r219703

2011-03-17 Thread Freddie Cash
On Thu, Mar 17, 2011 at 9:24 AM, Freddie Cash  wrote:
> On Wed, Mar 16, 2011 at 4:03 PM, Freddie Cash  wrote:
>> Anytime I try to import my pool built using 24x HAST devices, I get
>> the following message, and the system reboots:
>>
>> panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
>> tx) == 0, file:
>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
>> line: 484
>>
>> Everything runs nicely if I don't import the pool.
>>
>> Doing a "zpool import" shows that one of the HAST devices is FAULTED
>> "corrupted data".
>>
>> Haven't tried anything to remove/replace the faulted device, just
>> wanted to see if anyone knew what the above error meant.
>>
>> Pool was created using r219523 and successfully copied over 1 TB of
>> data from another ZFS system.  Had some issues with gptboot this
>> morning and the system locking up and rebooting a bunch, and now the
>> pool won't import.
>
> Hrm, it looks like the "pool roll-back on import" feature is working.
>
> # zpool import -F -d /dev/hast storage
>
> The above command imported the pool successfully.  No dmu_free_range()
> errors.  No solaris assert.  No kernel panic.  Will try hammering on
> the system a bit to see if that sticks or whether the space_map errors
> show up again.

Damn, of course that would be too easy.  :(  Adding or removing any
data from the pool still causes it to panic with the dmu_free_range()
assertion.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS import panic with r219703

2011-03-17 Thread Freddie Cash
On Wed, Mar 16, 2011 at 4:03 PM, Freddie Cash  wrote:
> Anytime I try to import my pool built using 24x HAST devices, I get
> the following message, and the system reboots:
>
> panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
> tx) == 0, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
> line: 484
>
> Everything runs nicely if I don't import the pool.
>
> Doing a "zpool import" shows that one of the HAST devices is FAULTED
> "corrupted data".
>
> Haven't tried anything to remove/replace the faulted device, just
> wanted to see if anyone knew what the above error meant.
>
> Pool was created using r219523 and successfully copied over 1 TB of
> data from another ZFS system.  Had some issues with gptboot this
> morning and the system locking up and rebooting a bunch, and now the
> pool won't import.

Hrm, it looks like the "pool roll-back on import" feature is working.

# zpool import -F -d /dev/hast storage

The above command imported the pool successfully.  No dmu_free_range()
errors.  No solaris assert.  No kernel panic.  Will try hammering on
the system a bit to see if that sticks or whether the space_map errors
show up again.



-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS import panic with r219703

2011-03-16 Thread Volodymyr Kostyrko

17.03.2011 01:03, Freddie Cash wrote:

Anytime I try to import my pool built using 24x HAST devices, I get
the following message, and the system reboots:

panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
tx) == 0, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
line: 484

Everything runs nicely if I don't import the pool.

Doing a "zpool import" shows that one of the HAST devices is FAULTED
"corrupted data".

Haven't tried anything to remove/replace the faulted device, just
wanted to see if anyone knew what the above error meant.

Pool was created using r219523 and successfully copied over 1 TB of
data from another ZFS system.  Had some issues with gptboot this
morning and the system locking up and rebooting a bunch, and now the
pool won't import.


Oh, the garbled space_map issue. The system will assert any time the 
pool is imported in r/w. ZFSv28 is able to import pool in r/o state and 
this is as far as I know the only way to recover data from such pool. 
Anyway pool should be recreated.


--
Sphinx of black quartz judge my vow.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS import panic with r219703

2011-03-16 Thread Freddie Cash
On Wed, Mar 16, 2011 at 4:03 PM, Freddie Cash  wrote:
> Anytime I try to import my pool built using 24x HAST devices, I get
> the following message, and the system reboots:
>
> panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
> tx) == 0, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
> line: 484
>
> Everything runs nicely if I don't import the pool.
>
> Doing a "zpool import" shows that one of the HAST devices is FAULTED
> "corrupted data".
>
> Haven't tried anything to remove/replace the faulted device, just
> wanted to see if anyone knew what the above error meant.
>
> Pool was created using r219523 and successfully copied over 1 TB of
> data from another ZFS system.  Had some issues with gptboot this
> morning and the system locking up and rebooting a bunch, and now the
> pool won't import.

Along with this ZFS import issue, it seems that hastd doesn't like it
when you fire up 24 hast devices all at once (via for loop), each with
over 100 MB of dirty data in it.  hast dumps core, kernel panics, and
system reboots.

If I do 1 hast device every 2 seconds (or however long it takes to
manually type "hastctl role primary disk-a1") then it starts up fine.

So, I can now panic my 9-CURRENT system by either:
  - starting 24 hast devices at once, or
  - importing a ZFS pool made up of those 24 hast devices, with 1
corrupted device

Isn't testing fun?  :)

I have a bunch of vmcore files from the hast crashes, not really sure
what to do with them, though.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"