Re: [zfs-discuss] new labelfix needed

2010-09-02 Thread Benjamin Brumaire
your point have only a rethoric meaning. System breaks regardless the ressource 
you put to build it. Bad hardware, typo, human mistakes, bugs, This 
mailing-list is full of examples. Having some tools like zdb, mdb, zfs import 
-fFX and labelfix for analyzis and repair is always a good thing. BTW zfsck 
would be a great improvement to ZFS.

bbr
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Algorithm of block size estimation

2010-09-02 Thread Zhu Han
Hi,

Can anybody help me give the link on the code snippet of block size
estimation?

I want to know when ZFS makes a decision on the block size used for a file.
Does ZFS estimate it based on the length of file when the create event of
file is committed to disk during txg commit?

If so,  is the block size of a file changed during  the lifetime of the
file? I suppose the answer is no.

Thank you for your kindly help!

best regards,
hanzhu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Algorithm of block size estimation

2010-09-02 Thread Darren J Moffat

On 02/09/2010 11:18, Zhu Han wrote:

Can anybody help me give the link on the code snippet of block size
estimation?


See the zfs_write() function.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Algorithm of block size estimation

2010-09-02 Thread Zhu Han
Thank you!

Here is my understanding, leave it here as reference for others. If it's not
correct, please point it out.

ZFS estimates the size of block only when the file only has single block and
it is extended.  This is because dmu_object_set_blocksize() only set the
block size when the object contains a single block.  Once the file is beyond
single block, the block size is fixed and never be shrinked.

ZFS always sets the block size as the power 2 ceil of file size. The largest
block size is specified by the file system property. One special case is if
the current file block size is not power 2, it's new block size can be
greater than the file system recordsize property. Does anybody know why
there is such a short circuit?

best regards,
hanzhu


On Thu, Sep 2, 2010 at 6:30 PM, Darren J Moffat darr...@opensolaris.orgwrote:

 On 02/09/2010 11:18, Zhu Han wrote:

 Can anybody help me give the link on the code snippet of block size
 estimation?


 See the zfs_write() function.

 --
 Darren J Moffat

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] new labelfix needed

2010-09-02 Thread Mark J Musante

On Wed, 1 Sep 2010, Benjamin Brumaire wrote:


your point have only a rethoric meaning.


I'm not sure what you mean by that.  I was asking specifically about your 
situation.  You want to run labelfix on /dev/rdsk/c0d1s4 - what happened 
to that slice that requires a labelfix?  Is there something that zfs might 
be doing to cause the problem?  Is there something that zfs could be doing 
to mitigate the problem?



BTW zfsck would be a great improvement to ZFS.


What specifically would zfsck do that is not done by scrub?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool died during scrub

2010-09-02 Thread Jeff Bacon

 looks similar to a crash I had here at our site a few month ago. Same
 symptoms, no actual solution. We had to recover from a rsync backup
 server.

Thanks Carsten. And on Sun hardware, too. Boy, that's comforting 

Three way mirrors anyone?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] what is zfs doing during a log resilver?

2010-09-02 Thread Jeff Bacon
So, when you add a log device to a pool, it initiates a resilver. 

What is it actually doing, though? Isn't the slog a copy of the
in-memory intent log? Wouldn't it just simply replicate the data that's
in the other log, checked against what's in RAM? And presumably there
isn't that much data in the slog so there isn't that much to check? 

Or is it just doing a generic resilver for the sake of argument because
you changed something? 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What forum to use with a ZFS how-to question

2010-09-02 Thread Dominik Hoffmann
Is this the right forum to post a zfs how-to question? If not, what are your 
suggestions, which forum I should go to?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What forum to use with a ZFS how-to question

2010-09-02 Thread Cindy Swearingen

This is the right forum, fire away...

Feel free to review ZFS information in advance:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs

ZFS Administration Guide (Solaris 10):

http://docs.sun.com/app/docs/doc/819-5461

ZFS Best Practices Guide:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

cs



On 09/02/10 07:42, Dominik Hoffmann wrote:

Is this the right forum to post a zfs how-to question? If not, what are your 
suggestions, which forum I should go to?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4k block alignment question (X-25E)

2010-09-02 Thread Ray Van Dolson
On Tue, Aug 31, 2010 at 12:47:49PM -0700, Brandon High wrote:
 On Mon, Aug 30, 2010 at 3:05 PM, Ray Van Dolson rvandol...@esri.com wrote:
  I want to fix (as much as is possible) a misalignment issue with an
  X-25E that I am using for both OS and as an slog device.
 
 It's pretty easy to get the alignment right
 
 fdisk uses a default of 63/255/*, which isn't easy to change. This
 makes each cylinder ( 63 * 255 * 512b ).  You want ( $cylinder_offset
 ) * ( 63 * 255 * 512b ) / ( $block_alignment_size ) to be evenly
 divisible. For a 4k alignment you want the offset to be 8.
 
 With fdisk, create your SOLARIS2 partition that uses the entire disk.
 The partition will be from cylinder 1 to whatever. Cylinder 0 is used
 for the MBR, so it's automatically un-aligned.
 
 When you create slices in format, the MBR cylinder isn't visible, so
 you have to subtract 1 from the offset, so your first slice should
 start on cylinder 7. Each additional cylinder should start on a
 multiple of 8, minus 1. eg: 63, 1999, etc.
 
 It doesn't matter if the end of a slice is unaligned, other than to
 make aligning the next slice easier.
 
 -B

Thanks Brandon.

Just a follow-up to my original post... unfortunately I couldn't try
aligning the slice on the SSD I was also using for slog/ZIL.  The
slog/ZIL slice was too small to be added to the ZIL mirror as the disk
we'd thrown in the system bypassing the expander was being used
completely (via EFI label).

Still wanted to test, however, so I pulled one of the drives from my
rpool, and added the entire disk to my mirror.  This uses the EFI label
and aligns everything correctly.

Unit Attention errors immediately began showing up.

I pulled that drive from the ZIL mirror and then used one of my two
L2ARC drives (also X-25E's) in the same fashion.

Same problem.

So I believe the problem is still expander related moreso than
alignment related.

Too bad.

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Mark J Musante


What does 'zpool import' show?  If that's empty, what about 'zpool import 
-d /dev'?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Dominik Hoffmann
I just tried

admin$ zpool replace BackupRAID /dev/disk0 /dev/disk1 /dev/disk2
too many arguments

As you can see, it didn't do what I need to accomplish.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Dominik Hoffmann
I think, I just destroyed the information on the old raidz members by doing

zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2

The pool mounted fine after that, but is empty. None of the old information is 
present. Am I right?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] new labelfix needed

2010-09-02 Thread Benjamin Brumaire
my company changed our SAN and I migrated our zpool to the new luns with 
attach/detach while the zpool stayed online. Once finished we had a cluster 
crash and the zpool (on new luns) get corrupted. No way to import it, zpool 
import -fFX failed.  

The old luns are detached and probably sane from a data point of view... and I 
've  already successfully recovered a detached disk with the help from Jeff 
Bonwick. 

I guess zfsck or zpoolck could help when zpool can't be imported.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Mark J Musante

On Thu, 2 Sep 2010, Dominik Hoffmann wrote:


I think, I just destroyed the information on the old raidz members by doing

zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2


It should have warned you that two of the disks were already formatted 
with a zfs pool.  Did it not do that?  If so, perhaps these aren't the 
same disks you were using in your pool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Dominik Hoffmann
There was no warning. This is the output:

admin$ sudo zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2
Password:
admin$ zpool status
  pool: BackupRAID
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
BackupRAID   ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
disk0s2  ONLINE   0 0 0
disk1s2  ONLINE   0 0 0
disk2s2  ONLINE   0 0 0

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Dominik Hoffmann
I can only tell you what it is now:

admin$ zpool import
no pools available to import
admin$ zpool import -d /dev
no pools available to import
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to rebuild raidz after system reinstall

2010-09-02 Thread Dominik Hoffmann
Also, I am quite sure that I was using the actual drives from the old raidz. 
This is my drive listing:


admin$ diskutil list
/dev/disk0
___#:___TYPE NAMESIZE___IDENTIFIER
___0:__GUID_partition_scheme*465.8 Gi___disk0
___1:EFI_200.0 Mi___disk0s1
___2:ZFS BackupRAID__465.4 Gi___disk0s2
/dev/disk1
___#:___TYPE NAMESIZE___IDENTIFIER
___0:__GUID_partition_scheme*465.8 Gi___disk1
___1:EFI_200.0 Mi___disk1s1
___2:ZFS BackupRAID__465.4 Gi___disk1s2
/dev/disk2
___#:___TYPE NAMESIZE___IDENTIFIER
___0:__GUID_partition_scheme*465.8 Gi___disk2
___1:EFI_200.0 Mi___disk2s1
___2:ZFS_465.4 Gi___disk2s2

/dev/disk2 is the new drive.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Information lost? Does zpool create erase volumes

2010-09-02 Thread Dominik Hoffmann
Those of you who have read my previous post know that I was trying to 
reassemble a raidz after a complete reinstall of the OS on a Mac running 
zfs-119. In a fit of impatience, I executed the zpool create command on the 
three volumes, two of which were part of the old raidz, the third one having 
replaced the third old one, which had died. This did succeed in creating a 
raidz on my system, but all the original information is gone, leading me to 
believe that I caused the old information to be erased or at least inaccessible.

Can any of you please clue me in?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Information lost? Does zpool create erase volumes

2010-09-02 Thread Cindy Swearingen

Dominik,

You overwrite your data when you recreated a pool with the same
name and the same disks with zpool create.

If I try to recreate a pool that already exists, at least exported,
I will see a message similar to the following:

# zpool create tank c3t3d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t3d0s0 is part of exported or potentially active ZFS pool 
tank. Please see zpool(1M).


Did you attempt to import the pool prior to the zpool create?

Thanks,

Cindy

On 09/02/10 14:17, Dominik Hoffmann wrote:

Those of you who have read my previous post know that I was trying to 
reassemble a raidz after a complete reinstall of the OS on a Mac running 
zfs-119. In a fit of impatience, I executed the zpool create command on the 
three volumes, two of which were part of the old raidz, the third one having 
replaced the third old one, which had died. This did succeed in creating a 
raidz on my system, but all the original information is gone, leading me to 
believe that I caused the old information to be erased or at least inaccessible.

Can any of you please clue me in?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Information lost? Does zpool create erase volumes

2010-09-02 Thread Dominik Hoffmann
Yes, I did try to import the pool. However, the response of the command was no 
pools available to import.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Information lost? Does zpool create erase volumes

2010-09-02 Thread Cindy Swearingen
 Yes, I did try to import the pool. However, the
 response of the command was no pools available to
 import.

I'm not sure what happened to your pool, but I think it is possible
that the pool information on these disks was removed accidentally. 
I'm not sure what the diskutil command does but if it was run on
the wrong disk, then possibly all the pool info was removed from
the disks' labels. This is just a guess.

Then, when you reissued the zpool create command, it didn't 
complain that the disks were already in use because the pool info 
was already removed.

Cindy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] possible ZFS-related panic?

2010-09-02 Thread Marion Hakanson
Folks,

Has anyone seen a panic traceback like the following?  This is Solaris-10u7
on a Thumper, acting as an NFS server.  The machine was up for nearly a
year, I added a dataset to an existing pool, set compression=on for the
first time on this system, loaded some data in there (via rsync),
then mounted it to the NFS client.

The first data was written by the client itself in a 10pm cron-job, and
the system crashed at 10:02pm as below:

panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 
872b5610, *ppp 0

fe8000f5c470 unix:mutex_exit_critical_size+20219 ()
fe8000f5c4b0 unix:page_list_sub_pages+161 ()
fe8000f5c510 unix:page_claim_contig_pages+190 ()
fe8000f5c600 unix:page_geti_contig_pages+44b ()
fe8000f5c660 unix:page_get_contig_pages+c2 ()
fe8000f5c6f0 unix:page_get_freelist+1a4 ()
fe8000f5c760 unix:page_create_get_something+95 ()
fe8000f5c7f0 unix:page_create_va+2a1 ()
fe8000f5c850 unix:segkmem_page_create+72 ()
fe8000f5c8b0 unix:segkmem_xalloc+60 ()
fe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
fe8000f5c8f0 unix:segkmem_alloc+10 ()
fe8000f5c9c0 genunix:vmem_xalloc+315 ()
fe8000f5ca20 genunix:vmem_alloc+155 ()
fe8000f5ca90 genunix:kmem_slab_create+77 ()
fe8000f5cac0 genunix:kmem_slab_alloc+107 ()
fe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
fe8000f5cb00 zfs:zio_buf_alloc+1d ()
fe8000f5cb50 zfs:zio_compress_data+ba ()
fe8000f5cba0 zfs:zio_write_compress+78 ()
fe8000f5cbc0 zfs:zio_execute+60 ()
fe8000f5cc40 genunix:taskq_thread+bc ()
fe8000f5cc50 unix:thread_start+8 ()

syncing file systems... done
. . .

Unencumbered by more than a gut feeling, I disabled compression on
the dataset, and we've gotten through two nightly runs of the same
NFS client job without crashing, but of course we would tecnically
have to wait for nearly a year before we've exactly replicated the
original situation (:-).

Unfortunately the dump-slice was slightly too small, we were just short
of enough space to capture the whole 10GB crash dump.  I did get savecore
to write something out, and I uploaded it to the Oracle support site,but it 
gives scat too much indigestion to be useful to the engineer I'm working
with.  They have not found any matching bugs so far, so I thought I'd ask a
slightly wider audience here.

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss