from:"Henrik Johansson"

Re: [zfs-discuss] Advanced format 4K bug in illumos? ( was: zfs-discuss] Advanced Format HDD's - are we there yet? ...)

2012-06-07 Thread Henrik Johansson

Hello,

While we are talking Advanced format, does anyone know if bugid 7021758 is a 
issue in illumos?

Synopsis: zpool disk corruption detected on 4k block disks
http://wesunsolve.net/bugid/id/7021758

Regards
Henrik

http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Pool faulted in a bad way

2012-01-08 Thread Henrik Johansson

Hello,

I have been asked to take a look at at poll on a old OSOL 2009.06 host. It have 
been left unattended for a long time and it was found in a FAULTED state. Two 
of the disks in the raildz2 pool seems to have failed, one have been replaced 
by a spare, the other one is UNAVAIL. The machine was restarted and the damaged 
disks was removed to make it possible to access the pool without it hanging on 
I/O-errors.

Now, I have no indication on that more than two disk should have failed,  and 
one of them seems to have been replaced by the spare. I would then have 
expected the pool to be in a working state even with two failed disks and some 
bad data on the remaining disks since metadata has additional replication.

This is the current state of the pool, unable to be imported (at least with 
2009.06):

  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
tank   FAULTED  0 0 1  corrupted data
  raidz2   DEGRADED 0 0 6
c12t0d0ONLINE   0 0 0
c12t1d0ONLINE   0 0 0
spare  ONLINE   0 0 0
  c12t2d0  ONLINE   0 0 0
  c12t7d0  ONLINE   0 0 0
c12t3d0ONLINE   0 0 0
c12t4d0ONLINE   0 0 0
c12t5d0ONLINE   0 0 0
c12t6d0UNAVAIL  0 0 0  cannot open

If we look at the status it is a mismatch of between the status message that 
states that insufficient replicas are available and the status of the disks. 
More troublesome is the corrupted data status for the whole pool. I also get 
bad config type 16 for stats from zdb.

What can possible cause something like this, a faulty controller? Is there any 
way to recover (UB rollback with OI perhaps?) The server has ECC memory and 
another pool that is still working fine. The controller is a ARECA 1280.

And some output from zdb:

# zdb tank | more   
zdb: can't open tank: I/O error
version=14
name='tank'
state=0
txg=0
pool_guid=17315487329998392945
hostid=8783846
hostname='storage'
vdev_tree
type='root'
id=0
guid=17315487329998392945
bad config type 16 for stats
children[0]
type='raidz'
id=0
guid=14250359679717261360
nparity=2
metaslab_array=24
metaslab_shift=37
ashift=9
asize=14002698321920
is_log=0
root@storage:~# zdb tank  
version=14
name='tank'
state=0
txg=0
pool_guid=17315487329998392945
hostid=8783846
hostname='storage'
vdev_tree
type='root'
id=0
guid=17315487329998392945
bad config type 16 for stats
children[0]
type='raidz'
id=0
guid=14250359679717261360
nparity=2
metaslab_array=24
metaslab_shift=37
ashift=9
asize=14002698321920
is_log=0
bad config type 16 for stats
children[0]
type='disk'
id=0
guid=5644370057710608379
path='/dev/dsk/c12t0d0s0'
devid='id1,sd@x001b4d23002bb800/a'

phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@0,0:a'
whole_disk=1
DTL=154
bad config type 16 for stats
children[1]
type='disk'
id=1
guid=7134885674951774601
path='/dev/dsk/c12t1d0s0'
devid='id1,sd@x001b4d23002bb810/a'

phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@1,0:a'
whole_disk=1
DTL=153
bad config type 16 for stats
children[2]
type='spare'
id=2
guid=7434068041432431375
whole_disk=0
bad config type 16 for stats
children[0]
type='disk'
id=0
guid=5913529661608977121
path='/dev/dsk/c12t2d0s0'
devid='id1,sd@x001b4d23002bb820/a'

Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2011-12-27 Thread Henrik Johansson


On Dec 27, 2011, at 9:20 PM, Frank Cusack wrote:

 http://sparcv9.blogspot.com/2011/12/solaris-11-illumos-and-source.html
 
 If I upgrade ZFS to use the new features in Solaris 11 I will be unable to 
 import my pool using the free ZFS implementation that is available in illumos 
 based distributions
 
 Is that accurate?  I understand if the S11 version is ahead of illumos, of 
 course I can't use the same pools in both places, but that is the same 
 problem as using an S11 pool on S10.  The author is implying a much worse 
 situation, that there are zfs tracks in addition to versions and that S11 
 is now on a different track and an S11 pool will not be usable elsewhere, 
 ever.  I hope it's just a misrepresentation.

I think the author has a valid point ;)

I probably should have written zpools instead of ZFS in that sentence. It is 
same as always with different pool version and features, but in this case we 
don't now if they will be implemented and implemented in the same way outside 
of Oracle after zpool version 28 since we do not have the source and Oracle 
does't want to play with us.

Regards
Henrik

http://sparcv9.blogspot.com___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] dskinfo utility

2011-06-21 Thread Henrik Johansson

Hello,

I got tired of gathering disk information from different places when working 
with Solaris disks so I wrote a small utility for summarizing the most commonly 
used information.

It is especially tricky to work with a large set of SAN disks using MPxIO you 
do not even see the logical unit number in the name of the disk so your have to 
use other commands to acquire that information per disk.

The focus of the first version is ZFS so it does understand which disks are 
part of pools, later version might add other volume managers or filesystems.

Besides the name of the disk, size and usage it can also show number of FC 
paths to disks, if it's  labeled, driver type, logical unit number, vendor, 
serial and product names.

Examples, mind the format, it looks good with 80 columns:
$ dskinfo list  
disk
 sizeuse  type 
c0t600144F8288C50B55BC58DB70001d0 499G  -   iscsi 
c5t0d0  
  149G rpooldisk 
c5t2d0  
  37G  - disk 
c6t0d0  
  1.4T zpool01 disk 
c6t1d0  
  1.4T zpool01 disk 
c6t2d0  
  1.4T zpool01 disk 

# dskinfo list-long
disk   size lun   use   p   spd type  lb
c1t0d0 136G - rpool -   -   disk  y
c1t1d0 136G - rpool -   -   disk  y
c6t6879120292610822533095343732d0  100G 0x1   zpool03 4   4Gb fcy
c6t6879120292610822533095343734d0 100G 0x3   zpool03 4   4Gb fcy
c6t6879120292610822533095343736d0 404G 0x5   zpool03 4   4Gb fcy
c6t6879120292610822533095343745d0 5T  0xbzpool03 4   4Gb fcy

# dskinfo list-full
disk   size hex   dec p   spd type  lb
  use  vendor   product  serial  
c0t0d0 68G  - -   -   -   disk  y 
  rpoolFUJITSU  MAP3735N SUN72G  -   
c0t1d0 68G  - -   -   -   disk  y 
  rpoolFUJITSU  MAP3735N SUN72G  -   
c1t1d0 16G  - -   -   -   disk  y 
  storage  SEAGATE  ST318404LSUN18G  -   
c1t2d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t3d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t4d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAG3182L SUN18G  -   
c1t5d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   
c1t6d0 16G  - -   -   -   disk  y 
  storage  FUJITSU  MAJ3182M SUN18G  -   

I'we been using it for myself for a while now, I thought it might fill a need 
so I am making the current version available for download. Download link and 
some other information can be found here: 
http://sparcv9.blogspot.com/2011/06/solaris-dskinfo-utility.html

Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Thin devices/reclamation with ZFS?

2010-10-25 Thread Henrik Johansson

Hello,

Does anyone here have experience or thoughts regarding the use of ZFS on thin 
devices?

Since ZFS is COW it will not play nicly with this feature, it will spread it's 
blocks all over the space it has been given, and it has curently no way to get 
back in contact with the storage arrays to tell them which blocks have been 
freed since SCSI UNMAP/TRIM is not implemented with ZFS (But TRIM was added to 
SATA in b146).

Reclaiming disk space also seems a bit problematic since all data a spread 
across the disks including metadata, so even if you write the whole pool full 
of zeroes it will be mixed with non-zero data in the form of metadata. The 
vendor I am looking at requires 768K of zeroes to do a reclaim.

I have done some initial quick test to see if updates without increasing the 
size of the data on disk ends upp with ZFS resusing the blocks and not spread 
out with new blocks all the time, but it seems to continue to claim new blocks. 
(With S10U9, this can have changed with zpool recover, i know ZFS in the past 
was supposed to reuse blocks to take  advantage of the fastest parts of the 
disks?).

There is a RFE for this, but I would like to know if someone have had 
experience with this in it's current state. 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913905

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Broadening of ZFS open source license

2010-05-11 Thread Henrik Johansson


On May 11, 2010, at 10:29 PM, Hillel Lubman wrote:

 In the article about MeeGo: 
 http://lwn.net/SubscriberLink/387196/103bbafc9266fd0d/ it is stated, that 
 Oracle (together with RedHat) contributes a bulk part of BTRFS development. 
 Given that ZFS and BTRFS both share many similar goals, wouldn't it be 
 reasonable for Oracle to license ZFS under wider range of FOSS licenses 
 (similar to how Mozilla released their code under triple license, since MPL 
 is incomatible with GPL)? Is there any movement in that direction (or the 
 solid intention not to do so?).


I don't think so, not in the short run at least. Oracle has a edge over the 
competition with Solaris that also is the primary platform for ZFS development, 
they control Solaris and can used it in their advantage. Why give it away to 
the competition and incorporate ZFS it into a OS they does not control? Oracle 
knows how to make money and I don't think broadening the license for ZFS is 
going to do that in a near future.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Henrik Johansson


Hello,

On 17 mar 2010, at 16.22, Paul van der Zwan paul.vanderz...@sun.com  
wrote:




On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:

Someone correct me if I'm wrong, but it could just be a  
coincidence. That is, perhaps the data that you copied happens to  
lead to a dedup ratio relative to the data that's already on there.  
You could test this out by copying a few gigabytes of data you know  
is unique (like maybe a DVD video file or something), and that  
should change the dedup ratio.


The first copy of that data was unique and even dedup is switched  
off for the entire pool so it seems a bug in the calculation of the

dedupratio or it used a method that is giving unexpected results.


I wonder if the dedup ratio is calculated by the contents of the DDT  
or by all the data contents of the whole pool, i'we only looked at the  
ratio for datasets which had dedup on for the whole lifetime. If the  
former, data added when it's switched off will never alter the ratio  
(until rewritten when with dedup on). The source should have the  
answer, but i'm on mail only for a few weeks.


It'a probably for the whole dataset, that makes the most sense, just a  
thought.


Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Henrik Johansson


On 18 mar 2010, at 18.38, Craig Alder craig.al...@sun.com wrote:

I remembered reading a post about this a couple of months back.   
This post by Jeff Bonwick confirms that the dedupratio is calculated  
only on the data that you've attempted to deduplicate, i.e. only the  
data written whilst dedup is turned on - http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034721.html 
.


Ah, I was on the right track then with the DDT then :) guess most  
people have it turned on/off from the begining until BP rewrite to  
ensure everything is deduplicated(which is probably a good idea).


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] why L2ARC device is used to store files ?

2010-03-06 Thread Henrik Johansson

Hello,

On Mar 5, 2010, at 10:46 AM, Abdullah Al-Dahlawi wrote:

 Greeting All
 
 I have create a pool that consists oh a hard disk and a ssd as a cache
 
 zpool create hdd c11t0d0p3
 zpool add hdd cache c8t0d0p0 - cache device
 
 I ran an OLTP bench mark to emulate a DMBS
 
 One I ran the benchmark, the pool started create the database file on the ssd 
 cache device ???
 
 
 can any one explain why this happening ?
 
 is not L2ARC is used to absorb the evicted data from ARC ?

No, it is not. if we look in the source there is a very good description of the 
L2ARC behavior:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c

1. There is no eviction path from the ARC to the L2ARC.  Evictions from the 
ARC behave as usual, freeing buffers and placing headers on ghost lists.  The 
ARC does not send buffers to the L2ARC during eviction as this would add 
inflated write latencies for all ARC memory pressure.

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Hardware for high-end ZFS NAS file server - 2010 March edition

2010-03-04 Thread Henrik Johansson



Hello,

On 4 mar 2010, at 11.11, Robert Milkowski mi...@task.gda.pl wrote:


On 04/03/2010 09:46, Dan Dascalescu wrote:
Please recommend your up-to-date high-end hardware components for  
building a highly fault-tolerant ZFS NAS file server.





2x M5000 + 4x EMC DMX

Sorry, I couldn't resist :)



I would not recommend that, you can't change boards in anything less  
than a M8000 so your service would have to switch nodes just to  
replace a CPU. ;)


Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to verify ecc for ram is active and enabled?

2010-03-04 Thread Henrik Johansson



Hello,

On 4 mar 2010, at 10.26, ace tojakt...@gmail.com wrote:

A process will continually scrub the memory, and is capable of  
correcting any one error per 64-bit word of memory.

at http://www.stringliterals.com/?tag=opensolaris.

If this is true what is the process and how is it accessed?


No, it's a kernel thread, something like:

# echo ::thread ! grep scrub

Or

echo memscrub_scans_done/U | mdb-k

This depeds om what platform you are on, some platforms do ths in  
hardware.


Google for the later to find some good pages with more info.

I'm not at my workstation so mind minor faults.

Henrik
http://sparcv9.blogspot.com___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Fishworks 2010Q1 and dedup bug?

2010-03-04 Thread Henrik Johansson

Hi all,

Now that the Fishworks 2010.Q1 release seems to get deduplication, does anyone 
know if bugid: 6924824 (destroying a dedup-enabled dataset bricks system) is 
still valid, it has not been fixed in in onnv and it is not mentioned in the 
release notes.

This is one of the bugs i've been keeping my eyes on before using dedup for any 
serious work, so I was a but surprised to see that it was in the 2010Q1 release 
but not fixed in ON. It might not be an issue, just curious, both from a 
fishworks perspective and from a OpenSolaris perspective.

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] flying ZFS pools

2010-03-01 Thread Henrik Johansson

Hello,

On Mar 1, 2010, at 11:57 PM, Ahmad AlTwaijiry wrote:

 Hi everyone,
 
 I'm preparing around 6 Solaris physical servers and I want to see if
 it's possible to create a zfs pool that I can make it as a shared pool
 between all the 6 servers (not concurrent, just active-passive way) is
 that possible? Is there any article that can show me how to do it ?
 
 sorry if this is a basic question but I'm new to ZFS area, in UFS I
 can just create a metaset between all the servers and I just release
 and take over manually and this is what I want to do with ZFS

It's even easier with ZFS, as long as all severs has access to all disks you 
can just do a zfs export of the pool and then a zpool import on another 
node. The easier part is that you do not need to add stuff to vfstab and or 
have any local knowledge of the pool layout.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Observations about compressability of metadata L2ARC

2010-03-01 Thread Henrik Johansson


On Feb 21, 2010, at 6:40 PM, Andrey Kuzmin wrote:

 I don't see why this couldn't be extended beyond metadata (+1 for the
 idea): if zvol is compressed, ARC/L2ARC could store compressed data.
 The gain is apparent: if user has compression enabled for the volume,
 he/she expects volume's data to be compressable at good ratio,
 yielding significant reduction of ARC memory footprint/L2ARC usable
 capacity boost.

I think something similar was discussed by Jeff and Bill in the ZFS keynote as 
KCA, Just-in-time decompression. Keeping prefetched data in memory but without 
decompressing it. I'll guess you would want the data decompressed it it's going 
to be used, at least frequently. The also discussed that unused data in the ARC 
might be able to be compressed in the future.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More performance questions [on zfs over nfs]

2010-02-21 Thread Henrik Johansson


On Feb 21, 2010, at 7:47 PM, Harry Putnam wrote:

 
 Working from a remote linux machine on a zfs fs that is an nfs mounted
 share (set for nfs availability on zfs server, mounted nfs on linux);
 I've been noticing a certain kind of sloth when messing with files.
 
 What I see:  After writing a file it seems to take the fs too long to
 be able to display the size correctly (with du).

You will not see the on disk size of the file with du before the transaction 
group have been committed which can take up to 30 seconds. ZFS does not even 
know how much space it will consume before writing out the data to disks since 
compression might be enabled. You can test this by executing sync(1M) on your 
file server, when it returns you should have the final size of the file.

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] server hang with compression on, ping timeouts from remote machine

2010-01-31 Thread Henrik Johansson

Hello Christo,

On Jan 31, 2010, at 4:07 PM, Christo Kutrovsky wrote:

 Hello All,
 
 I am running NTFS over iSCSI on a ZFS ZVOL volume with compression=gzip-9 and 
 blocksize=8K. The server is 2 core P4 3.0 Ghz with 5 GB of RAM.
 
 Whenever I start copying files from Windows onto the ZFS disk, after about 
 100-200 Mb been copied the server starts to experience freezes. I have iostat 
 running, which freezes as well. Even pings on both of the network adapters 
 are reporting either 4000 ms or timeouts for when the freeze is happening.
 
 I have reproduce the same behavior with a 1 GB test ZVOL. Whenever I do 
 sequential writes of 64 Kb with compression=gzip-9 I experience the freezes. 
 With compression=off it's all good.
 
 I've also experienced similar behavior (short freezes) when running zfs 
 send|zfs receive with compression on LOCALLY on ZVOLs again.

I think gzip in ZFS have a reputation being somewhat heavy on system resources, 
that said it would be nice if it did not have such a large impact on low level 
functions. Have a look in the archive, search for example death-spriral or 
Death-spriral revisited. Have you tried using the default compression algorithm 
also (lzjb, compresison=on)?

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Removing large holey file does not free space 6792701 (still)

2010-01-22 Thread Henrik Johansson

Hello,I mentioned this problem a year ago here and filed 6792701 and I know it has been discussed since. It should have been fixed in snv_118, but I can still trigger the same problem. This is only triggered if the creation of a large file is aborted, for example by loss of power, crash or SIGINT to mkfile(1M). The bug should probably be reopened but I post it here since some people where seeing something similar.Example and attached zdb output:filer01a:/$ uname -a  SunOS filer01a 5.11 snv_130 i86pc i386 i86pc Solarisfiler01a:/$ zpool create zpool01 raidz2 c4t0d0 c4t1d0 c4t2d0 c4t4d0 c4t5d0 c4t6d0filer01a:/$ zpool create zpool01 raidz2 c4t0d0 c4t1d0 c4t2d0 c4t4d0 c4t5d0 c4t6d0filer01a:/$ zfs list zpool01   NAME   USED AVAIL REFER MOUNTPOINTzpool01  123K 5.33T 42.0K /zpool01filer01a:/$ df -h /zpool01Filesystem  Size Used Avail Use% Mounted onzpool015.4T  42K 5.4T  1% /zpool01filer01a:/$ mkfile 1024G /zpool01/largefile   ^C   filer01a:/$ zfs list zpool01NAME   USED AVAIL REFER MOUNTPOINTzpool01  160G 5.17T  160G /zpool01filer01a:/$ ls -hl /zpool01/largefile  -rw--- 1 root root 1.0T 2010-01-22 15:02 /zpool01/largefilefiler01a:/$ rm /zpool01/largefilefiler01a:/$ sync filer01a:/$ zfs list zpool01   NAME   USED AVAIL REFER MOUNTPOINTzpool01  160G 5.17T  160G /zpool01filer01a:/$ df -h /zpool01Filesystem  Size Used Avail Use% Mounted onzpool015.4T 161G 5.2T  3% /zpool01filer01a:/$ ls -l /zpool01total 0filer01a:/$ zfs list -t all zpool01   NAME   USED AVAIL REFER MOUNTPOINTzpool01  160G 5.17T  160G /zpool01filer01a:/$ zpool export zpool01 filer01a:/$ zpool import zpool01 filer01a:/$ zfs list zpool01   NAME   USED AVAIL REFER MOUNTPOINTzpool01  160G 5.17T  160G /zpool01filer01a:/$ zfs -ddd zpool01cut Object lvl  iblk  dblk dsize lsize  %full typecut5  5  16K  128K  160G   1T  15.64 ZFS plain file/cut

zpool01.zdb
Description: Binary data

Henrikhttp://sparcv9.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Zpool is a bit Pessimistic at failures

2010-01-21 Thread Henrik Johansson

Hello,

Anyone else noticed that zpool is kind of negative when reporting back from 
some error conditions?

Like:
cannot import 'zpool01': I/O error
Destroy and re-create the pool from
a backup source.

or even worse:

cannot import 'rpool': pool already exists
Destroy and re-create the pool from
a backup source.

The first one i got when doing some failure testing on my new storage node, 
i've pulled several disks from a raidz2 to simulate loss off connectivity, 
lastly I pulled a third one which as expected made the pool unusable and later 
exported the pool. But when I reconnected one of the previous two drives and 
tried a import I got this message, the pool was fine once I reconnected the 
last disk to fail, so the messages seems a bit pessimistic.

The second one i got when importing a old rpool with altroot but forgot to 
specify a new name for the pool, the solution to just add a new name to the 
pool was much better than recreating the pool and restoring from backup.

I think this could scare or even make new users do terrible things, even if the 
errors could be fixed. I think I'll file a bug, agree?

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Henrik Johansson

Hello,

On Jan 11, 2010, at 6:53 PM, bank kus wrote:

 For example, you could set it to half your (8GB) memory so that 4GB is
 immediately available for other uses.
 
 * Set maximum ZFS ARC size to 4GB
 
 capping max sounds like a good idea.


Are we still trying to solve the starvation problem?

I filed a bug on the non-ZFS related urandom stall problem yesterday, primary 
since it can do nasty things from inside a resource capped zone:
CR 6915579 solaris-cryp/random Large read from /dev/urandom can stall system

Regards
Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Henrik Johansson

Hello again,

On Jan 10, 2010, at 5:39 AM, bank kus wrote:

 Hi Henrik
 I have 16GB Ram on my system on a lesser RAM system dd does cause problems as 
 I mentioned above. My __guess__ dd is probably sitting in some in memory 
 cache since du -sh doesnt show the full file size until I do a sync.
 
 At this point I m less looking for QA type repro questions and/or 
 speculations rather looking for  ZFS design expectations. 
 
 What is the expected behaviour, if one thread queues 100 reads  and another 
 thread comes later with 50 reads are these 50 reads __guaranteed__ to fall 
 behind the first 100 or is timeslice/fairshre done between two streams? 
 
 Btw this problem is pretty serious with 3 users using the system one of them 
 initiating a large copy grinds the other 2 to a halt. Linux doesnt have this 
 problem and this is almost a switch O/S moment for us unfortunately :-(

Have you reproduced the problem without using /dev/urandom? I can only get this 
behavior when using dd from urandom, not using files with cp, and not even 
files with dd. This could then be related the random driver spending kernel 
time in high priority threads.

So while I agree that this is not optimal, there is a huge difference in how 
bad it is, if it's urandom generated there is no problem with copying files. 
Since you also found that it's not related to ZFS (also tmpfs, and perhaps only 
urandom?) we are on the wrong list. Please isolate the problem, can we put 
aside any filesystem, if so we are on the wrong list, i've added perf-discuss 
also.

Regards

Henrik
http://sparcv9.blogspot.com


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Henrik Johansson

Hello Bob,

On Jan 10, 2010, at 4:54 PM, Bob Friesenhahn wrote:

 On Sun, 10 Jan 2010, Phil Harman wrote:
 In performance terms, you'll probably find that block sizes beyond 128K add 
 little benefit. So I'd suggest something like:
 
 dd if=/dev/urandom of=largefile.txt bs=128k count=65536
 
 dd if=largefile.txt of=./test/1.txt bs=128k 
 dd if=largefile.txt of=./test/2.txt bs=128k 
 
 As an interesting aside, on my Solaris 10U8 system (plus a zfs IDR), dd 
 (Solaris or GNU) does not produce the expected file size when using 
 /dev/urandom as input:

Do you feel this is related to the filesystem, is there any difference between 
putting the data in a file on ZFS or just throwing it away? 

$(dd if=/dev/urandom of=/dev/null bs=1048576k count=16) gives me a quite 
unresponsive system too.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Henrik Johansson




Henrik
http://sparcv9.blogspot.com

On 9 jan 2010, at 04.49, bank kus kus.b...@gmail.com wrote:


dd if=/dev/urandom of=largefile.txt bs=1G count=8

cp largefile.txt ./test/1.txt 
cp largefile.txt ./test/2.txt 

Thats it now the system is totally unusable after launching the two  
8G copies. Until these copies finish no other application is able to  
launch completely. Checking prstat shows them to be in the sleep  
state.


Question:
 I m guessing this because ZFS doesnt use CFQ and that one process  
is allowed to queue up all its I/O reads ahead of other processes?




What is CFQ, a sheduler, if you are running OpenSolaris, then you do  
not have CFQ.


 Is there a concept of priority among I/O reads? I only ask  
because if root were to launch some GUI application they dont start  
up until both copies are done. So there is no concept of priority?  
Needless to say this does not exist on Linux 2.60...

--


Probably not, but ZFS only runs in userspace on Linux with fuse so it  
will be quite different.






This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] I/O Read starvation

2010-01-09 Thread Henrik Johansson

On Jan 9, 2010, at 2:02 PM, bank kus wrote:

 Probably not, but ZFS only runs in userspace on Linux
 with fuse so it  
 will be quite different.
 
 I wasnt clear in my description, I m referring to ext4 on Linux. In fact on a 
 system with low RAM even the dd command makes the system horribly 
 unresponsive. 
 
 IMHO not having fairshare or timeslicing between different processes issuing 
 reads is frankly unacceptable given a lame user can bring the system to a 
 halt with 3 large file copies. Are there ZFS settings or Project Resource 
 Control settings one can use to limit abuse from individual processes?
 -- 

Are your sure this problem is related to ZFS? I have no problem with multiple 
threads reading and writing to my pools, it's till responsive, if I however put 
urandom with dd into the mix I get much more latency. 

Does't  for example $(dd if=/dev/urandom of=/dev/null bs=1048576k count=8) give 
you the same problem, or if you use the file you already created from urandom 
as input to dd?

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris 10 and ZFS dedupe status

2010-01-05 Thread Henrik Johansson


On Jan 5, 2010, at 4:38 PM, Bob Friesenhahn wrote:

 On Mon, 4 Jan 2010, Tony Russell wrote:
 
 I am under the impression that dedupe is still only in OpenSolaris and that 
 support for dedupe is limited or non existent.  Is this true?  I would like 
 to use ZFS and the dedupe capability to store multiple virtual machine 
 images.  The problem is that this will be in a production environment and 
 would probably call for Solaris 10 instead of OpenSolaris.  Are my 
 statements on this valid or am I off track?
 
 If dedup gets scheduled for Solaris 10 (I don't know), it would surely not be 
 available until at least a year from now.
 
 Dedup in OpenSolaris still seems risky to use other than for experimental 
 purposes.  It has only recently become available.

I've just wrote an entry about update 9,  I think it will contain zpool version 
19, so no dedup for this release if that's  correct.

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Help on Mailing List

2009-12-31 Thread Henrik Johansson

http://mail.opensolaris.org/pipermail/zfs-discuss/

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] best way to configure raidz groups

2009-12-30 Thread Henrik Johansson

Hello,

On Dec 30, 2009, at 2:08 PM, Thomas Burgess wrote:

 I'm about to build a ZFS based NAS and i'd like some suggestions about how to 
 set up my drives.
 
 The case i'm using holds 20 hot swap drives, so i plan to use either 4 vdevs 
 with 5 drives or 5 vdevs with 4 drives each (and a hot spare inside the 
 machine)
 
 
 The motherboard i'm getting has 4 pci-x slots 2 @ 133 Mhz and 2 @ 100 Mhz
 
 I was planning on buying 3 of the famous AOC-SAT2-MV8 cards which would give 
 me more than enough sata slots.  I'll also have 6 onboard slots.
 
 I also plan on using 2 sata= compact flash adapters with 16 gb compact flash 
 cards for the os.
 
 My main question is what is the best way to lay out the vdevs? 
 
 Does it really matter how i lay them out considering i only have gigabit 
 network?  

It depends, random I/O and resilver/scrubbing should be a bit faster with 5 
vdevs but for sequential data access it should not matter over gigabit.  It all 
comes down to what you want out of the configuration, redundancy versus usable 
space and price.

raidz2 might be a better choice than raidz, especially  if you have large 
disks. For most of my storage need I would probably build a pool out of 4 
radiz2 vdevs.

Regards

Henrik
Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Dedupe reporting incorrect savings

2009-12-15 Thread Henrik Johansson

Hello,

On Dec 15, 2009, at 8:02 AM, Giridhar K R wrote:

 Hi,
 Created a zpool with 64k recordsize and enabled dedupe on it.
 zpool create -O recordsize=64k TestPool device1
 zfs set dedup=on TestPool
 
 I copied files onto this pool over nfs from a windows client.
 
 Here is the output of zpool list
 Prompt:~# zpool list
 NAME  SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 TestPool   696G  19.1G   677G 2%  1.13x  ONLINE  -
 
 When I ran a dir /s command on the share from a windows client cmd, I see 
 the file size as 51,193,782,290 bytes. The alloc size reported by zpool along 
 with the DEDUP of 1.13x does not addup to 51,193,782,290 bytes.
 
 According to the DEDUP (Dedupe ratio) the amount of data copied is 21.58G 
 (19.1G * 1.13) 

Are you sure this problem is related to ZFS, not a Windows, link or CIFS issue? 
 Have you looked at the filesystem from the OpenSolaris host locally? Are sure 
there are no links in the filesystems that the windows client  also counts? 

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Space not freed?

2009-12-14 Thread Henrik Johansson


Hello,

On 14 dec 2009, at 14.16, Markus Kovero markus.kov...@nebula.fi wrote:

Hi, if someone running 129 could try this out, turn off compression  
in your pool, mkfile 10g /pool/file123, see used space and then  
remove the file and see if it makes used space available again. I’m  
having trouble with this, reminds me of similar bug that occurred in 
 111-release.



I filed bug about a year ago on a similar issue, bugid: 6792701, but  
it should have been fixed in snv_118.


Regards

Henrik
http://sparcv9.blogspot.com___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] scrub differs in execute time?

2009-11-13 Thread Henrik Johansson


How do you do,

On 13 nov 2009, at 11.07, Orvar Korvar  
knatte_fnatte_tja...@yahoo.com wrote:


I have a raidz2 and did a scrub, it took 8h. Then I reconnected some  
drives to other SATA ports, and now it takes 15h to scrub??


Why is that?


Could you perhaps provid some more info?

Which OSOL release?  are the new disks utilized? Have the pool data  
changed? Is there a difference in how much data that is read from the  
disks? Is the system otherwise idle? Which SATA controller? Does  
iostat show any errors?


Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ARC cache and ls query

2009-10-30 Thread Henrik Johansson


Hello John,

On Oct 30, 2009, at 9:03 PM, John wrote:


Hi,

On an idle server, when I do a recursive '/usr/bin/ls' on a folder,  
I see a lot of disk activity. This makes sense because the results  
(metadata/data) may not have been cached.
When I do a second ls on the same folder right after the first one  
finished, I do see disk activity again.


Can someone explain why the results are not cached in ARC?


You would have disk access again unless you have turned set atime to  
off for that filesystem. I did posted something similar a few days  
back and write a summary of the ARC-part of my findings: http://sparcv9.blogspot.com/2009/10/curious-case-of-strange-arc.html


Here is the whole thread: 
http://opensolaris.org/jive/thread.jspa?messageID=430385

If that does not explain it you should probably provide some more  
data, how many files, some ARC statistics etc.


Regards
Henrik

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ARC cache and ls query

2009-10-30 Thread Henrik Johansson



On Oct 30, 2009, at 10:20 PM, John wrote:



thanks Henrik. This makes perfect sense. More questions.
arc_meta_limit is set to a quarter of the ARC size.
what is arc_meta_max ?
On some systems, I have arc_meta_max  arc_meta_limit.

Example:
arc_meta_used = 29427 MB
arc_meta_limit= 16125 MB
arc_meta_max  = 29427 MB

Example 2:
arc_meta_used =  5885 MB
arc_meta_limit=  5885 MB
arc_meta_max  = 17443 MB
--  


That looks very strange, the source says:
if (arc_meta_max  arc_meta_used)
 arc_meta_max = arc_meta_used;

So arc_meta_max should be the maximum amount of that arc_meta_used has  
ever reached.


The limit on the metadata  is not enforced synchronously, but that  
seems to be quite a bit over the limit. What are these machines doing,  
are they quickly processing large numbers  files/directories? I do not  
know the exact implementation of this but perhaps new metadata is  
added to the cache faster than it gets purged. Maybe someone else  
knows more exactly how this works?


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sub-optimal ZFS performance

2009-10-29 Thread Henrik Johansson



On Oct 29, 2009, at 5:23 PM, Bob Friesenhahn wrote:


On Thu, 29 Oct 2009, Orvar Korvar wrote:

So the solution is to never get more than 90% full disk space, för  
fan?


Right.  While UFS created artificial limits to keep the filesystem  
from getting so full that it became sluggish and sick, ZFS does  
not seem to include those protections.  Don't ever run a ZFS pool  
for a long duration of time at very close to full since it will  
become excessively fragmented.


Setting quotas for all dataset could perhaps be of use for some of us.  
A überquota property for the whole pool would have been nice until a  
real solution is available.


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sub-optimal ZFS performance

2009-10-28 Thread Henrik Johansson



On Oct 16, 2009, at 1:01 AM, Henrik Johansson wrote:



My guess would be that this is due to fragmentation during some  
time when the filesystem might have been close to full, but it is  
still pretty terrible numbers even with 0.5M files in the  
structure. And while this is very bad I would at least expect the  
ARC to cache data and make a second run go faster:


I solved this, the second run was also slow since the metadata part of  
the ARC was to small, raising arc_meta_limit helped, and turning off  
atime also helped much since this directory seem to be terrible  
fragmented. With these changes the ARC helps so that the second goes  
as fast as it should. The fragmentation can be solved by a copy if I  
would want to keep the files. I wrote some more details about what I  
did if anyone is interested:


http://sparcv9.blogspot.com/2009/10/curious-case-of-strange-arc.html

I'll make sure to keep some more free space in my pools at all times  
now ;)


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] sub-optimal ZFS performance

2009-10-15 Thread Henrik Johansson


Hello,

ZFS is behaving strange on a OSOL laptop, your thoughts are welcome.

I am running OSOL on my laptop, currently b124 and i found that the  
performance of ZFS is not optimal in all situations. If i check the  
how much space the package cache for pkg(1) uses, it takes a bit  
longer on this host than on comparable machine to which i transferred  
all the data.


u...@host:/var/pkg$ time du -hs download
6.4Gdownload
real87m5.112s
user0m6.820s
sys 1m46.111s

My guess would be that this is due to fragmentation during some time  
when the filesystem might have been close to full, but it is still  
pretty terrible numbers even with 0.5M files in the structure. And  
while this is very bad I would at least expect the ARC to cache data  
and make a second run go faster:


u...@host:/var/pkg$ time du -hs download
6.4Gdownload
real94m14.688s
user0m6.708s
sys 1m27.105s

Two runs on the machine to which i have transferred the directory  
structure:


$ time du -hs download
6.4Gdownload
real2m59.60s
user0m3.83s
sys 0m18.87s

This goes a bit faster after the initial run also:

$ time du -hs download
6.4Gdownload
real0m15.40s
user0m3.40s
sys 0m11.43s

The disk are of course very busy during the first runs on both  
machines, but on the slow machine it has to do all the work again  
while the disk in the fast machine gets to rest on the second run.


Slow system (OSOL b124, T61 Intel c2d laptop root pool on 2.5 disk):
memstat pre first run:
Kernel 162685   635   16%
ZFS File Data   81284   3178%
Anon57323   2236%
Exec and libs3248120%
Page cache  14924581%
Free (cachelist) 7881301%
Free (freelist)700315  2735   68%

Total 1027660  4014
Physical  1027659  4014

memstat post first run:
Page SummaryPagesMB  %Tot
     
Kernel 461153  1801   45%
ZFS File Data   83598   3268%
Anon58389   2286%
Exec and libs3215120%
Page cache  14958581%
Free (cachelist) 6849261%
Free (freelist)399498  1560   39%

Total 1027660  4014
Physical  1027659  4014

arcstat first run:
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%   
arcsz c
21:02:31   27919  7114 7   3011   10   439M 
3G
21:12:31   19060 3152   28 8   9760   32   734M 
3G
21:22:31   22558 2557   25 0   9458   25   873M 
3G
21:32:31   20651 2451   24 0   2450   24   985M 
3G
21:42:31   17543 2443   24 0   2942   24 1G 
3G
21:52:31   16248 2948   29 0   5448   29 1G 
3G
22:02:31   15955 3454   34 0   9055   34 1G 
3G
22:12:31   16441 2541   24 0   6141   25 1G 
3G
22:22:31   16140 2440   24 0   6840   24 1G 
3G


arcstat second run:
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%   
arcsz c
22:35:521K   447 24   429   2317   47   436   26 1G 
3G
22:45:52   16340 2440   24 0   7540   24 1G 
3G
22:55:52   16140 2540   24 0   8640   25 1G 
3G
23:05:52   15940 2539   25 0   7140   25 1G 
3G
23:15:52   15840 2540   25 0   8640   25 1G 
3G
23:25:52   15840 2540   25 0  10040   25 1G 
3G
23:35:52   15740 2540   25 0  10040   25 1G 
3G
23:45:52   15840 2540   25 0  10040   25 1G 
3G
23:55:52   16040 2540   25 0  10040   25 1G 
3G
00:05:52   15640 2540   25 0  10040   25 1G 
3G



Fast system (OSOL b124, AMD Athlon X2 server, tested on root pool on  
2.5 SATA disk)

Memstat pre run:
Page SummaryPagesMB  %Tot
     
Kernel 160338   6268%
ZFS File Data   44875   1752%
Anon24388951%
Exec and libs1295 50%
Page cache   6490250%
Free (cachelist) 4786180%
Free (freelist)   1753978  6851   88%
Balloon 0

Re: [zfs-discuss] Hot Spares spin down?

2009-10-08 Thread Henrik Johansson


Hi there,

On Oct 8, 2009, at 9:46 PM, bjbm wrote:

Sorry if this is a noob question but I can't seem to find this info  
anywhere.


Are hot spares generally spunned down until they are needed?


No, but have a look at power.conf(4) and the device-thresholds keyword  
to spin down disks.


Here is a bigadmin article also: 
http://www.sun.com/bigadmin/features/articles/disk_power_saving.jsp

Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] KCA ZFS keynote available

2009-09-29 Thread Henrik Johansson


Hello everybody,

The KCA ZFS keynote by Jeff and Bill seems to be available online now: 
http://blogs.sun.com/video/entry/kernel_conference_australia_2009_jeff

It should probably be mentioned here, i might have missed it.

Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Error recovery with replicated metadata

2009-09-13 Thread Henrik Johansson


Hello all,

I have managed to get my hands on a OSOL 2009.06 root disk which has  
three failed blocks on it, these three blocks makes it impossible to  
boot from the disk and to import the disk to on another machine. I  
have checked the disk and three blocks are inaccessible, quite close  
to each other. Now should this not have good a good chance of being  
saved by replicated metadata? The data on the disk is usable, i did a  
block copy of the whole disk to a new one, and the scrub works out  
flawlessly. I guess this could be a timeout issue, but the disk is at  
least a WD RE2 disk with error recovery of 7 seconds. The failing  
systems release was 111a, and I have tried to import it into 122.


The disk was used by one of my friends which i have converted into  
using Solaris and ZFS for his company storage needs, and he is a bit  
skeptical when three blocks makes the whole pool unusable. The good  
part is that he uses mirrors for his rpool even on this non critical  
system now ;)


Anyway, can someone help to explain this, is there any timeouts that  
can be tuned to import the pool or is this a feature, obviously all  
data that is needed is intact on the disk since the block copy of the  
pool worked fine.


Also don't we need a force option for the -e option to zdb, so that we  
can use it with pools thats not have been exported correctly from a  
failing machine?


The import timeouts after 41 seconds:

r...@arne:/usr/sbin# zpool import -f 2934589927925685355 dpool
cannot import 'rpool' as 'dpool': one or more devices is currently  
unavailable


r...@arne:/usr/sbin# zpool import
  pool: rpool
id: 2934589927925685355
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier  
and

the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

rpool   ONLINE
  c1t4d0s0  ONLINE

Damaged blocks as reported by format:
Medium error during read: block 8646022 (0x83ed86) (538/48/28)
ASC: 0x11   ASCQ: 0x0
Medium error during read: block 8650804 (0x840034) (538/124/22)
ASC: 0x11   ASCQ: 0x0
Medium error during read: block 8651987 (0x8404d3) (538/143/8)
ASC: 0x11   ASCQ: 0x0

What i managed to get out of zdb:

r...@arne:/usr/sbin# zdb -e 2934589927925685355
WARNING: pool '2934589927925685355' could not be loaded as it was last  
accessed by another system (host: keeper hostid: 0xc34967). See: http://www.sun.com/msg/ZFS-8000-EY

zdb: can't open 2934589927925685355: No such file or directory

r...@arne:/usr/sbin# zdb -l /dev/dsk/c1t4d0s0

LABEL 0

version=14
name='rpool'
state=0
txg=269696
pool_guid=2934589927925685355
hostid=12798311
hostname='keeper'
top_guid=9161928630964440615
guid=9161928630964440615
vdev_tree
type='disk'
id=0
guid=9161928630964440615
path='/dev/dsk/c7t1d0s0'
devid='id1,s...@sata_wdc_wd5000ys-01m_wd-wcanu2080316/a'
phys_path='/p...@0,0/pci8086,2...@1c,4/pci1043,8...@0/ 
d...@1,0:a'

whole_disk=0
metaslab_array=23
metaslab_shift=32
ashift=9
asize=500067467264
is_log=0

LABEL 1

version=14
name='rpool'
state=0
txg=269696
pool_guid=2934589927925685355
hostid=12798311
hostname='keeper'
top_guid=9161928630964440615
guid=9161928630964440615
vdev_tree
type='disk'
id=0
guid=9161928630964440615
path='/dev/dsk/c7t1d0s0'
devid='id1,s...@sata_wdc_wd5000ys-01m_wd-wcanu2080316/a'
phys_path='/p...@0,0/pci8086,2...@1c,4/pci1043,8...@0/ 
d...@1,0:a'

whole_disk=0
metaslab_array=23
metaslab_shift=32
ashift=9
asize=500067467264
is_log=0

LABEL 2

version=14
name='rpool'
state=0
txg=269696
pool_guid=2934589927925685355
hostid=12798311
hostname='keeper'
top_guid=9161928630964440615
guid=9161928630964440615
vdev_tree
type='disk'
id=0
guid=9161928630964440615
path='/dev/dsk/c7t1d0s0'
devid='id1,s...@sata_wdc_wd5000ys-01m_wd-wcanu2080316/a'
phys_path='/p...@0,0/pci8086,2...@1c,4/pci1043,8...@0/ 
d...@1,0:a'

whole_disk=0
metaslab_array=23
metaslab_shift=32
ashift=9
asize=500067467264
is_log=0

LABEL 3

version=14
name='rpool'
state=0
txg=269696
pool_guid=2934589927925685355
hostid=12798311
hostname='keeper'
top_guid=9161928630964440615
guid=9161928630964440615
vdev_tree

Re: [zfs-discuss] Raid-Z Issue

2009-09-12 Thread Henrik Johansson



On Sep 11, 2009, at 10:41 PM, Frank Middleton wrote:


On 09/11/09 03:20 PM, Brandon Mercer wrote:


They are so well known that simply by asking if you were using them
suggests that they suck.  :)  There are actually pretty hit or miss
issues with all 1.5TB drives but that particular manufacturer has had
a few more than others.


FWIW I have a few of them in mirrored pools and they have been
working flawlessly for several months now with LSI controllers.
The workload is bursty - mostly MDA driven code generation and
compilation of  1M KLoC applications and they work well enough
for that. Also by now probably a PetaByte of zfs send/recvs and
many scrubs, never a timeout and never a checksum error. They
are all rev CC1H. So your mileage may vary, as they say...


I'we also been running three of them with SD17 in a raidz for about a  
year without any problems at all.


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] b122 and fake checksum errors

2009-09-10 Thread Henrik Johansson


Hello Brian,

On Sep 10, 2009, at 9:21 PM, Brian Hechinger wrote:

I've hit google and it looks like this is still an issue in b122.   
Does this
look like it will be fixed any time soon?  If so, what build will it  
be fixed

in and is there an ETA for the build to be released?



Adam has integrated the fix so if everything goes as planed it will be  
part of snv_124 which probably  is about a month away. I'm running  
with the fix and so far it looks good.


http://hg.genunix.org/onnv-gate.hg/rev/c383b4d6980f

Regards

Henrik
http://sparcv9.blogspot.com___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] This is the scrub that never ends...

2009-09-07 Thread Henrik Johansson


Hello Will,

On Sep 7, 2009, at 3:42 PM, Will Murnane wrote:



What can cause this kind of behavior, and how can I make my pool
finish scrubbing?



No idea what is causing this but did you try to stop the scrub? If so  
what happened? (Might not be a good idea since this is not a normal  
state?) What release of OpenSolaris are you running?


Maybe this could be of interest, but it is a duplicate and it should  
have been fixed in snv_110: running zpool scrub twice hangs the scrub


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-09-02 Thread Henrik Johansson


Hi Adam,


On Sep 2, 2009, at 1:54 AM, Adam Leventhal wrote:


Hi James,

After investigating this problem a bit I'd suggest avoiding  
deploying RAID-Z
until this issue is resolved. I anticipate having it fixed in build  
124.


For those of us which have already upgraded and written data to our  
raidz pools, are there any risks of inconsistency, wrong checksums in  
the pool? Is there a bug id?


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-09-02 Thread Henrik Johansson


Hello all,

I have backed down to snv_117, when scrubbing this pool i got my first  
checksum errors ever on any build except snv_121. I wonder if this is  
a coincidence or if bad checksums have been generated by snv_121?


So i have been running for 10 months without any checksum errors, i  
installed snv_121 and got plenty of them, now i also get them after  
backing to snv_117. I will check my hardware after the scrub is  
completed.


Someone asked what hardware we where using, I am have a Asus M3N78-VM  
(nforce 8200) with ECC protected memory (And I think HT uses CRC?) the  
pool is a 3 disk raidz.


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Expanding a raidz pool?

2009-09-02 Thread Henrik Johansson


On Sep 2, 2009, at 7:14 PM, rarok wrote:

I'm just a casual at ZFS but you want something that now don't  
exists. The most of the consumers want this but Sun is not  
interested in that market. To grow a existing RAIDZ just adding more  
disk to the RAIDZ would be great but at this moment there isn't  
anything like that.


I would change customers to users, many people who use ZFS for  
their home server would like this, but they are often not customers.


Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-08-25 Thread Henrik Johansson


Hello,

On 25 aug 2009, at 14.29, Gary Gendel g...@genashor.com wrote:

I have a 5-500GB disk Raid-Z pool that has been producing checksum  
errors right after upgrading SXCE to build 121.  They seem to be  
randomly occurring on all 5 disks, so it doesn't look like a disk  
failure situation.


Repeatingly running a scrub on the pools randomly repairs between 20  
and a few hundred checksum errors.


Since I hadn't physically touched the machine, it seems a very  
strong coincidence that it started right after I upgraded to 121.


I had my first checksum errors in almost  a year yesterday after  
upgrading to snv_121 on my filer. I blamed an esata device that was  
not part of the pool. I will do some testing tonight and see if I  
still get errors.


The machine that got the errors has a Asus M3N78-VM MB (GF8200).

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Shrinking a zpool?

2009-08-06 Thread Henrik Johansson



On 6 aug 2009, at 23.52, Bob Friesenhahn  
bfrie...@simple.dallas.tx.us wrote:
I still have not seen any formal announcement from Sun regarding  
deduplication.  Everything has been based on remarks from code  
developers.




To be fair, the official what's new document for 2009.06 states that  
dedup will be part of the next OSOL release in 2010. Or at least that  
we should look out for it ;)
We're already looking forward to the next release due in 2010. Look  
out for great new features like an interactive installation for SPARC,  
the ability to install packages directly from the repository during  
the install, offline IPS support, a new version of the GNOME desktop,  
ZFS deduplication and user quotas, cloud integration and plenty more!  
As always, you can follow active development by adding the dev/  
repository.



Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] No files but pool is full?

2009-07-24 Thread Henrik Johansson


On 24 jul 2009, at 09.33, Markus Kovero markus.kov...@nebula.fi wrote:

During our tests we noticed very disturbing behavior, what would be  
causing this?


System is running latest stable opensolaris.

Any other means to remove ghost files rather than destroying pool  
and restoring from backups?



This looks like bug i filed a while ago, CR 6792701 removing large  
holey files does bot free space.


The only solution I found  to clean the pool when isolating the bug  
was to recreate it. The fix was integrated inbuild post OSOL 2009.06.


Mkfile of a certain size will trigger this.

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] prstat -Z and load average values in different zones give same numeric results

2009-04-23 Thread Henrik Johansson


Hello Nobel,

On Apr 23, 2009, at 1:53 AM, Nobel Shelby wrote:


Folks,
Perplexing question about load average display with prstat -Z
Solaris 10 OS U4 (08/07)
We have 4 zones with very different processes and workloads..
The prstat  -Z command issued within each of the zones, correctly  
displays
the number of processes and lwps, but the load average value looks  
exactly the
same on all non-global zones..I mean all 3 values (1,5,15 load  
averages) are the same

which is quasi impossible given the different workloads..
Is there a bug here?
Thanks,


No, this is correct, unless you have defined resource pools with  
separate processor sets on the system all zones will share the same  
cpu resources as thus all have the same load average.


If you bind a zone to a pool you will see the load average of the pool  
from inside the zone, it can also be observed from the global zone  
with poolstat(1M).


Hope this helps.

Regards

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs-crypto on OpenSolaris 2009.06?

2009-03-05 Thread Henrik Johansson

So onnv_111 is no longer the target for crypto integration since that  
build is supposed to be included in osol 2009.06?


Regards
Henrik

On 5 mar 2009, at 11.06, Darren J Moffat darr...@opensolaris.org  
wrote:



Luca Morettoni wrote:
A lot of people ask me about crypto layer over ZFS and the future  
integration in OpenSolaris (I read around snv_111), it may be ready  
for the next stable release (2009.06)?


See:

http://opensolaris.org/os/project/zfs-crypto/

No it won't be in 2009.06.  To be in 2009.06 it would have to be  
finished by now and it is not.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Two zvol devices one volume?

2009-02-16 Thread Henrik Johansson


Thanks for the info Dave, I filed a bug on this: 6805659.

Regards
Henrik
On Feb 13, 2009, at 1:30 AM, Dave wrote:




Henrik Johansson wrote:
I tried to export the zpool also, and I got this, the strange part  
is that it sometimes still thinks that the ubuntu-01-dsk01 dataset  
exists:

# zpool export zpool01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist
cannot unmount '/zpool01/dump': Device busy
But:
# zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist
Regards


I have seen this 'phantom dataset' with a pool on nv93. I created a  
zpool, created a dataset, then destroyed the zpool. When creating a  
new zpool on the same partitions/disks as the destroyed zpool, upon  
export I receive the same message as you describe above, even though  
I never created the dataset in the new pool.


Creating a dataset of the same name and then destroying it doesn't  
seem to get rid of it, either.


I never did remember to file a bug for it...


Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on SAN?

2009-02-16 Thread Henrik Johansson


Hi all,

Ok, this might be to stir some things up again but I would like to  
make this more clear.


I have been reading this and other threads regarding ZFS on SAN and  
how well ZFS can recover from a serious error such as a cached disk  
array goes down or the connection to the SAN is lost. What I am  
hearing (Miles, ZFS-8000-72) is that sometimes you can end up in an  
unrecoverable state that forces you to restore the whole pool. I have  
been operating quite large deployments of SVM/UFS VxFS/VxVM for some  
years and while you sometimes are forced to do a filesystem check and  
some files might end up in lost+found I have never lost a whole  
filesystem. This is despite whole arrays crashing, split-brain  
scenarios etc. In the previous discussion a lot of fingers was pointed  
at hardware and USB connections, but then some people mentioned  
loosing pools located SAN in this thread.


We are currently evaluating if we should begin to implement ZFS in our  
SAN. I can see great opportunities with ZFS but if we have a higher  
risk of loosing entire pools that is a serious issue. I am aware that  
the other filesystems might not be in a correct state after a serious  
failure, but as stated before  that can be much better than restoring   
a multi terabyte filesystem from yesterdays backup.


So, what is the opinion, is this an existing problem even when using  
enterprise arrays? If I understand this correctly there should be no  
risk of loosing an entire pool if DKIOCFLUSHWRITECACHE is honored by  
the array?


If it is a problem, will the worst case scenario be at least on pair  
with UFS/VxFS when 6667683 is fixed?


Grateful for any additional information.

Regards

Henrik Johansson
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Two zvol devices one volume?

2009-02-12 Thread Henrik Johansson


Hi,

Can anyone explain the following to me?

Two zpool devices points at the same data, I was installing osol  
2008.11 in xVM when I saw that there already was a partition on the  
installation disk. An old dataset that I deleted since i gave it a  
slightly different name than I intended is not removed under /dev. I  
should not have used that name, but two device links should perhaps  
not point to the same device ether


zfs list |grep xvm/dsk
zpool01/xvm/dsk 25.0G  2.63T  24.0K  /zpool01/xvm/dsk
zpool01/xvm/dsk/osol01-dsk01  10G  2.64T  2.53G  -
zpool01/xvm/dsk/ubuntu01-dsk0110G  2.64T  21.3K  -

# ls -l /dev/zvol/dsk/zpool01/xvm/dsk
total 3
lrwxrwxrwx   1 root root  41 Feb 10 18:19 osol01-dsk01 - 
 ../../../../../../devices/pseudo/z...@0:4c
lrwxrwxrwx   1 root root  41 Feb 10 18:14 ubuntu-01-dsk01 - 
 ../../../../../../devices/pseudo/z...@0:4c
lrwxrwxrwx   1 root root  41 Feb 10 18:19 ubuntu01-dsk01 - 
 ../../../../../../devices/pseudo/z...@0:5c


# zpool history |grep xvm
2009-02-08.22:42:12 zfs create zpool01/xvm
2009-02-08.22:42:23 zfs create zpool01/xvm/media
2009-02-08.22:42:45 zfs create zpool01/xvm/dsk
2009-02-10.18:14:41 zfs create -V 10G zpool01/xvm/dsk/ubuntu-01-dsk01
2009-02-10.18:15:10 zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01
2009-02-10.18:15:21 zfs create -V 10G zpool01/xvm/dsk/ubuntu01-dsk01
2009-02-10.18:15:33 zfs create -V 10G zpool01/xvm/dsk/osol01-dsk01

# uname -a
SunOS ollespappa 5.11 snv_107 i86pc i386 i86xpv

While I am writing, is there any known issues with sharemgr and zfs in  
this release? svc:/network/shares/group:zfs hangs when going down  
since sharemgr stop zfs never returns...


Thanks

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Two zvol devices one volume?

2009-02-12 Thread Henrik Johansson

I tried to export the zpool also, and I got this, the strange part is  
that it sometimes still thinks that the ubuntu-01-dsk01 dataset exists:


# zpool export zpool01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist
cannot unmount '/zpool01/dump': Device busy

But:
# zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist

Regards


On Feb 12, 2009, at 11:51 PM, Henrik Johansson wrote:


Hi,

Can anyone explain the following to me?

Two zpool devices points at the same data, I was installing osol  
2008.11 in xVM when I saw that there already was a partition on the  
installation disk. An old dataset that I deleted since i gave it a  
slightly different name than I intended is not removed under /dev. I  
should not have used that name, but two device links should perhaps  
not point to the same device ether


zfs list |grep xvm/dsk
zpool01/xvm/dsk 25.0G  2.63T  24.0K  /zpool01/xvm/dsk
zpool01/xvm/dsk/osol01-dsk01  10G  2.64T  2.53G  -
zpool01/xvm/dsk/ubuntu01-dsk0110G  2.64T  21.3K  -

# ls -l /dev/zvol/dsk/zpool01/xvm/dsk
total 3
lrwxrwxrwx   1 root root  41 Feb 10 18:19 osol01-dsk01 - 
 ../../../../../../devices/pseudo/z...@0:4c
lrwxrwxrwx   1 root root  41 Feb 10 18:14 ubuntu-01- 
dsk01 - ../../../../../../devices/pseudo/z...@0:4c
lrwxrwxrwx   1 root root  41 Feb 10 18:19 ubuntu01-dsk01  
- ../../../../../../devices/pseudo/z...@0:5c


# zpool history |grep xvm
2009-02-08.22:42:12 zfs create zpool01/xvm
2009-02-08.22:42:23 zfs create zpool01/xvm/media
2009-02-08.22:42:45 zfs create zpool01/xvm/dsk
2009-02-10.18:14:41 zfs create -V 10G zpool01/xvm/dsk/ubuntu-01-dsk01
2009-02-10.18:15:10 zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01
2009-02-10.18:15:21 zfs create -V 10G zpool01/xvm/dsk/ubuntu01-dsk01
2009-02-10.18:15:33 zfs create -V 10G zpool01/xvm/dsk/osol01-dsk01

# uname -a
SunOS ollespappa 5.11 snv_107 i86pc i386 i86xpv

While I am writing, is there any known issues with sharemgr and zfs  
in this release? svc:/network/shares/group:zfs hangs when going down  
since sharemgr stop zfs never returns...


Thanks

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Henrik Johansson


On Nov 16, 2008, at 11:23 AM, Vincent Boisard wrote:

 I just found this: http://www.sun.com/software/solaris/whats_new.jsp

 I lists Solaris 10 features and is a first hint at what features are  
 in.

 Another question: My MoBo has a JMB (363 I think) SATA controller. I  
 know support is included now in sxce, but I don't know for s10U6.

 Is there a changelog for S10U6 somewhere like for SXCE ?

Have a look at the bugids in the patches for S10U6, like the kernel  
patch 137137-09. There are lists of all the new patches in the  
documentation for the release at docs.sun.com.

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lost space in empty pool (no snapshots)

2008-11-15 Thread Henrik Johansson

:64652d400:400 DVA[1]=0:c80032dc00:400 fletcher4  
lzjb LE con
tiguous birth=28 fill=5  
cksum=ae2a07052:491babbaac1:f96aca21cd61:2405acfae00513

Object  lvl   iblk   dblk  lsize  asize  type
 0716K16K16K  20.0K  DMU dnode

Object  lvl   iblk   dblk  lsize  asize  type
 1116K512512  1.50K  ZFS master node
microzap: 512 bytes, 3 entries

ROOT = 3
DELETE_QUEUE = 2
VERSION = 3

Object  lvl   iblk   dblk  lsize  asize  type
 2116K512512  1.50K  ZFS delete queue
microzap: 512 bytes, 1 entries

5 = 5

Object  lvl   iblk   dblk  lsize  asize  type
 3116K512512  1.50K  ZFS directory
 264  bonus  ZFS znode
path/
uid 0
gid 0
atime   Sun Nov 16 01:11:53 2008
mtime   Sun Nov 16 01:14:18 2008
ctime   Sun Nov 16 01:14:18 2008
crtime  Sun Nov 16 01:11:53 2008
gen 4
mode40755
size2
parent  3
links   2
xattr   0
rdev0x
microzap: 512 bytes, 0 entries


Object  lvl   iblk   dblk  lsize  asize  type
 5516K   128K   750G  12.2G  ZFS plain file
 264  bonus  ZFS znode
path???object#5
uid 0
gid 0
atime   Sun Nov 16 01:13:12 2008
mtime   Sun Nov 16 01:14:07 2008
ctime   Sun Nov 16 01:14:07 2008
crtime  Sun Nov 16 01:13:12 2008
gen 16
mode100600
size805306368000
parent  3
links   0
xattr   0
rdev0x

Henrik Johansson
http://sparcv9.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OpenStorage GUI

2008-11-13 Thread Henrik Johansson



On 13 nov 2008, at 15.15, Darren J Moffat [EMAIL PROTECTED]  
wrote:


 I believe the issue is that VirtualBox doesn't understand the multi- 
 file
 format VMDK files that are used for the boot disk (Sun Storage
 VMware*.vmdk).   I believe from googling that this could be fixed if  
 you
 have access to VMware server by combining them back into a single vmdk
 file - I don't have easy access to Vmware server so I can't try this.

 --

It's even possible to transfer the image to bare metal, the same  
procedure must be usable to move the image to other virtualization  
software, once bootet it's only ordinary block devices...

Regards
Henrik Johansson
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Lost space in empty pool (no snapshots)

2008-11-10 Thread Henrik Johansson


On Nov 11, 2008, at 1:56 AM, Victor Latushkin wrote:

 Henrik Johansson wrote:
 Hello,
 I have a snv101 machine with a three disk raidz pool which  
 allocation  of about 1TB with for no obvious reason, no snapshot,  
 no files,  nothing. I tried to run zdb on the pool to see if I got  
 any useful  info, but it has been working for over two hours  
 without any more  output.
 I know when the allocation occurred, I issued a mkfile 1024G  
 command  in the background, but changed my mind and killed the  
 process, after  that the 912G was missing (don't remember if I  
 actually removed the  test file or what happened). If I copy a file  
 to the /tank filesystem  it uses even more space, but that space is  
 reclaimed after I remove  the file.
 I could recreate the pool, it is empty, but I created it to test  
 the  system in the first place so I would like to know what's going  
 on. I  have tried to export and import the pool, but it stays the  
 same.
 Any ideas?

 You can try to increase zdb verbosity by adding some -v swiches.  
 Also try dumping all the objects with 'zdb - tank' (add even  
 more 'd' for   extra verbosity).

Ah, that did provide some more output, I can see the reserved space is  
indeed meant for the file I created earlier:

Dataset tank [ZPL], ID 16, cr_txg 1, 912G, 5 objects

 Object  lvl   iblk   dblk  lsize  asize  type
  0716K16K16K  20.0K  DMU dnode
  1116K512512  1.50K  ZFS master node
  2116K512512  1.50K  ZFS delete queue
  3116K512512  1.50K  ZFS directory
  6516K   128K 1T   912G  ZFS plain file

cut

 Object  lvl  iblk   dblk  lsize  asize  type
 6516K   128K 1T   912G  ZFS plain file
  264  bonus  ZFS znode
path???object#6
uid 0
gid 0
atime   Sun Nov  9 20:12:30 2008
mtime   Sun Nov  9 21:50:10 2008
ctime   Sun Nov  9 21:50:10 2008
crtime  Sun Nov  9 20:12:30 2008
gen 69
 mode   100600
 size   1099511627776
 parent 3
 links  0
 xattr  0
 rdev   0x

[deferred free] [L0 SPA space map] 1000L/200P  
DVA[0]=0:70ce8a5400:400 DVA[1]=
0:1f800421c00:400 DVA[2]=0:36800067c00:400 fletcher4 lzjb LE  
contiguous birth
=2259 fill=0 cksum=0:0:0:0
[deferred free] [L0 SPA space map] 1000L/400P  
DVA[0]=0:70ce8a6000:800 DVA[1]=
0:1f800422800:800 DVA[2]=0:36800061800:800 fletcher4 lzjb LE  
contiguous birth
=2259 fill=0 cksum=0:0:0:0
[deferred free] [L0 SPA space map] 1000L/200P  
DVA[0]=0:70ce8a8c00:400 DVA[1]=
0:1f800423c00:400 DVA[2]=0:36800069c00:400 fletcher4 lzjb LE  
contiguous birth
=2259 fill=0 cksum=0:0:0:0
[deferred free] [L0 DMU dnode] 4000L/800P DVA[0]=0:70ce8a8000:c00  
DVA[1]=0:1f
800423000:c00 DVA[2]=0:36800069000:c00 fletcher4 lzjb LE contiguous  
birth=225
9 fill=0 cksum=0:0:0:0
[deferred free] [L0 DMU dnode] 4000L/a00P DVA[0]=0:70ce8a7000:1000  
DVA[1]=0:1
f800295000:1000 DVA[2]=0:36800068000:1000 fletcher4 lzjb LE  
contiguous birth=
2259 fill=0 cksum=0:0:0:0
objset 0 object 0 offset 0x0 [L0 DMU objset] 400L/200P  
DVA[0]=0:70ce8ad800:400
  DVA[1]=0:1f800429000:400 DVA[2]=0:3680006e800:400 fletcher4 lzjb  
LE contigu
ous birth=2260 fill=74 cksum=1309351a7b:687cd8ec06d: 
12b694ebbc4e8:253a3515eb9248
objset 0 object 0 offset 0x0 [L0 DMU dnode] 4000L/c00P  
DVA[0]=0:70ce8ac400:1400
  DVA[1]=0:1f800427c00:1400 DVA[2]=0:3680006d400:1400 fletcher4  
lzjb LE cont
iguous birth=2260 fill=27 cksum=bbcf0aa9db:13ea5e4dc8e7d: 
1425e68263d46ff:f14c2da
e18c61e93

cut

objset 16 object 6 offset 0x12f73c [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c749c:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
objset 16 object 6 offset 0x12f73e [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c749f:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
objset 16 object 6 offset 0x12f740 [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c74a2:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
objset 16 object 6 offset 0x12f742 [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c74a5:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
objset 16 object 6 offset 0x12f744 [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c74a8:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
objset 16 object 6 offset 0x12f746 [L0 ZFS plain file] 2L/ 
2P DVA[0]=0:c74ab:3 fletcher2 uncompressed LE contiguous  
birth=164 fill=1 cksum=0:0:0:0
continue for more than 100MB of output

But, why has this happened, is it any known issue?

Regards

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo

[zfs-discuss] Fully supported 12-port SATA cards?

2008-11-09 Thread Henrik Johansson

I am helping a friend who is build a storage server for his company  
and I have advocated for Solaris and ZFS.

But now we are having trouble to find  any good SATA controller for  
12+ disks, there are Areca cards, but they seems very flaky (24 port  
support, but hangs with more than 12 in JBOD mode, hangs on disk  
failures etc).

The AOC-SAT2-MV8,AOC-USAS-L8i  have been mentioned, but it had  
problems with device numbering and/or as I understood it no support  
for hot plugging.

Isn't  there any supported PCI.* SATA card with at least 12 ports that  
do work and have a real driver supporting hot plugging/cfgadm- 
operations?

The HCL does not tell you everything, and scanning the lists did not  
show any consensus regarding this. This seems to have been a problem  
for years, I tried to find a card for myself a while back.

Henrik Johansson
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Disks errors not shown by zpool?

2008-07-08 Thread Henrik Johansson

Ok, this is not a OpenSolaris question, but it is a Solaris and ZFS  
question.

I have a pool with three mirrored vdevs. I just got an error message  
from FMD that read failed from one on the disks,(c1t6d0). All with  
instructions on how to handle the problem and replace the devices, so  
far everything is good. But the zpool still thinks everything is fine.  
Shouldn't zpool also show errors in this state?

This was run on S10U4 with 127127-11.

# zpool status -x
all pools are healthy

# zpool status
   pool: storage
  state: ONLINE
  scrub: scrub completed with 0 errors on Sun Jun 29 23:16:34 2008
config:

 NAMESTATE READ WRITE CKSUM
 storage ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c1t1d0  ONLINE   0 0 0
 c1t2d0  ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c1t3d0  ONLINE   0 0 0
 c1t4d0  ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c1t5d0  ONLINE   0 0 0
 c1t6d0  ONLINE   0 0 0

errors: No known data errors

# fmdump -v
TIME UUID SUNW-MSG-ID
Jul 08 20:14:42.6951 3780a675-96ea-6fa4-bd55-cb078a539f08 ZFS-8000-D3
   100%  fault.fs.zfs.device

 Problem in: zfs://pool=storage/vdev=83de319aad25c131
Affects: zfs://pool=storage/vdev=83de319aad25c131
FRU: -
   Location: -

 From my message log:
Jul  8 20:11:53 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:11:53 fortressSCSI transport failed: reason  
'incomplete': retrying command
Jul  8 20:12:56 fortress scsi: [ID 365881 kern.info] fas:   6.0:  
cdb=[ 0xa 0x0 0x1 0xda 0x2 0x0 ]
Jul  8 20:12:56 fortress scsi: [ID 365881 kern.info] fas:   6.0:  
cdb=[ 0xa 0x0 0x3 0xda 0x2 0x0 ]
Jul  8 20:12:56 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880 (fas1):
Jul  8 20:12:56 fortressDisconnected tagged cmd(s) (2) timeout  
for Target 6.0
Jul  8 20:12:56 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:12:56 fortressSCSI transport failed: reason  
'timeout': retrying command
Jul  8 20:12:56 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:12:56 fortressSCSI transport failed: reason 'reset':  
retrying command
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:12:59 fortressError for Command:  
write(10)   Error Level: Retryable
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Requested  
Block: 17672154  Error Block: 17672154
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Vendor:  
SEAGATESerial Number: 9946626576
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Sense Key:  
Unit Attention
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  ASC: 0x29  
(power on occurred), ASCQ: 0x1, FRU: 0x1
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:12:59 fortressError for Command:  
write(10)   Error Level: Retryable
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Requested  
Block: 17672154  Error Block: 17672154
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Vendor:  
SEAGATESerial Number: 9946626576
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  Sense Key: Not  
Ready
Jul  8 20:12:59 fortress scsi: [ID 107833 kern.notice]  ASC: 0x4 (LUN  
is becoming ready), ASCQ: 0x1, FRU: 0x2
Jul  8 20:13:04 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:13:04 fortressError for Command:  
write(10)   Error Level: Retryable
Jul  8 20:13:04 fortress scsi: [ID 107833 kern.notice]  Requested  
Block: 17672154  Error Block: 17672154
Jul  8 20:13:04 fortress scsi: [ID 107833 kern.notice]  Vendor:  
SEAGATESerial Number: 9946626576
Jul  8 20:13:04 fortress scsi: [ID 107833 kern.notice]  Sense Key: Not  
Ready
Jul  8 20:13:04 fortress scsi: [ID 107833 kern.notice]  ASC: 0x4 (LUN  
is becoming ready), ASCQ: 0x1, FRU: 0x2
Jul  8 20:13:09 fortress scsi: [ID 107833 kern.warning] WARNING: / 
[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0 (sd19):
Jul  8 20:13:09 fortressError for Command:  
write(10)   Error Level:

Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-04 Thread Henrik Johansson

On Jun 5, 2008, at 12:05 AM, Rich Teer wrote:

 On Wed, 4 Jun 2008, Henrik Johansson wrote:

 Anyone knows what the deal with /export/home is? I though /home was
 the default home directory in Solaris?

 Nope, /export/home has always been the *physical* location for
 users' home directories.  They're usually automounted under /home,
 though.

You are right, it was my own old habit of creating my own physical  
directory under /home for stand alone machines that got me confused.  
But filesystem(5) says: /home Default root of a subtree for user  
directories. and useradd:s base_dir defaults to /home, where it tries  
to create a directory if used only with the -m flag.

I know this doesn't  work when the automounter is available, but it  
can be disabled or reconfigured. When i think about it /export/home  
has been created earlier also, with UFS.

It's fun how old things can get one confused in a new context ;)

Regards
Henrik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

59 matches

Mail list logo