from:"Tomas Ögren"

Re: [zfs-discuss] Disable ZIL - persistent

2011-08-05 Thread Tomas Ögren

On 05 August, 2011 - Darren J Moffat sent me these 0,9K bytes:

 On 08/05/11 13:11, Edward Ned Harvey wrote:
 After a certain rev, I know you can set the sync property, and it
 takes effect immediately, and it's persistent across reboots. But that
 doesn't apply to Solaris 10.

 My question: Is there any way to make Disabled ZIL a normal mode of
 operations in solaris 10? Particularly:

 If I do this echo zil_disable/W0t1 | mdb -kw then I have to remount
 the filesystem. It's kind of difficult to do this automatically at boot
 time, and impossible (as far as I know) for rpool. The only solution I
 see is to write some startup script which applies it to filesystems
 other than rpool. Which feels kludgy. Is there a better way?

 echo set zfs:zil_disable = 1  /etc/system

Or use  if you don't want to zap /etc/system..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Tomas Ögren

On 25 July, 2011 - Erik Trimble sent me these 2,0K bytes:

 On 7/25/2011 3:32 AM, Orvar Korvar wrote:
 How long have you been using a SSD? Do you see any performance decrease? I 
 mean, ZFS does not support TRIM, so I wonder about long term effects...

 Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no  
 impact whatsoever.

 TRIM is primarily useful for low-volume changes - that is, for a  
 filesystem that generally has few deletes over time (i.e. rate of change  
 is low).

 Using a SSD as a ZIL or L2ARC device puts a very high write load on the  
 device (even as an L2ARC, there is a considerably higher write load than  
 a typical filesystem use).   SSDs in such a configuration can't really  
 make use of TRIM, and depend on the internal SSD controller block  
 re-allocation algorithms to improve block layout.

 Now, if you're using the SSD as primary media (i.e. in place of a Hard  
 Drive), there is a possibility that TRIM could help.  I honestly can't  
 be sure that it would help, however, as ZFS's Copy-on-Write nature means  
 that it tends to write entire pages of blocks, rather than just small  
 blocks. Which is fine from the SSD's standpoint.

You still need the flash erase cycle.

 On a related note:  I've been using a OCZ Vertex 2 as my primary drive  
 in a laptop, which runs Windows XP (no TRIM support). I haven't noticed  
 any dropoff in performance in the year its be in service.  I'm doing  
 typical productivity laptop-ish things (no compiling, etc.), so it  
 appears that the internal SSD controller is more than smart enough to  
 compensate even without TRIM.


 Honestly, I think TRIM isn't really useful for anyone.  It took too long  
 to get pushed out to the OSes, and the SSD vendors seem to have just  
 compensated by making a smarter controller able to do better  
 reallocation.  Which, to me, is the better ideal, in any case.

Bullshit. I just got a OCZ Vertex 3, and the first fill was 450-500MB/s.
Second and sequent fills are at half that speed. I'm quite confident
that it's due to the flash erase cycle that's needed, and if stuff can
be TRIM:ed (and thus flash erased as well), speed would be regained.
Overwriting an previously used block requires a flash erase, and if that
can be done in the background when the timing is not critical instead of
just before you can actually write the block you want, performance will
increase.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] monitoring ops

2011-06-28 Thread Tomas Ögren



Matt Harrison iwasinnamuk...@genestate.com wrote:

Hi list,

I want to monitor the read and write ops/bandwidth for a couple of
pools 
and I'm not quite sure how to proceed. I'm using rrdtool so I either 
want an accumulated counter or a gauge.

According to the ZFS admin guide, running zpool iostat without any 
parameters should show the activity since boot. On my system (OSOL

Average activity since boot...

snv_133) it's only showing ops in the single digits for a system with a

months uptime and many GB of transfers.

So, is there a way to get this output correctly, or is there a better 
way to do this?

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zpool with data errors

2011-06-21 Thread Tomas Ögren

On 21 June, 2011 - Todd Urie sent me these 5,9K bytes:

I have a zpool that shows the following from a zpool status -v zpool name

brsnnfs0104 [/var/spool/cron/scripts]# zpool status -v ABC0101
pool:ABC0101
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
ABC0101 ONLINE 0 010
/dev/vx/dsk/ABC01dg/ABC0101_01 ONLINE 0 0 2
/dev/vx/dsk/ABC01dg/ABC0101_02 ONLINE 0 0 8
/dev/vx/dsk/ABC01dg/ABC0101_03 ONLINE 0 010

errors: Permanent errors have been detected in the following files:

/clients/ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/717b52282ea059452621587173561360
/clients/
ABC0101/rep/local/bfm/web/htdocs/tmp/rscache/6e6a9f37c4d13fdb3dcb8649272a2a49
/clients/ABC0101/rep/d0/prod1/reports/ReutersCMOLoad/ReutersCMOLoad.
ABCntss001.20110620.141330.26496.ROLLBACK_FOR_UPDATE_COUPONS.html
/clients/
ABC0101/rep/local/bfm/web/htdocs/tmp/G2_0.related_detail_loader.1308593666.54643.n5cpoli3355.data
/clients/
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/F_OLPO82_A.gp.
ABCIM_GA.nlaf.xml.gz
/clients/
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNVLXCIAFI.gp.
ABCIM_GA.nlaf.xml.gz
/clients/
ABC0101/rep/d0/prod1/reports/gp_reports/ALLMNG/20110429/UNIVLEXCIA.gp.BARCRATING_
ABC.nlaf.xml.gz

I think that a scrub at least has the possibility to clear this up. A quick
search suggests that others have had some good experience with using scrub
in similar circumstances. I was wondering if anyone could share some of
their experiences, good and bad, so that I can assess the risk and
probability of success with this approach. Also, any other ideas would
certainly be appreciated.

As you have no ZFS based redundancy, it can only detect that some blocks
delivered from the devices (SAN I guess?) were broken according to the
checksum. If you had raidz/mirror in zfs, it would have corrected the
problems and written back correct data to the malfunctioning device. Now
it does not. A scrub only reads the data and verifies that data matches
checksums.

/Tomas
--
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-21 Thread Tomas Ögren

On 21 June, 2011 - Nomen Nescio sent me these 0,4K bytes:

 Hello Marty! 
 
  With four drives you could also make a RAIDZ3 set, allowing you to have
  the lowest usable space, poorest performance and worst resilver times
  possible.
 
 That's not funny. I was actually considering this :p

4-way mirror would be way more useful.

 But you have to admit, it would probably be somewhat reliable!

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Tomas Ögren

On 08 June, 2011 - Donald Stahl sent me these 0,6K bytes:

  One day, the write performance of zfs degrade.
  The write performance decrease from 60MB/s to about 6MB/s in sequence
  write.
 
  Command:
  date;dd if=/dev/zero of=block bs=1024*128 count=1;date
 
 See this thread:
 
 http://www.opensolaris.org/jive/thread.jspa?threadID=139317tstart=45
 
 And search in the page for:
 metaslab_min_alloc_size
 
 Try adjusting the metaslab size and see if it fixes your performance problem.

And if pool usage is 90%, then there's another problem (change of
finding free space algorithm).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RealSSD C300 - Crucial CT064M4SSD2

2011-06-08 Thread Tomas Ögren

On 08 June, 2011 - Eugen Leitl sent me these 0,5K bytes:

 
 Anyone running a Crucial CT064M4SSD2? Any good, or should
 I try getting a RealSSD C300, as long as these are still 
 available?

Haven't tried any of those, but how about one of these:

OCZ Vertex3 (Sandforce SF-2281, sataIII, MLC, to be used for l2arc):
shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F98CEFE5Dd0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 2.21005 s, 486 MB/s

OCZ Vertex2 EX (Sandforce SF-1500, sataII, SLC and supercap, to be used for zil)
shazoo:~# gdd if=/dev/rdsk/c0t5E83A97F1471E0A4d0s0 of=/dev/null bs=1024k 
count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.93114 s, 273 MB/s

This is in a x4170m2 with Solaris10.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SATA disk perf question

2011-06-01 Thread Tomas Ögren

On 01 June, 2011 - Paul Kraus sent me these 0,9K bytes:

 I figure this group will know better than any other I have contact
 with, is 700-800 I/Ops reasonable for a 7200 RPM SATA drive (1 TB Sun
 badged Seagate ST31000N in a J4400) ? I have a resilver running and am
 seeing about 700-800 writes/sec. on the hot spare as it resilvers.
 There is no other I/O activity on this box, as this is a remote
 replication target for production data. I have a the replication
 disabled until the resilver completes.

700-800 seq ones perhaps.. for random, you can divide by 10.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] changing vdev types

2011-06-01 Thread Tomas Ögren

On 01 June, 2011 - Eric Sproul sent me these 0,8K bytes:

 On Wed, Jun 1, 2011 at 2:54 PM, Matt Harrison
 iwasinnamuk...@genestate.com wrote:
  Hi list,
 
  I've got a pool thats got a single raidz1 vdev. I've just some more disks in
  and I want to replace that raidz1 with a three-way mirror. I was thinking
  I'd just make a new pool and copy everything across, but then of course I've
  got to deal with the name change.
 
  Basically, what is the most efficient way to migrate the pool to a
  completely different vdev?
 
 Since you can't mix vdev types in a single pool, you'll have to create
 a new pool.  But you can use zfs send/recv to move the datasets, so

You can mix as much as you want to, but you can't remove a vdev (yet).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Experiences with 10.000+ filesystems

2011-05-31 Thread Tomas Ögren

On 31 May, 2011 - Khushil Dep sent me these 4,5K bytes:

 The adage that I adhere to with ZFS features is just because you can
 doesn't mean you should!. I would suspect that with that many
 filesystems the normal zfs-tools would also take an inordinate length
 of time to complete their operations - scale according to size.

I've done a not too scientific test on reboot times for Solaris 10 vs 11
with regard to many filesystems...

Quad Xeon machines with single raid10 and one boot environment. Using
more be's with LU in sol10 will make the situation even worse, as it's
LU that's taking time (re)mounting all filesystems over and over and
over and over again.
http://www8.cs.umu.se/~stric/tmp/zfs-many.png

As the picture shows, don't try 1 filesystems with nfs on sol10.
Creating more filesystems gets slower and slower the more you have as
well.

 Generally snapshots are quick operations but 10,000 such operations
 would I believe take enough to time to complete as to present
 operational issues - breaking these into sets would alleviate some?
 Perhaps if you are starting to run into many thousands of filesystems
 you would need to re-examin your rationale in creating so many.

On a different setup, we have about 750 datasets where we would like to
use a single recursive snapshot, but when doing that all file access
will be frozen for varying amounts of time (sometimes half an hour or
way more). Splitting it up into ~30 subsets, doing recursive snapshots
over those instead has decreased the total snapshot time greatly and cut
the frozen time down to single digit seconds instead of minutes or
hours.

 My 2c. YMMV.
 
 -- 
 Khush
 
 On Tuesday, 31 May 2011 at 11:08, Gertjan Oude Lohuis wrote:
 
  Filesystem are cheap is one of ZFS's mottos. I'm wondering how far
  this goes. Does anyone have experience with having more than 10.000 ZFS
  filesystems? I know that mounting this many filesystems during boot
  while take considerable time. Are there any other disadvantages that I
  should be aware of? Are zfs-tools still usable, like 'zfs list', 'zfs
  get/set'.
  Would I run into any problems when snapshots are taken (almost)
  simultaneously from multiple filesystems at once?
  
  Regards,
  Gertjan Oude Lohuis
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org (mailto:zfs-discuss@opensolaris.org)
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Experiences with 10.000+ filesystems

2011-05-31 Thread Tomas Ögren

On 31 May, 2011 - Gertjan Oude Lohuis sent me these 0,9K bytes:

 On 05/31/2011 03:52 PM, Tomas Ögren wrote:
 I've done a not too scientific test on reboot times for Solaris 10 vs 11
 with regard to many filesystems...


 http://www8.cs.umu.se/~stric/tmp/zfs-many.png

 As the picture shows, don't try 1 filesystems with nfs on sol10.
 Creating more filesystems gets slower and slower the more you have as
 well.


 Since all filesystem would be shared via NFS, this clearly is a nogo :).  
 Thanks!

 On a different setup, we have about 750 datasets where we would like to
 use a single recursive snapshot, but when doing that all file access
 will be frozen for varying amounts of time

 What version of ZFS are you using? Like Matthew Ahrens said: version 27  
 has a fix for this.

22, Solaris 10.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Monitoring disk seeks

2011-05-19 Thread Tomas Ögren

On 19 May, 2011 - Sa??o Kiselkov sent me these 0,6K bytes:

 Hi all,
 
 I'd like to ask whether there is a way to monitor disk seeks. I have an
 application where many concurrent readers (50) sequentially read a
 large dataset (10T) at a fairly low speed (8-10 Mbit/s). I can monitor
 read/write ops using iostat, but that doesn't tell me how contiguous the
 data is, i.e. when iostat reports 500 read ops, does that translate to
 500 seeks + 1 read per seek, or 50 seeks + 10 reads, etc? Thanks!

Get DTraceToolkit and check out the various things under Disk and FS,
might help.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] fuser vs. zfs

2011-05-10 Thread Tomas Ögren

On 23 November, 2005 - Benjamin Lewis sent me these 3,0K bytes:

 Hello,
 
 I'm running Solaris Express build 27a on an amd64 machine and
 fuser(1M) isn't behaving
 as I would expect for zfs filesystems.  Various google and
...
  #fuser -c /
  /:[lots of other PIDs] 20617tm [others] 20412cm [others]
  #fuser -c /opt
  /opt:
  #
 
 Nothing at all for /opt.  So it's safe to unmount? Nope:
...
 Has anyone else seen something like this?

Try something less ancient, Solaris 10u9 reports it just fine for
example. ZFS was pretty new-born when snv27 got out..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] fuser vs. zfs

2011-05-10 Thread Tomas Ögren

On 10 May, 2011 - Tomas Ögren sent me these 0,9K bytes:

 On 23 November, 2005 - Benjamin Lewis sent me these 3,0K bytes:
 
  Hello,
  
  I'm running Solaris Express build 27a on an amd64 machine and
  fuser(1M) isn't behaving
  as I would expect for zfs filesystems.  Various google and
 ...
   #fuser -c /
   /:[lots of other PIDs] 20617tm [others] 20412cm [others]
   #fuser -c /opt
   /opt:
   #
  
  Nothing at all for /opt.  So it's safe to unmount? Nope:
 ...
  Has anyone else seen something like this?
 
 Try something less ancient, Solaris 10u9 reports it just fine for
 example. ZFS was pretty new-born when snv27 got out..

And for someone who is able to read as well, that mail was from 2005 -
when snv27 actually was less ancient ;)

Seems like the moderator queue from yesteryears just got flushed..

Sorry for the noise from my side..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] primarycache=metadata seems to force behaviour of secondarycache=metadata

2011-05-09 Thread Tomas Ögren

On 09 May, 2011 - Richard Elling sent me these 5,0K bytes:

 of the pool -- not likely to be a winning combination. This isn't a problem 
 for the ARC because
 it has memory bandwidth, which is, of course, always greater than I/O 
 bandwidth.

Slightly off topic, but we had an IBM RS/6000 43P with a PowerPC 604e
cpu, which had about 60MB/s memory bandwidth (which is kind of bad for a
332MHz cpu) and its disks could do 70-80MB/s or so.. in some other
machine..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Tomas Ögren

On 06 May, 2011 - Erik Trimble sent me these 1,8K bytes:

 If dedup isn't enabled, snapshot and data deletion is very light on RAM  
 requirements, and generally won't need to do much (if any) disk I/O.   
 Such deletion should take milliseconds to a minute or so.

.. or hours. We've had problems on an old raidz2 that a recursive
snapshot creation over ~800 filesystems could take quite some time, up
until the sata-scsi disk box ate the pool. Now we're using raid10 on a
scsi box, and it takes 3-15 minute or so, during which sync writes (NFS)
are almost unusable. Using 2 fast usb sticks as l2arc, waiting for a
Vertex2EX and a Vertex3 to arrive for ZILL2ARC testing. IO to the
filesystems are quite low (50 writes, 500k data per sec average), but
snapshot times goes waay up during backups.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express

2011-04-27 Thread Tomas Ögren

On 27 April, 2011 - Matthew Anderson sent me these 3,2K bytes:

 Hi All,
 
 I've run into a massive performance problem after upgrading to Solaris 11 
 Express from oSol 134.
 
 Previously the server was performing a batch write every 10-15 seconds and 
 the client servers (connected via NFS and iSCSI) had very low wait times. Now 
 I'm seeing constant writes to the array with a very low throughput and high 
 wait times on the client servers. Zil is currently disabled. There is 
 currently one failed disk that is being replaced shortly.
 
 Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 
 134?
 I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during 
 install which is probably another issue entirely...
 
 IOstat output is below. When running iostat -v 2 that level is writes OP's 
 and throughput is very constant.
 
capacity operationsbandwidth
 poolalloc   free   read  write   read  write
 --  -  -  -  -  -  -
 MirrorPool  12.2T  4.11T153  4.63K  6.06M  33.6M
   mirror1.04T   325G 11416   400K  2.80M
 c7t0d0  -  -  5114   163K  2.80M
 c7t1d0  -  -  6114   237K  2.80M
   mirror1.04T   324G 10374   426K  2.79M
 c7t2d0  -  -  5108   190K  2.79M
 c7t3d0  -  -  5107   236K  2.79M
   mirror1.04T   324G 15425   537K  3.15M
 c7t4d0  -  -  7115   290K  3.15M
 c7t5d0  -  -  8116   247K  3.15M
   mirror1.04T   325G 13412   572K  3.00M
 c7t6d0  -  -  7115   313K  3.00M
 c7t7d0  -  -  6116   259K  3.00M
   mirror1.04T   324G 13381   580K  2.85M
 c7t8d0  -  -  7111   362K  2.85M
 c7t9d0  -  -  5111   219K  2.85M
   mirror1.04T   325G 15408   654K  3.10M
 c7t10d0  -  -  7122   336K  3.10M
 c7t11d0  -  -  7123   318K  3.10M
   mirror1.04T   325G 14461   681K  3.22M
 c7t12d0  -  -  8130   403K  3.22M
 c7t13d0  -  -  6132   278K  3.22M
   mirror 749G   643G  1279   140K  1.07M
 c4t14d0  -  -  0  0  0  0
 c7t15d0  -  -  1 83   140K  1.07M
   mirror1.05T   319G 18333   672K  2.74M
 c7t16d0  -  - 11 96   406K  2.74M
 c7t17d0  -  -  7 96   266K  2.74M
   mirror1.04T   323G 13353   540K  2.85M
 c7t18d0  -  -  7 98   279K  2.85M
 c7t19d0  -  -  6100   261K  2.85M
   mirror1.04T   324G 12459   543K  2.99M
 c7t20d0  -  -  7118   285K  2.99M
 c7t21d0  -  -  4119   258K  2.99M
   mirror1.04T   324G 11431   465K  3.04M
 c7t22d0  -  -  5116   195K  3.04M
 c7t23d0  -  -  6117   272K  3.04M
   c8t2d00  29.5G  0  0  0  0

Btw, this disk seems alone, unmirrored and a bit small..?

 cache   -  -  -  -  -  -
   c8t3d059.4G  3.88M113 64  6.51M  7.31M
   c8t1d059.5G48K 95 69  5.69M  8.08M
 
 
 Thanks
 -Matt
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-27 Thread Tomas Ögren

On 27 April, 2011 - Edward Ned Harvey sent me these 0,6K bytes:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Erik Trimble
  
  (BTW, is there any way to get a measurement of number of blocks consumed
  per zpool?  Per vdev?  Per zfs filesystem?)  *snip*.
  
  
  you need to use zdb to see what the current block usage is for a
 filesystem.
  I'd have to look up the particular CLI usage for that, as I don't know
 what it is
  off the top of my head.
 
 Anybody know the answer to that one?

zdb -bb pool

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Read-only vdev

2011-04-08 Thread Tomas Ögren

On 08 April, 2011 - Karl Wagner sent me these 3,5K bytes:

 Hi everyone.
 
  
 
 I was just wondering if there was a way to for a specific vdev in a pool to
 be read-only?
 
  
 
 I can think of several uses for this, but would need to know if it was
 possible before thinking them through properly.

I can't think of any, so what are your uses?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Assessing health/performance of individual drives in ZFS pool

2011-04-07 Thread Tomas Ögren

On 07 April, 2011 - Russ Price sent me these 0,7K bytes:

 On 04/05/2011 03:01 PM, Tomas Ögren wrote:
 On 05 April, 2011 - Joe Auty sent me these 5,9K bytes:
 Has this changed, or are there any other techniques I can use to check
 the health of an individual SATA drive in my pool short of what ZFS
 itself reports?

 Through scsi compat layer..

 socker:~# smartctl -a -d scsi /dev/rdsk/c0t0d0s0

 Note that you can get more complete information by using -d sat,12 than 
 by using -d scsi. This works for me on both the onboard AHCI ports and 
 the SATA drives connected to my Intel SASUC8I (LSI-based) HBA.

Excellent. Works fine on the internal disks of a HP DL160G6 for example.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Assessing health/performance of individual drives in ZFS pool

2011-04-05 Thread Tomas Ögren

On 05 April, 2011 - Joe Auty sent me these 5,9K bytes:

 Hello,
 
 A while I was exploring running smartmontools in Solaris 10 on my SATA
 drives, but at the time SATA drives were not supported.
 
 Has this changed, or are there any other techniques I can use to check
 the health of an individual SATA drive in my pool short of what ZFS
 itself reports? 

Through scsi compat layer..

socker:~# smartctl -a -d scsi /dev/rdsk/c0t0d0s0
smartctl 5.40 2010-10-16 r3189 [i386-pc-solaris2.10] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Serial number: WCAT26836798
Device type: disk
Local Time is: Tue Apr  5 22:00:23 2011 MEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature: 29 C

Error Counter logging not supported

SMART Self-test log
Num  Test  Status segment  LifeTime  LBA_first_err 
[SK ASC ASQ]
 Description  number   (hours)
# 1  Default   Completed   - 293 - 
[-   --]
...

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Investigating a hung system

2011-02-25 Thread Tomas Ögren

On 25 February, 2011 - Mark Logan sent me these 0,6K bytes:

 Hi,

 I'm investigating a hung system. The machine is running snv_159 and was  
 running a full build of Solaris 11. You cannot get any response from the  
 console and you cannot ssh in, but it responds to ping.

 The output from ::arc shows:
 arc_meta_used =  3836 MB
 arc_meta_limit=  3836 MB
 arc_meta_max  =  3951 MB

 Is it normal for arc_meta_used == arc_meta_limit?

It means that it has cached as much metadata as it's allowed to during
the current circumstances (arc size).

 Does this explain the hang?

No..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Performance

2011-02-25 Thread Tomas Ögren

On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes:

 Hi All,

 In reading the ZFS Best practices, I'm curious if this statement is  
 still true about 80% utilization.

It happens at about 90% for me.. all of a sudden, the mail server got
butt slow.. killed an old snapshot to get to 85% free or so, then it got
snappy again. S10u9 sparc.

 from :   
 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

 


  
 http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage
  Pool Performance Considerations

 .
 Keep pool space under 80% utilization to maintain pool performance.  
 Currently, pool performance can degrade when a pool is very full and  
 file systems are updated frequently, such as on a busy mail server. Full  
 pools might cause a performance penalty, but no other issues.

 

 Dave



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best way/issues with large ZFS send?

2011-02-16 Thread Tomas Ögren

On 16 February, 2011 - Richard Elling sent me these 1,3K bytes:

 On Feb 16, 2011, at 6:05 AM, Eff Norwood wrote:
 
  I'm preparing to replicate about 200TB of data between two data centers 
  using zfs send. We have ten 10TB zpools that are further broken down into 
  zvols of various sizes in each data center. One DC is primary and the other 
  will be the replication target and there is plenty of bandwidth between 
  them (10 gig dark fiber).
  
  Are there any gotchas that I should be aware of? Also, at what level should 
  I be taking the snapshot to do the zfs send? At the primary pool level or 
  at the zvol level? Since the targets are to be exact replicas, I presume at 
  the primary pool level (e.g. tank) rather than for every zvol (e.g. 
  tank/prod/vol1)?
 
 There is no such thing as a pool snapshot. There are only dataset snapshots.

.. but you can make a single recursive snapshot call that affects all
datasets.

 The trick to a successful snapshot+send strategy at this size is to start 
 snapping
 early and often. You don't want to send 200TB, you want to send 2TB, 100 
 times :-)
 
 The performance is tends to be bursty, so the fixed record size of the zvols 
 can 
 work to your advantage for capacity planning. Also, a buffer of some sort can 
 help
 smooth out the utilization, see the threads on ZFS and mbuffer.
  -- richard
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sil3124 Sata controller for ZFS on Sparc OpenSolaris Nevada b130

2011-02-08 Thread Tomas Ögren

On 08 February, 2011 - Robert Soubie sent me these 1,1K bytes:

 Le 08/02/2011 07:10, Jerry Kemp a écrit :
 As part of a small home project, I have purchased a SIL3124 hba in  
 hopes of attaching an external drive/drive enclosure via eSATA.

 The host in question is an old Sun Netra T1 currently running  
 OpenSolaris Nevada b130.

 The card in question is this Sil3124 card:

 http://www.newegg.com/product/product.aspx?item=N82E16816124003

 although I did not purchase it from Newegg. I specifically purchased  
 this card as I have seen specific reports of it working under  
 Solaris/OpenSolaris distro's on several Solaris mailing lists.

 I use a non-eSata version of this card under Solaris Express 11 for a  
 boot mirrored ZFS pool. And another one for a Windows 7 machine that  
 does backups of the server. Bios and drivers are available from the  
 Silicon Image site, but nothing for Solaris.

The problem itself is sparc vs x86 and firmware for the card. AFAIK,
there is no sata card with drivers for solaris sparc. Use a SAS card.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] mix 4k wdd drives whit 512 WD drives ?

2011-01-26 Thread Tomas Ögren

On 26 January, 2011 - Benji sent me these 0,8K bytes:

 Those WD20EARS emulate 512 bytes sectors, so yes you can freely mix
 and match them with other regular 512 bytes drives. Some have
 reported slower read/write speeds but nothing catastrophic.

For some workloads, 3x slower than it should be.

 Or you can create a new 4K aligned pool (composed of only 4K drives!)
 to really take advantage of them. For that, you will need a modified
 zpool command to sets the ashift value of the pool to 12.

A 4k aligned pool will work perfectly on a 512b aligned disk, it's just
the other way that's bad. I guess ZFS could start defaulting to 4k, but
ideally it should do the right thing depending on content (although
that's hard for disks that are lying).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] stupid ZFS question - floating point operations

2010-12-22 Thread Tomas Ögren

On 22 December, 2010 - Jerry Kemp sent me these 1,0K bytes:

 I have a coworker, who's primary expertise is in another flavor of Unix.
 
 This coworker lists floating point operations as one of ZFS detriments.
 
 I's not really sure what he means specifically, or where he got this
 reference from.

Then maybe ask him first? Guilty until proven innocent isn't the regular
path...

 In an effort to refute what I believe is an error or misunderstanding on
 his part, I have spent time on Yahoo, Google, the ZFS section of
 OpenSolaris.org, etc.  I really haven't turned up much of anything that
 would prove or disprove his comments.  The one thing I haven't done is
 to go through the ZFS source code, but its been years since I have done
 any serious programming.
 
 If someone from Oracle, or anyone on this mailing list could point me
 towards any documentation, or give me a definitive word, I would sure
 appreciate it.  If there were floating point operations going on within
 ZFS, at this point I am uncertain as to what they would be.
 
 TIA for any comments,
 
 Jerry
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How many files directories in a ZFS filesystem?

2010-12-09 Thread Tomas Ögren

On 09 December, 2010 - David Strom sent me these 0,7K bytes:

 Looking for a little help, please.  A contact from Oracle (Sun)  
 suggested I pose the question to this email.

 We're using ZFS on Solaris 10 in an application where there are so many  
 directory-subdirectory layers, and a lot of small files (~1-2Kb) that we  
 ran out of inodes (over 30 million!).

 So, the zfs question is, how can we see how many files  directories  
 have been created in a zfs filesystem?  Equivalent to df -o i on a UFS  
 filesystm.

 Short of doing a find zfs-mount-point | wc.

GNU df can show, and regular Solaris could too but chooses not to.
statvfs() should be able to report as well. In ZFS, you will run out of
inodes at the same time as you run out of space.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] accidentally added a drive?

2010-12-06 Thread Tomas Ögren

On 05 December, 2010 - Chris Gerhard sent me these 0,3K bytes:

 Alas you are hosed.  There is at the moment no way to shrink a pool which is 
 what you now need to be able to do.
 
 back up and restore I am afraid.

.. or add a mirror to that drive, to keep some redundancy.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to create a checkpoint?

2010-11-08 Thread Tomas Ögren

On 08 November, 2010 - Peter Taps sent me these 0,7K bytes:

 Folks,
 
 My understanding is that there is a way to create a zfs checkpoint
 before doing any  system upgrade or installing a new software. If
 there is a problem, one can simply rollback to the stable checkpoint.
 
 I am familiar with snapshots and clones. However, I am not clear on
 how to manage checkpoints. I would appreciate your help in how I can
 create, destroy and roll back to a checkpoint, and how I can list all
 the checkpoints.

You probably refer to snapshots, as ZFS does not have checkpoints (and
is pretty much the same as a snapshot).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Unknown Space Gain

2010-10-20 Thread Tomas Ögren

On 20 October, 2010 - Krunal Desai sent me these 1,5K bytes:

 Huh, I don't actually ever recall enabling that. Perhaps that is
 connected to the message I started getting every minute recently in
 the kernel buffer,
 
 Oct 20 12:20:49 megatron pcplusmp: [ID 805372 kern.info] pcplusmp: ide
 (ata) instance 3 irq 0xf vector 0x45 ioapic 0x2 intin 0xf is bound to
 cpu 0
 Oct 20 12:21:49 megatron pcplusmp: [ID 805372 kern.info] pcplusmp: ide
 (ata) instance 3 irq 0xf vector 0x45 ioapic 0x2 intin 0xf is bound to
 cpu 1
 
 I just disabled it (zfs set com.sun\:auto-snapshot=false tank,
 correct?), will see if the log messages disappear. Did the filesystem
 kill off some snapshots or something in an effort to free up space?

Probably.

zfs list -t all   to see all the snapshots as well.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What is the 1000 bit?

2010-10-19 Thread Tomas Ögren

On 19 October, 2010 - Linder, Doug sent me these 1,2K bytes:

 Nicolas Williams [mailto:nicolas.willi...@oracle.com] wrote:
 
  It's the sticky bit.  Nowadays it's only useful on directories, and
  really it's generally only used with 777 permissions.  The chmod(1)
 
 Thanks.  It doesn't seem harmful.  But it does make me wonder why it's
 showing up on my newly-created zpool.  I literally created the pool
 with one command, created a file (mkfile) with the second command, and
 did an ls with the third.  I can't imagine how I could have done
 anything to set that bit.  Is this a ZFS weirdness?

It's mkfile.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Moving camp, lock stock and barrel

2010-10-11 Thread Tomas Ögren

On 11 October, 2010 - Harry Putnam sent me these 0,5K bytes:

 Harry Putnam rea...@newsguy.com writes:
 
 [...]
 
  I can't get X up ... it just went to a black screen, after seeing the
  main login screen, logging in to consol and calling:
 
  WHOOPS, omitted some information here...
 Calling:
 
 `startx /usr/bin/dbus-launch --exit-with-session gnome-session' from
 console. Which is how I've been starting X for some time.

This thread started out way off-topic from ZFS discuss (the filesystem)
and has continued off course.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Tomas Ögren

On 06 October, 2010 - Stephan Budach sent me these 2,1K bytes:

 Hi,
 
 I recently discovered some - or at least one corrupted file on one ofmy ZFS 
 datasets, which caused an I/O error when trying to send a ZFDS snapshot to 
 another host:
 
 
 zpool status -v obelixData
   pool: obelixData
  state: ONLINE
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: none requested
 config:
 
 NAME STATE READ WRITE CKSUM
 obelixData   ONLINE   4 0 0
   c4t21D023038FA8d0  ONLINE   0 0 0
   c4t21D02305FF42d0  ONLINE   4 0 0
 
 errors: Permanent errors have been detected in the following files:
 
 0x949:0x12b9b9
 
 obelixData/jvmprepr...@2010-10-02_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
 CI vor ET 10.6.2010/13404_41_07008 Estate 
 HandelsMarketing/Dealer_Launch_Invitations 
 Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
 
 obelixData/jvmprepr...@backupsnapshot_2010-10-05-08:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ
  in CI vor ET 10.6.2010/13404_41_07008 Estate 
 HandelsMarketing/Dealer_Launch_Invitations 
 Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
 
 obelixData/jvmprepr...@2010-09-24_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in 
 CI vor 6_210/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
 Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
 /obelixData/JvMpreprint/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor 
 ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations 
 Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps
 
 Now, scrub would reveal corrupted blocks on the devices, but is there a way 
 to identify damaged files as well?

Is this a trick question or something? The filenames are right over
your question..?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver that never finishes

2010-09-19 Thread Tomas Ögren

On 19 September, 2010 - Markus Kovero sent me these 0,5K bytes:

 Hi, 
 
  The drives and the chassis are fine, what I am questioning is how can it 
  be resilvering more data to a device than the capacity of the device?
 
 If data on pool has changed during resilver, resilver counter will not
 update accordingly, and it will show resilvering 100% for needed time
 to catch up.

I believe this was fixed recently, by displaying how many blocks it has
checked vs how many to check...

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS online device management

2010-09-11 Thread Tomas Ögren

On 11 September, 2010 - besson3c sent me these 0,6K bytes:

 Hello,
 
 I found in the release notes for Solaris 10 9/10:
 
 Oracle Solaris ZFS online device management, which allows customers to make 
 changes to filesystem configurations, without taking data offline.
 
 
 Can somebody kindly clarify what sort of filesystem configuration
 changes can me made this way?

See below.

 Does this include, say, changing a 6 disk RAID-Z to two 3 disk RAID-Z
 sets striped?

Nope.

You can add/remove mirrors of disks online (assuming you start with a
non-raidz vdev).
You can expand a pool by adding more vdevs.

You can not transform a raidz from one form to another.
You can not remove a vdev.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] performance leakage when copy huge data

2010-09-09 Thread Tomas Ögren

On 08 September, 2010 - Fei Xu sent me these 5,9K bytes:

 I dig deeper into it and might find some useful information.
 I attached an X25 SSD for ZIL to see if it helps.  but no luck.
 I run IOstate -xnz for more details and got interesting result as 
 below.(maybe too long)
 some explaination:
 1. c2d0 is SSD for ZIL
 2. c0t3d0, c0t20d0, c0t21d0, c0t22d0 is source pool.
...
 extended device statistics  
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.30.01.20.0  0.0  0.00.00.1   0   0 c2d0
 0.1   17.70.1   51.7  0.0  0.10.24.1   0   7 c3d0
 0.12.10.0   79.8  0.0  0.00.14.0   0   0 c0t2d0
 0.20.07.10.0  0.1  2.3  278.5 11365.1   1  46 c0t3d0

Service time here is crap. 11 seconds to reply.

 0.12.20.0   79.9  0.0  0.00.13.7   0   0 c0t5d0
 0.12.30.0   80.0  0.0  0.00.19.2   0   0 c0t6d0
 0.12.50.0   80.1  0.0  0.00.13.8   0   0 c0t10d0
 0.12.40.0   80.0  0.0  0.00.19.5   0   0 c0t11d0
 1.90.0  133.00.0  0.1  2.8   60.2 1520.6   2  51 c0t20d0

1.5 seconds to reply. crap.

 extended device statistics  
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...
 0.70.0   39.10.0  0.0  0.6   64.0  884.1   1  10 c0t3d0
...
 2.10.0  135.80.0  0.1  5.2   67.8 2498.1   3  88 c0t21d0
...
 extended device statistics  
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...
 3.50.0  246.80.0  0.0  0.86.3  229.8   1  20 c0t3d0
...
 0.70.0   29.20.0  0.0  0.60.0  911.0   0  12 c0t21d0
 1.90.0  138.70.0  0.1  4.7   73.0 2428.6   2  66 c0t22d0
...

Service times here are crap. Disks are malfunctioning in some way. If
your source disks can take seconds (or 10+ seconds) to reply, then of
course your copy will be slow. Disk is probably having a hard time
reading the data or something.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris 10u9

2010-09-08 Thread Tomas Ögren

On 08 September, 2010 - Edward Ned Harvey sent me these 0,6K bytes:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of David Magda
  
  The 9/10 Update appears to have been released. Some of the more
  noticeable
  ZFS stuff that made it in:
  
  More at:
  
  http://docs.sun.com/app/docs/doc/821-1840/gijtg
 
 Awesome!  Thank you.  :-)
 Log device removal in particular, I feel is very important.  (Got bit by
 that one.)
 
 Now when is dedup going to be ready?   ;-)

It's not in U9 at least:
...
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Reserved
 22  Received properties
...

scratchy:~# zfs create -o dedup=on kaka/kex
cannot create 'kaka/kex': 'dedup' is readonly
scratchy:~# zfs set dedup=on kaka
cannot set property for 'kaka': 'dedup' is readonly

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool

2010-08-28 Thread Tomas Ögren

On 27 August, 2010 - Darin Perusich sent me these 2,1K bytes:

 Hello All,
 
 I'm sure this has been discussed previously but I haven't been able to find 
 an 
 answer to this. I've added another raidz1 vdev to an existing storage pool 
 and 
 the increased available storage isn't reflected in the 'zfs list' output. Why 
 is this?
 
 The system in question is runnning Solaris 10 5/09 s10s_u7wos_08, kernel 
 Generic_139555-08. The system does not have the lastest patches which might 
 be 
 the cure.
 
 Thanks!
 
 Here's what I'm seeing.
 zpool create datapool raidz1 c1t50060E800042AA70d0  c1t50060E800042AA70d1

Just fyi, this is an inefficient variant of a mirror. More cpu required
and lower performance.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Directory tree renaming -- disk usage

2010-08-09 Thread Tomas Ögren

On 09 August, 2010 - David Dyer-Bennet sent me these 1,2K bytes:

 If I have a directory with a bazillion files in it (or, let's say, a
 directory subtree full of raw camera images, about 15MB each, totalling
 say 50GB) on a ZFS filesystem, and take daily snapshots of it (without
 altering it), the snapshots use almost no extra space, I know.
 
 If I now rename that directory, and take another snapshot, what happens? 
 Do I get two copies of the unchanged data now, or does everything still
 reference the same original data (file content)?  Seems like the new
 directory tree contains the same old files, same inodes and so forth, so
 it shouldn't be duplicating the data as I understand it; is that correct?

The files hasn't changed, unless you rename the directory by creating a
new, copy stuff over and remove the old.

The only change is the name of the directory.

 This would, obviously, be fairly easy to test; and, if I removed the
 snapshots afterward, wouldn't take space permanently (have to make sure
 that the scheduler doesn't do one of my permanent snapshots during the
 test).  But I'm interested in the theoretical answer in any case.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] vdev using more space

2010-08-05 Thread Tomas Ögren

On 04 August, 2010 - Karl Rossing sent me these 5,4K bytes:

  Hi,

 We have a server running b134. The server runs xen and uses a vdev as  
 the storage.

 The xen image is running nevada 134.

 I took a snapshot last night to move the xen image to another server.

 NAME USED  AVAIL  REFER  MOUNTPOINT
 vpool/host/snv_130 32.8G  11.3G  37.7G  -
 vpool/host/snv_...@2010-03-31  3.27G  -  13.8G  -
 vpool/host/snv_...@2010-08-03   436M  -  37.7G  -

 It's also worth noting that vpool/host/snv_130 is a clone at least two  
 other snapshots.

 I then did a zfs send of vpool/host/snv_...@2010-08-03 and got a 39GB file.
 A zfs send of vpool/host/snv_...@2010-03-31 gave a file of 15GB.

This is probably data + metadata or similar.

 I don't understand why the file is 39GB since df -h inside of the xen  
 image drive vpool/host/snv_130 shows:
 Filesystem size   used  avail capacity  Mounted on
 rpool/ROOT/snv_130  39G12G22G35%/

 It would be nice if the zfs send file would be roughly the same size as  
 the space used inside of xen machine.

The filesystem on the inside might have touched all the blocks, but not
informed the outer ZFS (because it can't) that some blocks are freed.
One way of making it smaller is to enable compression on the outer
zvol, disable compression on the inner filesystem and then fill the
inner filesystem with null (dd if=/dev/zero of=file bs=1024k) and remove
that file, then remove compression (if you want). This is just a
temporary thing, as the filesystem will be used on the inside (with Copy
on Write), the outer one will grow back again.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] When is the L2ARC refreshed if on a separate drive?

2010-08-03 Thread Tomas Ögren

On 03 August, 2010 - valrh...@gmail.com sent me these 1,2K bytes:

 I'm running a mirrored pair of 2 TB SATA drives as my data storage drives on 
 my home workstation, a Core i7-based machine with 10 GB of RAM. I recently 
 added a sandforce-based 60 GB SSD (OCZ Vertex 2, NOT the pro version) as an 
 L2ARC to the single mirrored pair. I'm running B134, with ZFS pool version 
 22, with dedup enabled. If I understand correctly, the dedup table should be 
 in the L2ARC on the SSD, and I should have enough RAM to keep the references 
 to that table in memory, and that this is therefore a well-performing 
 solution.
 
 My question is what happens at power off. Does the cache device essentially 
 get cleared, and the machine has to rebuild it when it boots? Or is it 
 persistent. That is, should performance improve after a little while 
 following a reboot, or is it always constant once it builds the L2ARC once? 

L2ARC is currently cleared at boot. There is an RFE to make it
persistent.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] never ending resilver

2010-07-05 Thread Tomas Ögren

On 05 July, 2010 - Roy Sigurd Karlsbakk sent me these 1,9K bytes:

 - Original Message -
  If you have one zpool consisting of only one large raidz2, then you
  have a slow raid. To reach high speed, you need maximum 8 drives in
  each raidz2. So one of the reasons it takes time, is because you have
  too many drives in your raidz2. Everything would be much faster if you
  split your zpool into two raidz2, each consisting of 7 or 8 drives.
  Then it would be fast.
 
 Keeping the VDEVs small is one thing, but this is about resilvering spending 
 far more time than reported. The same applies to scrubbing at times.
 
 Would it be hard to rewrite the reporting mechanisms in ZFS to report
 something more likely, than just a first guess? ZFS scrub reports
 tremendous times at start, but slows down after it's worked it's way
 through the metadata. What ZFS is doing when the system still scrubs
 after 100 hours at 100% is beyond my knowledge.

I believe it's something like this:
* When starting, it notes the number of blocks to visit
* .. visiting blocks ...
* .. adding more data (which then will be beyond the original 100%) ..
  and visiting blocks ...
* .. reaching the initial last block, which since then has gotten lots
  of new friends afterwards.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6899970

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status output confusion

2010-05-27 Thread Tomas Ögren

On 27 May, 2010 - Per Jorgensen sent me these 1,0K bytes:

 I get the following output when i run a zpool status , but i am a
 little confused of why c9t8d0 is more left align then the rest of
 the disks in the pool , what does it mean ?

Because someone forced it in without redundancy (or created it as such).
Your pool is bad, as c9t8d0 is without redundancy. If it fails, your
pool is toast.

zpool history   should be able to tell when it happened at least.

 $ zpool status blmpool
   pool: blmpool
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 blmpool ONLINE   0 0 0
   raidz2ONLINE   0 0 0
 c9t0d0  ONLINE   0 0 0
 c9t1d0  ONLINE   0 0 0
 c9t3d0  ONLINE   0 0 0
 c9t4d0  ONLINE   0 0 0
 c9t5d0  ONLINE   0 0 0
 c9t6d0  ONLINE   0 0 0
 c9t7d0  ONLINE   0 0 0
   c9t8d0ONLINE   0 0 0
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] creating a fast ZIL device for $200

2010-05-26 Thread Tomas Ögren

On 26 May, 2010 - sensille sent me these 4,5K bytes:

 Edward Ned Harvey wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of sensille
 
  The basic idea: the main problem when using a HDD as a ZIL device
  are the cache flushes in combination with the linear write pattern
  of the ZIL. This leads to a whole rotation of the platter after
  each write, because after the first write returns, the head is
  already past the sector that will be written next.
  My idea goes as follows: don't write linearly. Track the rotation
  and write to the position the head will hit next. This might be done
  by a re-mapping layer or integrated into ZFS. This works only because
  ZIL device are basically write-only. Reads from this device will be
  horribly slow.
  
  The reason why hard drives are less effective as ZIL dedicated log devices
  compared to such things as SSD's, is because of the rotation of the hard
  drives; the physical time to seek a random block.  There may be a
  possibility to use hard drives as dedicated log devices, cheaper than SSD's
  with possibly comparable latency, if you can intelligently eliminate the
  random seek.  If you have a way to tell the hard drive Write this data, to
  whatever block happens to be available at minimum seek time.
 
 Thanks for rephrasing my idea :) The only thing I'd like to point out is that
 ZFS doesn't do random writes on a slog, but nearly linear writes. This might
 even be hurting performance more than random writes, because you always hit
 the worst case of one full rotation.

A simple test would be to change write block X write block X+1
write block X+2 into  write block X write block X+4 write block
X+8 or something, so it might manage to send the command before the
head has travelled over to block X+4 etc..

I guess basically, you want to do something like TCQ/NCQ, but without
the Q.. placing writes optimally..

  So you believe you can know the drive geometry, the instantaneous head
  position, and the next available physical block address in software?  No
  need for special hardware?  That's cool.  I hope there aren't any gotchas
  as-yet undiscovered.
 
 Yes, I already did a mapping of several drives. I measured at least the track
 length, the interleave needed between two writes and the interleave if a
 track-to-track seek is involved. Of course you can always learn more about a
 disk, but that's a good starting point.

Since X, X+1, X+2 seems to be the optimally worst case, try just
skipping over a few blocks.. Double (or such) the performance for a
single software tweak would be surely welcome.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] don't mount a zpool on boot

2010-05-20 Thread Tomas Ögren

On 20 May, 2010 - John Andrunas sent me these 0,3K bytes:

 Can I make a pool not mount on boot?  I seem to recall reading
 somewhere how to do it, but can't seem to find it now.

zpool export thatpool

zpool import thatpool  when you want it back.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Very serious performance degradation

2010-05-18 Thread Tomas Ögren

On 18 May, 2010 - Philippe sent me these 6,0K bytes:

 Hi,
 
 The 4 disks are Western Digital ATA 1TB (one is slighlty different) :
  1 x ATA-WDC WD10EACS-00D-1A01-931.51GB
  3 x ATA-WDC WD10EARS-00Y-0A80-931.51GB
 
 I've done lots of tests (speed tests + SMART reports) with each of these 4 
 disk on another system (another computer, running Windows 2003 x64), and 
 everything was fine ! The 4 disks operate well, at 50-100 MB/s (tested with 
 Hdtune). And the access time : 14ms
 
 The controller is an LSI Logic SAS 1068-IR (MPT BIOS 6.12.00.00 - 31/10/2006)
 
 Here are some stats :
 
 1) cp of a big file to a ZFS filesystem (128K recordsize) :
 =
 iostat -x 30
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1   0.30.3   17.62.3  0.0  0.0   19.5   0   0
 sd2  11.56.0  350.1  154.5  0.0  0.3   19.5   0   4
 sd3  12.55.7  351.4  154.5  0.0  0.5   27.1   0   5
 sd4  15.96.3  615.1  153.8  0.0  1.3   58.2   0   8
 sd5  15.18.1  600.4  150.7  0.0  7.6  326.7   0  31
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1  41.30.0 5289.70.0  0.0  1.3   31.0   0   4
 sd2   4.2   24.1  214.0 1183.0  0.0  0.5   19.4   0   4
 sd3   3.7   23.6  227.2 1183.0  0.0  2.1   78.5   0  12
 sd4   6.6   26.4  374.2 1179.4  0.0 10.1  306.5   0  35
 sd5   4.3   31.0  369.6  973.3  0.0 22.0  622.0   0  96
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1  17.10.0 2184.60.0  0.0  0.5   30.6   0   2
 sd2   1.6   12.3  116.4  570.9  0.0  0.6   41.3   0   3
 sd3   1.6   12.1  107.6  570.9  0.0 10.3  754.7   0  33
 sd4   2.1   12.6  187.1  569.4  0.0  9.4  634.7   0  28
 sd5   0.4   21.7   25.6  700.6  0.0 29.5 1338.1   0  96

Umm.. Service time of sd3..5 are waay too high to be good working disks.
21 writes shouldn't take 1.3 seconds.

Some of your disks are not feeling well, possibly doing
block-reallocation like mad all the time, or block recovery of some
form. Service times should be closer to what sd1 and 2 are doing.
sd2,3,4 seems to be getting about the same amount of read+write, but
their service time is 15-20 times higher. This will lead to crap
performance (and probably broken array in a while).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Using WD Green drives?

2010-05-17 Thread Tomas Ögren

On 17 May, 2010 - Dan Pritts sent me these 1,6K bytes:

 On Thu, May 13, 2010 at 06:09:55PM +0200, Roy Sigurd Karlsbakk wrote:
  1. even though they're 5900, not 7200, benchmarks I've seen show they are 
  quite good 
 
 Minor correction, they are 5400rpm.  Seagate makes some 5900rpm drives.
 
 The green drives have reasonable raw throughput rate, due to the
 extremely high platter density nowadays.  however, due to their low
 spin speed, their average-access time is significantly slower than
 7200rpm drives.
 
 For bulk archive data containing large files, this is less of a concern.
 
 Regarding slow reslivering times, in the absence of other disk activity,
 I think that should really be limited by the throughput rate, not the
 relatively slow random i/o performance...again assuming large files
 (and low fragmentation, which if the archive is write-and-never-delete
 is what i'd expect).
 
 One test i saw suggests 60MB/sec avg throughput on the 2TB drives.
 That works out to 9.25 hours to read the entire 2TB.  At a conservative
 50MB/sec it's 11 hours.  This assumes that you have enough I/O bandwidth
 and CPU on the system to saturate all your disks.
 
 if there's other disk activity during a resilver, though, it turns into
 random i/o.  Which is slow on these drives.

Resilver does a whole lot of random io itself, not bulk reads.. It reads
the filesystem tree, not block 0, block 1, block 2... You won't get
60MB/s sustained, not even close.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Using WD Green drives?

2010-05-13 Thread Tomas Ögren

On 13 May, 2010 - Roy Sigurd Karlsbakk sent me these 2,9K bytes:

 - Brian broco...@vt.edu skrev:
 
  (1) They seem to have a firmware setting (that may not be modified
  depending on revision) that has to do with the drive parking the
  drive after 8 seconds of inactivity to save power.  These drives are
  rated for a certain number of park/unpark operations -- I think
  300,000.  Using these drives in a NAS results in a lot of
  park/unpark.
 
 8 seconds? is it really that low?

Yes. My disk went through 180k in like 2-3 months.. Then I told smartd
to poll the disk every 5 seconds to prevent it from falling asleep.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation

2010-05-10 Thread Tomas Ögren

On 10 May, 2010 - charles sent me these 0,8K bytes:

 Hi,
 
 This thread refers to Solaris 10, but it was suggested that I post it here as 
 ZFS developers may well be more likely to respond.
 
 http://forums.sun.com/thread.jspa?threadID=5438393messageID=10986502#10986502
 
 Basically after about ZFS 1000 filesystem creations the creation time slows 
 down to around 4 seconds, and gets progressively worse.
 
 This is not the case for normal mkdir which creates thousands of directories 
 very quickly.
 
 I wanted users home directories (60,000 of them) all to be individual ZFS 
 file systems, but there seems to be a bug/limitation due to the prohibitive 
 creation time.

If you're going to share them over nfs, you'll be looking at even worse
times.

In my experience, you don't want to go over 1-2k filesystems due to
various scalability problems, esp if you're doing NFS as well. It will
be slow to create and slow when (re)booting, but other than that it
might be ok..

Look into the zfs userquota/groupquota instead.. That's what I did, and
it's partly because of these issues that the userquota/groupquota got
implemented I guess.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Tomas Ögren

On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:

 On Wed, 5 May 2010, Edward Ned Harvey wrote:

 In the L2ARC (cache) there is no ability to mirror, because cache device
 removal has always been supported.  You can't mirror a cache device, because
 you don't need it.

 How do you know that I don't need it?  The ability seems useful to me.

The gain is quite minimal.. If the first device fails (which doesn't
happen too often I hope), then it will be read from the normal pool once
and then stored in ARC/L2ARC again. It just behaves like a cache miss
for that specific block... If this happens often enough to become a
performance problem, then you should throw away that L2ARC device
because it's broken beyond usability.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Tomas Ögren

On 05 May, 2010 - Michael Sullivan sent me these 0,9K bytes:

 HI,
 
 I have a question I cannot seem to find an answer to.
 
 I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.
 
 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will
 be relocated back to the spool.  I'd probably have it mirrored anyway,
 just in case.  However you cannot mirror the L2ARC, so...

Given enough opensolaris.. Otherwise, your pool is screwed iirc.

 What I want to know, is what happens if one of those SSD's goes bad?
 What happens to the L2ARC?  Is it just taken offline, or will it
 continue to perform even with one drive missing?

L2ARC is a pure cache thing, if it gives bad data (checksum error), it
will be ignored, if you yank it, it will be ignored. It's very safe to
have crap hardware there (as long as they don't start messing up some
bus or similar). They can be added/removed at any time as well.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren

 , and is 
 believed to be clean. 
 ___ 
 zfs-discuss mailing list 
 zfs-discuss@opensolaris.org 
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren

On 29 April, 2010 - Tomas Ögren sent me these 5,8K bytes:

 On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes:
 
  I got this hint from Richard Elling, but haven't had time to test it much. 
  Perhaps someone else could help? 
  
  roy 
  
   Interesting. If you'd like to experiment, you can change the limit of the 
   number of scrub I/Os queued to each vdev. The default is 10, but that 
   is too close to the normal limit. You can see the current scrub limit 
   via: 
   
   # echo zfs_scrub_limit/D | mdb -k 
   zfs_scrub_limit: 
   zfs_scrub_limit:10 
   
   you can change it with: 
   # echo zfs_scrub_limit/W0t2 | mdb -kw 
   zfs_scrub_limit:0xa = 0x2 
   
   # echo zfs_scrub_limit/D | mdb -k 
   zfs_scrub_limit: 
   zfs_scrub_limit:2 
   
   In theory, this should help your scenario, but I do not believe this has 
   been exhaustively tested in the lab. Hopefully, it will help. 
   -- richard 
 
 If I'm reading the code right, it's only used when creating a new vdev
 (import, zpool create, maybe at boot).. So I took an alternate route:
 
 http://pastebin.com/hcYtQcJH
 
 (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices *
 zfs_scrub_limit(10) = 70..)
 
 With these lower numbers, our pool is much more responsive over NFS..

But taking snapshots is quite bad.. A single recursive snapshot over
~800 filesystems took about 45 minutes, with NFS operations taking 5-10
seconds.. Snapshots usually take 10-30 seconds..

  scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go

 scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go

This is chugging along..

The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.
Should have enough oompf, but when you combine snapshot with a
scrub/resilver, sync performance gets abysmal.. Should probably try
adding a ZIL when u9 comes, so we can remove it again if performance
goes crap.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren

On 29 April, 2010 - Richard Elling sent me these 2,5K bytes:

  With these lower numbers, our pool is much more responsive over NFS..
  
  But taking snapshots is quite bad.. A single recursive snapshot over
  ~800 filesystems took about 45 minutes, with NFS operations taking 5-10
  seconds.. Snapshots usually take 10-30 seconds..
  
  scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go
  
  scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go
  
  This is chugging along..
  
  The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
  SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.
 
 slow disks == poor performance

I know they're not fast, but they're not should take 10-30 seconds to
create a directory. They do perfectly well in all combinations, except
when a scrub comes along (or sometimes when a snapshot feels like taking
45 minutes instead of 4.5 seconds). iostat says the disks aren't 100%
busy, the storage box itself doesn't seem to be busy, yet with zfs they
go downhill in some conditions..

  Should have enough oompf, but when you combine snapshot with a
  scrub/resilver, sync performance gets abysmal.. Should probably try
  adding a ZIL when u9 comes, so we can remove it again if performance
  goes crap.
 
 A separate log will not help.  Try faster disks.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Question about du and compression

2010-04-29 Thread Tomas Ögren

On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 1,2K bytes:

 Hi all
 
 Is there a good way to do a du that tells me how much data is there in
 case I want to move it to, say, an USB drive? Most filesystems don't
 have compression, but we're using it on (most of) our zfs filesystems,
 and it can be troublesome for someone that wants to copy a set of data
 to somewhere to find it's twice as big as reported by du.

GNU du has --apparent-size which reports the file size instead of how
much disk space it uses.. compression and sparse files will make this
differ, and you can't really tell them apart.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Performance drop during scrub?

2010-04-28 Thread Tomas Ögren

On 28 April, 2010 - Eric D. Mudama sent me these 1,6K bytes:

 On Wed, Apr 28 at  1:34, Tonmaus wrote:
 Zfs scrub needs to access all written data on all
 disks and is usually
 disk-seek or disk I/O bound so it is difficult to
 keep it from hogging
 the disk resources.  A pool based on mirror devices
 will behave much
 more nicely while being scrubbed than one based on
 RAIDz2.

 Experience seconded entirely. I'd like to repeat that I think we
 need more efficient load balancing functions in order to keep
 housekeeping payload manageable. Detrimental side effects of scrub
 should not be a decision point for choosing certain hardware or
 redundancy concepts in my opinion.

 While there may be some possible optimizations, i'm sure everyone
 would love the random performance of mirror vdevs, combined with the
 redundancy of raidz3 and the space of a raidz1.  However, as in all
 systems, there are tradeoffs.

 To scrub a long lived, full pool, you must read essentially every
 sector on every component device, and if you're going to do it in the
 order in which your transactions occurred, it'll wind up devolving to
 random IO eventually.

 You can choose to bias your workloads so that foreground IO takes
 priority over scrub, but then you've got the cases where people
 complain that their scrub takes too long.  There may be knobs for
 individuals to use, but I don't think overall there's a magic answer.

We have one system with a raidz2 of 8 SATA disks.. If we start a scrub,
then you can kiss any NFS performance goodbye.. A single mkdir or
creating a file can take 30 seconds.. Single write()s can take 5-30
seconds.. Without the scrub, it's perfectly fine. Local performance
during scrub is fine. NFS performance becomes useless.

This means we can't do a scrub, because doing so will basically disable
the NFS service for a day or three. If the scrub would be less agressive
and take a week to perform, it would probably not kill the performance
as bad..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-04-22 Thread Tomas Ögren

On 22 April, 2010 - Rich Teer sent me these 1,1K bytes:

 Hi all,
 
 I have a server running SXCE b130 and I use ZFS for all file systems.  I
 also have a couple of workstations running the same OS, and all is well.
 But I also have a MacBook Pro laptop running Snow Leopard (OS X 10.6.3),
 and I have troubles creating files on exported ZFS file systems.
 
 From the laptop, I can read and write existing files on the exported ZFS
 file systems just fine, but I can't create new ones.  My understanding is
 that Mac OS makes extensive use of file attributes so I was wondering if
 this might be the cause of the problem (I know ZFS supports file attributes,
 but I wonder if I have to utter some magic incantation to get them working
 properly with Mac OS).

I've noticed some issues with copying files to an smb share from macosx
clients like the last week.. haven't had time to investigate it fully,
but it sure seems EA related..
Copying a file from smb to smb (via the client) works as long as the
file hasn't gotten any EA yet.. If I for instance do the hide file
ext, then it's not working anymore. Enabling EA on a file works, but
creating one with EA doesn't.. So it seems like a Finder bug..

Copying via terminal (and cp) works.

 At the moment I have a workaround: I use sftp to copy the files from the
 laptop to the server.  But this is a pain in the ass and I'm sure there's
 a way to make this just work properly!

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Identifying what zpools are exported

2010-04-21 Thread Tomas Ögren

On 21 April, 2010 - Justin Lee Ewing sent me these 0,3K bytes:

 So I can obviously see what zpools I have imported... but how do I see  
 pools that have been exported?  Kind of like being able to see deported  
 volumes using vxdisk -o alldgs list.

'zpool import'

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Tomas Ögren

On 12 April, 2010 - Bob Friesenhahn sent me these 0,9K bytes:

 On Sun, 11 Apr 2010, James Van Artsdalen wrote:

 OpenSolaris needs support for the TRIM command for SSDs.  This command 
 is issued to an SSD to indicate that a block is no longer in use and 
 the SSD may erase it in preparation for future writes.

 There does not seem to be very much `need' since there are other ways  
 that a SSD can know that a block is no longer in use so it can be  
 erased.  In fact, ZFS already uses an algorithm (COW) which is friendly 
 for SSDs.

 Zfs is designed for high thoughput, and TRIM does not seem to improve  
 throughput.  Perhaps it is most useful for low-grade devices like USB  
 dongles and compact flash.

For flash to overwrite a block, it needs to clear it first.. so yes,
clearing it out in the background (after erasing) instead of just before
the timing critical write(), you can make stuff go faster.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Tomas Ögren

On 12 April, 2010 - David Magda sent me these 0,7K bytes:

 On Mon, April 12, 2010 10:48, Tomas Ögren wrote:
  On 12 April, 2010 - Bob Friesenhahn sent me these 0,9K bytes:
 
  Zfs is designed for high thoughput, and TRIM does not seem to improve
  throughput.  Perhaps it is most useful for low-grade devices like USB
  dongles and compact flash.
 
  For flash to overwrite a block, it needs to clear it first.. so yes,
  clearing it out in the background (after erasing) instead of just before
  the timing critical write(), you can make stuff go faster.
 
 Except that ZFS does not overwrite blocks because it is copy-on-write.

So CoW will enable infinite storage, so you never have to write on the
same place again? Cool.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC L2_Size kstat fluctuate

2010-04-09 Thread Tomas Ögren

On 09 April, 2010 - Abdullah Al-Dahlawi sent me these 27K bytes:

 Hi all
 
 I ran an OLTP-Filebench workload
 
 I set Arc max size = 2 gb
 l2arc ssd device size = 32gb
 workingset(dataset) = 10gb  , 10 files , 1gb each
 
 after running the workload for 6 hours and monitoring kstat , I have noticed
 that l2_size from kstat has reached 10gb which is great . however, l2_size
 started to drop all the way to 7gb  which means that the workload will
 go back to the HDD to retirive some data that are no longer on l2arc device
 .
 
 I understand that l2arc size reflected by zpool iostat is much larger
 becuase of COW and l2_size from kstat is the actual size of l2arc data.
 
 so can any one tell me why I am loosing my workingset from l2_size actual
 data !!!

Maybe the data in the l2arc was invalidated, because the original data
was rewritten?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC L2_Size kstat fluctuate

2010-04-09 Thread Tomas Ögren

On 09 April, 2010 - Abdullah Al-Dahlawi sent me these 5,3K bytes:

 Hi Tomas
 
 
 I understand from previous post
 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg36914.html
 
 that if the data gets invalidated, the l2arc size that is shown by zpool
 iostat is the one that changed (always growing because of COW)  not the
 actual size shown by kstat which represent the size of the up to date data
 in l2arc.
 
 My only conclusion here to this fluctuation in kstat l2_size is the fact
 that data has indeed invalidated and did not made it back to l2arc from the
 tail of ARC !!!
 
 Am I right 

Sounds plausible.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-08 Thread Tomas Ögren

On 08 April, 2010 - Cindy Swearingen sent me these 2,6K bytes:

 Hi Daniel,

 D'oh...

 I found a related bug when I looked at this yesterday but I didn't think
 it was your problem because you didn't get a busy message.

 See this RFE:

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6700597

Solaris 10 'man zfs', under 'receive':

 -uFile system that is associated with  the  received
   stream is not mounted.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC Workingset Size

2010-04-08 Thread Tomas Ögren

On 08 April, 2010 - Abdullah Al-Dahlawi sent me these 12K bytes:

 Hi Richard
 
 Thanks for your comments. OK ZFS is COW, I understand, but, this also means
 a waste of valuable space of my L2ARC SSD device, more than 60% of the space
 is consumed by COW !!!. I do not get it ?

The rest can and will be used if L2ARC needs it. It's not wasted, it's
just a number that doesn't match what you think it should be.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC Workingset Size

2010-04-03 Thread Tomas Ögren

On 02 April, 2010 - Abdullah Al-Dahlawi sent me these 128K bytes:

 Hi all
 
 I ran a workload that reads  writes within 10 files each file is 256M, ie,
 (10 * 256M = 2.5GB total Dataset Size).
 
 I have set the ARC max size to 1 GB  on  etc/system file
 
 In the worse case, let us assume that the whole dataset is hot, meaning my
 workingset size= 2.5GB
 
 My SSD flash size = 8GB and being used for L2ARC
 
 No slog is used in the pool
 
 My File system record size = 8K , meaning 2.5% of 8GB is used for L2ARC
 Directory in ARC. which ultimately mean that available ARC is 1024M - 204.8M
 = 819.2M Available ARC  (Am I Right ?)

Seems about right.

 Now the Question ...
 
 After running the workload for 75 minutes, I have noticed that L2ARC device
 has grown to 6 GB !!!   

No, 6GB of the area has been touched by Copy on Write, not all of it is
in use anymore though.

 What is in L2ARC beyond my 2.5GB Workingset ?? something else is has been
 added to L2ARC 

[ snip lots of data ]

This is your last one:

 module: zfs instance: 0
 name:   arcstatsclass:misc
 c   1073741824
 c_max   1073741824
 c_min   134217728
[...]
 l2_size 2632226304
 l2_write_bytes  6486009344

Roughly 6GB has been written to the device, and slightly less than 2.5GB
is actually in use.

 p   775528448

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Change a zpool to raidz

2010-03-12 Thread Tomas Ögren

On 12 March, 2010 - Erik Trimble sent me these 0,7K bytes:

 Ian Garbutt wrote:
 I was wondering if there is any way of converting a zpool which only have 
 one LUN in there to a raidz zpool that was 3 or more LUNS in it?

 Thanks
   
 No.

 Adding, removing, or otherwise changing disks in a RAIDZ is not possible  
 without destroying data in the pool.


 You'll have to copy the data from the single LUN pool somewhere else,  
 destroy the pool, then recreate it as a RAIDZ with the 3 LUNs.

What you can do is:
Create a new pool with lun2,lun3 and a sparse file the same size as
lun23.
Get rid of the file.
Copy data over from lun1 (old single lun thing) to the raidz
  (lun2,lun3,missingfile)
Destroy old pool
replace missingfile with lun1

With this method, the pool is lacking redundancy between step 4 and 5,
but requires no extra space.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] full backup == scrub?

2010-03-08 Thread Tomas Ögren

On 08 March, 2010 - Chris Banal sent me these 0,8K bytes:

 Assuming no snapshots. Do full backups (ie. tar or cpio) eliminate the need
 for a scrub?

No, it won't read redundant copies of the data, which a scrub will.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-08 Thread Tomas Ögren

On 08 March, 2010 - Miles Nordin sent me these 1,8K bytes:

  gm == Gary Mills mi...@cc.umanitoba.ca writes:
 
 gm destroys the oldest snapshots and creates new ones, both
 gm recursively.
 
 I'd be curious if you try taking the same snapshots non-recursively
 instead, does the pause go away?  

According to my testing, that would give you a much longer period of
slightly slower, but shorter period of per filesystem
reallyslowness, given recursive snapshots over lots of independent
filesystems.

 Because recursive snapshots are special: they're supposed to
 atomically synchronize the cut-point across all the filesystems
 involved, AIUI.  I don't see that recursive destroys should be
 anything special though.

From my experiences on a homedir file server with about 700 filesystems
and ~65 snapshots on each, giving about 45k snapshots.. In the
beginning, the snapshots took zero time to create.. Now when we have
snapshots spanning over a year, it's not as fast.

We then turned to only doing daily snapshots (for online backups in
addition to regular backups), but they could take up to 45 minutes
sometimes with regular nfs work being abysmal. So we started tuning
some stuff, and doing hourly snapshots actually helped (probably keeping
some data structures warm in ARC). Down to 2-3 minutes or so for a
recursive snapshot. So we tried adding 2x 4GB USB sticks (Kingston Data
Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the
snapshot times down to about 30 seconds.

http://www.acc.umu.se/~stric/tmp/snaptimes.png

y axis is mmss, so a value of 450 is 4 minutes, 50 seconds.. not all
linear ;)  x axis is just snapshot number, higher == newer..
Large spikes are snapshots at the same time as daily backups.
In snapshot 67..100 in the picture, I removed the L2ARC USB sticks and
the times increased and started fluctuating.. I'll give it a few days
and put the L2ARC back.. Even cheap $10 USB sticks can help it seems.

 gm Is it destroying old snapshots or creating new ones that
 gm causes this dead time?
 
 sortof seems like you should tell us this, not the other way
 around. :)  Seriously though, isn't that easy to test?  And I'm curious
 myself too.



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-08 Thread Tomas Ögren

On 08 March, 2010 - Bill Sommerfeld sent me these 0,4K bytes:

 On 03/08/10 12:43, Tomas Ögren wrote:
 So we tried adding 2x 4GB USB sticks (Kingston Data
 Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the
 snapshot times down to about 30 seconds.

 Out of curiosity, how much physical memory does this system have?

System Memory:
 Physical RAM:  6134 MB
 Free Memory :  190 MB
 LotsFree:  94 MB

ARC Size:
 Current Size: 1890 MB (arcsize)
 Target Size (Adaptive):   2910 MB (c)
 Min Size (Hard Limit):638 MB (zfs_arc_min)
 Max Size (Hard Limit):5110 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  67%1959 MB (p)
 Most Frequently Used Cache Size:32%950 MB (c-p)


It does some mail server stuff as well.

The two added USB sticks grew to about 3.2GB of metadata L2ARC, totally
about 6.5M files on the system.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Wildcards to zfs list

2010-03-07 Thread Tomas Ögren

On 07 March, 2010 - David Dyer-Bennet sent me these 1,1K bytes:

 There isn't some syntax I'm missing to use wildcards in zfs list to list  
 snapshots, is there?  I find nothing in the man page, and nothing I've  
 tried works (yes, I do understand that normally wildcards are expanded  
 by the shell, and I don't expect bash to have zfs-specific stuff like  
 that in it by default).  Given that bash passes through wildcards that  
 don't expand to anything (or you can always force it with quoting), zfs  
 list *could* use those to filter the snapshot list; that would be  
 convenient.  In the meantime, I can split what the user enters at the  
 @ and use grep to filter the output from zfs list.

zfs list -t snapshot  ?

Add  -o name -H   if you only want the names..

 (I'm running 2009.06, which is based on snv_111b, so if this capability  
 has appeared since then in some form, I'd really like to know; I'll be  
 updating to the next stable release.)

 -- 
 David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
 Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
 Photos: http://dd-b.net/photography/gallery/
 Dragaera: http://dragaera.info

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to verify ecc for ram is active and enabled?

2010-03-03 Thread Tomas Ögren

On 03 March, 2010 - casper@sun.com sent me these 0,8K bytes:

 
 Is there a method to view the status of the rams ecc single or double bit 
 errors? I would like to 
 confirm that ecc on my xeon e5520 and ecc ram are performing their role since 
 memtest is ambiguous.
 
 
 I am running memory test on a p6t6 ws, e5520 xeon, 2gb samsung ecc modules 
 and this is what is on 
 the screen:
 
 Chipset: Core IMC (ECC : Detect / Correct)
 
 However, further down ECC is identified as being off. Yet there is a 
 column for ECC Errs.
 
 I don't know how to interpret this. Is ECC active or not?
 
 Off but only disabled by memtest, I believe.

Memtest doesn't want potential errors to be hidden by ECC, so it
disables ECC to see them if they occur.

 
 You can enable it in the memtest menu.
 
 Casper
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Tomas Ögren

On 02 March, 2010 - Carson Gaspar sent me these 0,5K bytes:

 I strongly suggest that folks who are thinking about this examine what  
 NetApp does when exporting NTFS security model qtrees via NFS. It  
 constructs a mostly bogus set of POSIX permission info based on the ACL.  
 All access is enforced based on the actual ACL. Sadly for NFSv3 clients  
 there is no way to see what the actual ACL is, but it is properly 
 enforced.

ZFS recently stopped doing something similar to this (faking POSIX draft
ACLs), because it can cause data (ACL) corruption.

Client sees a faked ACL over NFS, modifies it and sends it back..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Tomas Ögren

On 26 February, 2010 - Ronny Egner sent me these 0,6K bytes:

 Dear All,
 
 our storage system running opensolaris b133 + ZFS has a lot of memory for 
 caching. 72 GB total. While testing we observed free memory never falls below 
 11 GB.
 
 Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
 shortly after (i assume ARC cache is shrunken in this context).
 
 As far as i know ZFS is designed to use all memory except 1 GB for caching

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_init

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_reclaim_needed


So you have a max limit which it won't try to go past, but also a keep
this much free for the rest of the system. Both are a bit too
protective for a pure ZFS/NFS server in my opinion (but can be tuned).

You can check most variables with f.ex:
echo freemem/D | mdb -k


On one server here, I have in /etc/system:

* 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
* about 7.8*1024*1024*1024, must be  physmem*pagesize  
(206*4096=8446861312 right now)
set zfs:zfs_arc_max = 835000
set zfs:zfs_arc_meta_limit = 70
* some tuning
set ncsize = 50
set nfs:nrnode = 5


And I've done runtime modifications to swapfs_minfree to force usage of another
chunk of memory.


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Tomas Ögren

On 26 February, 2010 - Lutz Schumann sent me these 2,2K bytes:

 Hello list, 
 
 ZFS can be used in both file level (zfs) and block level access (zvol). When 
 using zvols, those are always thin provisioned (space is allocated on first 
 write). We use zvols with comstar to do iSCSI and FC access - and exuse me in 
 advance - but this may also be a more comstar related question then.
 
 When reading from a freshly created zvol, no data comes from disk. All reads 
 are satisfied by ZFS and comstar returns 0's (I guess) for all reads. 
 
 Now If a virtual machine writes to the zvol, blocks are allocated on disk. 
 Reads are now partial from disk (for all blocks written) and from ZFS layer 
 (all unwritten blocks). 
 
 If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / 
 frees space within the zvol, this also means a write - usually in meta data 
 area only. Thus the underlaying Storage system does not know which blocks in 
 a zvol are really used.
 
 So reducing size in zvols is really difficult / not possible. Even if one 
 deletes everything in guest, the blocks keep allocated. If one zeros out all 
 blocks, even more space is allocated. 
 
 For the same purpose TRIM (ATA) / PUNCH (SCSI) has bee introduced. With this 
 commands the guest can tell the storage, which blocks are not used anymore. 
 Those commands are not available in Comstar today :(
 
 However I had the idea that comstar can get the same result in the way vmware 
 did it some time ago with vmware tools. 
 
 Idea: 
   - If the guest writes a block with 0's only, the block is freed again
   - if someone reads this block again - it wil get the same 0's it would get 
 if the 0's would be written 
- The checksum of a all 0 block dan be hard coded for SHA1 / Flecher, so 
 the comparison for is this a 0 only block is easy.
 
 With this in place, a host wishing to free thin provisioned zvol space can 
 fill the unused blocks wirth 0s easity with simple tools (e.g. dd 
 if=/dev/zero of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on 
 the zvol side. 
 
 Does anyone know why this is not incorporated into ZFS ?

What you can do until this is to enable compression (like lzjb) on the
zvol, then do your dd dance in the client, then you can disable the
compression again.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Observations about compressability of metadata L2ARC

2010-02-21 Thread Tomas Ögren

Hello.

I got an idea.. How about creating an ramdisk, making a pool out of it,
then making compressed zvols and add those as l2arc.. Instant compressed
arc ;)

So I did some tests with secondarycache=metadata...

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
ftp 5.07T  1.78T198 17  11.3M  1.51M
  raidz21.72T   571G 58  5  3.78M   514K
...
  raidz21.64T   656G 75  6  3.78M   524K
...
  raidz21.70T   592G 64  5  3.74M   512K
...
cache   -  -  -  -  -  -
  /dev/zvol/dsk/ramcache/ramvol  84.4M  7.62M  4 17  45.4K 233K
  /dev/zvol/dsk/ramcache/ramvol2  84.3M  7.71M  4 17  41.5K 233K
  /dev/zvol/dsk/ramcache/ramvol384M 8M  4 18  42.0K 236K
  /dev/zvol/dsk/ramcache/ramvol4  84.8M  7.25M  3 17  39.1K 225K
  /dev/zvol/dsk/ramcache/ramvol5  84.9M  7.08M  3 14  38.0K 193K

NAME  RATIO  COMPRESS
ramcache/ramvol   1.00x   off
ramcache/ramvol2  4.27x  lzjb
ramcache/ramvol3  6.12xgzip-1
ramcache/ramvol4  6.77x  gzip
ramcache/ramvol5  6.82xgzip-9

This was after 'find /ftp' had been running for about 1h, along with all
the background noise of its regular nfs serving tasks.

I took an image of the uncompressed one (ramvol) and ran that through
regular gzip and got 12-14x compression, probably due to smaller block
size (default 8k) in the zvols.. So I tried with both 8k and 64k..

After not running that long (but at least filled), I got:

NAME  RATIO  COMPRESS  VOLBLOCK
ramcache/ramvol   1.00x   off8K
ramcache/ramvol2  5.57x  lzjb8K
ramcache/ramvol3  7.56x  lzjb   64K
ramcache/ramvol4  7.35xgzip-18K
ramcache/ramvol5  11.68xgzip-1   64K


Not sure how to measure the cpu usage of the various compression levels
for (de)compressing this data..  It does show that having metadata in
ram compressed could be a big win though, if you have cpu cycles to
spare..

Thoughts?


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se - 070-5858487
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] l2arc current usage (population size)

2010-02-21 Thread Tomas Ögren

On 21 February, 2010 - Felix Buenemann sent me these 0,7K bytes:

 Am 20.02.10 03:22, schrieb Tomas Ögren:
 On 19 February, 2010 - Christo Kutrovsky sent me these 0,5K bytes:
 How do you tell how much of your l2arc is populated? I've been looking for 
 a while now, can't seem to find it.

 Must be easy, as this blog entry shows it over time:

 http://blogs.sun.com/brendan/entry/l2arc_screenshots

 And follow up, can you tell how much of each data set is in the arc or 
 l2arc?

 kstat -m zfs
 (p, c, l2arc_size)

 arc_stat.pl is good, but doesn't show l2arc..

 zpool iostat -v poolname would also do the trick for l2arc.

No, it will show how much of the disk has been visited (dirty blocks)
but not how much it occupies right now. At least very obvious difference
if you add a zvol as cache..

If it had supported TRIM or similar, they would probably be about the
same though.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] l2arc current usage (population size)

2010-02-21 Thread Tomas Ögren

On 21 February, 2010 - Richard Elling sent me these 1,3K bytes:

 On Feb 21, 2010, at 9:18 AM, Tomas Ögren wrote:
 
  On 21 February, 2010 - Felix Buenemann sent me these 0,7K bytes:
  
  Am 20.02.10 03:22, schrieb Tomas Ögren:
  On 19 February, 2010 - Christo Kutrovsky sent me these 0,5K bytes:
  How do you tell how much of your l2arc is populated? I've been looking 
  for a while now, can't seem to find it.
  
  Must be easy, as this blog entry shows it over time:
  
  http://blogs.sun.com/brendan/entry/l2arc_screenshots
  
  And follow up, can you tell how much of each data set is in the arc or 
  l2arc?
  
  kstat -m zfs
  (p, c, l2arc_size)
  
  arc_stat.pl is good, but doesn't show l2arc..
  
  zpool iostat -v poolname would also do the trick for l2arc.
  
  No, it will show how much of the disk has been visited (dirty blocks)
  but not how much it occupies right now. At least very obvious difference
  if you add a zvol as cache..
  
  If it had supported TRIM or similar, they would probably be about the
  same though.
 
 Don't confuse the ZIL with L2ARC.  TRIM will do little for L2ARC devices.

I was mostly thinking about the telling the backing device that block X
isn't in use anymore, not the performance part.. If I have an L2ARC
backed by a zvol without compression, the used size will grow until
it's full, even if L2ARC doesn't use all of it currently.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] l2arc current usage (population size)

2010-02-19 Thread Tomas Ögren

On 19 February, 2010 - Christo Kutrovsky sent me these 0,5K bytes:

 Hello,
 
 How do you tell how much of your l2arc is populated? I've been looking for a 
 while now, can't seem to find it.
 
 Must be easy, as this blog entry shows it over time:
 
 http://blogs.sun.com/brendan/entry/l2arc_screenshots
 
 And follow up, can you tell how much of each data set is in the arc or l2arc?

kstat -m zfs
(p, c, l2arc_size)

arc_stat.pl is good, but doesn't show l2arc..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance - napp-it + benchmarks

2010-02-18 Thread Tomas Ögren

On 18 February, 2010 - Günther sent me these 1,1K bytes:

 hellobr
 there is a new beta v. 0.220 of napp-it, the free webgui for nexenta(core) 3
 br
 new:br
 -bonnie benchmarks included  a href=http://www.napp-it.org/bench.png; 
 target=_blanksee screenshot/abr
 -bug fixesbr
 br
 if you look at the benchmark screenshot:br
 -pool daten: zfs3 of 7 x wd 2TB raid edition (WD2002FYPS), dedup and compress 
 enabledbr
 -pool z3ssdcache: zfs3 of 4 sas Seagate 15k/s (ST3146855SS)  edition, 
   dedup and compress enabled + ssd read cache (supertalent ultradrive 64GB)
 br
 i was surprised about the seqential write/ rewrite result.
 the wd 2 TB drives performs very well only in sequential write of characters 
 but are horrible bad in blockwise write/ rewrite
 the 15k sas drives with ssd read cache performs 20 x better (10MB/s - 200 
 MB/s)  

Most probably due to lack of ram to hold the dedup tables, which your
second version fixes with an l2arc.

Try the same test without dedup or same l2arc in both, instead of
comparing apples to canoes.

 brbr
 downlaod:br
 http://www.napp-it.orgbr
 br
 howto setupbr
 http://www.napp-it.org/napp-it.pdfbr
 br
 
 gea
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] improve meta data performance

2010-02-18 Thread Tomas Ögren

On 18 February, 2010 - Chris Banal sent me these 1,8K bytes:

 We have a SunFire X4500 running Solaris 10U5 which does about 5-8k nfs ops
 of which about 90% are meta data. In hind sight it would have been
 significantly better  to use a mirrored configuration but we opted for 4 x
 (9+2) raidz2 at the time. We can not take the downtime necessary to change
 the zpool configuration.
 
 We need to improve the meta data performance with little to no money. Does
 anyone have any suggestions? Is there such a thing as a Sun supported NVRAM
 PCI-X card compatible with the X4500 which can be used as an L2ARC?

See if it helps sticking a few cheap USB sticks in there, and set
secondarycache=metadata.. For instance Kingston DT Slim Mini are not
that bad performers and cost close to nothing. I've got two in a server
here, and reading random 4k blocks they do 1500 iops each which is
probably more than your current disks. Or if you can stick an Intel
X25-M/E in there through SATA/SAS.

You can add/remove L2ARCs at will and they don't need to be 100%
reliable either, so if you add several of them they will be raid0'd for
performance.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

2010-02-03 Thread Tomas Ögren

On 03 February, 2010 - Frank Cusack sent me these 0,7K bytes:

 On February 3, 2010 12:04:07 PM +0200 Henu henrik.he...@tut.fi wrote:
 Is there a possibility to get a list of changed files between two
 snapshots? Currently I do this manually, using basic file system
 functions offered by OS. I scan every byte in every file manually and it
   ^^^

 On February 3, 2010 10:11:01 AM -0500 Ross Walker rswwal...@gmail.com  
 wrote:
 Not a ZFS method, but you could use rsync with the dry run option to list
 all changed files between two file systems.

 That's exactly what the OP is already doing ...

rsync by default compares metadata first, and only checks through every
byte if you add the -c (checksum) flag.

I would say rsync is the best tool here.

The find -newer blah suggested in other posts won't catch newer files
with an old timestamp (which could happen for various reasons, like
being copied with kept timestamps from somewhere else).

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compressed ration inconsistency

2010-02-01 Thread Tomas Ögren

On 01 February, 2010 - antst sent me these 0,6K bytes:

 Probably I'm missing something here, but what I see on my system
 
 zfs list -o used,ratio,compression,name export/home/user
 89.6G  2.86xgzip-4  export/home/user
 
 cmsmaster ~ # du -hs /export/home/user/
 90G /export/home/user/
 
 du -hsb /export/home/user/
 380781942931/export/home/user/
 
 89.6G*2.86=256.26G is way too far from 354.63G reported by du.
 
 What's wrong?

   -b, --bytes
  equivalent to `--apparent-size --block-size=1'

   --apparent-size
  print apparent sizes,  rather  than  disk  usage;

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS ARC

2010-02-01 Thread Tomas Ögren

On 01 February, 2010 - tester sent me these 0,4K bytes:

 Hi,
 
 I have heard references to ARC releasing memory when the demand is
 high. Can someone please point me to the code path from the point of
 such a detection to ARC release?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_reclaim_needed

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Filesystem Quotas

2010-01-20 Thread Tomas Ögren

On 20 January, 2010 - Mr. T Doodle sent me these 1,0K bytes:

 I currently have one filesystem / (root), is it possible to put a quota on
 let's say /var? Or would I have to move /var to it's own filesystem in the
 same pool?

Only filesystems can have different settings.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Tomas Ögren

On 20 January, 2010 - Richard Elling sent me these 2,7K bytes:

 Hi Lutz,
 
 On Jan 20, 2010, at 3:17 AM, Lutz Schumann wrote:
 
  Hello, 
  
  we tested clustering with ZFS and the setup looks like this: 
  
  - 2 head nodes (nodea, nodeb)
  - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)
 
 This makes me nervous. I suspect this is not in the typical QA 
 test plan.
 
  - two external jbods
  - two mirror zpools (pool1,pool2)
- each mirror is a mirror of one disk from each jbod
  - no ZIL (anyone knows a well priced SAS SSD ?)
  
  We want active/active and added the l2arc to the pools. 
  
  - pool1 has nodea_l2arc as cache
  - pool2 has nodeb_l2arc as cache
  
  Everything is great so far. 
  
  One thing to node is that the nodea_l2arc and nodea_l2arc are named equally 
  ! (c0t2d0 on both nodes).
  
  What we found is that during tests, the pool just picked up the device 
  nodeb_l2arc automatically, altought is was never explicitly added to the 
  pool pool1.
 
 This is strange. Each vdev is supposed to be uniquely identified by its GUID.
 This is how ZFS can identify the proper configuration when two pools have 
 the same name. Can you check the GUIDs (using zdb) to see if there is a
 collision?

Reproducable:

itchy:/tmp/blah# mkfile 64m 64m disk1
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# zpool create blah /tmp/blah/disk1 
itchy:/tmp/blah# zpool add blah cache /dev/zvol/dsk/rpool/blahcache 
itchy:/tmp/blah# zpool status blah
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zpool export blah
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache 

LABEL 0

version=15
state=4
guid=6931317478877305718

itchy:/tmp/blah# zfs destroy rpool/blahcache
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# dd if=/dev/zero of=/dev/zvol/dsk/rpool/blahcache bs=1024k 
count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.559299 seconds, 120 MB/s
itchy:/tmp/blah# zpool import -d /tmp/blah
  pool: blah
id: 16691059548146709374
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

blah ONLINE
  /tmp/blah/disk1ONLINE
cache
  /dev/zvol/dsk/rpool/blahcache
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0


LABEL 1


LABEL 2


LABEL 3

itchy:/tmp/blah# zpool import -d /tmp/blah blah
itchy:/tmp/blah# zpool status
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0

version=15
state=4
guid=6931317478877305718
...


It did indeed overwrite my formerly clean blahcache.

Smells like a serious bug.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se

  -- richard
 
  We had a setup stage when pool1 was configured on nodea with nodea_l2arc 
  and pool2 was configured on nodeb without a l2arc. Then we did a failover. 
  Then pool1 pickup up the (until then) unconfigured nodeb_l2arc. 
  
  Is this intended ? Why is a L2ARC device automatically picked up if the 
  device name is the same ? 
  
  In a later stage we had both pools configured with the corresponding l2arc 
  device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). 
  Then we also did a failover. The l2arc device of the pool failing over was 
  marked as too many corruptions instead of missing. 
  
  So from this tests it looks like ZFS just picks up the device with the same 
  name and replaces the l2arc without looking at the device signatures to 
  only consider devices beeing part of a pool.
  
  We have not tested with a data disk as c0t2d0 but if the same behaviour

Re: [zfs-discuss] need a few suggestions for a poor man's ZIL/SLOG device

2010-01-06 Thread Tomas Ögren

On 06 January, 2010 - Thomas Burgess sent me these 5,8K bytes:

 I think the confusing part is that the 64gb version seems to use a different
 controller all together

It does.

 I couldn't find any SNV125-S2/40's in stock so i got 3 SNV125-S2/64's
 thinking it would be the same,m only bigger.looks like it was stupid on
 my part.
 
 now i understand why i got such a good deal.
 well i have yet to try them...maybe they won't be so bad...on newegg they
 get a lot of good ratings.
 either way i doubt using them for the rpool will hurt me...just a little
 more expensive than the compact flash cards i was going to get.

I've ordered a 40G which should be coming in a week or so, I'll do some
ZIL/L2ARC testing with it and report back.

Random 4k writes seems to be quite alright;
http://benchmarkreviews.com/index.php?option=com_contenttask=viewid=392Itemid=60limit=1limitstart=6

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file expiration date/time

2009-12-30 Thread Tomas Ögren

On 30 December, 2009 - Dennis Yurichev sent me these 0,7K bytes:

 Hi.
 
 Why each file can't have also expiration date/time field, e.g.,
 date/time when operation system will delete it automatically?
 This could be usable for backups, camera raw files, internet browser
 cached files, etc.

Using extended attributes + cron, you could provide the same service
yourself and other similar (or not) things people would like to do
without developers providing it for you in the fs..

Start at 'man fsattr'

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FW: ARC not using all available RAM?

2009-12-21 Thread Tomas Ögren

On 21 December, 2009 - Tristan Ball sent me these 4,5K bytes:

Richard Elling wrote:

On Dec 20, 2009, at 12:25 PM, Tristan Ball wrote:

I've got an opensolaris snv_118 machine that does nothing except
serve up NFS and ISCSI.

The machine has 8G of ram, and I've got an 80G SSD as L2ARC.
The ARC on this machine is currently sitting at around 2G, the kernel
is using around 5G, and I've got about 1G free.
...
What I'm trying to find out is is my ARC relatively small because...

1) ZFS has decided that that's all it needs (the workload is fairly
random), and that adding more wont gain me anything..
2) The system is using so much ram for tracking the L2ARC, that the ARC
is being shrunk (we've got an 8K record size)
3) There's some other memory pressure on the system that I'm not aware
of that is periodically chewing up then freeing the ram.
4) There's some other memory management feature that's insisting on that
1G free.

My bet is on #4 ...

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_reclaim_needed

See line 1956 .. I tried some tuning on a pure nfs server (although
s10u8) here, and got it to use a bit more of the last 1GB out of 8G..
I think it was swapfs_minfree that I poked with a sharp stick. No idea
if anything else that relies on it could break, but the machine has been
fine for a few weeks here now and using more memory for ARC.. ;)

Re: [zfs-discuss] zfs allow - internal error

2009-12-09 Thread Tomas Ögren

On 09 December, 2009 - Andrew Robert Nicols sent me these 1,6K bytes:

 I've just done a fresh install of Solaris 10 u8 (2009.10) onto a Thumper.
 Running zfs allow gives the following delightful output:
 
 -bash-3.00$ zfs allow
 internal error: /usr/lib/zfs/pyzfs.py not found
 
 I've confirmed it on a second thumper, also running Solaris 10 u8 installed
 about 2 months ago.
 
 Has anyone else seen this?

Yes. You haven't got SUNWPython installed, which is wrongly marked as
belonging to the GNOME2 cluster. Install SUNWPython and SUNWPython-share
and it'll work. Some ZFS stuff (userspace, allow, ..) started using
python in u8.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ARC Ghost lists, why have them and how much ram is used to keep track of them? [long]

2009-11-28 Thread Tomas Ögren

Hello.

We have a file server running S10u8 which is a disk backend to a caching
ftp/http frontend cluster (homebrew) which currently has about 4.4TB
of data which obviously doesn't fit in the 8GB of ram the machine has.

arc_summary currently says:
System Memory:
 Physical RAM:  8055 MB
 Free Memory :  1141 MB
 LotsFree:  124 MB
ARC Size:
 Current Size: 3457 MB (arcsize)
 Target Size (Adaptive):   3448 MB (c)
 Min Size (Hard Limit):878 MB (zfs_arc_min)
 Max Size (Hard Limit):7031 MB (zfs_arc_max)
ARC Size Breakdown:
 Most Recently Used Cache Size:  93%3231 MB (p)
 Most Frequently Used Cache Size: 6%217 MB (c-p)
...
CACHE HITS BY CACHE LIST:
  Anon:3%377273490  [ New 
Customer, First Cache Hit ]
  Most Recently Used:  9%1005243026 (mru)   [ 
Return Customer ]
  Most Frequently Used:   81%9113681221 (mfu)   [ 
Frequent Customer ]
  Most Recently Used Ghost:2%284232070 (mru_ghost)  [ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  3%361458550 (mfu_ghost)  [ 
Frequent Customer Evicted, Now Back ]

And some info from echo ::arc | mdb -k:
arc_meta_used =  2863 MB
arc_meta_limit=  3774 MB
arc_meta_max  =  4343 MB


Now to the questions.. As I've understood it, ARC keeps a list of newly
evicted data from the ARC in the ghost lists, for example to be used
for L2ARC (or?).

In mdb -k:
 ARC_mfu_ghost::print
...
arcs_lsize = [ 0x2341ca00, 0x4b61d200 ]
arcs_size = 0x6ea39c00
...
 ARC_mru_ghost::print
arcs_lsize = [ 0x65646400, 0xd24e00 ]
arcs_size = 0x6636b200
 ARC_mru::print
arcs_lsize = [ 0x2b9ae600, 0x38646e00 ]
arcs_size = 0x758ae800
 ARC_mfu::print
arcs_lsize = [ 0, 0x4d200 ]
arcs_size = 0x1043a000

Does this mean that currently, 1770MB+1635MB is wasted just for
statistics, and 1880+260MB is used for actual cached data, or does these
numbers just refer to how much data they keep stats for?

So basically, what is the point of the ghost lists and how much ram are
they actually using?

Also, since this machine just has 2 purposes in life - sharing data over
nfs and taking backups of the same data, I'd like to get those 1141MB of
free memory to be actually used.. Can I set zfs_arc_max (can't find
any runtime tunable, only /etc/system one, right?) to 8GB. If it runs
out of memory, it'll set no_grow and shrink a little, right?

Currently, data can use all of ARC if it wants, but metadata can use a
maximum of $arc_meta_max. Since there's no chance of caching all of the
data, but there's a high chance of caching a large proportion of the
metadata, I'd like reverse limits; limit data size to 1GB or so (due
to buffers currently being handled, setting primarycache=metadata will
give crap performance in my testing) and let metadata take as much as
it'd like.. Is there a chance of getting something like this?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rquota didnot show userquota (Solaris 10)

2009-11-26 Thread Tomas Ögren

On 26 November, 2009 - Willi Burmeister sent me these 1,7K bytes:

 Hi,
 
 we have a new fileserver running on X4275 hardware with Solaris 10U8.
 
 On this fileserver we created one test dir with quota and mounted these
 on another Solaris 10 system. Here the quota command didnot show the 
 used quota. Does this feature only work with OpenSolaris or is it 
 intended to work on Solaris 10?

ZFS userspace quota doesn't support rquotad reporting. (.. yet?)

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Tomas Ögren

On 20 October, 2009 - Matthew Ahrens sent me these 2,2K bytes:

 The user/group used can be out of date by a few seconds, same as the 
 used and referenced properties.  You can run sync(1M) to wait for 
 these values to be updated.  However, that doesn't seem to be the problem 
 you are encountering here.

 Can you send me the output of:

 zfs list zpool1/sd01_mail
 zfs get all zpool1/sd01_mail
 zfs userspace -t all zpool1/sd01_mail
 ls -ls /export/sd01/mail
 zdb -vvv zpool1/sd01_mail

On a related note, there is a way to still have quota used even after
all files are removed, S10u8/SPARC:

# zfs create rpool/quotatest
# zfs set userqu...@stric=5m rpool/quotatest
# zfs userspace -t all rpool/quotatest
TYPE NAME   USED  QUOTA
POSIX Group  root 3K   none
POSIX User   root 3K   none
POSIX User   stric 0 5M
# chmod a+rwt /rpool/quotatest

stric% cd /rpool/quotatest;tar jxvf /somewhere/gimp-2.2.10.tar.bz2
... wait and it will start getting Disc quota exceeded, might have to
help it by running 'sync' in another terminal
stric% sync
stric% rm -rf gimp-2.2.10
stric% sync
... now it's all empty.. but...

# zfs userspace -t all rpool/quotatest
TYPE NAME   USED  QUOTA
POSIX Group  root 3K   none
POSIX Group  tdb  3K   none
POSIX User   root 3K   none
POSIX User   stric3K 5M

Can be repeated for even more lost blocks, I seem to get between 3 and
5 kB each time. I tried this last night, and when I got back in the
morning, it had gone down to zero again. Haven't done any more verifying
than that.

It doesn't seem to trigger if I just write a big file with dd which gets
me into DQE, but unpacking a tarball seems to trigger it. My tests has
been as above.

Output from all of the above + zfs list, zfs get all, zfs userspace, ls
-l and zdb -vvv is at:
http://www.acc.umu.se/~stric/tmp/zfs-userquota.txt

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Tomas Ögren

On 20 October, 2009 - Matthew Ahrens sent me these 0,7K bytes:

 Tomas Ögren wrote:
 On a related note, there is a way to still have quota used even after
 all files are removed, S10u8/SPARC:

 In this case there are two directories that have not actually been 
 removed. They have been removed from the namespace, but they are still 
 open, eg due to some process's working directory being in them.

Only a few processes in total were involved in this dir.. cd into the
fs, untar the tarball, remove it all, cd out, run sync. Quota usage
still remains.

 This is confirmed by your zdb output, there are 2 directories on the
 delete queue. You can force it to be flushed by unmounting and
 re-mounting your filesystem.

.. which isn't such a good workaround for a busy home directory server
which I will use this in shortly...

I have to say a big thank you for this userquota anyway, because I tried
the one fs per user way first, and it just didn't scale to our 3-4000
users, but I still want to use ZFS.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Interesting bug with picking labels when expanding a slice where a pool lives

2009-10-19 Thread Tomas Ögren

Hi.

We've got some test machines which amongst others has zpools in various
sizes and placements scribbled all over the disks.

0. HP DL380G3, Solaris10u8, 2x16G disks; c1t0d0  c1t1d0
1. Took a (non-emptied) disk, created a 2GB slice0 and a ~14GB (to the
   last cyl) slice7.
2. zpool create striclek c1t1d0s0
3. zdb -l /dev/rdsk/c1t1d0s0  shows 4 labels, each with the same guid
   and only c1t1d0s0 as vdev. All is well.
4. format, increase slice0 from 2G to 16G. remove slice7. label.
5. zdb -l /dev/rdsk/c1t1d0s0 shows 2 labels from the correct guid 
   c1t1d0s0, it also shows 2 labels from some old guid (from an rpool
   which was abandoned long ago) belonging to a
   mirror(c1t0d0s0,c1t1d0s0). c1t0d0s0 is current boot disk with other
   rpool and other guid.
6. zpool export striclek;zpool import  shows guid from the working
   pool, but that it's missing devices (although only lists c1t1d0s0 -
   ONLINE)
7. zpool import striclek  doesn't work. zpool import theworkingguid
   doesn't work.

If I resize the slice back to 2GB, all 4 labels shows the workingguid
and import works again.

Questions:
* Why does 'zpool import' show the guid from label 0/1, but wants vdev
  conf as specified by label 2/3?
* Is there no timestamp or such, so it would prefer label 0/1 as they
  are brand new and ignore label 2/3 which are waaay old.


I can agree to being forced to scribble zeroes/junk all over the slice7
space which we're expanding to in step 4.. But stuff shouldn't fail
this way IMO.. Maybe comparing timestamps and see that label 2/3 aren't
so hot anymore and ignore them, or something..

zdb -l and zpool import dumps at:
http://www.acc.umu.se/~stric/tmp/zdb-dump/

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting bug with picking labels when expanding a slice where a pool lives

2009-10-19 Thread Tomas Ögren

On 19 October, 2009 - Cindy Swearingen sent me these 2,4K bytes:

 Hi Tomas,

 I think you are saying that you are testing what happens when you  
 increase a slice under a live ZFS storage pool and then reviewing
 the zdb output of the disk labels.

 Increasing a slice under a live ZFS storage pool isn't supported and
 might break your pool.

It also happens on a non-live pool, that is, if I export, increase the
slice and then try to import.
r...@ramses:~# zpool export striclek
r...@ramses:~# format
Searching for disks...done
... increase c1t1d0s0 
r...@ramses:~# zpool import striclek
cannot import 'striclek': one or more devices is currently unavailable

.. which is the way to increase a pool within a disk/device if I'm not
mistaken.. Like if the storage comes off a SAN and you resize the LUN..

 I think you are seeing some remnants of some old pools on your slices
 with zdb since this is how zpool import is able to import pools that
 have been destroyed.

Yep, that's exactly what I see. The issue is that the newgood labels
aren't trusted anymore (it also looks at old ones) and also that zpool
import picks information from different labels and presents it as one
piece of info.

If I was using some SAN and my lun got increased, and the new storage
space had some old scrap data on it, I could get hit by the same issue.

 Maybe I missed the point. Let me know.


 Cindy

 On 10/19/09 12:41, Tomas Ögren wrote:
 Hi.

 We've got some test machines which amongst others has zpools in various
 sizes and placements scribbled all over the disks.

 0. HP DL380G3, Solaris10u8, 2x16G disks; c1t0d0  c1t1d0
 1. Took a (non-emptied) disk, created a 2GB slice0 and a ~14GB (to the
last cyl) slice7.
 2. zpool create striclek c1t1d0s0
 3. zdb -l /dev/rdsk/c1t1d0s0  shows 4 labels, each with the same guid
and only c1t1d0s0 as vdev. All is well.
 4. format, increase slice0 from 2G to 16G. remove slice7. label.
 5. zdb -l /dev/rdsk/c1t1d0s0 shows 2 labels from the correct guid 
c1t1d0s0, it also shows 2 labels from some old guid (from an rpool
which was abandoned long ago) belonging to a
mirror(c1t0d0s0,c1t1d0s0). c1t0d0s0 is current boot disk with other
rpool and other guid.
 6. zpool export striclek;zpool import  shows guid from the working
pool, but that it's missing devices (although only lists c1t1d0s0 -
ONLINE)
 7. zpool import striclek  doesn't work. zpool import theworkingguid
doesn't work.

 If I resize the slice back to 2GB, all 4 labels shows the workingguid
 and import works again.

 Questions:
 * Why does 'zpool import' show the guid from label 0/1, but wants vdev
   conf as specified by label 2/3?
 * Is there no timestamp or such, so it would prefer label 0/1 as they
   are brand new and ignore label 2/3 which are waaay old.


 I can agree to being forced to scribble zeroes/junk all over the slice7
 space which we're expanding to in step 4.. But stuff shouldn't fail
 this way IMO.. Maybe comparing timestamps and see that label 2/3 aren't
 so hot anymore and ignore them, or something..

 zdb -l and zpool import dumps at:
 http://www.acc.umu.se/~stric/tmp/zdb-dump/

 /Tomas
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Strange problem with liveupgrade on zfs (10u7 and u8)

2009-10-15 Thread Tomas Ögren

On 14 October, 2009 - Brian sent me these 4,3K bytes:

 I am have a strange problem with liveupgrade of ZFS boot environment.
 I found a similar discussion on the zones-discuss, but, this happens
 for me on installs with and without zones, so I do not think it is
 related to zones.  I have been able to reproduce this on both sparc
 (ldom) and x86 (phsyical).   I was originally trying to luupdate to
 u8, but, this is easily reproducible with 3 simple steps: lucreate,
 luactivate, reboot.  
...
 [b]lucreate -n sol10alt[/b]
 Noticed the following warning during lucreate: WARNING: split
 filesystem / file system type zfs cannot inherit mount point
 options - from parent filesystem / file type - because the two
 file systems have different types.

Got the same warning and the same end result, was planning on filing it
with Sun yesterday but haven't have time to do that yet. I got it on
sparc (physical) too. I didn't install LU from the u8 iso, but it was
patched with latest LU patches through PCA.

 [b]luactivate sol10alt[/b]

If you lumount, comment out those rpool/ROOT/ thingies, then luumont
here, it'll work too.

 [b]/usr/sbin/shutdown -g0 -i6 -y[/b]

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Terrible ZFS performance on a Dell 1850 w/ PERC 4e/Si (Sol10U6)

2009-10-09 Thread Tomas Ögren

On 09 October, 2009 - Brandon Hume sent me these 2,0K bytes:

 I've got a mail machine here that I built using ZFS boot/root.  It's
 been having some major I/O performance problems, which I posted once
 before... but that post seems to have disappeared.
 
 Now I've managed to obtain another identical machine, and I've built
 it in the same way as the original.  Running Solaris 10 U6, I've got
 it fully patched as of 2009/10/06.  It's using a mirrored disk via the
 PERC (LSI Megaraid) controller.
 
 The main problem seems to be ZFS.  If I do the following on a UFS filesystem:
 
  # /usr/bin/time dd if=/dev/zero of=whee.bin bs=1024000 count=x
 
 ... then I get real times of the following:
 
  x time
 128  35. 4
 256  1:01.8
 512  2:19.8

Is this minutes:seconds.millisecs ? if so, you're looking at 3-4MB/s ..
I would say something is wrong.

 It's all very linear and fairly decent.

Decent?!

 However, if I then destroy that filesystem and recreate it using ZFS
 (no special options or kernel variables set) performance degrades
 substantially.  With the same dd, I get:
 
 x  time
 128  3:45.3
 256  6:52.7
 512  15:40.4

0.5MB/s .. that's floppy speed :P

 So basically a 6.5x loss across the board.  I realize that a simple
 'dd' is an extremely weak test, but real-world use on these machines
 shows similar problems... long delays logging in, and running a
 command that isn't cached can take 20-30 seconds (even something as
 simple as 'psrinfo -vp').
 
 Ironically, the machine works just fine for simple email, because the
 files are small and very transient and thus can exist quite easily
 just in memory.  But more complex things, like a local copy of our
 mailmaps, cripples the machine.

.. because something is messed up, and for some reason ZFS seems to feel
worse than UFS..

 I'm about to rebuild the machine with the RAID controller in
 passthrough mode, and I'll see what that accomplishes.  Most of the
 machines here are Linux and use the hardware RAID1, so I was/am
 hesitant to break standard that way.  Does anyone have any
 experience or suggestions for trying to make ZFS boot+root work fine
 on this machine?

Check for instance 'iostat -xnzmp 1'  while doing this and see if any
disk is behaving badly, high service times etc.. Even your speedy
3-4MB/s is nowhere close to what you should be getting, unless you've
connected a bunch of floppy drives to your PERC..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Tomas Ögren

On 02 October, 2009 - Ray Clark sent me these 4,4K bytes:

 Data security.  I migrated my organization from Linux to Solaris
 driven away from Linux by the the shortfalls of fsck on TB size file
 systems, and towards Solaris by the features of ZFS.
[...]
 Before taking rather disruptive actions to correct this, I decided to
 question my original decision and found schlie's post stating that a
 bug in fletcher2 makes it essentially a one bit parity on the entire
 block:
 http://opensolaris.org/jive/thread.jspa?threadID=69655tstart=30
 While this is twice as good as any other file system in the world that
 has NO such checksum, this does not provide the security I migrated
 for.  Especially given that I did not know what caused the original
 data loss, it is all I have to lean on.
...

That post refers to bug 6740597
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597
which also refers to
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2178540

So it seems like it's fixed in snv114 and s10u8, which won't help your
s10u4 unless you update..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Tomas Ögren

On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes:


 Hello list,

 We are unfortunately still experiencing some issues regarding our support 
 license with Sun, or rather our Sun Vendor.

 We need ZFS User quotas. (That's not the zfs file-system quota) which 
 first appeared in svn_114.

 We would like to run something like svn_117 (don't really care which 
 version per-se, that is just the one version we have done the most 
 testing with).

 But our Vendor will only support Solaris 10. After weeks of wrangling, 
 they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which 
 does not have ZFS User quotas).

 When I approach Sun-Japan directly I just get told that they don't speak  
 English.  When my Japanese colleagues approach Sun-Japan directly, it is  
 suggested to us that we stay with our current Vendor.

 * Will there be official Solaris 10, or OpenSolaris releases with ZFS 
 User quotas? (Will 2010.02 contain ZFS User quotas?)

http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html
which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 3 >

1 - 100 of 230 matches

Mail list logo