Re: [zfs-discuss] Pool vdev imbalance

2010-02-28 Thread Andrew Gabriel

Ian Collins wrote:
I was running zpool iostat on a pool comprising a stripe of raidz2 
vdevs that appears to be writing slowly and I notice a considerable 
imbalance of both free space and write operations.  The pool is 
currently feeding a tape backup while receiving a large filesystem.


Is this imbalance normal?  I would expect a more even distribution as 
the poll configuration hasn't been changed since creation.


The second and third ones are pretty much full, with the others having 
well over 10 times more free space, so I wouldn't expect many writes to 
the full ones.


Have the others ever been in a degraded state? That might explain why 
the fill level has become unbalanced.




The system is running Solaris 10 update 7.

capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
tank  15.9T  2.19T 87119  2.34M  1.88M
 raidz2  2.90T   740G 24 27   762K  95.5K
   c0t1d0-  - 14 13   273K  18.6K
   c1t1d0-  - 15 13   263K  18.3K
   c4t2d0-  - 17 14   288K  18.2K
   spare -  - 17 20   104K  17.2K
 c5t2d0  -  - 16 13   277K  17.6K
 c7t5d0  -  -  0 14  0  17.6K
   c6t3d0-  - 15 12   242K  18.7K
   c7t3d0-  - 15 12   242K  17.6K
   c6t4d0-  - 16 12   272K  18.1K
   c1t0d0-  - 15 13   275K  16.8K
 raidz2  3.59T  37.8G 20  0   546K  0
   c0t2d0-  - 11  0   184K361
   c1t3d0-  - 10  0   182K361
   c4t5d0-  - 14  0   237K361
   c5t5d0-  - 13  0   220K361
   c6t6d0-  - 12  0   155K361
   c7t6d0-  - 11  0   149K361
   c7t4d0-  - 14  0   219K361
   c4t0d0-  - 14  0   213K361
 raidz2  3.58T  44.1G 27  0  1.01M  0
   c0t5d0-  - 16  0   290K361
   c1t6d0-  - 15  0   301K361
   c4t7d0-  - 20  0   375K361
   c5t1d0-  - 19  0   374K361
   c6t7d0-  - 17  0   285K361
   c7t7d0-  - 15  0   253K361
   c0t0d0-  - 18  0   328K361
   c6t0d0-  - 18  0   348K361
 raidz2  3.05T   587G  7 47  24.9K  1.07M
   c0t4d0-  -  3 21   254K   187K
   c1t2d0-  -  3 22   254K   187K
   c4t3d0-  -  5 22   350K   187K
   c5t3d0-  -  5 21   350K   186K
   c6t2d0-  -  4 22   265K   187K
   c7t1d0-  -  4 21   271K   187K
   c6t1d0-  -  5 22   345K   186K
   c4t1d0-  -  5 24   333K   184K
 raidz2  2.81T   835G  8 45  30.9K   733K
   c0t3d0-  -  5 16   339K   126K
   c1t5d0-  -  5 16   333K   126K
   c4t6d0-  -  6 16   441K   127K
   c5t6d0-  -  6 17   435K   126K
   c6t5d0-  -  4 18   294K   126K
   c7t2d0-  -  4 18   282K   124K
   c0t6d0-  -  7 19   446K   124K
   c5t7d0-  -  7 21   452K   122K
  -  -  -  -  -  -

--
Andrew

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] device mixed-up while tying to import.

2010-02-28 Thread Yariv Graf
Hi,
Thanks for the reply.
I can arrange the lost SSD buy I already formatted it.
Second, even the external HDD is for instance /dev/rdsk/c16t0d0, when I try to 
debug using zdb
It shows me another “path”:
path='/dev/dsk/c11t0d0s0'
devid='id1,s...@tst31500341as2ger66y7/a'

phys_path='/p...@0,0/pci8086,2...@1e/pci1458,1...@6/u...@00203702003490ab/d...@0
 ,0:a'

Is there away to fix it?

Regards
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Clear vdev information from disk

2010-02-28 Thread Lutz Schumann
Hello list, 

it is damn difficult to destroy ZFS labels :) 

I try to remove the vedev labels of disks used in a pool before. According to 
http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf
 I created a script that removes the first 512 KB and the last 512 KB, however 
I always miss the last labels .. 


LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

version=14
name='pool1'
state=1
txg=26


LABEL 3

version=14
name='pool1'
state=1


How can I calculate or determine the location of the superblocks described ion 
the document above ? Has that changed over the versions of ZFS ? (I now that 
vdevs now do not need to be exactly the same size in later OpenSolaris 
releases, so maybe something has changed)

Any hints appreshiated .. 

p.s. Clearing the whole disk is troublesome, because those are a bunch of 1 TB 
disks and deletion should be fast.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slow zfs scrub?

2010-02-28 Thread Roy Sigurd Karlsbakk
hi all

I have a server running svn_131 and the scrub is very slow. I have a cron job 
for starting it every week and now it's been running for a while, and it's 
very, very slow

 scrub: scrub in progress for 40h41m, 12.56% done, 283h14m to go

The configuration is listed below, consisting of three raidz2 groups with seven 
2TB drives each. The root fs is on a pair of X25M (gen 1) SSDs and another set 
of similar SSDs be used for Zil and L2ARC (mirror for zil and stripe for l2arc).

Is this correct behaviour? according to the zpool status, opensolaris is 
supposed to use something like 14 days to scrub the dpool...

roy
NAME STATE READ WRITE CKSUM
dpoolONLINE   0 0 0
  raidz2-0   ONLINE   0 0 0
c7t2d0   ONLINE   0 0 0
c7t3d0   ONLINE   0 0 0
c7t4d0   ONLINE   0 0 0
c7t5d0   ONLINE   0 0 0
c7t6d0   ONLINE   0 0 0
c7t7d0   ONLINE   0 0 0
c8t0d0   ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c8t1d0   ONLINE   0 0 0
c8t2d0   ONLINE   0 0 0
c8t3d0   ONLINE   0 0 0
c8t4d0   ONLINE   0 0 0
c8t5d0   ONLINE   0 0 0
c8t6d0   ONLINE   0 0 0
c8t7d0   ONLINE   0 0 0
  raidz2-2   ONLINE   0 0 0
c9t0d0   ONLINE   0 0 0
c9t1d0   ONLINE   0 0 0
c9t2d0   ONLINE   0 0 0
c9t3d0   ONLINE   0 0 0
c9t4d0   ONLINE   0 0 0
c9t5d0   ONLINE   0 0 0
c9t6d0   ONLINE   0 0 0
logs
  mirror-3   ONLINE   0 0 0
c10d1s0  ONLINE   0 0 0
c11d0s0  ONLINE   0 0 0
cache
  c10d1s1ONLINE   0 0 0
  c11d0s1ONLINE   0 0 0
spares
  c9t7d0 AVAIL   


Vennlige hilsener

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-28 Thread Kjetil Torgrim Homme
Paul B. Henson hen...@acm.org writes:
 On Fri, 26 Feb 2010, David Dyer-Bennet wrote:
 I think of using ACLs to extend extra access beyond what the
 permission bits grant.  Are you talking about using them to prevent
 things that the permission bits appear to grant?  Because so long as
 they're only granting extended access, losing them can't expose
 anything.

 Consider the example of creating a file in a directory which has an
 inheritable ACL for new files:

why are you doing this?  it's inherently insecure to rely on ACL's to
restrict access.  do as David says and use ACL's to *grant* access.  if
needed, set permission on the file to 000 and use umask 777.

 drwx--s--x+  2 henson   csupomona   4 Feb 27 09:21 .
 owner@:rwxpdDaARWcC--:-di---:allow
 owner@:rwxpdDaARWcC--:--:allow
 group@:--x---a-R-c---:-di---:allow
 group@:--x---a-R-c---:--:allow
  everyone@:--x---a-R-c---:-di---:allow
  everyone@:--x---a-R-c---:--:allow
 owner@:rwxpdDaARWcC--:f-i---:allow
 group@:--:f-i---:allow
  everyone@:--:f-i---:allow

 When the ACL is respected, then regardless of the requested creation
 mode or the umask, new files will have the following ACL:

 -rw---+  1 henson   csupomona   0 Feb 27 09:26 foo
 owner@:rw-pdDaARWcC--:--:allow
 group@:--:--:allow
  everyone@:--:--:allow

 Now, let's say a legacy application used a requested creation mode of
 0644, and the current umask was 022, and the application calculated
 the resultant mode and explicitly set it with chmod(0644):

why is umask 022 when you want 077?  *that's* your problem.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (200 disks)

2010-02-28 Thread Orvar Korvar
Speaking of long boot times, Ive heard that IBM power servers boot in 90 
minutes or more.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Clear vdev information from disk

2010-02-28 Thread Richard Elling
On Feb 28, 2010, at 5:05 AM, Lutz Schumann wrote:
 Hello list, 
 
 it is damn difficult to destroy ZFS labels :) 

Some people seem to have a knack of doing it accidentally :-)

 I try to remove the vedev labels of disks used in a pool before. According to 
 http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf
  I created a script that removes the first 512 KB and the last 512 KB, 
 however I always miss the last labels .. 
 
 
 LABEL 0
 
 failed to unpack label 0
 
 LABEL 1
 
 failed to unpack label 1
 
 LABEL 2
 
version=14
name='pool1'
state=1
txg=26
 
 
 LABEL 3
 
version=14
name='pool1'
state=1
 
 
 How can I calculate or determine the location of the superblocks described 
 ion the document above ? Has that changed over the versions of ZFS ? (I now 
 that vdevs now do not need to be exactly the same size in later OpenSolaris 
 releases, so maybe something has changed)

It has not changed.  The labels are aligned to 256KB boundaries, so your script 
needs
to find the correct end.  You can also measure this directly using something 
like iosnoop
when running zdb -l.
 -- richard

 
 Any hints appreshiated .. 
 
 p.s. Clearing the whole disk is troublesome, because those are a bunch of 1 
 TB disks and deletion should be fast.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool vdev imbalance

2010-02-28 Thread Ian Collins

Andrew Gabriel wrote:

Ian Collins wrote:
I was running zpool iostat on a pool comprising a stripe of raidz2 
vdevs that appears to be writing slowly and I notice a considerable 
imbalance of both free space and write operations.  The pool is 
currently feeding a tape backup while receiving a large filesystem.


Is this imbalance normal?  I would expect a more even distribution as 
the poll configuration hasn't been changed since creation.


The second and third ones are pretty much full, with the others having 
well over 10 times more free space, so I wouldn't expect many writes 
to the full ones.


Have the others ever been in a degraded state? That might explain why 
the fill level has become unbalanced.


We had to swap a drive in the second one and I've seen hot spares kick 
in as in the first one here.  These have always been as a result of 
phantom errors from that wretched Marvell driver (the box is an x4500).


Nothing has been degraded for long and I stop all copies to the box when 
scrubbing or resilvering is in progress (receives still restart scrubs 
in update 7).




The system is running Solaris 10 update 7.

capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
tank  15.9T  2.19T 87119  2.34M  1.88M
 raidz2  2.90T   740G 24 27   762K  95.5K
   c0t1d0-  - 14 13   273K  18.6K
   c1t1d0-  - 15 13   263K  18.3K
   c4t2d0-  - 17 14   288K  18.2K
   spare -  - 17 20   104K  17.2K
 c5t2d0  -  - 16 13   277K  17.6K
 c7t5d0  -  -  0 14  0  17.6K
   c6t3d0-  - 15 12   242K  18.7K
   c7t3d0-  - 15 12   242K  17.6K
   c6t4d0-  - 16 12   272K  18.1K
   c1t0d0-  - 15 13   275K  16.8K
 raidz2  3.59T  37.8G 20  0   546K  0
   c0t2d0-  - 11  0   184K361
   c1t3d0-  - 10  0   182K361
   c4t5d0-  - 14  0   237K361
   c5t5d0-  - 13  0   220K361
   c6t6d0-  - 12  0   155K361
   c7t6d0-  - 11  0   149K361
   c7t4d0-  - 14  0   219K361
   c4t0d0-  - 14  0   213K361
 raidz2  3.58T  44.1G 27  0  1.01M  0
   c0t5d0-  - 16  0   290K361
   c1t6d0-  - 15  0   301K361
   c4t7d0-  - 20  0   375K361
   c5t1d0-  - 19  0   374K361
   c6t7d0-  - 17  0   285K361
   c7t7d0-  - 15  0   253K361
   c0t0d0-  - 18  0   328K361
   c6t0d0-  - 18  0   348K361
 raidz2  3.05T   587G  7 47  24.9K  1.07M
   c0t4d0-  -  3 21   254K   187K
   c1t2d0-  -  3 22   254K   187K
   c4t3d0-  -  5 22   350K   187K
   c5t3d0-  -  5 21   350K   186K
   c6t2d0-  -  4 22   265K   187K
   c7t1d0-  -  4 21   271K   187K
   c6t1d0-  -  5 22   345K   186K
   c4t1d0-  -  5 24   333K   184K
 raidz2  2.81T   835G  8 45  30.9K   733K
   c0t3d0-  -  5 16   339K   126K
   c1t5d0-  -  5 16   333K   126K
   c4t6d0-  -  6 16   441K   127K
   c5t6d0-  -  6 17   435K   126K
   c6t5d0-  -  4 18   294K   126K
   c7t2d0-  -  4 18   282K   124K
   c0t6d0-  -  7 19   446K   124K
   c5t7d0-  -  7 21   452K   122K
  -  -  -  -  -  -



--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Clear vdev information from disk

2010-02-28 Thread Tim Cook
On Sun, Feb 28, 2010 at 1:12 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Feb 28, 2010, at 5:05 AM, Lutz Schumann wrote:
  Hello list,
 
  it is damn difficult to destroy ZFS labels :)

 Some people seem to have a knack of doing it accidentally :-)

  I try to remove the vedev labels of disks used in a pool before.
 According to
 http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdfI
  created a script that removes the first 512 KB and the last 512 KB,
 however I always miss the last labels ..
 
  
  LABEL 0
  
  failed to unpack label 0
  
  LABEL 1
  
  failed to unpack label 1
  
  LABEL 2
  
 version=14
 name='pool1'
 state=1
 txg=26
  
  
  LABEL 3
  
 version=14
 name='pool1'
 state=1
 
 
  How can I calculate or determine the location of the superblocks
 described ion the document above ? Has that changed over the versions of ZFS
 ? (I now that vdevs now do not need to be exactly the same size in later
 OpenSolaris releases, so maybe something has changed)

 It has not changed.  The labels are aligned to 256KB boundaries, so your
 script needs
 to find the correct end.  You can also measure this directly using
 something like iosnoop
 when running zdb -l.
  -- richard

 
  Any hints appreshiated ..
 
  p.s. Clearing the whole disk is troublesome, because those are a bunch of
 1 TB disks and deletion should be fast.
  --
  This message posted from opensolaris.org


Perhaps this has already been suggested, but it would seem to me to make a
lot more sense to have some sort of zfs label clear type command to
quickly and easily clear labels...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] device mixed-up while tying to import.

2010-02-28 Thread Cyril Plisko
On Sun, Feb 28, 2010 at 2:06 PM, Yariv Graf ya...@walla.net.il wrote:
 Hi,
 Thanks for the reply.
 I can arrange the lost SSD buy I already formatted it.
 Second, even the external HDD is for instance /dev/rdsk/c16t0d0, when I try 
 to debug using zdb
 It shows me another “path”:
 path='/dev/dsk/c11t0d0s0'
        devid='id1,s...@tst31500341as2ger66y7/a'
        
 phys_path='/p...@0,0/pci8086,2...@1e/pci1458,1...@6/u...@00203702003490ab/d...@0
  ,0:a'

 Is there away to fix it?

Yariv,

In short you need not. That 'another' path won't fool ZFS, since
it ultimately trusts the device GUIDs. You can reshuffle disks
in your pool and put it back in random order and ZFS will find
its way. On successful import all the recorded paths will be updated.
So, back to square 1 - you need import your pool first.


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS compression and deduplication on root pool on SSD

2010-02-28 Thread valrh...@gmail.com
I am running my root pool on a 60 GB SLC SSD (OCZ Agility EX). At present, my 
rpool/ROOT has no compression, and no deduplication. I was wondering about 
whether it would be a good idea, from a performance and data integrity 
standpoint, to use one, the other, or both, on the root pool. My current 
problem is that I'm starting to run out of space on the SSD, and based on a 
send|receive I did to a backup server, I should be able to compress by about a 
factor of 1.5x. If I enable both on the rpool filesystem, then clone the boot 
environment, that should enable it on the new BE (which would be a child of 
rpool/ROOT), right?

Also, I don't have the numbers to prove this, but it seems to me that the 
actual size of rpool/ROOT has grown substantially since I did a clean install 
of build 129a (I'm now at build133). WIthout compression, either, that was 
around 24 GB, but things seem to have accumulated by an extra 11 GB or so. Or 
am I imagining things? Is there a way to get rid of all of the legacy stuff 
that's in there? I already deleted the old snapshots and boot environments that 
were taking up much space.

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs scrub?

2010-02-28 Thread Bob Friesenhahn

On Sat, 27 Feb 2010, Roy Sigurd Karlsbakk wrote:


hi all

I have a server running svn_131 and the scrub is very slow. I have a 
cron job for starting it every week and now it's been running for a 
while, and it's very, very slow


Have you checked the output of 'iostat -xe' to see if there are 
unusually slow (or overloaded) disks or increasing error counts?  Is 
the CPU load unusually high?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD

2010-02-28 Thread Bob Friesenhahn

On Sun, 28 Feb 2010, valrh...@gmail.com wrote:

backup server, I should be able to compress by about a factor of 
1.5x. If I enable both on the rpool filesystem, then clone the boot 
environment, that should enable it on the new BE (which would be a 
child of rpool/ROOT), right?


If by 'clone' you are talking about zfs's clone, I don't think that 
this will immediately save you any space since zfs clone is done by 
block references to existing blocks.  You would need to actually copy 
(or re-write) the files in order for them to be compressed.


Using lzjb compression sounds like a good idea.  I doubt that GRUB 
supports gzip compression so take care that you use a compression 
algorithm that GRUB understands or your system won't boot.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What's the advantage of using multiple filesystems in a pool

2010-02-28 Thread tomwaters
Hi guys, on my home server I have a variety of directories under a single 
pool/filesystem, Cloud.

Things like
cloud/movies  - 4TB
cloud/music - 100Gig
cloud/winbackups  - 1TB
cloud/data   - 1TB

etc.

After doing some reading, I see recomendations to have separate filesystem to 
improve performance...but not sure how as it's the same pool?

Can someone help me understand if/why I should use separate file systems for 
these?

ta.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import as unavailable when mpxio disabled

2010-02-28 Thread mingli
I have 1 host with Solaris 10 update 8 and it linked with stk6540 array(the 
host type set to traffic manager), and host have 4 paths and 2 linked to the 
controller A and the rest 2 linked to controller B, when I disabled the MPxIO 
and the host reboot, then I checked the zpool status, the testpool imported as 
unavailable, the reason is zpool still try to import zpool with the MPxIO 
device files, actually they were not available again, and if I export this pool 
and re-import it again, the pool can be imported with the correct device files.

But I did the same test with one Solaris 10 U8 host linked to IBM ess800, and 
when MPxIO disable, the zpool can import correctly with the correct device 
files.

The scenario confuse me, and do anybody have experiment on it?Any reply is very 
appreciated.

Thanks,Ming
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD

2010-02-28 Thread Bill Sommerfeld

On 02/28/10 15:58, valrh...@gmail.com wrote:

Also, I don't have the numbers to prove this, but it seems to me

 that the actual size of rpool/ROOT has grown substantially since I
 did a clean install of build 129a (I'm now at build133). WIthout
 compression, either, that was around 24 GB, but things seem
 to have accumulated by an extra 11 GB or so.

One common source for this is slowly accumulating files under
/var/pkg/download.

Clean out /var/pkg/download and delete all but the most recent boot 
environment to recover space (you need to do this to get the space back 
because the blocks are referenced by the snapshots used by each clone as 
its base version).


To avoid this in the future, set PKG_CACHEDIR in your environment to 
point at a filesystem which isn't cloned by beadm -- something outside 
rpool/ROOT, for instance.


On several systems which have two pools (root  data) I've relocated it 
to the data pool - it doesn't have to be part of the root pool.  This 
has significantly slimmed down my root filesystem on systems which are 
chasing the dev branch of opensolaris.


 At present, my rpool/ROOT has no compression, and no deduplication. I 
 was wondering about whether it would be a good idea, from a

 performance and data integrity standpoint, to use one, the other, or
 both, on the root pool.

I've used the combination of copies=2 and compression=yes on rpool/ROOT 
for a while and have been happy with the result.


On one system I recently moved to an ssd root, I also turned on dedup 
and it seems to be doing just fine:


NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
r2  37G  14.7G  22.3G39%  1.31x  ONLINE  -

(the relatively high dedup ratio is because I have one live upgrade BE 
with nevada build 130, and a beadm BE with opensolaris build 130, which 
is mostly the same)


- Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's the advantage of using multiple filesystems in a pool

2010-02-28 Thread Erik Trimble

tomwaters wrote:

Hi guys, on my home server I have a variety of directories under a single 
pool/filesystem, Cloud.

Things like
cloud/movies  - 4TB
cloud/music - 100Gig
cloud/winbackups  - 1TB
cloud/data   - 1TB

etc.

After doing some reading, I see recomendations to have separate filesystem to 
improve performance...but not sure how as it's the same pool?

Can someone help me understand if/why I should use separate file systems for 
these?

ta.
  
Obviously, having different filesystems gives you the ability to set 
different values for attributes, which may substantially improve 
performance or storage space depending on the data in that filesystem.  
As an example above, I would consider turning compression on for your 
cloud/winbackups and possibly for cloud/data, but definitely not for 
either cloud/movies (assuming mpeg4 or similar files) or cloud/music. 




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] suggested ssd for zil

2010-02-28 Thread rwalists
If anyone has specific SSD drives they would recommend for ZIL use would you 
mind a quick response to the list?  My understanding is I need to look for:

1) Respect cache flush commands (which is my real question...the answer to this 
isn't very obvious in most cases)
2) Fast on small writes

It seems even the smallest sizes should be sufficient.  This is for a home NAS 
where most write work is for iSCSI volumes hosting backups for OS X Time 
Machine.  There is also some small amount of MySQL (InnoDB) shared via NFS.

From what I can gather workable options would be:

- Stec which are in the 7000 series and extremely expensive

- Mtron Pro 7500 16GB SLC which seem to respect the cache flush but aren't 
particularly fast doing it
http://opensolaris.org/jive/thread.jspa?messageID=459872tstart=0

- Intel X-25E with the cache turned off which seems to be like the Mtron

- Seagate's marketing page for their new SSD implies it has a capacitor to 
protect data in cache like I believe the Stec does.  But I don't think they are 
available at retail yet.
Power loss data protection to ensure against data loss upon power failure
http://www.seagate.com/www/en-us/products/servers/pulsar/pulsar/

And what won't work are:

- Intel X-25M
- Most/all of the consumer drives prices beneath the X-25M

all because they use capacitors to get write speed w/o respecting cache flush 
requests.  Is there anything that is safe to use as a ZIL, faster than the 
Mtron but more appropriate for home than a Stec?  Maybe the answer is to wait 
on Seagate, but I thought maybe someone has other ideas.

Thanks,
Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] suggested ssd for zil

2010-02-28 Thread rwalists

On Feb 28, 2010, at 11:51 PM, rwali...@washdcmail.com wrote:

 And what won't work are:
 
 - Intel X-25M
 - Most/all of the consumer drives prices beneath the X-25M
 
 all because they use capacitors to get write speed w/o respecting cache flush 
 requests. 

Sorry, meant to say they use cache to get write speed w/o respecting cache 
flush requests.

--Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] suggested ssd for zil

2010-02-28 Thread Daniel Carosone
 Is there anything that is safe to use as a ZIL, faster than the
 Mtron but more appropriate for home than a Stec?  

ACARD ANS-9010, as mentioned several times here recently (also sold as
hyperdrive5)  

--
Dan.



pgpeFYm43bUlS.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD

2010-02-28 Thread Daniel Carosone
On Sun, Feb 28, 2010 at 07:36:30PM -0800, Bill Sommerfeld wrote:
 To avoid this in the future, set PKG_CACHEDIR in your environment to  
 point at a filesystem which isn't cloned by beadm -- something outside  
 rpool/ROOT, for instance.

+1 - I've just used a dataset mounted at /var/pkg/download, don't know
when this knbob appeared but i'only heard about it recently and not
yet bothered to rearrange stuff accordingly. 

 On one system I recently moved to an ssd root, I also turned on dedup  
 and it seems to be doing just fine:

I have had compress=on,dedup=on on several rpools for some time,
including a little netbook with a 7.5G slow-as ssd.

--
Dan.


pgp9IxlyKOn8j.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] suggested ssd for zil

2010-02-28 Thread rwalists

On Mar 1, 2010, at 12:05 AM, Daniel Carosone wrote:

 Is there anything that is safe to use as a ZIL, faster than the
 Mtron but more appropriate for home than a Stec?  
 
 ACARD ANS-9010, as mentioned several times here recently (also sold as
 hyperdrive5) 

You are right.  I saw that in a recent thread.  In my case I don't have a spare 
bay for it.  I'm similarly constrained on some of the PCI solutions that have 
either battery backup or external power.

But this seems like a good solution if someone has the space.

Thanks,
Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] suggested ssd for zil

2010-02-28 Thread Erik Trimble

rwali...@washdcmail.com wrote:

On Feb 28, 2010, at 11:51 PM, rwali...@washdcmail.com wrote:

  

And what won't work are:

- Intel X-25M
- Most/all of the consumer drives prices beneath the X-25M

all because they use capacitors to get write speed w/o respecting cache flush requests. 



Sorry, meant to say they use cache to get write speed w/o respecting cache flush 
requests.

--Ware
  
Actually, the bigger strike against the X-25M and similar MLC-based SSDs 
is their relatively poor small random writes performance.


I'm pretty sure that all SandForce-based SSDs don't use DRAM as their 
cache, but take a hunk of flash to use as scratch space instead. Which 
means that they'll be OK for ZIL use.  OCZ's  Vertex 2 EX and Vertex 2 
both use the controller, but they'll not be available for another month 
or so, in all likelihood.


http://www.techspot.com/review/242-ocz-vertex2-pro-ssd/


Also, it looks like the Vertex Limited Edition is SandForce-based, too.

http://www.legitreviews.com/article/1222/2/

Though, according to the article, without the capacitor, you still might 
loose some data stored in the SandForce controller's internal buffer.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] device mixed-up while tying to import.

2010-02-28 Thread Yariv Graf
Hi Cyril,
Thanks for the response.
In simple words this is what been done.
1- zpool import HD (external HDD[single drive])
2- zpool add HD log c0t4d0 (SSD drive)
3- play with it a bit.
4 zpool export HD
5- reinstall opensolaris on SSD drive (ex slog above).

Is there any chance to recover the HD zpool?
I can use the SSD drive as slog for recovery if needed.

Many thanks

Yariv
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sizing for L2ARC and dedup...

2010-02-28 Thread Richard Elling
On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote:
 I'm finally at the point of adding an SSD to my system, so I can get 
 reasonable dedup performance.
 
 The question here goes to sizing of the SSD for use as an L2ARC device.
 
 Noodling around, I found Richard's old posing on ARC-L2ARC memory 
 requirements, which is mighty helpful in making sure I don't overdo the L2ARC 
 side.
 
 (http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34677.html)

I don't know of an easy way to see the number of blocks, which is what
you need to complete a capacity plan.  OTOH, it doesn't hurt to have an 
L2ARC, just beware of wasting space if you have a small RAM machine.

 What I haven't found is a reasonable way to determine how big I'll need an 
 L2ARC to fit all the relevant data for dedup.  I've seen several postings 
 back in Jan about this, and there wasn't much help, as was acknowledged at 
 the time.
 
 What I'm after is exactly what needs to be stored extra for DDT?  I'm looking 
 at the 200-byte header in ARC per L2ARC entry, and assuming that is for all 
 relevant info stored in the L2ARC, whether it's actual data or metadata.  My 
 question is this: the metadata for a slab (record) takes up how much space?  
 With DDT turned on, I'm assuming that this metadata is larger than with it 
 off (or, is it the same now for both)?
 
 There has to be some way to do a back-of-the-envelope calc that says  (X) 
 pool size = (Y) min L2ARC size = (Z) min ARC size

If you know the number of blocks and the size distribution you can
calculate this. In other words, it isn't very easy to do in advance unless
you have a fixed-size workload (eg database that doesn't grow :-)
For example, if you have a 10 GB database with 8KB blocks, then
you can calculate how much RAM would be required to hold the
headers for a 10 GB L2ARC device:
headers = 10 GB / 8 KB
RAM needed ~ 200 bytes * headers

for media, you can reasonably expect 128KB blocks.

The DDT size can be measured with zdb -D poolname  but you 
can expect that to grow over time, too.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sizing for L2ARC and dedup...

2010-02-28 Thread Erik Trimble

Richard Elling wrote:

On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote:
  

I'm finally at the point of adding an SSD to my system, so I can get reasonable 
dedup performance.

The question here goes to sizing of the SSD for use as an L2ARC device.

Noodling around, I found Richard's old posing on ARC-L2ARC memory 
requirements, which is mighty helpful in making sure I don't overdo the L2ARC side.

(http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34677.html)



I don't know of an easy way to see the number of blocks, which is what
you need to complete a capacity plan.  OTOH, it doesn't hurt to have an 
L2ARC, just beware of wasting space if you have a small RAM machine.
  

I haven't found a good way, either.  And I've looked.  ;-)





What I haven't found is a reasonable way to determine how big I'll need an 
L2ARC to fit all the relevant data for dedup.  I've seen several postings back 
in Jan about this, and there wasn't much help, as was acknowledged at the time.

What I'm after is exactly what needs to be stored extra for DDT?  I'm looking 
at the 200-byte header in ARC per L2ARC entry, and assuming that is for all 
relevant info stored in the L2ARC, whether it's actual data or metadata.  My 
question is this: the metadata for a slab (record) takes up how much space?  
With DDT turned on, I'm assuming that this metadata is larger than with it off 
(or, is it the same now for both)?

There has to be some way to do a back-of-the-envelope calc that says  (X) pool 
size = (Y) min L2ARC size = (Z) min ARC size



If you know the number of blocks and the size distribution you can
calculate this. In other words, it isn't very easy to do in advance unless
you have a fixed-size workload (eg database that doesn't grow :-)
For example, if you have a 10 GB database with 8KB blocks, then
you can calculate how much RAM would be required to hold the
headers for a 10 GB L2ARC device:
headers = 10 GB / 8 KB
RAM needed ~ 200 bytes * headers

for media, you can reasonably expect 128KB blocks.

The DDT size can be measured with zdb -D poolname  but you 
can expect that to grow over time, too.

 -- richar
That's good, but I'd like a way to pre-calculate my potential DDT size 
(which, I'm assuming, will sit in L2ARC, right?)  Once again, I'm 
assuming that each DDT entry corresponds to a record (slab), so to be 
exact, I would need to know the number of slabs (which doesn't currently 
seem possible).  I'd be satisfied with a guesstimate based on what my 
expected average block size it.  But what I need to know is how big a 
DDT entry is for each record. I'm trying to parse the code, and I don't 
have it in a sufficiently intelligent IDE right now to find all the 
cross-references. 


I've got as far as (in ddt.h)

struct ddt_entry {
   ddt_key_tdde_key;
   ddt_phys_tdde_phys[DDT_PHYS_TYPES];
   zio_t*dde_lead_zio[DDT_PHYS_TYPES];
   void*dde_repair_data;
   enum ddt_typedde_type;
   enum ddt_classdde_class;
   uint8_tdde_loading;
   uint8_tdde_loaded;
   kcondvar_tdde_cv;
   avl_node_tdde_node;
};

Any idea what these structure size actually are?




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss