Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-11-17 Thread Kyle McDonald
The question that has occurred to me is:

I *must* choose one of those support options for how long?

I mean if I buy support for a machine for a year and put S11 Express
in production on it, then I don't renew the support, am I now
violating the license?

That's bogus. I could be wrong but I don't think Sun ever did this. As
far as I knew when I worked at Sun, I seem to remember that buying a
machine gave you a 'right to use' Solaris (even future versions as I
understood it) on that machine with out any extra charge.

Is there an option to just buy a license outright without paying for
support?

This is as bad a some application software companies are. license
ends  app stops running.
Actually this is worse since it's not just one app it's the whole OS.
At least it doesn't refuse to run or cripple itself like some other OS
does. ;)

  -Kyle

 Licensing and Support for Oracle Solaris 11 Express

 11-Can I get support for Oracle Solaris 11 Express?

 Yes. Oracle Solaris 11 Express is covered under the Oracle Premier
 Support for Operating Systems or Oracle Premier Support for Systems
 support option for Oracle hardware, and Oracle Solaris Premier
 Subscription for non-Oracle hardware. Customers must choose either
 of these support options should they wish to deploy Oracle Solaris
 11 Express into a production environment.

 [1]
 http://www.oracle.com/technetwork/server-storage/solaris11/overview/faqs-oraclesolaris11express-185609.pdf



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Any opinions on the Brocade 825 Dual port 8Gb FC HBA?

2010-11-16 Thread Kyle McDonald
Does OpenSolaris/Solaris11 Express have a driver for it already?

Anyone used one already?

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Growing the swap vol?

2010-11-13 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 

Hi all,

I'd like to give my machine a little more swap.

I ran:

zfs get volsize rpool/swap

and saw it was 2G

So I ran:

zfs set volsize=4G rpool/swap

to double it. zfs get shows it took affect, but swap -l doesn't show
any change.
I ran swap -d to remove the device, and then swap -a to re-add it, and
it still shows 2G (about 4 million blocks).

How do I make the change take affect?

  -Kyle
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJM3xiQAAoJEEADRM+bKN5wbEsH/3Kp1l3NqO62Z6g8GteXN5t3
PpV/x9MnLsfCohM8ye8ThsMWkiGyNUWYJ0rp43wNu/6pqz1uBMPO4JCxOmUNaKXp
KSkyA3ZPCO3D49quXJac7uS5aRhyXi2RHoKBDpV4DMeq3cjYr3pfwl5EZgICKxSw
govRsdpf3VVEHvYx+pJ4p7Tbvz/Ig1dA/R4rgMnTi5NO/S3wTRG65ESBJI/v3rAA
RyXeICg1Ni7wBdUM1LOpbHSJ4uIHAPMvZNuSiG6Hh4XGUy3ihMexWUp9qc6+V3lR
De+83rsQpXqgR4d5V1YaQk7msuINN2uhwKQOxT6xhClH4ni+9DV+2l2ABETT1MI=
=wR3h
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-12 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 11/12/2010 10:03 AM, Edward Ned Harvey wrote:

 Since combining ZFS storage backend, via nfs or iscsi, with ESXi
 heads, I?m in love. But for one thing. The interconnect between
 the head  storage.



 1G Ether is so cheap, but not as fast as desired. 10G ether is
 fast enough, but it?s overkill and why is it so bloody expensive?
 Why is there nothing in between? Is there something in between?

I suppose you could try multiple 1G interfaces bonded together - Does
the ESXi hypervisor support LACP aggregations?  I'm not sure it will
help though, given the algorithms that LACP can use to distribute the
traffic.

  -Kyle

 Is there a better option? I mean ? sata is cheap, and it?s 3g or
 6g, but it?s not suitable for this purpose. But the point
 remains, there isn?t a fundamental limitation that **requires** 10G
 to be expensive, or **requires** a leap directly from 1G to 10G. I
 would very much like to find a solution which is a good fit? to
 attach ZFS storage to vmware.



 What are people using, as interconnect, to use ZFS storage on
 ESX(i)?



 Any suggestions?





 ___ zfs-discuss mailing
 list zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJM3VycAAoJEEADRM+bKN5wh9cIAJNFlr99ue2Bd2l/GBFOHY4y
IJ7Z0N6oWtKsHmNoCfepbLa9NU1VdHfaICFXq7TXBJnzjMECUu6gfsW/dK+3tgBv
1jcpx5+pxk4yAYA0znBUn+ro57bZH6PDV/tZzy4ZU0M/uLQtHGpD2wZF+qj3b9MC
ieG6ywkt9YiOzOvOk7X7oTwi+iQQeKRXKVi+02vxeuN8PWRkD2NtHGbfLlp3f3en
LNZx0hD0gOXBMSW3xRKTAJv0ioNRptRI0ZVc1a5+0daksioOlhdeMl+2tV2zCb8h
qmnrj+H1RlWORPAWPo9QsQPLBBGixkcy7Yavj+XZz9nHanHAbtUt5z5j/hKsvAM=
=dDzv
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Any opinoins on these SSD's?

2010-11-11 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
I'm shopping for an SSD for a ZIL.

Looking around on NewEgg, at the claimed (not sure I beleive them)
IOPS, these caught my attention:


Corsair Force 80GB   CSSD-F80GBP2-BRKT50K 4K aligned ran.
write IOPS
OCZ Vertex 2 120GB   OCZSSD3-2VTX120G 50K 4K aligned ran.
write IOPS
A-DATA S559  128GB   AS599S0128GM 50K 4k aligned write
IOPS
Crucial RealSSD C300 128GB   CTFDDAC128MAG-1G1CCA 60K/30K 4k
read/write IOPS

Any opinions? stories? other models I missed?

Other questions:

1) The ZIL will be small compared to the size of these, can I use the
rest as L2ARC or is that not such a good idea?

2) Will ZFS align the ZIL writes in such a way that those IOPS numbers
will be close to attainable?

 -Kyle




-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJM3E1IAAoJEEADRM+bKN5wYdkH/28o/PxjEYvuGTyzgXof2apu
79NL1uceWP8mlqV8Fo55XTLyISEOh/b+72YSNdFx0lvzNkV+SvI19cugH1IS4Ic7
zspUYgEBs1Xq9+fUarRFO1vOaFdZcSByyaAGN4XHGSz30E6bRfmAQU0l7VVqR3Cc
UQbHfk588PkhNKT0JcDD1vku06jsdGRNHzqAH5QdrQnaNfPXBHFOvdDIbClFnohE
G213EE5wwzWDSUHYP8rD29dL0atOFFev4203D5LeatxoXME9qAprZEdaG41gTN1A
0XnaUW2RLDzgzkXAg6b1V2ufLmmdpp/jQjkE5QIesar7t/ZPTQPiowrIhh/UwBg=
=fFbi
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS

2010-10-28 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 8/7/2010 4:11 PM, Terry Hull wrote:

 It is just that lots of the PERC controllers do not do JBOD very well. I've
 done it several times making a RAID 0 for each drive. Unfortunately, that
 means the server has lots of RAID hardware that is not utilized very well.
Doing that lets you use the cache, which is the only part of the RAID
HW that I'd worry about wasting.
 Also, ZFS loves to see lots of spindles, and Dell boxes tend not to have
 lots of drive bays in comparison to what you can build at a given price
 point.
I've found the R515 (the R510's cousing with AMD processors) to be
very interesting in this regard. It has many more drive bays than most
Dell boxes.

I've also priced out the IBM x3630 M3, even more drive bays in this one.
for about %20 more.

 Of course then you have warranty / service issues to consider.

I don't know what you're needs are, but I found dell's 5yr onsite 10x5
NBD support to be priced very attractively. But I can live with a
machine being down till the next day, or through a weekend.

 -Kyle

 --
 Terry Hull
 Network Resource Group, Inc.


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMyZFEAAoJEEADRM+bKN5wTr4IAIh1LzIcm346TVcRZdKwkbgW
EkFux2ZT8uzk/v1lXqgiDCkO0zQ/Bwpk9SsSa0KOblOxKRWPYQwj2pO30syX/QnR
82aFfhcJaWmf0H3aphoowqTTDhKRefYXgbPINaVafDV8JY8tN9d0+Tcnhv03n3pq
7Eafg+RbjaZPceZxDuNQ0xJFw+cpXvOYSFAcCB+E49actOqDIErf4A2xGL96PK7k
POu1bHN5qyIsca6t76nvuR7w8+yq6FfM4HY0KahyPhx/MXjp01N7vFyQKdLF5rGU
ByliQedo7r8OsLl6BxeMwv+SBNxab4sjqWpWfTzniLk1Ng6aG3mm5YQ7/iAUZ+0=
=FDkN
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 10/25/2010 3:39 AM, Markus Kovero wrote:

 Any other feasible alternatives for Dell hardware? Wondering, are these
issues mostly related to Nehalem-architectural problems, eg. c-states.
 So is there anything good in switching hw vendor? HP anyone?

Note that while it was a Dell I was asking about, it's an AMD opteron
system (the R515.)

I doubt with an architecture that different that the same 'c-states'
corner case will appear. Aren't there too many variables changing
between AMD and Intel to have the exact same problem?

Not there there won't be a different problem though. :)

  -Kyle

 Yours
 Markus Kovero


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMxYghAAoJEEADRM+bKN5wnEEH/iMYiNEjqRdEWMYMlzrXJV7G
1EqsmgC/10nwdVS+lxHQbeoXZ6AZltomkb42ckwLfR74BVwHTM8BBC2hmoaXVMAr
FeJzVPe61c8LF5M0RrVJ59gXpBJCjIps8mBli/7wqNYm5SyLAfu0DDD59kY54n75
QcvNvz6mNlXjmE2+kakcLbN3DMjCxRlQ4XgrGQrqwusoZL7LPFhwEy7f+rGp63PO
LW82RUIolVqRoNQ5Vg2iemaASkbJUKONppOV2J6FN30MQt8fyGL8SlkU1Fek/hgS
EbHZ1e8wgmrOKlcKxnMMH7yh296X8ICl990aWRbt6jxUDM+zeKRC3NceV+pmrSc=
=heKE
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Hi All,

I'm currently considering purchasing 1 or 2 Dell R515's.

With up to 14 drives, and up to 64GB of RAM, it seems like it's well
suited
for a low-end ZFS server.

I know this box is new, but I wonder if anyone out there has any
experience with it?

How about the H700 SAS controller?

Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
want to put some SSD's in a box like this, but there's no way I'm
going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
they kidding?

  -Kyle

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o
83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D
MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1
Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y
9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo
Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA=
=DW76
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to avoid striping ?

2010-10-18 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 10/18/2010 4:28 AM, Habony, Zsolt wrote:

 I worry about head thrashing.
Why?

If your SAN group gives you a LUN that is at the opposite end of the
array, I would think that was because they had already assigned the
space in the middle to other customers (other groups like yours, or
other hosts of yours.)

If so, don't you think that all those other hosts and customers will
be reading and writing from that array all the time anyway? I mean if
the heads are going to 'thrash', then they'll be doing so even before
you request your second LUN right?

Adding your second LUN to the mix isn't going to seriously change the
workload on the disks in the array.

 Though memory cache of large storage should make the problem
 easier, I would be more happy if I can be sure that zpool will not
 be handled as a stripe.

 Is there a way to avoid it, or can we be sure that the problem does
 not exist at all ?

As I think the logic above suggests, If the problem exists, it exists
even when you only have 1 LUN.

  -Kyle
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMvFeKAAoJEEADRM+bKN5wuc4IALPTIrGcAq6TWa95yrA/DCWp
vu2K7+pwSvz/IRIP+C6Y+qvWm/Km+UdtRu6PKb8G/DF8xp5vEnkqXdRSNDC6FlpR
EwSNavS7ij87bN6fuBiw6E02GZtADi2RptPKgyGz1FT3wPDHS8SQKtA59DwrWJNS
ckHUi+9BwngL4p7E0C+8pcahyF7QmtTm3DpL3y4AZ+7O+c/wPcIwLZ3dI6yQU8vd
KuRe6h/xCHffKH9gHoXJf0pG4e5iA8XP+lt7DlJGPxRYzZil0Rr5JA67uGqEf/VY
FbhAtXqWrHkNSd2sk1bIJVj7OFCS6j/NXMkV/Dt6OUH2Gkucl1nBs4yIAQ9Hu3s=
=I+w1
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to avoid striping ?

2010-10-18 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 10/18/2010 5:40 AM, Habony, Zsolt wrote:
 (I do not mirror, as the storage gives redundancy behind LUNs.)

By not enabling redundancy (Mirror or RAIDZ[123]) at the ZFS level,
you are opening yourself to corruption problems that the underlying
SAN storage can't protect you from. the SAN array won't even notice
the problem.

ZFS will notice the problem, and (if you don't give it redundancy to
work with) it won't be able to repair it for you.

You'd be better off getting unprotected LUNS from the Array, and
letting ZFS handle the redundancy.

  -Kyle
 Online LUN expansion seems promising, and answering my question.
 Thank You for that.

 Zsolt


 ___ zfs-discuss mailing
 list zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMvFhtAAoJEEADRM+bKN5wmgwIAK2HCAtaHkAp2RxqfkcFGD3A
0YyzP148fzTcEpFwhpNm59nht9fsfAibjCZZ/HmApe2jYWJ2K9l4W0MBXedXnz3e
gEaIxqymSHLjkF2SF0OD2XfnNiDMor5CrzPirZMcAL7TeyIqyACeuQTVVqZPw2rZ
TF1fGG2M9Y0l1Gq5+PfNcGESiz4tb7Er6UtDnLFe7rx4DObNJnO07jr1BMBxHsp8
tL1+YxhAUpWvaKOqHJvruZRtxagdE1KUQAtipPQjZvFudqIVAT8PRL0Acwz0D6aq
Lv1nmYzGg3M1usjrbfSEDV2eM3WR3gc7px93xyxZ1kMQPOgRO7X0YRxwfUMEsUc=
=+YXG
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RaidzN blocksize ... or blocksize in general ... and resilver

2010-10-17 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 10/17/2010 9:38 AM, Edward Ned Harvey wrote:

 The default blocksize is 128K. If you are using mirrors, then
 each block on disk will be 128K whenever possible. But if you're
 using raidzN with a capacity of M disks (M disks useful capacity +
 N disks redundancy) then the block size on each individual disk
 will be 128K / M. Right?


If I understand things correctly, I think this is why it is
recommended that you pick an M that divides into 128K evenly. I
believe powers of 2 are recommended.

I think increasing the block size to 128K*M would be overkill, but
that idea does make me wonder:

In cases where M can't be a power of 2, would it make sense to adjust
the block size so that M still divides evenly?

If M were 4 then the data written to each drive would be 32K. So if
you really wanted to M to be 5 drives, is there an advantage to making
the block size 160K, or if that's too big, how about 80K?

Like wise if you really wanted to M to be 3 drives, would adjusting it
BS to 96K make sense?

  -Kyle

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMuzG2AAoJEEADRM+bKN5wokMH/A2W3hjf2yZx0uO4n0UvSbIY
aAS2faGjx9R03ile3u1K/Qlg/dAm0zLdMkNoKY8Pcg8TPx3VLCapNvmlySxCldAf
rPXC8NC5xzIj75oGqb1VGByUlqerCdVldvBjo5vFKcDM83CcpLLjmO6gJzNe1UoV
MwcKsb0oZv3JzmYcvqjW/lNCIjaQzxkm0k0EP+pV1tx+HMPyHp+kaxnzv4v994GO
zwz0OfUOsHaIkSJda8t8ekg9qMdvZa63X8A0VGmhnR26lpjHZD/274IPBStapasx
IC+T7O0EYazQSO3fftZ6MCd9O6//0tbQX0MLHPDMpyX90EU+ihILuqYn/QjJjhg=
=4mvO
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS, IPS (IBM ServeRAID) driver, and a kernel panic...

2010-07-09 Thread Kyle McDonald
Hi,


I have been trying out the latest NextentaCore and NexentaStor Community
ed. builds (they have the driver I need built in) on the hardware I have
with this controller.

The only difference between the 2 machines is that the 'Core' machine
has 16GB of RAM and the 'Stor' one has 12GB.

On both machines I did the following:

1) Created zpool consisting of a single RaidZ from 5 300GB U320 10K
   drives.
2) Created 4 filesystems in the pool.
3) On the 4 filesystems I set the dedup and compression properties
   to cover all the combinations. (off/off, off/on, on/off, and
   on/on)

On the 'Stor' machine I elected to Disable the ZIL and cacheflush
through the web GUI. I didn't do this on the 'Core' machine.

On the 'Core' machine I mounted the 4 Filesystems from the 'Stor'
machine via NFSv4.

Now for a bit of history.

I tried out the 'Stor' machine in this exact config (but with ZIL and
Cache flushes on) about a month ago with version 3.0.2. At that time I
used a Linux NFS client to time untar'ing the GCC sources to each of the
4 filesystems. This test repeatedly failed on the first filesystem by
bringing the machine to it's knees to the point that I had to power
cycle it.

This time around I decided to use the 'Core' machine as the client so I
could also time the same test to it's local ZFS filesystems.

At first I got my hopes up, because the test ran to completion (and
rather quickly) locally on the core machine. I then added running it
over NFS to the 'Stor' machine to the testing. In the beginning I was
untarring it once on each filesystem, and even over NFS this worked
(though slower than I'd hoped for having the ZIL and cacheflush disabled.)

So I thought I'd push the DeDup a little harder, and I expanded the test
to untar the sources 4 times per filesystem. This ran fine until the 4th
NFS filesystem, where the 'Stor' machine panic'd. The client waited
while it rebooted, and then resumed the test causing it to panic a
second time. For some reason it hung so bad the second time it didn't
even reboot - I'll have to power cycle it monday when I get to work.

The 2 stack traces are identical:

 anic[cpu3]/thread=ff001782fc60: BAD TRAP: type=e (#pf Page fault) 
 rp=ff001782f9c0 addr=18 occurred in module unix due to a NULL pointer 
 dereference
 
 sched: #pf Page fault
 Bad kernel fault at addr=0x18
 pid=0, pc=0xfb863374, sp=0xff001782fab8, eflags=0x10286
 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
 cr2: 18cr3: 500cr8: c
 
 rdi: ff03dc84fcfc rsi: ff03e1d03d98 rdx:2
 rcx:2  r8:0  r9: ff0017a51c60
 rax: ff001782fc60 rbx:2 rbp: ff001782fb10
 r10:   e10377c748 r11: ff00 r12: ff03dc84fcfc
 r13: ff00 r14: ff00 r15:   10
 fsb:0 gsb: ff03e1d03ac0  ds:   4b
  es:   4b  fs:0  gs:  1c3
 trp:e err:0 rip: fb863374
  cs:   30 rfl:10286 rsp: ff001782fab8
  ss:   38
 
 ff001782f8a0 unix:die+dd ()
 ff001782f9b0 unix:trap+177b ()
 ff001782f9c0 unix:cmntrap+e6 ()
 ff001782fb10 unix:mutex_owner_running+14 ()
 ff001782fb40 ips:ips_remove_busy_command+27 ()
 ff001782fb80 ips:ips_finish_io_request+a8 ()
 ff001782fbb0 ips:ips_intr+7b ()
 ff001782fc00 unix:av_dispatch_autovect+7c ()
 ff001782fc40 unix:dispatch_hardint+33 ()
 ff0018517580 unix:switch_sp_and_call+13 ()
 ff00185175d0 unix:do_interrupt+b8 ()
 ff00185175e0 unix:_interrupt+b8 ()
 ff00185176e0 genunix:kmem_free+34 ()
 ff0018517710 zfs:zio_pop_transforms+86 ()
 ff0018517780 zfs:zio_done+152 ()
 ff00185177b0 zfs:zio_execute+8d ()
 ff0018517810 zfs:zio_notify_parent+a6 ()
 ff0018517880 zfs:zio_done+3e2 ()
 ff00185178b0 zfs:zio_execute+8d ()
 ff0018517910 zfs:zio_notify_parent+a6 ()
 ff0018517980 zfs:zio_done+3e2 ()
 ff00185179b0 zfs:zio_execute+8d ()
 ff0018517a10 zfs:zio_notify_parent+a6 ()
 ff0018517a80 zfs:zio_done+3e2 ()
 ff0018517ab0 zfs:zio_execute+8d ()
 ff0018517b50 genunix:taskq_thread+248 ()
 ff0018517b60 unix:thread_start+8 ()
 
 syncing file systems... done
 dumping to /dev/zvol/dsk/syspool/dump, offset 65536, content: kernel + curproc
   0% done: 0 pages dumped, dump failed: error 5
 rebooting...
 

As I read this, it's probably a bug in the IPS driver. But I really
don't know anything about kernel panic's.

This seems 100% reproducible, so I'm happy to run more tests in KDB if
it will help. As I've mentioned before I'd be happy to try to work on
the code myself if it were available.

Anyone have any ideas?

   -Kyle



On 7/7/2010 3:12 PM, Kyle McDonald wrote:
 On 6/24/2010 6:31 PM, James C. McPherson wrote:
 
 
 hi Kyle

Re: [zfs-discuss] Announce: zfsdump

2010-06-29 Thread Kyle McDonald
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/28/2010 10:30 PM, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Tristram Scott

 If you would like to try it out, download the package from:
 http://www.quantmodels.co.uk/zfsdump/
 
 I haven't tried this yet, but thank you very much!
 
 Other people have pointed out bacula is able to handle multiple tapes, and
 individual file restores.  However, the disadvantage of
 bacula/tar/cpio/rsync etc is that they all have to walk the entire
 filesystem searching for things that have changed.

A compromise here might be to feed those tools the output from the new
ZFS diff command (which 'diffs' 2 snapshots.) when it arrives.

That might get somethign close to the best of both worlds.

 -Kyle

 
 The advantage of zfs send (assuming incremental backups) is that it
 already knows what's changed, and it can generate a continuous datastream
 almost instantly.  Something like 1-2 orders of magnitude faster per
 incremental backup.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)

iQEcBAEBAgAGBQJMKgO2AAoJEEADRM+bKN5wqF0IAJMN1+41+WSEy8qR4QrxFkPc
VgHv976VjY/mf2EujeSLQOwHEzx4bEfAnA7DjehQqim0YXSvo5jIDXwEZYkoCBaU
TsD6RQucks23fJUhsf0XKZNXZkpe7dqxGFXbOVd8so12LoYaB4/ZfZMdaQrhOHX8
CwyjS22YCvgxYTEUXs52RSwBg8Qw/sxjMYNa2D/iJPgZ8qtezNiiJD3bb8b30TRy
0YFHnAaC6V4/iyDvh+NpixPflaLMFmCkSh55zK1rBVHNJ7npUpZEFAKUZOXq/q38
bttGomj5gJSaoI8u8NGqADuh4Bk7JbkqKncXGJ6gxwW0pyIEplI3tS6yCTHgP/w=
=Hhu9
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSDs adequate ZIL devices?

2010-06-16 Thread Kyle McDonald
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've very in-frequently seen the RAMSAN devices mentioned here. Probably
due to price.

However a long time ago I think I remember someone suggesting a build it
yourself RAMSAN.

Where is the down side of one or 2 OS boxes with a whole lot of RAM
(and/or SSD's) exporting either RAMdisks or zVOLs out over iSCSI, FCoE,
or direct FC (can OS do that?)

If the RAM and/or SSD's (or even HD's) ere large enough this box might
be able to serve several other ZFS servers. A dedicated Network, or
direct connections if there are enough ports, should eliminate the net
from the being a bottle neck.

A sub $100 UPS (or 2) could protect the whole thing.

I'm sure I'm missing something, but I'm not seeing it at the moment.
Anyone else have any ideas?

 -Kyle
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMGN7DAAoJEEADRM+bKN5w35EIAKX5T96Ls4wNQUMEtHKp1qpM
cu3TlS+h+2vRGMYq0ZMnudiEvGlvxOldifSUHkHWWVMqOsPZplMcBJMoDXOQgChU
i4NPSMTnjPT3zRxLeOm6ZCrfHv4/rYr4RNYjN2DUcaXHrfGdMXg0aYFAoJxObnwx
zMNB8xLqqlXDIkSo3i9ONZAbvVbHehs8V3az63j/P+AyyQcyhu96xR3wjJZpfDnI
N7kE3id9o8WNufw35KyQy3w/bOAvhh8dXsuZm81rpaq6VQ1wS5AnRVQ48mhbYua9
kZNy8eLrobOBR2YCZZFoLrXVQWYfSVMV/pL0fYUf2J12P7EETk6LHKnr3Hy7W2E=
=XDQw
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-11 Thread Kyle McDonald
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/11/2010 12:32 AM, Erik Trimble wrote:
 On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet wrote:
 On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwalanu...@kqinfotech.com 
 wrote:
   
 We at KQInfotech, initially started on an independent port of ZFS to
 linux.
 When we posted our progress about port last year, then we came to
 know about
 the work on LLNL port. Since then we started working on to re-base our
 changing on top Brian's changes.

 We are working on porting ZPL on that code. Our current status is that
 mount/unmount is working. Most of the directory operations and
 read/write is
 also working. There is still lot more development work and testing that
 needs to be going in this. But we are committed to make this happen so
 please stay tuned.
  

 Good times ahead!

 I don't mean to be a PITA, but I'm assuming that someone lawyerly has
 had the appropriate discussions with the porting team about how linking
 against the GPL'd Linux kernel means your kernel module has to be
 GPL-compatible.  It doesn't matter if you distribute it outside the
 general kernel source tarball, what matters is that you're linking
 against a GPL program, and the old GPL v2 doesn't allow for a
 non-GPL-compatibly-licensed module to do that.
 
 As a workaround, take a look at what nVidia did for their X driver - it
 uses a GPL'd kernel module as a shim, which their codebase can then call
 from userland. Which is essentially what the ZFS FUSE folks have been
 reduced to doing.
 
 
 If the new work is a whole new implementation of the ZFS *design*
 intended for the linux kernel, then Yea! Great!  (fortunately, it does
 sound like this is what's going on)  Otherwise, OpenSolaris CDDL'd code
 can't go into a Linux kernel, module or otherwise.
 

Actually my understanding of this is that it revolves around
distribution (copying - since it's based on copyright) of the code.

If the developers distribute source code, which is then compiled and
linked to the GPL code by the *end-user* then there are no issues, since
the person combining the 2 codebases is not distributing the combined
work further.

The grey-er area (though it can still be ok if I understand correctly)
is when the code is distributed pre-compiled. On one hand presumably GPL
headers were used to do the compiling, but on the other it is still the
*end-user* that links the 2 'programs' together and that's what really
matters.

I beleive this is how all the proprietary binary drivers for linux get
around this issue.

All the licenses do is hamper distribution. The vendors using shims may
do so to make it easier to be included in major linux distributions?

   -Kyle
 
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMEi8JAAoJEEADRM+bKN5w/z0IAMMPo0tcCY2jFb0pJ5Ee6M1j
HJFdpTlg5eMsyIJ/4+lj/G1haMnn2YTD5UT4LWkg5x7LSwqCtNA+lRgcTc5zoYQ3
SievVfCaJ4lal3xB2AoKLzhNd4BxDG4bLBI8S1q8LEyx+J2bhbleWpkATwegJ9N/
xA0yecoQAqxwOv3gOTr7DKbCyo/t4VxXkgKxKHauztYy5JMg/UqhRwQrKnfL4E7H
4qZpqapi81+G77d16cEpCcZvG1lgEYfMas4+5Eju5x1BteXsWs87cWZhVJLN0Pkl
p+CPHSgt0CtP+Wg07ojvHRGbnm32uaLEEmN1ieb08YqEEFsLXE6l5qgEg9fv3cU=
=PByp
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-09 Thread Kyle McDonald
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/9/2010 5:04 PM, Edward Ned Harvey wrote:

 
 Everything is faster with more ram.  There is no limit, unless the total
 used disk in your system is smaller than the available ram in your system
 ... which seems very improbable.


Off topic, but...

When I managed a build/simulation farm for one of Sun's ASIC design
teams, we had several 24 CPU machines with 96GB or 192GB of RAM and only
36GB or maybe 73GB of disk.

Probably a special case though. ;)

  -Kyle
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMEANnAAoJEEADRM+bKN5w+8EH/iUP/eEZUkZLLCyqgKN89yfy
TBePmfHwBgneIvcW+YJrk1aKysXAze/PNxP4tBtUsgoqrbmPQTFqFkAcIrLxw1Sf
udmSD+LQsOAult2W5e/jpJIxbPQRnbWqUuyatimN0xRF6Fs9/D5fFX8LDvjl5Eqb
daf+e2fRGFn0rvQ2g+TQpulR6PwQTdkmh+e7oYkQ7kV6DvKjjbPVApRKrurNVMR5
SQbArcm6xwCmq5x+Yn2bXERlM8IPA9Z4APxScY6P7yxc3yqFbKyosEU98fP1JJtR
GWflGBRc+uysozCu6Dc2WSek/loIRnihzDTDtdcZynLXsN7if139LaCGYFRx1j4=
=ylMM
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Kyle McDonald
On 5/27/2010 2:45 PM, Jan Kryl wrote:
 Hi Frank,

 On 24/05/10 16:52 -0400, Frank Middleton wrote:
   
  Many many moons ago, I submitted a CR into bugs about a
  highly reproducible panic that occurs if you try to re-share
  a  lofi mounted image. That CR has AFAIK long since
  disappeared - I even forget what it was called.

  This server is used for doing network installs. Let's say
  you have a 64 bit iso lofi-mounted and shared. You do the
  install, and then wish to switch to a 32 bit iso. You unshare,
  umount, delete the loopback, and then lofiadm the new iso,
  mount it and then share it. Panic, every time.

  Is this such a rare use-case that no one is interested? I have
  the backtrace and cores if anyone wants them, although
  such were submitted with the original CR. This is pretty
  frustrating since you start to run out of ideas for mountpoint
  names after a while unless you forget and get the panic.

  FWIW (even on a freshly booted system after a panic)
  # lofiadm zyzzy.iso /dev/lofi/1
  # mount -F hsfs /dev/lofi/1 /mnt
  mount: /dev/lofi/1 is already mounted or /mnt is busy
  # mount -O -F hsfs /dev/lofi/1 /mnt
  # share /mnt
  #

  If you unshare /mnt and then do this again, it will panic.
  This has been a bug since before Open Solaris came out.

  It doesn't happen if the iso is originally on UFS, but
  UFS really isn't an option any more.  FWIW the dataset
  containing the isos has the sharenfs attribute set,
  although it doesn;t have to be actually mounted by
  any remote NFS for this panic to occur.

  Suggestions for a workaround most welcome!

 
 the bug (6798273) has been closed as incomplete with following
 note:

 I cannot reproduce any issue with the given testcase on b137.

 So you should test this with b137 or newer build. There have
 been some extensive changes going to treeclimb_* functions,
 so the bug is probably fixed or will be in near future.

 Let us know if you can still reproduce the panic on
 recent build.

   
I don't know if the code path is the same enough, bu you should also try
it like this:

# mount -F hsfs zyzzy.iso /mnt

For many builds now, (Open)Solaris hasn't needed the 'lofiadm' step for
ISO's (and possibly other FS's that can be guessed)

I now put ISO's (for installs just like you) directly in my /etc/vfstab.

  -Kyle

 thanks
 -jan
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-05-27 Thread Kyle McDonald
On 5/27/2010 9:30 PM, Reshekel Shedwitz wrote:
 Some tips…

 (1) Do a zfs mount -a and a zfs share -a. Just in case something didn't get 
 shared out correctly (though that's supposed to automatically happen, I think)

 (2) The Solaris automounter (i.e. in a NIS environment) does not seem to 
 automatically mount descendent filesystems (i.e. if the NIS automounter has a 
 map for /public pointing to myserver:/mnt/zfs/public but on myserver, I 
 create a descendent filesystem in /mnt/zfs/public/folder1, browsing to 
 /public/folder1 on another computer will just show an empty directory all the 
 time).
   
 The automounter behaves the same irregardless of whether NIS is
invovled or not (or LDAP for that matter.) The Automounter can be
configured with files locally, and that won't change it's behavior.

The behavior your describing has been the behavior of all flavors of NFS
since it was born, and also doesn't have anything to do with the
automounter - it was by design. No automounter I'm aware of is capable
of learning on it's own that 'folder1' is a new filesystem (not a new
directory) and mounting it. So this isn't limited to Solaris.

 If you're in that sort of environment, you need to add another map on NIS.
   
Your example doesn't specify if /public is a direct or indirect mount,
being in / kind of implies it's direct, and those mounts can be more
limiting (more so in the past) and most admins avoid using the
auto.direct map for these reasons.

If the example was /import/public with /import being defined by the
auto.import map, then the solution to this problem is not an entirely
new entry in the the map for /import/public/folder1, but to convert the
entry for /import/folder1 to a hierarchical mount entry, specifying
explicitly the folder1 sub mount. A hierarchical mount can even mount
folder1 from a different server than public came from.

In the past (SunOS4 and early Solaris timeframe) heirarchical mounts had
some limitations (mainly issues with unmounting them) that made people
wary of them. Most if not all of those have been eliminated.

In general the Solaris automounter is very reliable and flexible and can
be configured to do almost anything you want. Recent linux automounters
(autofs4??) have come very close to the Solaris ones, however earlier
ones had some missing fieatures, buggy features, and some different
interpretations of the maps.

But the issues described in this thread is not an automounter issue,
it's a design issue of NFS - at least for all versions of NFS before v4.
Version 4 has a feature that others have mentioned called mirror
mounts that tries to pass along the information trequired for the
client to re-create the sub-mount - Even if the original fileserver
mounted the sub-filesystem from another server! It's a cool feature, but
NFS v4 suport in clients isn't complete yet, so specifying the full
hierarchical mount tree in the automount maps is still required.

 (3) Try using /net mounts. If you're not aware of how this works, you can 
 browse to /net/computer name to see all the NFS mounts. On Solaris, /net 
 *will* automatically mount descendent filesystems (unlike NIS).
   
In general /net mounts are a bad idea. While it will basically scan the
output of 'showmount -e' for everything the server exports, and mount it
all, that's not exactly what you always want. It will only pick up
sub-filesystem that are explicitly shared (which NFSv4 might also only
do I'm not sure) and it will miss branches of the tree if they are
mounted from another server.

Also most automounters that I'm aware of will only mount all the
exported filesystems at the time of the access to /net/hostname, and
(unles it's unused long enough to be unmounted) will miss all changes in
what is exported on the server until the mount is triggered again.

On top of that, /net/hostname mounts encourage embedding the hostname of
the server in config files, scripts, and binaries (-R path for shared
libraries) and that's not good since you then can't move a filesystem
from one host to another, since you need to maintain that /net/hostname
path forever - or edit many files and recompile programs. (If I recall
correctly, this was once used as one of the arguments against shared
libraries by some.)

Because of this, by using /net/hostname, you give up one of the biggest
benefits of the automounter - redirection. By making an auto.import map
that has an entry for 'public' you allow yourself to be able to clone
public to a new server, and modify the map to (over time as it is
unmounted and remounted) migrate the clients to the new server.

Lastly using /net also diables the load-sharing and failover abilities
of read-only automounts, since you are by definition limiting yourself
to one hostname.

That was longer than I expected, but hopefully it will help some. :)

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

[zfs-discuss] USB Flashdrive as SLOG?

2010-05-25 Thread Kyle McDonald
Hi,

I know the general discussion is about flash SSD's connected through
SATA/SAS or possibly PCI-E these days. So excuse me if I'm askign
something that makes no sense...

I have a server that can hold 6 U320 SCSI disks. Right now I put in 5
300GB for a data pool, and 1 18GB for the root pool.

I've been thinking lately that I'm not sure I like the root pool being
unprotected, but I can't afford to give up another drive bay. So
recently the idea occurred to me to go the other way. If I were to get 2
USB Flash Thunb drives say 16 or 32 GB each, not only would i be able to
mirror the root pool, but I'd also be able to put a 6th 300GB drive into
the data pool.

That led me to wonder whether partitioning out 8 or 12 GB on a 32GB
thumb drive would be beneficial as an slog?? I bet the USB bus won't be
as good as SATA or SAS, but will it be better than the internal ZIL on
the U320 drives?

This seems like at least a win-win, and possibly a win-win-win.
Is there some other reason I'm insane to consider this?

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] USB Flashdrive as SLOG?

2010-05-25 Thread Kyle McDonald
On 5/25/2010 11:39 AM, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Kyle McDonald

 I've been thinking lately that I'm not sure I like the root pool being
 unprotected, but I can't afford to give up another drive bay. 
 
 I'm guessing you won't be able to use the USB thumbs as a boot device.  But
 that's just a guess.
   
No I've installed to an 8GB one on my laptop and booted from it. And
this server offers USB drives as a boot option, I don't see why it
wouldn't work. but I won't kow till I try it.
 However, I see nothing wrong with mirroring your primary boot device to the
 USB.  At least in this case, if the OS drive fails, your system doesn't
 crash.  You're able to swap the OS drive and restore your OS mirror.

   
True. If nothing else I may do at least that.
   
 That led me to wonder whether partitioning out 8 or 12 GB on a 32GB
 thumb drive would be beneficial as an slog?? 
 
 I think the only way to find out is to measure it.  I do have an educated
 guess though.  I don't think, even the fastest USB flash drives are able to
 work quickly, with significantly low latency.  Based on measurements I made
 years ago, so again I emphasize, only way to find out is to test it.

   
Yes I guess Ill have to try some benchmarks. The thing that got me
thinking was that many of these drives support a windows feature called
'Ready boost' - which I think is just windows swapping to the USB drive
instead of HD - but Windows does a performance test on the device to
seee it's fast enough. I thought maybe if it's faster to swap to than a
HD it might be faster for an SLOG too.

But you're right the only way to know is to measure it.
 One thing you could check, which does get you a lot of mileage for free
 is:  Make sure your HBA has a BBU, and enable the WriteBack.  In my
 measurements, this gains about 75% of the benefit that log devices would
 give you.

   
My HBA's have 256MB of BBC. And it's enabled on all 6 drives, so that
should help. However I may have hit a bug inthe 'isp' driver (still have
to debug and see if that's the root cause) and I may need to yank the
RAID enabler, and go back to straight SCSI.

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Kyle McDonald
SNIP a whole lot of ZIL/SLOG discussion

Hi guys.

yep I know about the ZIL, and SSD Slogs.

While setting Nextenta up it offered to disable the ZIL entirely. For
now I left it on. In the end (hopefully for only specifc filesystems -
once that feature is released.) I'll end up disabling the ZIL for our
software builds since:

1) The builds are disposable - We only need to save them if they finish,
and we can restart them if needed.
2) The build servers are not on UPS so a power failure is likely to make
the clients lose all state and need to restart anyway.

But, This issue I've seen with Nexenta, is not due to the ZIL. It runs
until it literally crashes the machine. It's not just slow, It brings
the machine to it's knees. I beleive it does have something to do with
exhausting memory though. As Erast says it maybe the IPS driver (though
I've used that on b130 of SXCE without issues,) or who knows what else.

I did download some updates from Nexenta yesterday. I'm going to try to
retest today or tomorrow.

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-20 Thread Kyle McDonald
Hi all,

I recently installed Nexenta Community 3.0.2 on one of my servers:

IBM eSeries X346
2.8Ghz Xeon
12GB DDR2 RAM
1 builtin BGE interface for management
4 port Intel GigE card aggregated for Data
IBM ServRAID 7k with 256MB BB Cache with (isp driver)
  6 RAID0 single drive LUNS (so I can use the Cache)
1 18GB LUN for the rpool
5 300GB LUN for the data pool
1 RAIDZ1 pool from the 5 300GB drives.
  4 test filesystems
1 No Dedup, No Compression
1 DeDup, No Compression
1 No DeDup, Compression
1 DeDup, Compression

This is pretty old hardware, so I wasn't expecting miracles, but I
thought I'd give it a shot.
My work load is NFS service to software build servers (cvs checkouts, un
tarring files, compiling, etc.) I'm hoping the many CVS checkout trees
will lend themselves to DeDup well, and I know source code should
compress easily.

I setup one client with a single GigE connection, mounted the four file
systems (plus one from the netapp we have here) and proceeded to write a
loop to time both un-tarring the gcc-4.3.3 sources to those 5
filesystems, and to 1 local directory, and to rm -rf the sources too.

The tar took 28 seconds and 10 seconds to remove in the local dir, then
on the first ZFS/NFS filesystem mount, it took basically forever and
hung the Nexenta server. I was watching it go on the web admin page and
it all looked fine for a while, then the client started reporting 'NFS
Server not responding, still trying...' For a while, there were Also
'NFS Server OK' messages too, and the Web GUI remained responsive.
Eventually The OK messages stopped, and the Web GUI froze.

I went an rebooted the NFS client thinking that id the requests stopped
the Server might catch up, but it never started responding again.

I was only untarring a file.. How did this bring the machine down?
I hadn't even gotten to the FS's that had SeSup or Compression turned
on, so those shouldn't have affected things - yet.

Any ideas?

  -Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-05-04 Thread Kyle McDonald
On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote:
 valrh...@gmail.com valrh...@gmail.com writes:

   
 I have been using DVDs for small backups here and there for a decade
 now, and have a huge pile of several hundred. They have a lot of
 overlapping content, so I was thinking of feeding the entire stack
 into some sort of DVD autoloader, which would just read each disk, and
 write its contents to a ZFS filesystem with dedup enabled. [...] That
 would allow me to consolidate a few hundred CDs and DVDs onto probably
 a terabyte or so, which could then be kept conveniently on a hard
 drive and archived to tape.
 
 it would be inconvenient to make a dedup copy on harddisk or tape, you
 could only do it as a ZFS filesystem or ZFS send stream.  it's better to
 use a generic tool like hardlink(1), and just delete files afterwards
 with

   
There is a perl script floating around on the internet for years that
will convert copies of files on the same FS to hardlinks (sorry I don't
have the name handy). So you don't need ZFS. Once this is done you can
even recreate an ISO and burn it back to DVD (possibly merging hundreds
of CD's into one DVD (or BD!). The script can also delete the
duplicates, but there isn't much control over which one it keeps - for
backupsyou may realyl want to  keep the earliest (or latest?) backup the
file appeared in.

Using ZFS Dedup is an interesting way of doing this. However archiving
the result may be hard. If you use different datasets (FS's) for each
backup, can you only send 1 dataset at a time (since you can only
snapshot on a dataset level? Won't that 'undo' the deduping?
 
If you instead put all the backups on on data set, then the snapshot can
theoretically contain the dedpued data. I'm not clear on whether
'send'ing it will preserve the deduping or not - or if it's up to the
receiving dataset to recognize matching blocks? If the dedup is in the
stream, then you may be able to write the stream to a DVD or BD.

Still if you save enough space so that you can add the required level of
redundancy, you could just leave it on disk and chuck the DVD's. Not
sure I'd do that, but it might let me put the media in the basement,
instead of the closet, or on the desk next to me.

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs directory symlink owner

2010-05-03 Thread Kyle McDonald
On 5/3/2010 7:41 AM, Michelle Knight wrote:
 The long ls command worked, as in it created the links, but they didn't work 
 properly under the ZFS SMB share.
   
I'm guessing you meant the 'long ln' command?

If you look at what those 2 commadns create you'll notice (in the output
of ls -l) that the target the link points to has been recorded in the
link differently. One will be relative (../a/foo) and the other absolute
(/mirror/audio-Cd-Tracks/a/foo). This can affect how the SMB server
process these links when requests for them are made depending on how the
parent directories are shared (or not shared.) The relative links should
work I would think since they don't 'leave' the SMB share.

 They didn't work as in, on a remote Linux box, I could execute ls and see 
 them, but I couldn't change in; permission issues. (despite having the 
 correct ownership) and also on the remote linux box, the GUI file browser 
 couldn't even see the folders.

   
Are you also sharing these files to Windows machines?

If you're only sharing them to Linux machines, then NFS would be so much
easier to use. You'll still want relative links though.

  -Kyle

 By changing in to the directory and then executing the ls command relative to 
 that point, everything worked.

 Odd.
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs directory symlink owner

2010-05-03 Thread Kyle McDonald
On 5/3/2010 4:56 PM, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Kyle McDonald

 If you're only sharing them to Linux machines, then NFS would be so
 much
 easier to use. You'll still want relative links though.
 
 Only if you have infrastructure to sanitize the UID's.

 If you have disjoint standalone machines, then samba winbind works pretty
 well to map usernames to locally generated unique UID's.  In which case,
 IMHO, samba is easier than NFS.  However, if you do have some kind of
 domains LDAP, NIS, etc... then I agree 1,000% NFS is easier than samba.

   
True, using local passwd files on more than a handful of machines can
make adding and removing users and changing passwords a pain.

But (and I could be wrong these days) in my experience, while the Samba
server is great, the SMB client on linux can only mount the share as a
single specific user, and all accesses to files in the share are
performed as that user. Right?

That to me makes SMB a less desirable filesystem then NFS where you
can't really tell the difference between that and UFS or whatever.

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-04-24 Thread Kyle McDonald
On 3/9/2010 1:55 PM, Matt Cowger wrote:
 That's a very good point - in this particular case, there is no option to
 change the blocksize for the application.

   
I have no way of guessing the effects it would have, but is there a
reason that the filesystem blocks can't be a multiple of the application
block size? I mean 4 4kb app blocks to 1 16kb fs block sounds like it
might be a decent comprimise to me. Decent enough to make it worth
testing anyway.

  -Kyle

 On 3/9/10 10:42 AM, Roch Bourbonnais roch.bourbonn...@sun.com wrote:

   
 I think This is highlighting that there is extra CPU requirement to
 manage small blocks in ZFS.
 The table would probably turn over if you go to 16K zfs records and
 16K reads/writes form the application.

 Next step for you is to figure how much reads/writes IOPS do you
 expect to take in the real workloads and whether or not the filesystem
 portion
 will represent a significant drain of CPU resource.

 -r


 Le 8 mars 10 à 17:57, Matt Cowger a écrit :

 
 Hi Everyone,

 It looks like I¹ve got something weird going with zfs performance on
 a ramdiskS.ZFS is performing not even a 3rd of what UFS is doing.

 Short version:

 Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren¹t
 swapping
 Create zpool on it (zpool create ramS.)
 Change zfs options to turn off checksumming (don¹t want it or need
 it), atime, compression, 4K block size (this is the applications
 native blocksize) etc.
 Run a simple iozone benchmark (seq. write, seq. read, rndm write,
 rndm read).

 Same deal for UFS, replacing the ZFS stuff with newfs stuff and
 mounting the UFS forcedirectio (no point in using a buffer cache
 memory for something that¹s already in memory)

 Measure IOPs performance using iozone:

 iozone  -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g

 With the ZFS filesystem I get around:
 ZFS 
  
 (seq
  write) 42360 (seq read)31010   (random
 read)20953   (random write)32525
 Not SOO bad, but here¹s UFS:
 UFS 
 (seq
  write )42853 (seq read) 100761(random read)
 100471   (random write) 101141

 For all tests besides the seq write, UFS utterly destroys ZFS.

 I¹m curious if anyone has any clever ideas on why this huge
 disparity in performance exists.  At the end of the day, my
 application will run on either filesystem, it just surprises me how
 much worse ZFS performs in this (admittedly edge case) scenario.

 --M
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-19 Thread Kyle McDonald
On 4/17/2010 9:03 AM, Edward Ned Harvey wrote:

 It would be cool to only list files which are different.
 
 Know of any way to do that?
   
 cmp
 
 Oh, no.  Because cmp and diff require reading both files, it could take
 forever, especially if you have a lot of snapshots to check, with a large
 file or set of files...  Well, what the heck.  Might as well make it
 optional.  Sometimes people will just want to check a single small file.

   
I think I saw an ARC case go by recently for anew 'zfs diff' command. I
think it allows you compare 2 snapshots, or maybe the live filesystem
and a snapshot and see what's changed.

It sounds really useful, Hopefully it will integrate soon.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-16 Thread Kyle McDonald
On 4/16/2010 10:30 AM, Bob Friesenhahn wrote:
 On Thu, 15 Apr 2010, Eric D. Mudama wrote:

 The purpose of TRIM is to tell the drive that some # of sectors are no
 longer important so that it doesn't have to work as hard in its
 internal garbage collection.

 The sector size does not typically match the FLASH page size so the
 SSD still has to do some heavy lifting.  It has to keep track of many
 small holes in the FLASH pages.  This seems pretty complicated since
 all of this information needs to be well-preserved in non-volatile
 storage.

But doesn't the TRIM command help here. If as the OS goes along it makes
sectors as unused, then the SSD will have a lighter wight lift to only
need to read for example 1 out of 8 (assuming sectors of 512 bytes, and
4K FLASH Pages) before writing a new page with that 1 sector and 7 new
ones.

Additionally in the background I would think it would be able to find a
Page with 3 inuse sectors and another with 5 for example, write all 8 to
a new page, remap those sectors to the new location, and then pre-erase
the 2 pages just freed up.

How doesn't that help?

 -Kyle

 Bob
 -- 
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD sale on newegg

2010-04-06 Thread Kyle McDonald
On 4/6/2010 3:41 PM, Erik Trimble wrote:
 On Tue, 2010-04-06 at 08:26 -0700, Anil wrote: 
   
 Seems a nice sale on Newegg for SSD devices. Talk about choices. What's the 
 latest recommendations for a log device?

 http://bit.ly/aL1dne
 
 The Vertex LE models should do well as ZIL  (though not as well as an
 X25-E or a Zeus) for all non-enterprise users.

 The X25-M is still the best choice for a L2ARC device, but the Vertex
 Turbo or Cosair Nova are good if you're on a budget.

 If you really want an SSD a boot drive, or just need something for
 L2ARC, the various Intel X25-V models are cheap, if not a really great
 performers. I'd recommend one of these if you want an SSD for rpool, or
 if you need a large L2ARC for dedup (or similar) and can't afford
 anything in the X25-M price range.  You should also be OK with a Corsair
 Reactor in this performance category.

   
What about if you want to get one that you can use for both the rpool,
and ZIL (for another data pool?)
What if you want one for all 3 (rpool, ZIL, L2ARC)??

 -Kyle


   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Kyle McDonald
On 4/4/2010 11:04 PM, Edward Ned Harvey wrote:
 Actually, It's my experience that Sun (and other vendors) do exactly
 that for you when you buy their parts - at least for rotating drives, I
 have no experience with SSD's.

 The Sun disk label shipped on all the drives is setup to make the drive
 the standard size for that sun part number. They have to do this since
 they (for many reasons) have many sources (diff. vendors, even diff.
 parts from the same vendor) for the actual disks they use for a
 particular Sun part number.
 
 Actually, if there is a fdisk partition and/or disklabel on a drive when it
 arrives, I'm pretty sure that's irrelevant.  Because when I first connect a
 new drive to the HBA, of course the HBA has to sign and initialize the drive
 at a lower level than what the OS normally sees.  So unless I do some sort
 of special operation to tell the HBA to preserve/import a foreign disk, the
 HBA will make the disk blank before the OS sees it anyway.

   
That may be true. Though these days they may be spec'ing the drives to
the manufacturer's at an even lower level.

So does your HBA have newer firmware now than it did when the first disk
was connected?
Maybe it's the HBA that is handling the new disks differently now, than
it did when the first one was plugged in?

Can you down rev the HBA FW? Do you have another HBa that might still
have the older Rev you coudltest it on?

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-05 Thread Kyle McDonald
I've seen the Nexenta and EON webpages, but I'm not looking to build my own.

Is there anything out there I can just buy?

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Kyle McDonald
On 4/2/2010 8:08 AM, Edward Ned Harvey wrote:
 I know it is way after the fact, but I find it best to coerce each
 drive down to the whole GB boundary using format (create Solaris
 partition just up to the boundary). Then if you ever get a drive a
 little smaller it still should fit.
 
 It seems like it should be unnecessary.  It seems like extra work.  But
 based on my present experience, I reached the same conclusion.

 If my new replacement SSD with identical part number and firmware is 0.001
 Gb smaller than the original and hence unable to mirror, what's to prevent
 the same thing from happening to one of my 1TB spindle disk mirrors?
 Nothing.  That's what.

   
Actually, It's my experience that Sun (and other vendors) do exactly
that for you when you buy their parts - at least for rotating drives, I
have no experience with SSD's.

The Sun disk label shipped on all the drives is setup to make the drive
the standard size for that sun part number. They have to do this since
they (for many reasons) have many sources (diff. vendors, even diff.
parts from the same vendor) for the actual disks they use for a
particular Sun part number.

This isn't new, I beleive IBM, EMC, HP, etc all do it also for the same
reasons.
I'm a little surprised that the engineers would suddenly stop doing it
only on SSD's. But who knows.

  -Kyle

 I take it back.  Me.  I am to prevent it from happening.  And the technique
 to do so is precisely as you've said.  First slice every drive to be a
 little smaller than actual.  Then later if I get a replacement device for
 the mirror, that's slightly smaller than the others, I have no reason to
 care.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] *SPAM* Re: zfs send/receive - actual performance

2010-03-31 Thread Kyle McDonald
On 3/27/2010 3:14 AM, Svein Skogen wrote:
 On 26.03.2010 23:55, Ian Collins wrote:
  On 03/27/10 09:39 AM, Richard Elling wrote:
  On Mar 26, 2010, at 2:34 AM, Bruno Sousa wrote:

  Hi,
 
  The jumbo-frames in my case give me a boost of around 2 mb/s, so it's
  not that much.
   
  That is about right.  IIRC, the theoretical max is about 4%
  improvement, for MTU of 8KB.
 

  Now i will play with link aggregation and see how it goes, and of
  course i'm counting that incremental replication will be slower...but
  since the amount of data would be much less probably it will still
  deliver a good performance.
   
  Probably won't help at all because of the brain dead way link
  aggregation has to
  work.  See Ordering of frames at
 
 http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol#Link_Aggregation_Control_Protocol
 
 
 
  Arse, thanks for reminding me Richard! A single stream will only use one
  path in a LAG.

 Doesn't (Open)Solaris have the option of setting the aggregate up as a
 FEC or in roundrobin mode?

Solaris does offer what the Wiki describes as  L4 or port number based
hashing.
I'm not sure what FEC is, but when I asked, round-robin isn't available
as preserving packet ordering wouldn't be easy (possible?) that way.

  -Kyle


 //Svein

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-30 Thread Kyle McDonald
On 3/30/2010 2:44 PM, Adam Leventhal wrote:
 Hey Karsten,

 Very interesting data. Your test is inherently single-threaded so I'm not 
 surprised that the benefits aren't more impressive -- the flash modules on 
 the F20 card are optimized more for concurrent IOPS than single-threaded 
 latency.

   

Yes it would be interesting to see the Avg numbers for 10 or more
clients (or jobs on one client) all performing that same test.

 -Kyle

 Adam

 On Mar 30, 2010, at 3:30 AM, Karsten Weiss wrote:

   
 Hi, I did some tests on a Sun Fire x4540 with an external J4500 array 
 (connected via two
 HBA ports). I.e. there are 96 disks in total configured as seven 12-disk 
 raidz2 vdevs
 (plus system, spares, unused disks) providing a ~ 63 TB pool with fletcher4 
 checksums.
 The system was recently equipped with a Sun Flash Accelerator F20 with 4 FMod
 modules to be used as log devices (ZIL). I was using the latest snv_134 
 software release.

 Here are some first performance numbers for the extraction of an 
 uncompressed 50 MB
 tarball on a Linux (CentOS 5.4 x86_64) NFS-client which mounted the test 
 filesystem
 (no compression or dedup) via NFSv3 (rsize=wsize=32k,sync,tcp,hard).

 standard ZIL:   7m40s  (ZFS default)
 1x SSD ZIL:  4m07s  (Flash Accelerator F20)
 2x SSD ZIL:  2m42s  (Flash Accelerator F20)
 2x SSD mirrored ZIL:   3m59s  (Flash Accelerator F20)
 3x SSD ZIL:  2m47s  (Flash Accelerator F20)
 4x SSD ZIL:  2m57s  (Flash Accelerator F20)
 disabled ZIL:   0m15s
 (local extraction0m0.269s)

 I was not so much interested in the absolute numbers but rather in the 
 relative
 performance differences between the standard ZIL, the SSD ZIL and the 
 disabled
 ZIL cases.

 Any opinions on the results? I wish the SSD ZIL performance was closer to the
 disabled ZIL case than it is right now.

 ATM I tend to use two F20 FMods for the log and the two other FMods as L2ARC 
 cache
 devices (although the system has lots of system memory i.e. the L2ARC is not 
 really
 necessary). But the speedup of disabling the ZIL altogether is appealing 
 (and would
 probably be acceptable in this environment).
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

 --
 Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-10 Thread Kyle McDonald

On 3/10/2010 3:27 PM, Robert Thurlow wrote:

As said earlier, it's the string returned from the reverse DNS lookup 
that needs to be matched.





So, to make a long story short, if you log into the server
from the client and do who am i, you will get the host
name you need for the share.
Another test (for a server configured as a DNS client, LDAP would be 
different) is to run 'nslookup client-ip' (or the dig equivalent.) The 
name returned is the one that needs to be in the share config.


  -Kyle





Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharemgr

2009-11-25 Thread Kyle McDonald

dick hoogendijk wrote:

glidic anthony wrote:

  

I have a solution with use zfs set sharenfs=rw,nosuid zpool but i prefer
use the sharemgr command.



Then you prefere wrong.

To each their own.

 ZFS filesystems are not shared this way.
  
They can be. I do it all the time. There's nothing technical that 
dictates that sharemgr can't be used on ZFS filesystems.
Just because ZFS provides an alternate way, that doesn't make it the 
only way, or even the 'one true way.'


About the only advantage I can see of using zfs share, is inheritance. 
If you don't need that, then sharemgr is just as good, and there are 
cases where it may be simpler - For instance, I loopback mount many many 
ISO's, and need to use sharemgr to share those anyway, I find it much 
more convienent to manage all my shares in one place with one tool.


If sharemgr could (optionally) manage inherited  sharing on ZFS 
filesystems, then I think it'd be cleaner to suggest to users to use the 
one system-wide sharing tool, rather that one that only works for one 
filesystem. I can't remember them right now, but I think there are other 
commands where ZFS seems to have done the same thing and I can't figure 
out why that's the trend? As great as ZFS is, it won't ever be the only 
filesystem around, ISOs (at least) will be around for a long time 
still.  Why start forcing users to learn new tools for each filesystem 
type?

Read up on ZFS and NFS.

  

What make you think he didn't?

While the docs do describe how you can optionally use zfs share (which 
he clearly read about since he mentioned it) they don't prohibit using 
sharemgr. I read his question as How can I get sharemgr to setup 
sharing so that it get inherited on child filesystems?


Apparently the answer to that question is You can't. If you want to 
set it up only once you need zfs share, and if you really want to use 
sharemgr you need to share each filesystem separately. Maybe someday 
that will change.


   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS directory and file quota

2009-11-18 Thread Kyle McDonald

Darren J Moffat wrote:

Jozef Hamar wrote:

Hi all,

I can not find any instructions on how to set the file quota (i.e. 
maximum number of files per filesystem/directory) or directory quota 
(maximum size that files in particular directory can consume) in ZFS.


That is because it doesn't exist.

I understand ZFS has no support for this. Am I right? If I am, are 
there any plans to include this in the next releases of 
OpenSolaris/Solaris?


Why would you want to do that rather than set a maximum amount of space
a filesystem, user or group can consume?
Last I checked NetApp had a 'directory quota' concept, but I don't know 
if it could be used on just any directory, or only on upper level 
directories.


Granted, with ZFS you can just make any directory at any level a new FS, 
and get the same effect, but that can be heavywieght, and have undesired 
side effects.


What is the real problem you are trying to solve by restricting
the number of files that can be created ?

I imagine it's one thta was previously solved with older unix/ufs file 
quotas. Though I can't imagine a use for that now that lack of inodes is 
not likely to be a problem any time soon.


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] More Dedupe Questions...

2009-11-03 Thread Kyle McDonald

Hi Darren,

More below...

Darren J Moffat wrote:

Tristan Ball wrote:

Obviously sending it deduped is more efficient in terms of bandwidth 
and CPU time on the recv side, but it may also be more complicated to 
achieve?


A stream can be deduped even if the on disk format isn't and vice versa.

Is the send dedup'ing more efficient if the filesystem is already 
depdup'd? If both are enabled do they share anything?


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss gone from web?

2009-10-28 Thread Kyle McDonald

Jacob Ritorto wrote:


With the web redesign, how does one get to zfs-discuss via the 
opensolaris.org website?


Sorry for the ot question, but I'm becoming desperate after 
clicking circular links for the better part of the last hour :(


You can get the web pages to load? All I get are The connection has 
timed out. The server at opensolaris.org is taking too long to respond.


Something is messed up.

 -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple cans ZFS project

2009-10-25 Thread Kyle McDonald

David Magda wrote:

On Oct 24, 2009, at 08:53, Joerg Schilling wrote:


The article that was mentioned a few hours ago did mention
licensing problems without giving any kind of evidence for
this claim. If there is evidence, I would be interested in
knowing the background, otherwise it looks to me like FUD.



I'm guessing that you'll never see direct evidence given the 
sensitivity that these negotiations can take. All you'll guess is 
rumours and leaks of various levels of reliability.


Apple can currently just take the ZFS CDDL code and incorporate it 
(like they did with DTrace), but it may be that they wanted a private 
license from Sun (with appropriate technical support and 
indemnification), and the two entities couldn't come to mutually 
agreeable terms.
Indemnification, I think reakky could have been a sticking point. I 
beleive that the NetApp - Sun Legal disputes are still working their 
way through the legal process. If I were Apple I would have wanted some 
protection in the case Sun loses the case. I don't think I'd want to be 
target #2 with precedent already set.


That said, from what I've read, I don't beleive NetApp has a leg to 
stand on But then again I'm not a lawyer. ;)


  -Kyle



Oh well. I'm sure Apple can come up something good in the FS team, but 
it's a shame that the wheel has to be re-invented when there's a 
production-ready option available.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-10-23 Thread Kyle McDonald

Mike Bo wrote:

Once data resides within a pool, there should be an efficient method of moving 
it from one ZFS file system to another. Think Link/Unlink vs. Copy/Remove.

Here's my scenario... When I originally created a 3TB pool, I didn't know the 
best way carve up the space, so I used a single, flat ZFS file system. Now that 
I'm more familiar with ZFS, managing the sub-directories as separate file 
systems would have made a lot more sense (seperate policies, snapshots, etc.). 
The problem is that some of these directories contain tens of thousands of 
files and many hundreds of gigabytes. Copying this much data between file 
systems within the same disk pool just seems wrong.

I hope such a feature is possible and not too difficult to implement, because 
I'd like to see this capability in ZFS.

  
Alternatively, (and I don't know if this is feasible,) it might be 
easier and/or better to be able to set those properties on, and 
independently snapshot regular old sub directories.


Just an idea

 -Kyle


Regards,
mikebo
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Kyle McDonald

Bob Friesenhahn wrote:

On Fri, 23 Oct 2009, Anand Mitra wrote:


One of the biggest questions around this effort would be “licensing”.
As far as our understanding goes; CDDL doesn’t restrict us from
modifying ZFS code and releasing it. However GPL and CDDL code cannot
be mixed, which implies that ZFS cannot be compiled into Linux Kernel
which is GPL. But we believe the way to get around this issue is to
build ZFS as a module with a CDDL license, it can still be loaded in
the Linux kernel. Though it would be restricted to use the non-GPL
symbols, but as long as that rule is adhered to there is no problem of
legal issues.


The legal issues surrounding GPLv2 is what constitutes the Program 
and work based on the Program.  In the case of Linux, the Program 
is usually the Linux kernel, and things like device drivers become a 
work based on the Program.


Conjoining of source code is not really the issue.  The issue is what 
constitutes the Program.


About 10 years ago I had a long discussion with RMS and the 
(presumably) injured party related to dynamically loading a module 
linked to GPLv2 code into our application.  RMS felt that loading that 
module caused the entire work to become a work based on the Program 
while I felt that the module was the work based on the Program but 
that the rest of our application was not since that module could be 
deleted without impact to the application.


Regardless, it has always seemed to me that (with sufficient care), a 
loadable module can be developed which has no linkages to other code, 
yet can still be successfully loaded and used.  In this case it seems 
that the module could be loaded into the Linux kernel without itself 
being distributed under GPL terms.


Disclaimer: I am not a lawyer, nor do I play one on TV. I could be very 
wrong about this.


Along these lines, it's always struck me that most of the restrictions 
of the GPL fall on the entity who distrbutes the 'work' in question.


I would thinkthat distributing the source to a separate original work 
for a module, leaves that responsibility up to who-ever compiles it and 
loads it. This means the end-users, as long as they never distribute 
what they create, are (mostly?) unaffected by the Kernel's GPL, and if 
they do distribute it, the burden is on them.


Arguably that line might even be shifted from the act of compiling it, 
to the act of actually loading (linking) it into the Kernel, so that 
distributing a compiled module might even work the same way. I'm not so 
sure about this though. Presumably compiling it before distribution 
would require the use of include files from the kernel, and that seems a 
grey area to me. Maybe clean room include files could be created?


 -Kyle



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Export, Import = Windows sees wrong groups in ACLs

2009-09-17 Thread Kyle McDonald

Owen Davies wrote:

Thanks.  I took a look and that is exactly what I was looking for.  Of course I 
have since just reset all the permissions on all my shares but it seems that 
the proper way to swap UIDs for users with permissions on CIFS shares is to:

Edit /etc/passwd
Edit /var/smb/smbpasswd

And to change GIDs for groups used on CIFS shares you need to both:

Edit /etc/group
Edit /var/smb/smbgroup.db

Is there a better way to do this than manually editing each file (or db)? 

I've just started reading the CIFS docs recently, so I could be wrong

But I think the smb files were populated when you added the mappings 
(back when /etc/passwd and /etc/group were wrong.)
I bet, if you removed the mappings, fixed the UNIX files, and recreated 
the mappings then the SMB files would be 'fixed'.


It may not be easier, but it probably is better in the case that there 
are other housekeeping things the map commands do.


  -Kyle



 I don't think there is much of this sort of integration yet so that tools 
update things in a consistent way on both the UNIX side and the CIFS side.

Thanks,
Owen Davies
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Kyle McDonald

Scott Meilicke wrote:

I am still not buying it :) I need to research this to satisfy myself.

I can understand that the writes come from memory to disk during a txg write 
for async, and that is the behavior I see in testing.

But for sync, data must be committed, and a SSD/ZIL makes that faster because 
you are writing to the SSD/ZIL, and not to spinning disk. Eventually that data 
on the SSD must get to spinning disk.

  
But the txg (which may contain more data than just the sync data that 
was written to the ZIL) is still written from memory. Just because the 
sync data was written to the ZIL, doesn't mean it's not still in memory.


 -Kyle


To the books I go!

-Scott
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool Layout Advice Needed

2009-08-06 Thread Kyle McDonald

Adam Sherman wrote:

On 6-Aug-09, at 11:32 , Thomas Burgess wrote:
i've seen some people use usb sticks, and in practice it works on 
SOME machines.  The biggest difference is that the bios has to allow 
for usb booting.  Most of todays computers DO.  Personally i like 
compact flash because it is fairly easy to use as a cheap alternative 
to a hard drive.  I mirror the cf drives exactly like they are hard 
drives so if one fails i just replace it.  USB is a little harder to 
do that with because they are just not as consistent as compact 
flash.  But honestly it should work and many people do this.



This product looks really interesting:

http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp

But I can't confirm it will show both cards as separate disks…
My read is that it won't (which is supported by the single SATA data 
connector,) but it will do the mirroring for you.


I know that I generally prefer to let XFS handle the redundancy for me, 
but for you it may be enough to let this do the mirroring for the root pool.


It seems too expensive to get 2.   Do they have a cheaper one that takes 
only 1 CF card?


 -Kyle



A.

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool Layout Advice Needed

2009-08-06 Thread Kyle McDonald

Adam Sherman wrote:

On 6-Aug-09, at 11:50 , Kyle McDonald wrote:
i've seen some people use usb sticks, and in practice it works on 
SOME machines.  The biggest difference is that the bios has to 
allow for usb booting.  Most of todays computers DO.  Personally i 
like compact flash because it is fairly easy to use as a cheap 
alternative to a hard drive.  I mirror the cf drives exactly like 
they are hard drives so if one fails i just replace it.  USB is a 
little harder to do that with because they are just not as 
consistent as compact flash.  But honestly it should work and many 
people do this.


This product looks really interesting:

http://www.addonics.com/products/flash_memory_reader/ad2sahdcf.asp

But I can't confirm it will show both cards as separate disks…
My read is that it won't (which is supported by the single SATA data 
connector,) but it will do the mirroring for you.


Turns out the FAQ page explains that it will not, too bad.

I know that I generally prefer to let ZFS handle the redundancy for 
me, but for you it may be enough to let this do the mirroring for the 
root pool.


I'm with you there.

It seems too expensive to get 2.   Do they have a cheaper one that 
takes only 1 CF card?


I just ordered a pair of the Syba units, cheap enough too test out 
anyway.
Oh. I was looking and if you have an IDE socket, this will do separate 
master/slave devices:
(no IDE cable needed, it plugs right into the MB - There's another that 
uses a cable if you prefer.)


http://www.addonics.com/products/flash_memory_reader/adeb44idecf.asp

And 2 of these (which look remarkably like the Syba ones) would work too:

http://www.addonics.com/products/flash_memory_reader/adsahdcf.asp

They're only 30 each so 2 of those are less than the dual one.


-Kyle




Now to find some reasonably priced 8GB CompactFlash cards…

Thanks,

A.

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Kyle McDonald

Martin wrote:

C,

I appreciate the feedback and like you, do not wish to start a side rant, but 
rather understand this, because it is completely counter to my experience.

Allow me to respond based on my anecdotal experience.

  

What's wrong with make a new pool.. safely copy the data. verify data
and then delete the old pool..



You missed a few steps.  The actual process would be more like the following.
1. Write up the steps and get approval from all affected parties
-- In truth, the change would not make it past step 1.
  

Maybe, but maybe not see below...

2. Make a new pool
3. Quiesce the pool and cause a TOTAL outage during steps 4 through 9
  
That's not entirely true. You can use ZFS send/recv to do the major 
first pass of  #4  (and #5 against the snapshot) Live before the total 
outage.
Then after you quiesce everything, you could use an incremental 
send/recv copy the changes since then quickly, reducing down time.


I'd probably run a second full verify anyway, but in theory, I beleive 
the ZFS checksums are used in the send/recv process to ensure that there 
isn't any corruption, so after enough positive experience, I might start 
to skip the second verify.


This should greatly reduce the length of the down time.


Everyone.

  

and then one day [months or years later] wants to shrink it...



Business needs change.  Technology changes.  The project was a pilot and 
canceled.  The extended pool didn't meet verification requirements, e,g, 
performance and the change must be backed out.
In an Enterprise, a change for performance should have been tested on 
another identical non-production system before being implemented on the 
production one.


I'd have to concur there's more useful things out there. OTOH... 



That's probably true and I have not seen the priority list.  I was merely amazed at the 
number of Enterprises don't need this functionality posts.

  
All that said, as a personal home user, this is a feature I'm hoping for 
all the time. :)


 -Kyle


Thanks again,
Marty
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Kyle McDonald

Jacob Ritorto wrote:

Is this implemented in OpenSolaris 2008.11?  I'm moving move my filer's rpool 
to an ssd mirror to free up bigdisk slots currently used by the os and need to 
shrink rpool from 40GB to 15GB. (only using 2.7GB for the install).

  
Your best bet would be to install the new ssd drives, create a new pool, 
snapshot the exisitng pool and use ZFS send/recv to migrate the data to 
the new pool. There are docs around about how install grub and the boot 
blocks on the new devices also. After that remove (export!, don't 
destroy yet!)

the old drives, and reboot to see how it works.

If you have no problems, (and I don't think there's anything technical 
that would keep this from working,) then you're good. Otherwise put the 
old pool back in. :)



 -Kyle


thx
jake
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sol10u7: can't zpool remove missing hot spare

2009-08-05 Thread Kyle McDonald

Will Murnane wrote:

I'm using Solaris 10u6 updated to u7 via patches, and I have a pool
with a mirrored pair and a (shared) hot spare.  We reconfigured disks
a while ago and now the controller is c4 instead of c2.  The hot spare
was originally on c2, and apparently on rebooting it didn't get found.
 So, I looked up what the new name for the hot spare was, then added
it to the pool with zpool add home1 spare c4t19d0.  I then tried to
remove the original name for the hot spare:

r...@box:~# zpool remove home1 c2t0d8
r...@box:~# zpool status home1
  pool: home1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
home1ONLINE   0 0 0
  mirror ONLINE   0 0 0
c4t17d0  ONLINE   0 0 0
c4t24d0  ONLINE   0 0 0
spares
  c2t0d8 UNAVAIL   cannot open
  c4t19d0AVAIL

errors: No known data errors

So, how can I convince the pool to release its grasp on c2t0d8?

  
Have you tried making a sparse file with mkfile in /tmp and then ZFS 
replace'ing c2t0d8 with the file, and then zfs remove'ing the file?


I don't know if it will work, but at least at the time of the remove, 
the device will exist.


 -Kyle


Thanks!
Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supported Motherboard SATA controller chipsets?

2009-08-04 Thread Kyle McDonald

Volker A. Brandt wrote:

I'm currently trying to decide between a MB with that chipset and
another that uses the nVidia 780a and nf200 south bridge.

Is the nVidia SATA controller well supported? (in AHCI mode?)



Be careful with nVidia if you want to use Samsung SATA disks.
There is a problem with the disk freezing up.   This bit me with
our X2100M2 and X2200M2 systems.

  
I don't know if it's related to your issue, but I have also seen 
comments around about the nv-sata windows drivers hanging up when 
formatting drives  than 1024GB. But that's been fixed in the latest 
nvidia windows drivers.


Does that sound related, or like something different?

 -Kyle


Regards -- Volker
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Supported Motherboard SATA controller chipsets?

2009-08-03 Thread Kyle McDonald

Hi all,

I think I've read that the AMD 790FX/750SB chipset's SATA controller is 
upported, but may have recently had bugs?


I'm currently trying to decide between a MB with that chipset and 
another that uses the nVidia 780a and nf200 south bridge.


Is the nVidia SATA controller well supported? (in AHCI mode?)

At the moment I'm leaning toward that MB (ASUS M3N-HT) since it seems to 
still be available. Where as the AMD one (ASUS M3A79-T) seems harder to 
find. There is the ASUS M4A79T which is almost the same board, but it 
has 1 less SATA port - Which is also the reason I'm not looking at the 
M4N82 nVidia board.


I wanted to run all this through something like the driver detection 
tool, but since I haven't bought the boards yet, that's kind of tough.


-Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] feature proposal

2009-07-31 Thread Kyle McDonald

dick hoogendijk wrote:

On Fri, 31 Jul 2009 18:38:16 +1000
Tristan Ball tristan.b...@leica-microsystems.com wrote:

  
Because it means you can create zfs snapshots from a non solaris/non 
local client...


Like a linux nfs client, or a windows cifs client.



So if I want a snapshot of i.e. rpool/export/home/dick I can do a zfs
snapshot rpool/export/home/dick, 

But your command requires that it be run on the NFS/CIFS *server* directly.

The 'mkdir' command version can be run on the server or on any NFS or 
CIFS client.


It's possible (likely even) that regular users would not be allowed to 
login to server machines, but if given the right access, they can still 
use the mkdir  version to create their own snapshots from a client.

but what is the exact syntax for the
same snapshot using this other method?
  
As I understand it, if rpool/export/home/dick is mounted on /home/dick, 
then the syntax would be


cd /home/dick/.zfs/snapshot
mkdir mysnapshot

 -Kyle




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] shrinking a zpool - roadmap

2009-07-30 Thread Kyle McDonald

Ralf Gans wrote:


Jumpstart puts a loopback mount into the vfstab,
and the next boot fails.

The Solaris will do the mountall before ZFS starts,
so the filesystem service fails and you have not even
an sshd to login over the network.
  
This is why I don't use the mountpoint settings in ZFS. I set them all 
to 'legacy', and put them in the /etc/vfstab myself.


I keep many .ISO files on a ZFS filesystem, and I LOFI mount them onto 
subdirectories of the same ZFS tree, and then (since they are for 
Jumpstart) loop back mount parts of eacch of the ISO's into /tftpboot


When you've got to manage all this other stuff in /etc/vfstab ayway, 
it's easier to manage ZFS there too. I don't see it as a hardship, and I 
don't see the value of doing it in ZFS to be honest (unless every 
filesystem you have is in ZFS maybe.)


The same with sharing this stuff through NFS. I since the LOFI mounts 
are separate filesystems, I have to share them with share (or sharemgr) 
and it's easier to share the ZFS diretories through those commands at 
the same time.


I must be missing something, but I'm not sure I get the rationale behind 
duplicating all this admin stuff inside ZFS.


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] feature proposal

2009-07-29 Thread Kyle McDonald

Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an administrator 
turns
on this magic property of a filesystem, after that every mkdir *in the root* of
that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is off.

Right now I see this as being mostly useful for /home. Main benefit in this case
is that various user administration tools can work unmodified and do the right
thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

  
But now that quotas are working properly, Why would you want to continue 
the hack of 1 FS per user?


I'm seriously curious here. In my view it's just more work. A more 
cluttered zfs list, and share output. A lot less straight forward and 
simple too.

Why bother? What's the benefit?

-Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] feature proposal

2009-07-29 Thread Kyle McDonald

Darren J Moffat wrote:

Kyle McDonald wrote:

Andriy Gapon wrote:

What do you think about the following feature?

Subdirectory is automatically a new filesystem property - an 
administrator turns
on this magic property of a filesystem, after that every mkdir *in 
the root* of

that filesystem creates a new filesystem. The new filesystems have
default/inherited properties except for the magic property which is 
off.


Right now I see this as being mostly useful for /home. Main benefit 
in this case
is that various user administration tools can work unmodified and do 
the right

thing when an administrator wants a policy of a separate fs per user
But I am sure that there could be other interesting uses for this.

  
But now that quotas are working properly, Why would you want to 
continue the hack of 1 FS per user?


hack ?  Different usage cases!


Why bother? What's the benefit?


The benefit is that users can control their own snapshot policy, they 
can create and destroy their own sub datasets, send and recv them etc.

We can also delegate specific properties to users if we want as well.

This is exactly how I have the builds area setup on our ONNV build 
machines for the Solaris security team.Sure the output of zfs list 
is long - but I don't care about that.
I can imagine a use for a builds. 1 FS per build - I don't know. But why 
link it to the mkdir? Why not make the build scripts do the zfs create 
out right?


When encryption comes along having a separate filesystem per user is 
an useful deployment case because it means we can deploy with separate 
keys for each user (granted may be less interesting if they only 
access their home dir over NFS/CIFS but still useful).  I have a 
prototype PAM module
that uses the users login password as the ZFS dataset wrapping key and 
keeps that in sync with the users login password on password change.


Encryption is an interesting case. User Snapshots I'd need to think 
about more.

Couldn't the other properties be delegated on directories?

Maybe I'm just getting old. ;) I still think having the zpool not 
automatically include a filesystem, and having ZFS containers was a 
useful concept. And I still use share (and now sharemgr) to manage my 
shares, and not ZFS share. Oh well. :)


 -Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-24 Thread Kyle McDonald

Tristan Ball wrote:

It just so happens I have one of the 128G and two of the 32G versions in
my drawer, waiting to go into our DR disk array when it arrives. 

  

Hi Tristan,

Just so I can be clear, What model/brand are the drives you were testing?

 -Kyle


I dropped the 128G into a spare Dell 745 (2GB ram) and used a Ubuntu
liveCD to run some simple iozone tests on it. I had some stability
issues with Iozone crashing however I did get some results...

Attached are what I've got. I intended to do two sets of tests, one for
each of sequential reads, writes, and a random IO mix. I also wanted
to do a second set of tests, running a streaming read or streaming write
in parallel with the random IO mix, as I understand many SSD's have
trouble with those kind of workloads.

As it turns out, so did my test PC. :-) 


I've used 8K IO sizes for all the stage one tests - I know I might get
it to go faster with a larger size, but I like to know how well systems
will do when I treat them badly!

The Stage_1_Ops_thru_run is interesting. 2000+ ops/sec on random writes,
5000 on reads.


The Streaming write load and random over writes were started at the
same time - although I didn't see which one finished first, so it's
possible that the stream finished first and allowed the random run to
finish strong. Basically take these numbers with several large grains of
salt!

Interestingly, the random IO mix doesn't slow down much, but the
streaming writes are hurt a lot.

Regards,
Tristan.



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of thomas
Sent: Friday, 24 July 2009 5:23 AM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] SSD's and ZFS...

  

I think it is a great idea, assuming the SSD has good write


performance.
  

This one claims up to 230MB/s read and 180MB/s write and it's only


$196.
  

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393

Compared to this one (250MB/s read and 170MB/s write) which is $699.

Are those claims really trustworthy? They sound too good to be true!




MB/s numbers are not a good indication of performance. What you should
pay attention to are usually random IOPS write and read. They tend to
correlate a bit, but those numbers on newegg are probably just best case
from the manufacturer.

In the world of consumer grade SSDs, Intel has crushed everyone on IOPS
performance.. but the other manufacturers are starting to catch up a
bit.
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-24 Thread Kyle McDonald

Michael McCandless wrote:

I've read in numerous threads that it's important to use ECC RAM in a
ZFS file server.

My question is: is there any technical reason, in ZFS's design, that
makes it particularly important for ZFS to require ECC RAM?
  
I think, basically the idea is, that if you're going to use ZFS to 
protect your data from this sort of thing through the path to the stable 
storage, then it seems  like a shame (or a waste?)  not to equally 
protect the data both before it's given to ZFS for writing, and after 
ZFS reads it back and returns it to you.


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)

2009-07-24 Thread Kyle McDonald

Bob Friesenhahn wrote:


Of course, it is my understanding that the zfs slog is written 
sequentially so perhaps this applies instead:


Actually, reading up on these drives I've started to wonder about the 
slog writing pattern. While these drives do seem to do a great job at 
random writes, most of the promise shows at sequential writes, so Does 
the slog attempt to write sequentially through the space given to it?


Also there are all sorts of analysis out there about how the drives 
always attempt to write new data to the pages and blocks they know are 
empty since they can't overwrite one page (usually 4k) without erasing 
the whole (512k) block the page is in. This leads to a drop in write 
performance after all the space (both the space you paid for, and any 
extra space the vendor putin to work around this issue) has been used 
once. This shows up in regular filesystems because when a file is 
deleted the drive only sees a new (over)write of some meta-data so the 
OS can record that the file is gone, but the drive is never told that 
the blocks the file was occupying are now free and can be pre-erased at 
the drives convience.


The Drive vendors have come up with a new TRIM command, which some OS's 
(Win7) are talking about supporting in their Filesystems. Obviously for 
use only as an sLog device ZFS itself doesn't need (until people start 
using SSD's as regular pool devices) to know how to use TRIM, but I 
would think that the slog code would need to use it in order to keep 
write speeds up and latencies down. No?


If so, what's the current concensus, thoughts, plans, etc. on if and 
when TRIM will be usable in Solaris/ZFS?


-Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech.

2009-07-24 Thread Kyle McDonald

Miles Nordin wrote:

km == Kyle McDonald kmcdon...@egenera.com writes:



km hese drives do seem to do a great job at random writes, most
km of the promise shows at sequential writes, so Does the slog
km attempt to write sequentially through the space given to it?


thwack NO!  Everyone who is using the code, writing the code, and
building the systems says, io/s is the number that matters.  If you've
got some experience otherwise, fine, odd things turn up all the time.
but AFAICT the consensus is clear right now.

  
Yeah I know. I get it. I screwed up and used the the wrong term. OK? I 
agree with you.


Still when all the previously erased pages are gone, write latencies go 
up (drastically - in some cases worse than a spinning HD,) and io/s goes 
down. So what I really wanted to get into was the question below.

km they can't overwrite one page (usually 4k) without erasing the
km whole (512k) block the page is in.

don't presume to get into the business of their black box so far.
  

I'm not.

Guys like this are:

http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8

That's almost certainly not what they do.  They probably do COW like
ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to
partly-empty pages until the page is full.  In the background a gc
thread will evacuate and rewrite pages that have become spattered with
unreferenced sectors. 
That's where the problem comes in. They have no knowledge of the upper 
filesystem, and don't know what previously written blocks are still 
referenced. When the OS FS rewrites a directory to remove a pointer to 
the string of blocks the file used to use, and updates it's list of 
which LBA sectors are now free vs. in use, it probably happens pretty 
much exactly like you say.


But that doesn't let the SSD mark the sectors the file used as 
unreferenced, so the gc thread can't evacuate them ahead of time and 
add them to the empty page pool.

km The Drive vendors have come up with a new TRIM command, which
km some OS's (Win7) are talking about supporting in their
km Filesystems.

this would be useful for VM's with thin-provisioned disks, too.
  
True. Keeping or Putting the 'holes' back in the 'holey' disk files when 
the VM frees up space would be very useful.

km I would think that the slog code would need to use it in order
km to keep write speeds up and latencies down. No?

read the goofy gamer site review please.  No, not with the latest
intel firmware, it's not needed.
  
I did read at least one review that compared old and new firmware on the 
Intel M model. In that I'm pretty sure they still saw a performance hit
(in latency) when the entire drive had been written to. It may have 
taken longer to hit, and it may have not been as drastic but it was 
still there.


Which review are you talking about?

So what if Intel has fixed it. Not everyone is going to use the intel 
drives. If the TRIM command (assuming it can help at all) can keep the 
other brands and models performing close to how they performed when new, 
then I'd say it's useful in the ZFS slogs too - Just because one vendor 
might have made it unnecessary, doesn't mean it is for everyone.


Does it?

 -Kyle






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

F. Wessels wrote:

Thanks posting this solution.

But I would like to point out that bug 6574286 removing a slog doesn't work 
still isn't resolved. A solution is under it's way, according to George Wilson. But in 
the mean time, IF something happens you might be in a lot of trouble. Even without some 
unfortunate incident you cannot for example export your data pool, pull the drives and 
leave the root pool.
  
In my case the slog slice wouldn't be the slog for the root pool, it 
would be the slog for a second data pool.


If the device went bad, I'd have to replace it, true. But if the device 
goes bad, then so did a good part of my root pool, and I'd have to 
replace that too.

Don't get me wrong I would like such a setup a lot. But I'm not going to 
implement it until the slog can be removed or the pool be imported without the 
slog.

In the mean time can someone confirm that in such a case, root pool and zil in 
two slices and mirrored, that the write cache can be enabled with format? Only 
zfs is using the disk, but perhaps I'm wrong on this. There have been post's 
regarding enabling the write_cache. But I couldn't find a conclusive answer for 
the above scenario.

  
When you have just the root pool on a disk, ZFS won't enable the write 
cache by default. I think you can manually enable it but I don't know 
the dangers. Adding the slog shouldn't be any different. To be honest, I 
don't know how closely the write caching on a SSD matches what a moving 
disk has.


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Brian Hechinger wrote:

On Thu, Jul 23, 2009 at 10:28:38AM -0400, Kyle McDonald wrote:
  
 
  
In my case the slog slice wouldn't be the slog for the root pool, it 
would be the slog for a second data pool.



I didn't think you could add a slog to the root pool anyway.  Or has that
changed in recent builds?  I'm a little behind on my SXCE versions, been
too busy to keep up. :)
  
I don't know either. It's not really what I was looking to do so I never 
even thought of it. :)
  
When you have just the root pool on a disk, ZFS won't enable the write 
cache by default.



I don't think this is limited to root pools.  None of my pools (root or
non-root) seem to have the write cache enabled.  Now that I think about
it, all my disks are hidden behind an LSI1078 controller so I'm not
sure what sort of impact that would have on the situation.

  
When you give the full disk (deivce name 'cWtXdY' - with no 'sZ' ) then 
ZFS  will usually instruct the drive to enable write caching.
You're right though if youre drives are really something like single 
drive RAID 0 LUNs, then who knows what happens.


 -Kyle


-brian
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Richard Elling wrote:


On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote:


F. Wessels wrote:

Thanks posting this solution.

But I would like to point out that bug 6574286 removing a slog 
doesn't work still isn't resolved. A solution is under it's way, 
according to George Wilson. But in the mean time, IF something 
happens you might be in a lot of trouble. Even without some 
unfortunate incident you cannot for example export your data pool, 
pull the drives and leave the root pool.


In my case the slog slice wouldn't be the slog for the root pool, it 
would be the slog for a second data pool.


If the device went bad, I'd have to replace it, true. But if the 
device goes bad, then so did a good part of my root pool, and I'd 
have to replace that too.


Mirror the slog to match your mirrored root pool.
Yep. That was the plan. I was just explaining that not being able to 
remove the slog wasn't an issue for me since I planned on always having 
that device available.


I was more curious about whether there were any diown sides to sharing 
the SSD between the root pool and the slog?


Thanks for the valuable input, Richard.

 -Kyle



Don't get me wrong I would like such a setup a lot. But I'm not 
going to implement it until the slog can be removed or the pool be 
imported without the slog.


In the mean time can someone confirm that in such a case, root pool 
and zil in two slices and mirrored, that the write cache can be 
enabled with format? Only zfs is using the disk, but perhaps I'm 
wrong on this. There have been post's regarding enabling the 
write_cache. But I couldn't find a conclusive answer for the above 
scenario.



When you have just the root pool on a disk, ZFS won't enable the 
write cache by default. I think you can manually enable it but I 
don't know the dangers. Adding the slog shouldn't be any different. 
To be honest, I don't know how closely the write caching on a SSD 
matches what a moving disk has.


Write caches only help hard disks.  Most (all?) SSDs do not have 
volatile write buffers.
Volatile write buffers are another bad thing you can forget when you 
go to SSDs :-)

 -- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Richard Elling wrote:

On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote:


Richard Elling wrote:


On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote:


F. Wessels wrote:

Thanks posting this solution.

But I would like to point out that bug 6574286 removing a slog 
doesn't work still isn't resolved. A solution is under it's way, 
according to George Wilson. But in the mean time, IF something 
happens you might be in a lot of trouble. Even without some 
unfortunate incident you cannot for example export your data pool, 
pull the drives and leave the root pool.


In my case the slog slice wouldn't be the slog for the root pool, 
it would be the slog for a second data pool.


If the device went bad, I'd have to replace it, true. But if the 
device goes bad, then so did a good part of my root pool, and I'd 
have to replace that too.


Mirror the slog to match your mirrored root pool.
Yep. That was the plan. I was just explaining that not being able to 
remove the slog wasn't an issue for me since I planned on always 
having that device available.


I was more curious about whether there were any diown sides to 
sharing the SSD between the root pool and the slog?


I think it is a great idea, assuming the SSD has good write performance.

This one claims up to 230MB/s read and 180MB/s write and it's only $196.

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393

Compared to this one (250MB/s read and 170MB/s write) which is $699.

Are those claims really trustworthy? They sound too good to be true!

 -Kyle


 -- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Kyle McDonald wrote:

Richard Elling wrote:

On Jul 23, 2009, at 9:37 AM, Kyle McDonald wrote:


Richard Elling wrote:


On Jul 23, 2009, at 7:28 AM, Kyle McDonald wrote:


F. Wessels wrote:

Thanks posting this solution.

But I would like to point out that bug 6574286 removing a slog 
doesn't work still isn't resolved. A solution is under it's way, 
according to George Wilson. But in the mean time, IF something 
happens you might be in a lot of trouble. Even without some 
unfortunate incident you cannot for example export your data 
pool, pull the drives and leave the root pool.


In my case the slog slice wouldn't be the slog for the root pool, 
it would be the slog for a second data pool.


If the device went bad, I'd have to replace it, true. But if the 
device goes bad, then so did a good part of my root pool, and I'd 
have to replace that too.


Mirror the slog to match your mirrored root pool.
Yep. That was the plan. I was just explaining that not being able to 
remove the slog wasn't an issue for me since I planned on always 
having that device available.


I was more curious about whether there were any diown sides to 
sharing the SSD between the root pool and the slog?


I think it is a great idea, assuming the SSD has good write performance.

This one claims up to 230MB/s read and 180MB/s write and it's only $196.

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393

Compared to this one (250MB/s read and 170MB/s write) which is $699.


Oops. Forgot the link:

http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014

Are those claims really trustworthy? They sound too good to be true!

 -Kyle


 -- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Greg Mason wrote:

I think it is a great idea, assuming the SSD has good write performance.


This one claims up to 230MB/s read and 180MB/s write and it's only $196.

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393

Compared to this one (250MB/s read and 170MB/s write) which is $699.

  

Oops. Forgot the link:

http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014


Are those claims really trustworthy? They sound too good to be true!

 -Kyle
  


Kyle-

The less expensive SSD is an MLC device. The Intel SSD is an SLC device.
That right there accounts for the cost difference. The SLC device (Intel
X25-E) will last quite a bit longer than the MLC device.
  
I understand that. That's why I picked that one to compare. It was my 
understanding that the MLC drives weren't even close performance wise to 
the SLC ones. This one seems pretty close. How can that be?


 -Kyle

 
-Greg


  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Kyle McDonald

Adam Sherman wrote:
In the context of a low-volume file server, for a few users, is the 
low-end Intel SSD sufficient?


You're right, it supposedly has less than half the the write speed, and 
that probably won't matter for me, but I can't find a 64GB version of it 
for sale, and the 80GB version is over 50% more at $314.


 -Kyle




A.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD's and ZFS...

2009-07-22 Thread Kyle McDonald
I've started reading up on this, and I know I have alot more reading to 
do, but I've already got some questions... :)



I'm not sure yet that it will help for my purposes, but I was 
considering buying 2 SSD's for mirrored boot devices anyway.


My main question is: Can a pair of say 60GB  SSD's be shared for both 
the root pool and as an SSD ZIL? 

Can the installer be configured to make the slice for the root pool to 
be something less than the whole disk? leaving another slice for the 
ZIL? Or would a zVOL in the root pool be a better idea?


I doubt 60GB will leave enough space, but would doing this for the L2ARC 
be useful also?



 -Kyle




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-07-21 Thread Kyle McDonald

chris wrote:

Thanks for your reply.
What if I wrap the ram in a sheet of lead?;-)
(hopefully the lead itself won't be radioactive)

  

I've been looking at the same thing recently.

I found these 4 AM3 motherboard with optional ECC memory support. I don't 
know whether this means ECC works, or ECC memory can be used but ECC will not. Do you?

  
That's a good question. The ASUS specs definitely say unbuffered ECC 
memory is compatible, but until you mentioned it I never thought about 
whether the ECC functionality would actually be used.

Asus  M4N78 SE, Nvidia nForce 720D Chipset, 4xsata
Asus  M4N78-VM, Nvidia GeForce 8200 Chipset, 6xsata, onboard video
Asus  M4N82 Deluxe,  NVIDIA nForce 980a Chipset, 6xsata
Gigabyte  GA-MA770T-UD3P, AMD 770 Chipset, 6xsata
  

I hadn't located the Gigabyte board yet I'll have to look at that.

The ASUS boards with the AMD chipsets (the models that start with M4A - 
like the M4A79T) are all true AM3 boards - they take DDR3 memory. All 
the nVidia chipset boards (even the 980a one) are AM2+/AM3 boards, and 
(as far as I know) only take DDR2 memory, but that may not matter to you 
since this will only be a server for you. The chipset isn't supposed to 
dictate the memory type that up to the CPU, but the MB does need to 
support it in other ways.


DDR3 doesn't appear (in any reviews I've seen) to give much benefits 
with the current processors anyway. What I find more discouraging (since 
I'm trying to build a desktop/workstation) is that when you go to look 
for RAM the only ECC memory available (doesn't matter if it's DDR2 or 3) 
is rated much slower than what is available for non-ECC. For example you 
can find DD2 at 1066mhz, or even 1200mhz, but the fasted ECC DDR2 you 
can get is 800mhz. - It's cheap though, unless you want 4GB DIMMs then 
it's outrageous!
The 2nd one looks the most promising, and GeForce 8200 seems somewhat supported by solaris except for sound(don't care) and network (can add another card. 
I don't see the the 1st or the  2nd one at usa.asus.com. The 3rd is the 
one I've been considering hard lately. In my searching the other brands 
don't seem to support ECC memeory at all.


Another thing to remember is the expansion slots. You mentioned putting 
in a SATA controller for more drives, You'll want to make sure the board 
has a slot that can handle the card you want. If you're not using 
graphics then any board with a single PCI-E x16 slot should handle 
anything. But if you do put in a graphics board you'll want to look at 
what other slots are available. Not many consumer boards have PCI-X 
slots, and only some have PCI-E x4 slots. PCI-E x1 slots are getting 
scarce too. Most of the PCI-E SATA controlers I've seen want a slot at 
least x4, and many are x8.


 -Kye

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-23 Thread Kyle McDonald

Erik Ableson wrote:


Just a side note on the PERC labelled cards: they don't have a JBOD 
mode so you _have_ to use hardware RAID. This may or may not be an 
issue in your configuration but it does mean that moving disks between 
controllers is no longer possible. The only way to do a pseudo JBOD is 
to create broken RAID 1 volumes which is not ideal.



It won't even let you make single drive RAID 0 LUNs? That's a shame.

The lack of portability is disappointing. The trade-off though is 
battery backed cache if the card supports it.


 -Kyle



Cordialement,

Erik Ableson

+33.6.80.83.58.28
Envoyé depuis mon iPhone

On 23 juin 2009, at 04:33, Eric D. Mudama 
edmud...@bounceswoosh.org wrote:


 On Mon, Jun 22 at 15:46, Miles Nordin wrote:
 edm == Eric D Mudama edmud...@bounceswoosh.org writes:

  edm We bought a Dell T610 as a fileserver, and it comes with an
  edm LSI 1068E based board (PERC6/i SAS).

 which driver attaches to it?

 pciids.sourceforge.net says this is a 1078 board, not a 1068 board.

 please, be careful.  There's too much confusion about these cards.

 Sorry, that may have been confusing.  We have the cheapest storage
 option on the T610, with no onboard cache.  I guess it's called the
 Dell SAS6i/R while they reserve the PERC name for the ones with
 cache.  I had understood that they were basically identical except for
 the cache, but maybe not.

 Anyway, this adapter has worked great for us so far.


 snippet of prtconf -D:


 i86pc (driver name: rootnex)
pci, instance #0 (driver name: npe)
pci8086,3411, instance #6 (driver name: pcie_pci)
pci1028,1f10, instance #0 (driver name: mpt)
sd, instance #1 (driver name: sd)
sd, instance #6 (driver name: sd)
sd, instance #7 (driver name: sd)
sd, instance #2 (driver name: sd)
sd, instance #4 (driver name: sd)
sd, instance #5 (driver name: sd)


 For this board the mpt driver is being used, and here's the prtconf
 -pv info:


  Node 0x1f
assigned-addresses:  
 81020010..fc00..0100.83020014..

 df2ec000..4000.8302001c.
 .df2f..0001
reg:  
 
0002.....01020010....0100.03020014....4000.0302001c.

 ...0001
compatible: 'pciex1000,58.1028.1f10.8' + 'pciex1000,58.1028.1f10' 
 + 'pciex1000,58.8' + 'pciex1000,58' + 'pciexclass,01' + 
 'pciexclass,0100' + 'pci1000,58.1028.1f10.8' + 
 'pci1000,58.1028.1f10' + 'pci1028,1f10' + 'pci1000,58.8' + 
 'pci1000,58' + 'pciclass,01' + 'pciclass,0100'

model:  'SCSI bus controller'
power-consumption:  0001.0001
devsel-speed:  
interrupts:  0001
subsystem-vendor-id:  1028
subsystem-id:  1f10
unit-address:  '0'
class-code:  0001
revision-id:  0008
vendor-id:  1000
device-id:  0058
pcie-capid-pointer:  0068
pcie-capid-reg:  0001
name:  'pci1028,1f10'


 --eric


 --
 Eric D. Mudama
 edmud...@mail.bounceswoosh.org

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS attributes for CIFS.

2009-06-22 Thread Kyle McDonald

Hi all,

I'm setting up a new fileserver, and while I'm not planning on enabling 
CIFS right away, I know I will in the future.


I know there are several ZFS properties or attributes that affect how 
CIFS behaves. I seem to recall that at least one of those needs to be 
set early (like when the filesystem [or pool?] is created?


Which properties might those be?

Where can I find more info the CIFS/ZFS interaction?

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Kyle McDonald

Bob Friesenhahn wrote:

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that it was
faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But sure, it
uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading data 
the first time is faster.  However, Solaris is really good at caching 
data so any often-accessed data is highly likely to be cached and 
therefore read just one time.

One thing I'm cuious about...

When reading compressed data, is it cached before or after it is 
uncompressed?


If before, then while you've save re-reading it from the disk, there is 
still (redundant) overhead for uncompressing it over and over.


If the uncompressed data is cached, then I agree it sounds like a total 
win for read-mostly filesystems.


  -Kyle

  The main point of using compression for the root pool would be so 
that the OS can fit on an abnormally small device such as a FLASH 
disk.  I would use it for a read-mostly device or an archive (backup) 
device.


On desktop systems the influence of compression on desktop response is 
quite noticeable when writing, even with very fast CPUs and multiple 
cores.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-16 Thread Kyle McDonald

Darren J Moffat wrote:

Kyle McDonald wrote:

Bob Friesenhahn wrote:

On Mon, 15 Jun 2009, Thommy M. wrote:


In most cases compression is not desireable.  It consumes CPU and
results in uneven system performance.


IIRC there was a blog about I/O performance with ZFS stating that 
it was

faster with compression ON as it didn't have to wait for so much data
from the disks and that the CPU was fast at unpacking data. But 
sure, it

uses more CPU (and probably memory).


I'll believe this when I see it. :-)

With really slow disks and a fast CPU it is possible that reading 
data the first time is faster.  However, Solaris is really good at 
caching data so any often-accessed data is highly likely to be 
cached and therefore read just one time.

One thing I'm cuious about...

When reading compressed data, is it cached before or after it is 
uncompressed?


The decompressed (and decrypted) data is what is cached in memory.

Currently the L2ARC stores decompressed (but encrypted) data on the 
cache devices.


So the cache saves not only the time to access the disk but also the CPU 
time to decompress. Given this, I think it could be a big win.


 -Kyle



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Much room for improvement for zfs destroy -r ...

2009-04-17 Thread Kyle McDonald

Joep Vesseur wrote:

All,

I was wondering why zfs destroy -r is so excruciatingly slow compared to
parallel destroys.

  

 SNIP

while a little handy-work with

  # time for i in `zfs list | awk '/blub2\\// {print $1}'` ;\
   do ( zfs destroy $i  ) ; done

yields

  real0m8.191s
  user0m6.037s
  sys 0m16.096s

An 38.8 time improvement (at the cost of some extra CPU load)

Why is there so much overhead in the sequential case? Or have I oversimplified
the issues at hand with this simple test?

  
One reason is that you're not timing how long it takes for the destroy's 
to complete. You're only timing how long it takes to start all the jobs 
in the background.


 -Kyle


Joep
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE for two-level ZFS

2009-02-20 Thread Kyle McDonald

On 2/20/2009 9:33 AM, Gary Mills wrote:

On Thu, Feb 19, 2009 at 09:59:01AM -0800, Richard Elling wrote:
   

Gary Mills wrote:
 

Should I file an RFE for this addition to ZFS?  The concept would be
to run ZFS on a file server, exporting storage to an application
server where ZFS also runs on top of that storage.  All storage
management would take place on the file server, where the physical
disks reside.  The application server would still perform end-to-end
error checking but would notify the file server when it detected an
error.
   

Currently, this is done as a retry. But retries can suffer from cached
badness.
 


So, ZFS on the application server would retry the read from the
storage server.  This would be the same as it does from a physical
disk, I presume.  However, if the checksum failure persisted, it
would declare an error.  That's where the RFE comes in, because it
would then notify the file server to utilize its redundant data
source.  Perhaps this could be done as part of the retry, using
existing protocols.
   
I'm no expert, but I think not only would this have been taken care of 
by the retry but if the error is being introduced by any HW or SW on 
the storage server's end, then the storage server will already be 
checking it's checksums.


The main place the new errors could be introduced will be after the data 
left ZFS's control, heading out the network interface across the wires, 
and into the application server... While not impossible for the same 
error to creep in on every retry, I think it'd be rarer than different 
errors each time, and the retries would have a very good chance 
eventually getting good copies of every block.


Even if the application server could notify the storage server of the 
problem. There isn't any thing more the storage server can do. If there 
was a problem that it's redundancy could fix, it's checksums would have 
identified that, and it would have fixed it even before the data was 
sent to the application server.
   

There are several advantages to this configuration.  One current
recommendation is to export raw disks from the file server.  Some
storage devices, including I assume Sun's 7000 series, are unable to
do this.  Another is to build two RAID devices on the file server and
to mirror them with ZFS on the application server.  This is also
sub-optimal as it doubles the space requirement and still does not
take full advantage of ZFS error checking.  Splitting the
responsibilities works around these problems
   

I'm not convinced, but here is how you can change my mind.

1. Determine which faults you are trying to recover from.
 


I don't think this has been clearly identified, except that they are
``those faults that are only detected by end-to-end checksums''.

   
Adding ZFS on the appserver will add a new set of checksums for the 
data's journey over the wire and back again. Nothing will be checking 
those checksums on the storage server to see if corruption happened to 
writes on the way there (which might be a place for improvement - but 
I'm not sure how that can even be done,) but those same checksums will 
be sent back to the appserver on a read, so the appserver will be able 
to determine the problem then - Of course if the corruption happenned 
while sending the write, then no amount of retries will help. Only ZFS 
redundancy on the app server can (currently) help with that.


  -Kyle


2. Prioritize these faults based on their observability, impact,
and rate.
 


Perhaps the project should be to extend end-to-end checksums in
situations that don't have end-to-end redundancy.  Redundancy at the
storage layer would be required, of course.

   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Kyle McDonald

On 2/13/2009 5:58 AM, Ross wrote:

huh?  but that looses the convenience of USB.

I've used USB drives without problems at all, just remember to zpool export 
them before you unplug.
   
I think there is a subcommand of cfgaadm you should run to to notify 
Solariss that you intend to unplug the device. I don't use USB, and my 
familiarity with cfgadm (for FC and SCSI) is limited.


  -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Kyle McDonald

On 2/10/2009 3:37 PM, D. Eckert wrote:

(...)
Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
spanning removable drives, you probably wouldn't have been so lucky.
(...)

we are not talking about a RAID 5 array or an LVM. We are talking about a 
single FS setup as a zpool over the entire available disk space on an external 
USB HDD.

   
Ok then the parallel on linux would still be something like running 
reiserfs on a single disk LVM (which I think redhat still installs with 
by default?)


And my real point is that with ZFS even though you only wany a single FS 
on a single disk, you can't treat it like the LVM/RAID level of software 
isn't there just because you only have one disk. It is still there, and 
you need to understand it's commands and how to use them when you want 
to diconnect the disk.

I decided to do so due to the read/write speed performance of zfs comparing to 
UFS/ReiserFS.

   
That's fine. If you have reasons to use a single disk that option is 
still available. Again that doesn't mean you can treat it like a FS on a 
raw device.


   -Kyle


Regards,

DE.
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Kyle McDonald

On 2/10/2009 4:48 PM, Roman V. Shaposhnik wrote:

On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote:
   

These posts do sound like someone who is blaming their parents after
breaking a new toy before reading the instructions.
 


It looks like there's a serious denial of the fact that bad things
do happen to even the best of people on this thread.
   

No one is denying that that can happen.

However there are many things that were done here that increased the 
chance (or things that weren't done that could have decreased the 
chance) of this happenning.


I'm not saying the OP should have known better. Everyone learns from 
mistakes. I'm just trying to explain to him both why what happenned 
might have happenned, and what he could have done that might have 
avoided it.


Is it still possible that something like this could have happenned? 
sure. Should there be a better way to handle it when it does? you bet!


  -Kyle


Thanks,
Roman.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] where did my 400GB space go?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 12:11 PM, Bob Friesenhahn wrote:


My understanding is that 1TB is the maximum bootable disk size since 
EFI boot is not supported.  It is good that you were allowed to use 
the larger disk, even if its usable space is truncated.



I don't dispute that, but I don't understand it either.

If EFI is not being used (ZFS boot doesn't use EFI on the root pool 
since the BIOS doesn't (usually) uinderstand the EFI label) then what is 
it that has a 1TB limit?


I beleive linux (and I'd guess NTFS) can use the whole disk past 1TB, so 
my guess is the old fashioned PC/DOS/FDisk partition tables can handle 
sizes over 1TB now (though I know they couldn't in the past.)


Anyone know what the bottle neck is?

  -Kyle


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 12:35 PM, Toby Thain wrote:


On 11-Feb-09, at 11:19 AM, Tim wrote:


...
And yes, I do keep checksums of all the data sitting on them and 
periodically check it.  So, for all of your ranting and raving, the 
fact remains even a *crappy* filesystem like fat32 manages to handle 
a hot unplug without any prior notice without going belly up.


By chance, certainly not design.
Yep. I've never unplugged a USB drive on purpose, but I have left a 
drive plugged into the docking station, Hibernated windows XP 
professional, undocked the laptop, and then woken it up later undocked. 
It routinely would pop up windows saying that a 'delayed write' was not 
successful on the now missing drive.


I've always counted myself lucky that any new data written to that drive 
was written long long before I hibernated, becuase have yet to find any 
problems with that data, (but I don't read it very often if at all.) But 
it is luck only!


  -Kyle



--Toby




--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] where did my 400GB space go?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 12:57 PM, Tomas Ögren wrote:

On 11 February, 2009 - Kyle McDonald sent me these 1,2K bytes:

   

On 2/11/2009 12:11 PM, Bob Friesenhahn wrote:
 

My understanding is that 1TB is the maximum bootable disk size since
EFI boot is not supported.  It is good that you were allowed to use
the larger disk, even if its usable space is truncated.

   

I don't dispute that, but I don't understand it either.

If EFI is not being used (ZFS boot doesn't use EFI on the root pool
since the BIOS doesn't (usually) uinderstand the EFI label) then what is
it that has a 1TB limit?
 


SMI/VTOC, the original label (partition table format:ish) system used.

EFI can use larger, but EFI tables for boot isn't supported right now.
I guess you should be able to put the rpool on a 50GB slice or so, then
put the other 1450GB in an EFI data pool..
   
Ok, So while the fdisk solaris partition could be made to use the whole 
disk, the solaris label/vtoc inside the solaris fdisk partition can only 
use 1TB of that.


Since you can't mix EFI and FDisk partition tables, and you can't have 
more than one Solaris fdisk partition (that I'm aware of anyway) it 
looks like 1TB is all you can give Solaris at the moment.


But you could give that other 400GB to some other OS or Filesystem I 
suppose.


SInce EFI boot requires (IIRC) X86 HW vendors to improve the BIOS 
support, EFI boot isn't going to be useful for a while even if it 
appeared tomorrow. Is there any hope, or plan to improve/fix teh Solaris 
VTOC?


  -Kyle


  -Kyle


It's just that you can't have the rpool1TB due to boot limits.

/Tomas
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] where did my 400GB space go?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 1:03 PM, Kyle McDonald wrote:


Since you can't mix EFI and FDisk partition tables, and you can't have 
more than one Solaris fdisk partition (that I'm aware of anyway) it 
looks like 1TB is all you can give Solaris at the moment.

I should have qualified that with  If you need to boot from it.

Of course if you don't need to boot from it, Solaris can just put an EFI 
label  on it and use the whole thing.


If it were me I'd find some small drive to put the OS on and save that 
nice big drive for a second (non root) pool that can use the whole 
thing. Also since those drives are generally under $200 now, I'd 
probably pick up a second and mirror the 2.


 -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] where did my 400GB space go?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 1:50 PM, Richard Elling wrote:


Solaris can now (as of b105) use extended partitions.
http://www.opensolaris.org/os/community/on/flag-days/pages/2008120301/


That's interesting, but I'm not sure how it helps.

It's my understanding that Solaris doesn't like it if more than one of 
the fdisk partitions (primary or extended) are of type 'Solaris[2]'


Has that changed?

  -Kyle


-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Kyle McDonald

On 2/10/2009 2:50 PM, D. Eckert wrote:

(..)
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only 
with ZFS you risk to loose ALL your data.
that's the point!
(...)

I did that many times after performing the umount cmd with ufs/reiserfs 
filesystems on USB external drives. And they never complainted or got corrupted.
   
Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0 
spanning removable drives, you probably wouldn't have been so lucky.


Just because you only create a single ZFS filesystem inside your zpool, 
doesn't mean that when that single filesystem is unmounted it si safe to 
remove the drive. When you consider the extra layer of the zPool (like 
LVM or sw RAID) it's not surpriseing there are other things you have to 
do before you remove the disk.


  -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Kyle McDonald

On 2/10/2009 2:54 PM, D. Eckert wrote:

I disagree, see posting above.

ZFS just accepts it 2 or 3 times. after that, your data are passed away to 
nirvana for no reason.

And it should be legal, to have an external USB drive with a ZFS. with all 
respect, why should a user always care for redundancy, e. g. setup a mirror on 
a single HDD between the slices??

   
You don't have to have redundancy. But if you don't then I don't know 
how you can expect the

'repair' features of ZFS to bail you out when somethign bad happens.

This reduces half your available space you have on your drive.
   
Mirroring between slices does more than that. it' will ruin your 
performance also. It's be much better to set 'copies=2', though that 
will still reduce your space by half.


  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Kyle McDonald
Hi Dave,

Having read through the whole thread, I think there are several things 
that could all be adding to your problems.
At least some of which are not related to ZFS at all.

You mentioned the ZFS docs not warning you about this, and yet I know 
the docs explictly tell you that:

1. While a ZFS pool that has no redundancy (Mirroring or Parity,) like 
your's is missing, can still *detect* errors in the data read from the 
drive, it can't *repair* those errors. Repairing errors requires that 
ZFS be performing (at least) the (top-most level of) Mirroring or Parity 
functions. Since you have no Mirroring or Parity ZFS cannot 
automatically recover this data.

2. As others have said, a zpool can contain many filesystems. 'zfs 
umount' only unmounts a single filesystem. Removing a full pool from a 
machine requires a 'zpool export' no matter what disk technology is 
being used (USB, SCSI, SATA, FC, etc.)  On the new system you would use 
'zpool import' to bring the pool into the new system.

I'm sure this next on is documented by Sun also though not in the ZFS 
docs, probably in some other part of the system dealing with removable 
devices:

3. In addition, according to Casper's message you need to 'off-line' USB 
(and probasbly other types too) storage in Solaris (Just like in 
Windows) before pulling the plug. This has nothing to do with ZFS. This 
will have corrupted (possibly even past the point of repair most other 
filesystems also.

Still, I had an idea on something you might try. I don't know how long 
it's been  since you pulled the drive, or what else you've done since.

Which machine is reporting the errors you've shown us? The machine you 
pulled the drives from? or the machine you moved them too? Were you 
successful in 'zpool import' the pool into the other machines? This idea 
might work either way, but if you haven't successfully immported it into 
another machine there's probably more of a chance.

If the output is from the machine you pulled them out of, then basically 
that machine still thinks the pool is connected to it, and it thinks the 
one and only disk in the pool is now not responding. In this case the 
errors you see in the tables are the errors from trying to contact a 
drive that no longer exists.

Have you reconnected the disk to the original machine yet? If not I'd 
attempt a 'zpool export' now (though that may not work.) and then shut 
the machine down fully, and connect the disk. Then boot it all up. 
Depending on what you've tried to do with this disk to fix the problem 
since it happened I have no idea exactly how the machine will come up.

If you couldn't do the 'zpool' export, then the machine will try to 
mount the FS's in the pool on boot. This may nor may not work.
If you were successful in doing the export with the disks disconnected, 
then it won't try, and you'll need to 'zpool import' them after the 
machine is booted.

Depending on how the import goes, you might still see errors in the 
'zpool status' output. If so, I know a 'zpool clear' will clear those 
errors, and I doubt it can make the situation any worse than it is now. 
You'd have to give us info about what the machine tells you after this 
before I can advise you more. But (and the experts can correct me if I'm 
wrong) this might 'just work(tm)'.

My theory here is that the ZFS may have been successful in keeping the 
state of the (meta)data on the disk consistent after all. The checksum 
and I/O errors listed may be from ZFS trying to access the non-existent 
drive after you removed it. Which (in theory) are all bogus errors, and 
don't really point to errors in the data on the drive.

Of course there are many things that all have to be true for this theory 
to turn out to be true. Depending on what has happened to the machines 
and the disks since they were originally unplugged from each other, all 
bets might be off. And then there's the possibility that the my idea 
never could work at all. People much more expert than I can chime in on 
that.

  -Kyle




D. Eckert wrote:
 Hi,

 after working for 1 month with ZFS on 2 external USB drives I have 
 experienced, that the all new zfs filesystem is the most unreliable FS I have 
 ever seen.

 Since working with the zfs, I have lost datas from:

 1 80 GB external Drive
 1 1 Terrabyte external Drive

 It is a shame, that zfs has no filesystem management tools for repairing e. 
 g. being able to repair those errors:

NAMESTATE READ WRITE CKSUM
 usbhdd1 ONLINE   0 0 8
   c3t0d0s0  ONLINE   0 0 8

 errors: Permanent errors have been detected in the following files:

 usbhdd1: 0x0


 It is indeed very disappointing that moving USB zpools between computers ends 
 in 90 % with a massive loss of data.

 This is to the not reliable working command zfs umount poolname, even if 
 the output of mount shows you, that the pool is no longer mounted and ist 
 removed from mntab.

 It works only 1 

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Kyle McDonald
D. Eckert wrote:
 too many words wasted, but not a single word, how to restore the data.

 I have read the man pages carefully. But again: there's nothing said, that on 
 USB drives zfs umount pool is not allowed.
   
It is allowed. But it's not enough. You need to read both the 'zpool ' 
and 'zfs' manpages. the 'zpool' manpage will tell you that the way to 
move the 'whole pool' to another machine is to run 'zpool export 
poolname'. The 'zpool export' will actually run the 'zfs umount' for 
you, though it's not a problem if it's already been done.

Note, this isn't USB specific, you won't see anything in the docs about 
USB. This condition applies to SCSI and others too. You need to export 
the pool to move it to another machine. If the machine crashed before 
you could export it, 'zpool import -f' on the new machine can help 
import it anyway.

With USB, there are probably other commands you'll also need to use to 
notify Solaris that you are going to unplug the drive, Just like the 
'Safely remove hardware' tool on windows. Or you need to remove it only 
when the system is shut down. These commands will be documented 
somewhere else, not in the ZFS docs because they don't apply to just ZFS.
 So how on earth should a simple user know that, if he knows that filesystems 
 properly unmounted using the umount cmd??
   
You need to understand that the filesystems are all contained in a 
'pool' (more than one filesystem can share the disk space in in the same 
pool). Unmounting the filesystem *does not* prepare the *pool* to be 
moved from one machine to another.
 And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there 
 was no shock, hit or any other event like that?
   
Who knows? Some harddrives are manufactured with problems. Remember that 
ZFS is designed to catch problems that even the ECC on the drive doesn't 
catch. So it's not impossible for it to catch errors even the 
manufacturer's QA tests missed.
 It is of course easier to blame the stupid user instead of having proper 
 documentation and emergency tools to handle that.

   
I beleive that between the man pages, the administration docs on the 
web, the best practices pages, and all the other blogs and web pages, 
that ZFS is documented well enough. It's not like other filesystems, so 
there is more to learn, and you need to review all the docs, not just 
the ones that cover the operations (like unmount) that you're familiar 
with. Understanding pools (and the commands that manage pools,) is also 
important. Man pages and command references are good when you understand 
the architecture and need to learn about the details of a command you 
know you need to use. It's the other documentation that will fill you in 
you on how the system parts work together, and advise you on the best 
way to setup or do what you want.

As I said in my other email ZFS can't repair errors without a way to 
reconstruct the data. It needs mirroring, parity (or the copies=x 
setting) to be able to repair the data. By setting up a pool with no 
redundancy. So your email subject line is a little backwards, since any 
'professional' usage would incorporate redundancy (Mirror, Parity, etc.) 
What you're trying to do is more 'home/hobbiest' usage. Though most 
home/hobbiest users decide to incorporte redundancy for any data they 
really care about.
 The list of malfunctions of SNV Builts gets longer and longer with every 
 version released.

   
I'm sure new things are added every release, but many are also fixed. 
sNV is pre-release software after all. Overall the problems found aren't 
around long, and I beleive the list gets shorter as often as it gets 
longer. If you want production level Solaris, ZFS is available in 
solaris 10.
 e. g. on SNV 107

 - installation script is unable to write properly the boot blocks for grub
 - you choose German locale, but have an American Keyboard style in the gnome 
 (since SNV 103) 
 - in SNV 107 adding these lines to xorg.conf:

 Option XkbRules xorg
 Option XkbModel pc105
 Option XkbLayout de

 (was working in SNV 103)

 lets crash the Xserver.

 - latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn't work with 
 OpenSolaris SNV 107
 - nwam and iwk0: not solved, no DHCP responses

   
Yes there was a major update of the X server sources to catch up to the 
latest(?) X.org release. Workarounds are known, and I bet this will be 
working again in b108 (or not long after.)
 it seems better, to stay focused on having a colourfull gui with hundreds of 
 functions no one needs instead providing a stable core.

   
The core of solaris is much more stable than anythign else I've used. 
The windowing system is not a part of the core of an operatinog system 
in my book.
 I am looking forward the day booting OpenSolaris and see a greeting Windows 
 XP Logo surrounded by the blue bubbles of OpenSolaris.

   
roll-eyes

Note that sNV (aka SXCE - or Solaris eXpress Community Edition) 

[zfs-discuss] Should I report this as a bug?

2009-02-04 Thread Kyle McDonald
I jumpstarted my machine with sNV b106, and installed with ZFS root/boot.
It left me at a shell prompt in the JumpStart environment, with my ZFS 
root on /a.

I wanted to try out some things that I planned on scripting for the 
JumpStart to run, one of these waas creating a new ZFS pool from the 
remaining disks. I looked at the zpool create manpage, and saw this it 
had a -R altroot option, and the exact same thing had just worked for 
me with 'dladm aggr-create' so I thought I'd give that a try.

If the machine had been booted normally, my ZFS root would have been /, 
and a 'zpool create zdata0 ...' would have defaulted to mounting the new 
pool as /zdata0 right next to my ZFS root pool /zroot0. So I expected 
'zpool create -R /a zdata0 ...' to set the default mountpoint for the 
pool to /zdata0 with a temporary altroot=/a.

I gave it a try, and while it created the pool it failed to mount it at 
all. It reported that /a wasn't empty.

'zpool list', and 'zpool get all' show the altroot=/a. But 'zfs  get 
all  zdata0' shows the mountpoint=/a also, not the default of /zdata0.

Am I expecting the wrong thing here? or is this a bug?

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mount race condition?

2009-01-28 Thread Kyle McDonald

On 1/28/2009 12:16 PM, Nicolas Williams wrote:

On Wed, Jan 28, 2009 at 09:07:06AM -0800, Frank Cusack wrote:
   

On January 28, 2009 9:41:20 AM -0600 Bob Friesenhahn
bfrie...@simple.dallas.tx.us  wrote:
 

On Tue, 27 Jan 2009, Frank Cusack wrote:
   

i was wondering if you have a zfs filesystem that mounts in a subdir
in another zfs filesystem, is there any problem with zfs finding
them in the wrong order and then failing to mount correctly?
 

I have not encountered that problem here and I do have a multilevel mount
heirarchy so I assume that ZFS orders the mounting intelligently.
   

well, the thing is, if the two filesystems are in different pools (let
me repeat the example):
 


Then weird things happen I think.  You run into the same problems if you
want to mix ZFS and non-ZFS filesystems in a mount hierarchy.  You end
up having to set the mountpoint property so the mounts don't happen at
boot and then write a service to mount all the relevant things in order.

   

Or set them all to legacy, and put them in /etc/vfstab.

That's what I do. I have a directory on ZFS that holds ISO images, and a 
peer directory that contains mountpoints for loopback mounts of all 
those ISO's.


I set the ZFS to legacy, and then in /etc/vfstab I put the FS containing 
the ISO files before I list all the ISO's to be mounted.


  -Kyle


Nico
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS filesystem creation during JumpStart

2008-12-15 Thread Kyle McDonald
Brad Hudson wrote:
 Thanks for the response Peter.  However, I'm not looking to create a 
 different boot environment (bootenv).  I'm actually looking for a way within 
 JumpStart to separate out the ZFS filesystems from a new installation to have 
 better control over quotas and reservations for applications that usually run 
 rampant later.  In particular, I would like better control over the following 
 (e.g. the ability to explicitly create them at install time):

   
Whether you want a bootenv or not, that command, and syntax) is the only 
way to specify to jumpstart to both use ZFS instead of UFS, and to 
customize how it's intalled (it's option to split out /var is, 
unfortunately, the only FS that can be split at the moment.)

You're not the first to lament over this fact, but I wouldn't hold your 
breath for any improvements, since JumpStart is not really being 
actively improved anylonger. Sun is instead focusing on it's replacement 
'AI', which is currently being developed and used on OpenSolaris, and I 
beleive is intended to replace JS on Sun Solaris at some undefined time 
in the future.

At the moment I don't beleive that AI has the features you're looking 
for either - It has quite a few other differences from JS too, if you 
think you'll use it, you should keep tabs on the project pages, and 
mailing lists.
 rpool/opt - /opt
 rpool/usr - /usr
 rpool/var - /var
 rpool/home - /home

 Of the above /home can easily be created post-install, but the others need to 
 have the flexibility of being explicitly called out in the JumpStart profile 
 from the initial install to provide better ZFS accounting/controls.
   
It's not hard to create /opt, or /var/xyz ZFS filesystems, and move 
files into them during post install, or first boot even, then mve the 
originals, and set the zfs mountpoints to where the originals are. This 
even give you the advantage of enabling compression (since all the data 
will be rewritten and thus compressed.) /usr is harder. Might not be 
impossible in a finish script, but probably much harder in a first-boot 
script.

All that said, if you're planning on using live upgrade (or snap upgrade 
on OS) after installation is done, I'm not sure if they'll just 'Do the 
right thing' (or even work at all) with these other filesystems as they 
clone and upgrade the new BE's. My bet would be no.


   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs is a co-dependent parent and won't let children leave home

2008-12-09 Thread Kyle McDonald
Tim Haley wrote:
 Ross wrote:
   
 While it's good that this is at least possible, that looks horribly 
 complicated to me.  
 Does anybody know if there's any work being done on making it easy to remove 
 obsolete 
 boot environments?
 

 If the clones were promoted at the time of their creation the BEs would 
 stay independent and individually deletable.  Promotes can fail, though, 
 if there is not enough space.

 I was told a little while back when I ran into this myself on an Nevada 
 build where ludelete failed, that beadm *did* promote clones.  This 
 thread appears to be evidence to the contrary.  I think it's a bug, we 
 should either promote immediately on creation, or perhaps beadm destroy 
 could do the promotion behind the covers.
   
If I understand this right, the latter option looks better to me. Why 
consume the disk space before you have to?
What does LU do?

  -Kyle

 -tim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Custom Jumpstart and RAID-10 ZFS rpool

2008-10-29 Thread Kyle McDonald
Ian Collins wrote:
 Stephen Le wrote:
   
 Is it possible to create a custom Jumpstart profile to install Nevada
 on a RAID-10 rpool? 
 

 No, simple mirrors only.
   
Though a finish sscript could add additional simple mirrors to create 
the config his example would have created.
Pretty sure that's still not RAID10 though.

And any files laid down by the installer would be constrained to the 
first mirrored pair, only new files would have a chance at be 
distributed over the addtional pairs.

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Custom Jumpstart and RAID-10 ZFS rpool

2008-10-29 Thread Kyle McDonald
kristof wrote:
 I don't think this is possible.

 I already tried to add extra vdevs after install, but I got an error message 
 telling me that multiple vdevs for rpool are not allowed.

 K
   
Oh. Ok. Good to know.

I always put all my 'data' diskspace in a separate pool anyway to make 
migration to another host easier, so I haven't actually tried it.

  -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Kyle McDonald
Douglas R. Jones wrote:
 4) I change the auto.ws map thusly:
 Integration chekov:/mnt/zfs1/GroupWS/
 Upgradeschekov:/mnt/zfs1/GroupWS/
 cstools chekov:/mnt/zfs1/GroupWS/
 com chekov:/mnt/zfs1/GroupWS

   
This is standard NFS behavior (prior to NFSv4).  Child Filesystems have 
to be mounted on the NFS client explicitly.
As someone else mentioned, NFSv4 has a feature called 'mirror-mounts' 
that is supposed to automate this for you.

For now try this:

Integration   chekov:/mnt/zfs1/GroupWS/
Upgrades  chekov:/mnt/zfs1/GroupWS/
cstools   chekov:/mnt/zfs1/GroupWS/
com  /chekov:/mnt/zfs1/GroupWS   \
 /Integration chekov:/mnt/zfs1/GroupWS/Integration


Note the \ line continuation character. The last 2 lines are really all 
one line.

If you had had 'Integration' on it's own ufs or ext2fs filesystem in the 
past, but still mounted below 'GroupWS' you would have seen this in the 
past. It's not a ZFS thing, or a Solaris thing.

   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dump and swap

2008-09-24 Thread Kyle McDonald
Darren J Moffat wrote:
 John Cecere wrote:
   
 The man page for dumpadm says this:

 A given ZFS volume cannot be configured for both the swap area and the dump 
 device.

 And indeed when I try to use a zvol as both, I get:

 zvol cannot be used as a swap device and a dump device

 My question is, why not ?
 

 Swap is a normal ZVOL and subject to COW, checksum, compression (and 
 coming soon encryption).

   
Would there be no performance benefits from having swap read/write from 
contiguous preallocated space also?

I do realize that nifty features like encryption might be lost in that 
case, but Im wondering if there's any performance to be gained?

Then again if you're concerned about performance you need to just buy 
ram till you stop swapping all together, huh?

   -Kyle

 Dump ZVOLs are preallocated contiguous space that are written to 
 directly by the ldi_dump routines, they aren't written to by normal ZIO 
 transactions, they aren't checksum'd - the compression is done by the 
 dump layer not by ZFS.  This is needed because when we are writing a 
 crash dump we want as little as possible in IO the stack.

 --
 Darren J Moffat
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Greenbytes/Cypress

2008-09-23 Thread Kyle McDonald
Richard Elling wrote:
 Bob Friesenhahn wrote:
   
 On Tue, 23 Sep 2008, Eric Schrock wrote:

   
 
 See:

 http://www.opensolaris.org/jive/thread.jspa?threadID=73740tstart=0
 
   
 I must apologize for anoying everyone.  When Richard Elling posted the 
 GreenBytes link without saying what it was I completely ignored it. 
 I assumed that it would be Windows-centric content that I can not view 
 since of course I am a dedicated Solaris user.  I see that someone 
 else mentioned that the content does not work for Solaris users.  As a 
 result I ignored the entire discussion as being about some silly 
 animation of gumballs.
   
 

 So you admit that you didn't grok it? :-)
 Dude poured in a big bag of gumballs, but they were de-duped,
 so the gumball machine only had a few gumballs.
   
I won't admit I didn't grok it. I will admit however, (and this may be 
worse) that even though I do have a windows laptop, with QuickTime 
installed, I couldn't get the damn thing to work in Firefox. So I 
couldn't see it.

 -Kyle

  -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Error: value too large for defined data type

2008-09-05 Thread Kyle McDonald
Paul Raines wrote:
 I am having a very odd problem on one of our ZFS filesystems

 On certain files, when accessed on the Solaris server itself locally
 where the zfs fs sits, we get an error like the following:

 [EMAIL PROTECTED] # ls -l
 ./README: Value too large for defined data type
 total 36
 -rw-r-   1 mreuter  mreuter 1019 Sep 25  2006 Makefile
 -rw-r-   1 mreuter  mreuter 3185 Feb 22  2000 lcompgre.cc
 -rw-r-   1 mreuter  mreuter 3238 Feb 22  2000 lcompgsh.cc
 -rw-r-   1 mreuter  mreuter 2485 Feb 22  2000 lcompreg.cc
 -rw-r-   1 mreuter  mreuter 2774 Feb 22  2000 lcompshf.cc

   
Do you by chance have /usr/gnu/bin, or any directory with a Gnu 'ls' in 
your path before /usr/bin?
(what does 'which ls' show?)

I've seen this with Gnu ls that I have compiled myself as far back as 
Solaris 9 mayber earlier. By default Gnu ls compiled on solaris doesn't 
know how to handle latgr files (and therefore probably 64bit dates either.)

When I've seen this, explicitly running /usr/bin/ls -l worked fine, and 
I suspect it will for you too.

   -Kyle

 The odd thing is that when the filesystem is accessed from our
 Linux boxes over NFS, there is no error access the same file


 vader:complex[84] ls -l
 total 24
 drwxr-x---+ 2 mreuter mreuter8 Sep 25  2006 .
 drwxr-x---+ 5 mreuter mreuter5 Mar 31  1997 ..
 -rw-r-+ 1 mreuter mreuter 3185 Feb 22  2000 lcompgre.cc
 -rw-r-+ 1 mreuter mreuter 3238 Feb 22  2000 lcompgsh.cc
 -rw-r-+ 1 mreuter mreuter 2485 Feb 22  2000 lcompreg.cc
 -rw-r-+ 1 mreuter mreuter 2774 Feb 22  2000 lcompshf.cc
 -rw-r-+ 1 mreuter mreuter 1019 Sep 25  2006 Makefile
 -rw-r-+ 1 mreuter mreuter 1435 Jan  4  1945 README
 vader:mreuter:complex[85] wc README
40  181 1435 README

 The file is obvious small so this is not a large file problem.

 Anyone have an idea what gives?


   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Pools 1+TB

2008-08-28 Thread Kyle McDonald
Daniel Rock wrote:

 Kenny schrieb:
 2. c6t600A0B800049F93C030A48B3EA2Cd0 
 SUN-LCSM100_F-0670-931.01GB
/scsi_vhci/[EMAIL PROTECTED]
 3. c6t600A0B800049F93C030D48B3EAB6d0 
 SUN-LCSM100_F-0670-931.01MB
/scsi_vhci/[EMAIL PROTECTED]

 Disk 2: 931GB
 Disk 3: 931MB

 Do you see the difference?

Not just disk 3:

 AVAILABLE DISK SELECTIONS:
3. c6t600A0B800049F93C030D48B3EAB6d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
4. c6t600A0B800049F93C031C48B3EC76d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
8. c6t600A0B800049F93C031048B3EB44d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
   
This all makes sense now, since a RAIDZ (or RAIDZ2) vdev can only be as 
big as it's *smallest* component device.

   -Kyle



 Daniel
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Pools 1+TB

2008-08-28 Thread Kyle McDonald
Kenny wrote:

 How did you determine from the format output the GB vs MB amount??

 Where do you compute 931 GB vs 932 MB from this??

 2. c6t600A0B800049F93C030A48B3EA2Cd0 /scsi_vhci/[EMAIL PROTECTED]

 3. c6t600A0B800049F93C030D48B3EAB6d0
 /scsi_vhci/[EMAIL PROTECTED]

It's in the part you didn't cut and paste:

AVAILABLE DISK SELECTIONS:
3. c6t600A0B800049F93C030D48B3EAB6d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
4. c6t600A0B800049F93C031C48B3EC76d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
8. c6t600A0B800049F93C031048B3EB44d0 SUN-LCSM100_F-0670-931.01MB
   /scsi_vhci/[EMAIL PROTECTED]
   

Look at the label:

SUN-LCSM100_F-0670-931.01MB

The last field.


 Please educate me!!  grin

No problem. Things like this have happened to me from time to time.

   -Kyle

 Thanks again!

 --Kenny
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best layout for 15 disks?

2008-08-22 Thread Kyle McDonald
mike wrote:


 Sorry :)

 Okay, so you can create a zpool from multiple vdevs. But you cannot
 add more vdevs to a zpool once the zpool is created. Is that right?
Nope. That's exactly what you *CAN* do.

So say today you only really need 6TB usable, you could go buy 8 of your 
1TB disks,
and setup a pool with a single 7 disk RAIDZ1 vDev, and a single spare 
today. Later
when disks are cheaper, and you need the space you could add a second 7 
disk RAIDZ1
to the pool. This way you'd gradually grow into exaclty the example you 
gave earlier.

Also while it makes sense to use the same size drives in the same vDev, 
additional vDevs you add later can easily be made from different size 
drives. For the exaple above, when you got around to adding the second 
vDev, 2TB disks might be out, for the same space, you could create a 
vDev with fewer 2TB drives, or a vDev with the same number of drives and 
add twice the space, or some combo inbetween - Just because oyur first 
vDev had 7 disks doesn't mean the others have to.

Antoher note, as someone said earlier, if you can go to 16 drives, you 
should consider 2 8disk RAIDZ2 vDevs, over 2 7disk RAIDZ vDevs with a 
spare, or (I would think) even a 14disk RAIDZ2 vDev with a spare.

If you can (now or later) get room to have 17 drives, 2 8disk RAIDZ2 
vDevs with a spare would be your best bet. And remember you can grow 
into it... 1 vDev and spare now, second vDev later.

  -Kyle

 That's what it sounded like someone said earlier.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best layout for 15 disks?

2008-08-22 Thread Kyle McDonald
mike wrote:
 Or do smaller groupings of raidz1's (like 3 disks) so I can remove
 them and put 1.5TB disks in when they come out for instance?
   

I wouldn't reduce it to 3 disks (should almost mirror if you go that low.)

Remember, while you can't take a drive out of a vDev, or a vDev out of a 
pool, you can *replace* the drives in a vDev.

For example if you have 8 1TB drives in a RAIDZ (1 or 2) vDev, and buy 8 
1.5TB drives, instead of adding a second vDev which is always an option, 
you can replace 1 drive at a time, and as soon as the last drive in the 
vDev is swapped, you'll see the space in the pool jump.

Granted, if you need to buy drives gradually, swapping out 3 at at time 
(with 3 drive vDevs) is easier than 8 at a time, but you'll lose 33% of 
your space to parity, instead of 25% and you'll only be able to lose one 
disk (of each set of 3) at a time.

  -Kyle

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >