Re: [zfs-discuss] how to replace failed vdev on non redundant pool?

2010-10-15 Thread Scott Meilicke
If the pool is non-redundant and your vdev has failed, you have lost your data. 
Just rebuild the pool, but consider a redundant configuration. 

On Oct 15, 2010, at 3:26 PM, Cassandra Pugh wrote:

 Hello, 
 
 I would like to know how to replace a failed vdev in a non redundant pool?
 
 I am using fiber attached disks, and cannot simply place the disk back into 
 the machine, since it is virtual.  
 
 I have the latest kernel from sept 2010 that includes all of the new ZFS 
 upgrades.
 
 Please, can you help me?
 -
 Cassandra
 (609) 243-2413
 Unix Administrator
 
 
 From a little spark may burst a mighty flame.
 -Dante Alighieri 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-13 Thread Scott Meilicke
Hello Peter, 

Read the ZFS Best Practices Guide to start. If you still have questions, post 
back to the list.

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations

-Scott

On Oct 13, 2010, at 3:21 PM, Peter Taps wrote:

 Folks,
 
 If I have 20 disks to build a raidz3 pool, do I create one big raidz vdev or 
 do I create multiple raidz3 vdevs? Is there any advantage of having multiple 
 raidz3 vdevs in a single pool?
 
 Thank you in advance for your help.
 
 Regards,
 Peter
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-12 Thread Scott Meilicke
On Oct 12, 2010, at 3:31 PM, Bob Friesenhahn wrote:
 
 For obvious reasons, the SLOG is designed to write sequentially. Otherwise it 
 would offer much less benefit.  Maybe this random-write issue with Sandforce 
 would not be a problem?


Isn't writing from cache to disk designed to be sequential, while writes to the 
ZIL/SLOG will be more random (in order to commit quickly)?

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [RFC] Backup solution

2010-10-08 Thread Scott Meilicke

On Oct 8, 2010, at 8:25 AM, Bob Friesenhahn wrote:
 
 It also does not include the human factor which is still the most 
 significant contributor to data loss.  This is the most difficult factor to 
 diminish.  If the humans have difficulty understanding the system or the 
 hardware, then they are more likely to do something wrong which damages the 
 data.

This is often overlooked during a system design. It is very easy to lose your 
head during a high stress moment, and pull the wrong drive (I of course, have 
never done that... ahem). Having z2(3) / triple mirrors, graphical pictures 
of which disk has failed, working LED failures lights, and letting a hot spare 
finish reslivering before replacing a disk are all good counter measures.

 It also does not account for an OS kernel which caches quite a lot of data in 
 memory (relying on ECC for reliability), and which may have bugs.

At some point you have to rely on your backups for the unexpected and 
unforeseen. Make sure they are good!

Michael, nice reliability write up!

--

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [RFC] Backup solution

2010-10-07 Thread Scott Meilicke
Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 
disk raidz2 vdev that took about 16 hours to resliver. There was very little IO 
on the array, and it had maybe 3.5T of data to resliver.

On Oct 7, 2010, at 3:17 PM, Ian Collins wrote:  
 I would seriously consider raidz3, given I typically see 80-100 hour resilver 
 times for 500G drives in raidz2 vdevs.  If you haven't already, read Adam 
 Leventhal's paper:
 
 http://queue.acm.org/detail.cfm?id=1670144
 
 -- 
 Ian.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Scott Meilicke
Scrub?

On Oct 6, 2010, at 6:48 AM, Stephan Budach wrote:

 No - not a trick question., but maybe I didn't make myself clear.
 Is there a way to discover such bad files other than trying to actually read 
 from them one by one, say using cp or by sending a snapshot elsewhere?
 
 I am well aware that the file shown in  zpool status -v is damaged and I have 
 already restored it, but I wanted to know, if there're more of them.
 
 Regards,
 budy
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When is it okay to turn off the verify option.

2010-10-04 Thread Scott Meilicke
Why do you want to turn verify off? If performance is the reason, is it 
significant, on and off?

On Oct 4, 2010, at 2:28 PM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Peter Taps
 
 As I understand, the hash generated by sha256 is almost guaranteed
 not to collide. I am thinking it is okay to turn off verify property
 on the zpool. However, if there is indeed a collision, we lose data.
 Scrub cannot recover such lost data.
 
 I am wondering in real life when is it okay to turn off verify
 option? I guess for storing business critical data (HR, finance, etc.),
 you cannot afford to turn this option off.
 
 Right on all points.  It's a calculated risk.  If you have a hash collision,
 you will lose data undetected, and backups won't save you unless *you* are
 the backup.  That is, if the good data, before it got corrupted by your
 system, happens to be saved somewhere else before it reached your system.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
Has it been running long? Initially the numbers are way off. After a while
it settles down into something reasonable.

How many disks, and what size, are in your raidz2?

-Scott

On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote:

 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours, and
 maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
What version of OS?
Are snapshots running (turn them off).

So are there eight disks?


On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com wrote:

 It's always running less than an hour.
 
 It usually starts at around 300,000h estimate(at 1m in), goes up to an
 estimate in the millions(about 30mins in) and restarts.
 
 Never gets past 0.00% completion, and K resilvered on any LUN.
 
 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs.
 
 
 
 
 On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke
 scott.meili...@craneaerospace.com wrote:
 Has it been running long? Initially the numbers are way off. After a while it
 settles down into something reasonable.
 
 How many disks, and what size, are in your raidz2?  
 
 -Scott
 
 
 On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com
  wrote:
 
 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours, and
 maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
(I left the list off last time ­ sorry)

No, the resliver should only be happening if there was a spare available. Is
the whole thing scrubbing? It looks like it. Can you stop it with a

zpool scrub ­s pool

So... Word of warning, I am no expert at this stuff. Think about what I am
suggesting before you do it :). Although stopping a scrub is pretty
innocuous.

-Scott

On 9/29/10 9:22 AM, LIC mesh licm...@gmail.com wrote:

 You almost have it - each iSCSI target is made up of 4 of the raidz vdevs - 4
 * 6 = 24 disks.
 
 16 targets total.
 
 We have one LUN with status of UNAVAIL but didn't know if removing it
 outright would help - it's actually available and well as far as the target is
 concerned, so we thought it went UNAVAIL as a result of iSCSI timeouts - we've
 since fixed the switches buffers, etc.
 
 See:
 http://pastebin.com/pan9DBBS
 
 
 
 On Wed, Sep 29, 2010 at 12:17 PM, Scott Meilicke
 scott.meili...@craneaerospace.com wrote:
 OK, let me see if I have this right:
 
 8 shelves, 1T disks, 24 disks per shelf = 192 disks
 8 shelves, 2T disks, 24 disks per shelf = 192 disks
 Each raidz is six disks.
 64 raidz vdevs
 Each iSCSI target is made up of 8 of these raidz vdevs (8 x 6 disks = 48
 disks)
 Then the head takes these eight targets, and makes a raidz2. So the raidz2
 depends upon all 384 disks. So when a failure occurs, the resliver is
 accessing all 384 disks.
 
 If I have this right, which I am in serious doubt :), then that will either
 take an enormous amount of time to complete, or never. It looks like never.
 
 Recovery:
 
 From the head, can you see which vdev has failed? If so, can you remove it to
 stop the resliver?
 
 
 
 On 9/29/10 8:57 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com
  wrote:
 
 This is an iSCSI/COMSTAR array.
 
 The head was running 2009.06 stable with version 14 ZFS, but we updated that
 to build 134 (kept the old OS drives) - did not, however, update the zpool -
 it's still version 14.
 
 The targets are all running 2009.06 stable, exporting 4 raidz1 LUNs each of
 6 drives - 8 shelves have 1TB drives, the other 8 have 2TB drives.
 
 The head sees the filesystem as comprised of 8 vdevs of 8 iSCSI LUNs each,
 with SSD ZIL and SSD L2ARC.
 
 
 
 On Wed, Sep 29, 2010 at 11:49 AM, Scott Meilicke
 scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com  wrote:
 What version of OS?
 Are snapshots running (turn them off).
 
 So are there eight disks?
 
 
 
 On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com
 http://licm...@gmail.com  http://licm...@gmail.com  wrote:
 
 It's always running less than an hour.
 
 It usually starts at around 300,000h estimate(at 1m in), goes up to an
 estimate in the millions(about 30mins in) and restarts.
 
 Never gets past 0.00% completion, and K resilvered on any LUN.
 
 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs.
 
 
 
 
 On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke
 scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com  wrote:
 Has it been running long? Initially the numbers are way off. After a
 while it settles down into something reasonable.
 
 How many disks, and what size, are in your raidz2?  
 
 -Scott
 
 
 On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com
 http://licm...@gmail.com  http://licm...@gmail.com
  http://licm...@gmail.com  wrote:
 
 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours,
 and maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.
 
 
 We value your opinion!  http://www.craneae.com/surveys/satisfaction.htm
 How may we serve you better?Please click the survey link to tell us how we
 are doing:  http://www.craneae.com/surveys/satisfaction.htm
 http://www.craneae.com/surveys/satisfaction.htm
 http://www.craneae.com/surveys/satisfaction.htm
 
 Your feedback is of the utmost importance to us. Thank you for your time.
 
 Crane Aerospace  Electronics Confidentiality Statement:
 The information contained in this email message may be privileged and is
 confidential information intended only for the use of the recipient, or any
 employee or agent responsible to deliver it to the intended recipient. Any
 unauthorized use, distribution or copying of this information is strictly
 prohibited and may be unlawful. If you have received this communication in
 error, please notify the sender immediately and destroy the original message
 and all attachments from your electronic files.
 
 


--
Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace 
Electronics | +1 425-743-8153 | M: +1 206-406-2670



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time

[zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
This must be resliver day :)

I just had a drive failure. The hot spare kicked in, and access to the pool 
over NFS was effectively zero for about 45 minutes. Currently the pool is still 
reslivering, but for some reason I can access the file system now. 

Resliver speed has been beaten to death I know, but is there a way to avoid 
this? For example, is more enterprisy hardware less susceptible to reslivers? 
This box is used for development VMs, but there is no way I would consider this 
for production with this kind of performance hit during a resliver.

My hardware:
Dell 2950
16G ram
16 disk SAS chassis
LSI 3801 (I think) SAS card (1068e chip)
Intel x25-e SLOG off of the internal PERC 5/i RAID controller
Seagate 750G disks (7200.11)

I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
i86pc Solaris)

  pool: data01
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Wed Sep 29 14:03:52 2010
1.12T scanned out of 5.00T at 311M/s, 3h37m to go
82.0G resilvered, 22.42% done
config:

NAME   STATE READ WRITE CKSUM
data01 DEGRADED 0 0 0
  raidz2-0 ONLINE   0 0 0
c1t8d0 ONLINE   0 0 0
c1t9d0 ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
  raidz2-1 DEGRADED 0 0 0
c1t22d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
c1t23d0ONLINE   0 0 0
spare-5REMOVED  0 0 0
  c1t20d0  REMOVED  0 0 0
  c8t18d0  ONLINE   0 0 0  (resilvering)
c1t21d0ONLINE   0 0 0
logs
  c0t1d0   ONLINE   0 0 0
spares
  c8t18d0  INUSE currently in use

errors: No known data errors

Thanks for any insights.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
I should add I have 477 snapshots across all files systems. Most of them are 
hourly snaps (225 of them anyway).

On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:

 This must be resliver day :)
 
 I just had a drive failure. The hot spare kicked in, and access to the pool 
 over NFS was effectively zero for about 45 minutes. Currently the pool is 
 still reslivering, but for some reason I can access the file system now. 
 
 Resliver speed has been beaten to death I know, but is there a way to avoid 
 this? For example, is more enterprisy hardware less susceptible to reslivers? 
 This box is used for development VMs, but there is no way I would consider 
 this for production with this kind of performance hit during a resliver.
 
 My hardware:
 Dell 2950
 16G ram
 16 disk SAS chassis
 LSI 3801 (I think) SAS card (1068e chip)
 Intel x25-e SLOG off of the internal PERC 5/i RAID controller
 Seagate 750G disks (7200.11)
 
 I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
 i86pc Solaris)
 
  pool: data01
 state: DEGRADED
 status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
 scan: resilver in progress since Wed Sep 29 14:03:52 2010
1.12T scanned out of 5.00T at 311M/s, 3h37m to go
82.0G resilvered, 22.42% done
 config:
 
   NAME   STATE READ WRITE CKSUM
   data01 DEGRADED 0 0 0
 raidz2-0 ONLINE   0 0 0
   c1t8d0 ONLINE   0 0 0
   c1t9d0 ONLINE   0 0 0
   c1t10d0ONLINE   0 0 0
   c1t11d0ONLINE   0 0 0
   c1t12d0ONLINE   0 0 0
   c1t13d0ONLINE   0 0 0
   c1t14d0ONLINE   0 0 0
 raidz2-1 DEGRADED 0 0 0
   c1t22d0ONLINE   0 0 0
   c1t15d0ONLINE   0 0 0
   c1t16d0ONLINE   0 0 0
   c1t17d0ONLINE   0 0 0
   c1t23d0ONLINE   0 0 0
   spare-5REMOVED  0 0 0
 c1t20d0  REMOVED  0 0 0
 c8t18d0  ONLINE   0 0 0  (resilvering)
   c1t21d0ONLINE   0 0 0
   logs
 c0t1d0   ONLINE   0 0 0
   spares
 c8t18d0  INUSE currently in use
 
 errors: No known data errors
 
 Thanks for any insights.
 
 -Scott
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When Zpool has no space left and no snapshots

2010-09-28 Thread Scott Meilicke
Preemptively use quotas?


On 9/22/10 7:25 PM, Aleksandr Levchuk alevc...@gmail.com wrote:

 Dear ZFS Discussion,
 
 I ran out of space, consequently could not rm or truncate files. (It
 make sense because it's a copy-on-write and any transaction needs to
 be written to disk. It worked out really well - all I had to do is
 destroy some snapshots.)
 
 If there are no snapshots to destroy, how to prepare for a situation
 when a ZFS pool looses it's last free byte?
 
 Alex
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace 
Electronics | +1 425-743-8153 | M: +1 206-406-2670



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-27 Thread Scott Meilicke
I just realized that the email I sent to David and the list did not make the 
list (at least as jive can see it), so here is what I sent on the 23rd:

Brilliant. I set those parameters via /etc/system, rebooted, and the pool 
imported with just the –f switch. I had seen this as an option earlier, 
although not that thread, but was not sure it applied to my case.

Scrub is running now. Thank you very much! 

-Scott

Update: The scrub finished with zero errors.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] My filesystem turned from a directory into a special character device

2010-09-27 Thread Scott Meilicke
I am running nexenta CE 3.0.3. 

I have a file system that at some point in the last week went from a directory 
per 'ls -l' to a  special character device. This results in not being able to 
get into the file system. Here is my file system, scott2, along with a new file 
system I  just created, as seen by ls -l:

drwxr-xr-x 4 root root4 Sep 27 09:14 scott
crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2

Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been 
fiddling with permissions last week, then had problems with a kernel panic. 
Perhaps this is related?

Any ideas how to get access to my file system? 

Thanks,
-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My filesystem turned from a directory into a special character device

2010-09-27 Thread Scott Meilicke

On 9/27/10 9:56 AM, Victor Latushkin victor.latush...@oracle.com wrote:

 
 On Sep 27, 2010, at 8:30 PM, Scott Meilicke wrote:
 
 I am running nexenta CE 3.0.3.
 
 I have a file system that at some point in the last week went from a
 directory per 'ls -l' to a  special character device. This results in not
 being able to get into the file system. Here is my file system, scott2, along
 with a new file system I  just created, as seen by ls -l:
 
 drwxr-xr-x 4 root root4 Sep 27 09:14 scott
 crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2
 
 Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been
 fiddling with permissions last week, then had problems with a kernel panic.
 
 Are you still running with aok/zfs_recover being set? Have you seen this issue
 before panic? 

Yes. Well, I have removed those entries in /etc/system, but have not yet
rebooted the box.

 
 Perhaps this is related?
 
 May be.
 
 Any ideas how to get access to my file system?
 
 This can be fixed, but it is a bit more complicated and error prone that
 setting couple of variables.

OK. Sounds like restoring from my backup would be best?

What causes this? I saw this exact same behavior on my home box, and had to
restore about two weeks ago. Not very encouraging. :(

Is there anything I can provide to help people who know more than me solve
this problem?

 
 Regards
 Victor

Thanks Victor.

-Scott



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Scott Meilicke
When I do the calculations, assuming 300bytes per block to be conservative, 
with 128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped 
data. But block size is dynamic, so you will need more than this.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data transfer taking a longer time than expected (Possibly dedup related)

2010-09-24 Thread Scott Meilicke
Can I disable dedup on the dataset while the transfer is going on?
Yes. Only the blocks copied after disabling dedupe will not be deduped. The 
stuff you have already copied will be deduped. 

Can I simply Ctrl-C the procress to stop it?
Yes, you can do that to a mv process. 

Maybe stop the process, delete the deduped file system (your copy target), and 
create a new file system without dedupe to see if that is any better?

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Scott Meilicke
Hi Peter,

dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple 
file systems are set to dedupe, then they all benefit from using the same pool 
of deduped blocks. In this way, if two files share some of the same blocks, 
even if they are in different file systems, they will dedupe.

I am not sure why reporting is not done at the file system level. It may be an 
accounting issue, i.e. which file system owns the dedupe blocks. But it seems 
some fair estimate could be made. Maybe the overhead to keep a file system 
updated with these stats is too high?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Configuration questions for Home File Server (CPU cores, dedup, checksum)?

2010-09-07 Thread Scott Meilicke
Craig,

3. I do not think you will get much dedupe on video, music and photos. I would 
not bother. If you really wanted to know at some later stage, you could create 
a new file system, enable dedupe, and copy your data (or a subset) into it just 
to see. In my experience there is a significant CPU penalty as well. My four 
core (1.86GHz xeons, 4 yrs old) box nearly maxes out when putting a lot of data 
into a deduped file system.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS development moving behind closed doors

2010-08-16 Thread Scott Meilicke
I had already begun the process of migrating my 134 boxes over to Nexenta 
before Oracle's cunning plans became known. This just reaffirms my decision. 

Us too. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshot space - miscalculation?

2010-08-04 Thread Scott Meilicke
Are there other file systems underneath daten/backups that have snapshots?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog/L2ARC on a hard drive and not SSD?

2010-07-21 Thread Scott Meilicke
Another data point - I used three 15K disks striped using my RAID controller as 
a slog for the zil, and performance went down. I had three raidz sata vdevs 
holding the data, and my load was VMs, i.e. a fair amount of small, random IO 
(60% random, 50% write, ~16k in size). 

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deleting large amounts of files

2010-07-19 Thread Scott Meilicke
If these files are deduped, and there is not a lot of RAM on the machine, it 
can take a long, long time to work through the dedupe portion. I don't know 
enough to know if that is what you are experiencing, but it could be the 
problem.

How much RAM do you have?

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR iSCSI and two Windows computers

2010-06-23 Thread Scott Meilicke
Look again at how XenServer does storage. I think you will find it already has 
a solution, both for iSCSI and NFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid-z - not even iops distribution

2010-06-23 Thread Scott Meilicke
Reaching into the dusty regions of my brain, I seem to recall that since RAIDz 
does not work like a traditional RAID 5, particularly because of variably sized 
stripes, that the data may not hit all of the disks, but it will always be 
redundant. 

I apologize for not having a reference for this assertion, so I may be 
completely wrong.

I assume your hardware is recent, the controllers are on PCIe x4 buses, etc.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OCZ Devena line of enterprise SSD

2010-06-15 Thread Scott Meilicke
Price? I cannot find it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] combining series of snapshots

2010-06-08 Thread Scott Meilicke
You might bring over all of your old data and snaps, then clone that into a new 
volume. Bring your recent stuff into the clone. Since the clone only updates 
blocks that are different than the underlying snap, you may see a significant 
storage savings.

Two clones could even be made - one for your live data, another to access the 
historical data.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-05-26 Thread Scott Meilicke
iSCSI writes require a sync to disk for every write. SMB writes get cached in 
memory, therefore are much faster.

I am not sure why it is so slow for reads.

Have you tried comstar iSCSI? I have read in these forums that it is faster.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI confusion

2010-05-24 Thread Scott Meilicke
VMware will properly handle sharing a single iSCSI volume across multiple ESX 
hosts. We have six ESX hosts sharing the same iSCSI volumes - no problems.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.

2010-04-23 Thread Scott Meilicke
At the time we had it setup as 3 x 5 disk raidz, plus a hot spare. These 16 
disks were in a SAS cabinet, and the the slog was on the server itself. We are 
now running 2 x 7 raidz2 plus a hot spare and slog, all inside the cabinet. 
Since the disks are 1.5T, I was concerned about resliver times for a failed 
disk.

About the only thing I would consider at this point is getting an SSD for the 
l2arc for dedupe performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarking Methodologies

2010-04-23 Thread Scott Meilicke
My use case for opensolaris is as a storage server for a VM environment (we 
also use EqualLogic, and soon an EMC CX4-120). To that end, I use iometer 
within a VM, simulating my VM IO activity, with some balance given to easy 
benchmarking. We have about 110 VMs across eight ESX hosts. Here is what I do:

* Attach a 100G vmdk to one Windows 2003 R2 VM
* Create a 32G test file (my opensolaris box has 16G of RAM)
* export/import the pool on the solaris box, and reboot my guest to clear 
caches all around
* Run a disk queue depth of 32 outstanding IOs
* 60% read, 65% random, 8k block size
* Run for five minutes spool up, then run the test for five minutes

My actual workload is closer to 50% read, 16k block size, so I adjust my 
interpretation of the results accordingly. 

Probably I should run a lot more iometer daemons.

Performance will increase as the benchmark runs due to the l2arc filling up, so 
I found that running the benchmark starting at 5 minutes into the work load was 
a happy medium. Things will get a bit faster the longer the benchmark runs, but 
this is good as far as benchmarking goes.

Only occasionally due I get wacko results, which I happily toss out the window.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.

2010-04-16 Thread Scott Meilicke
I have used build 124 in this capacity, although I did zero tuning. I had about 
4T of data on a single 5T iSCSI volume over gigabit. The windows server was a 
VM, and the opensolaris box is on a Dell 2950, 16G of RAM, x25e for the zil, no 
l2arc cache device. I used comstar. 

It was being used as a target for Doubletake, so it only saw write IO, with 
very little read. My load testing using iometer was very positive, and I would 
not have hesitated to use it as the primary node serving about 1000 users, 
maybe 200-300 active at a time. 

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Rethinking my zpool

2010-03-19 Thread Scott Meilicke
You will get much better random IO with mirrors, and better reliability when a 
disk fails with raidz2. Six sets of mirrors are fine for a pool. From what I 
have read, a hot spare can be shared across pools. I think the correct term 
would be load balanced mirrors, vs RAID 10.

What kind of performance do you need? Maybe raidz2 will give you the 
performance you need. Maybe not. Measure the performance of each configuration 
and decide for yourself. I am a big fan of iometer for this type of work.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-19 Thread Scott Meilicke
 One of the reasons I am investigating solaris for
 this is sparse volumes and dedupe could really help
 here.  Currently we use direct attached storage on
 the dom0s and allocate an LVM to the domU on
 creation.  Just like your example above, we have lots
 of those 80G to start with please volumes with 10's
 of GB unused.  I also think this data set would
 dedupe quite well since there are a great many
 identical OS files across the domUs.  Is that
 assumption correct?

This is one reason I like NFS - thin by default, and no wasted space within a 
zvol. zvols can be thin as well, but opensolaris will not know the inside 
format of the zvol, and you may still have a lot of wasted space after a while 
as files inside of the zvol come and go. In theory dedupe should work well, but 
I would be careful about a possible speed hit. 


 I've not seen an example of that before.  Do you mean
 having two 'head units' connected to an external JBOD
 enclosure or a proper HA cluster type configuration
 where the entire thing, disks and all, are
 duplicated?

I have not done any type of cluster work myself, but from what I have read on 
Sun's site, yes, you could connect the same jbod to two head units, 
active/passive, in an HA cluster, but no duplicate disks/jbod. When the active 
goes down, passive detects this and takes over the pool by doing an import. 
During the import, any outstanding transactions on the zil are replayed, 
whether they are on a slog or not. I believe this is how Sun does it on their 
open storage boxes (7000 series). Note - two jbods could be used, one for each 
head unit, making an active/active setup. Each jbod is active on one node, 
passive on the other.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
It is hard, as you note, to recommend a box without knowing the load. How many 
linux boxes are you talking about?

I think having a lot of space for your L2ARC is a great idea.

Will you mirror your SLOG, or load balance them? I ask because perhaps one will 
be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS 
using an iometer profile that closely approximates my work load. My ~100 VMs on 
8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups.

Don't discount NFS. I absolutely love NFS for management and thin provisioning 
reasons. Much easier (to me) than managing iSCSI, and performance is similar. I 
highly recommend load testing both iSCSI and NFS before you go live. Crash 
consistent backups of your VMs are possible using NFS, and recovering a VM from 
a snapshot is a little easier using NFS, I find.

Why not larger capacity disks?

Hopefully your switches support NIC aggregation?

The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly 
attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 
6794994).

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
I was planning to mirror them - mainly in the hope that I could hot swap a new 
one in the event that an existing one started to degrade. I suppose I could 
start with one of each and convert to a mirror later although the prospect of 
losing either disk fills me with dread.

You do not need to mirror the L2ARC devices, as the system will just hit disk 
as necessary. Mirroring sounds like a good idea on the SLOG, but this has been 
much discussed on the forums.

 Why not larger capacity disks?

We will run out of iops before we run out of space.

Interesting. I find IOPS is more proportional to the number of VMs vs disk 
space. 

User: I need a VM that will consume up to 80G in two years, so give me an 80G 
disk.
Me: OK, but recall we can expand disks and filesystems on the fly, without 
downtime.
User: Well, that is cool, but 80G to start with please.
Me: sigh 

I also believe the SLOG and L2ARC will make using high RPM disks not as 
necessary. But, from what I have read, higher RPM disks will greatly help with 
scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another 
with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. 
Many options. But measure to see what works for you. iometer is great for that, 
I find. 

Any opinions on the use of battery backed SAS adapters?

Surely these will help with performance in write back mode, but I have not done 
any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly 
help with IOPS on a five disk raidz. There are pros and cons. Search the 
forums, but off the top of my head 1) SLOGs are much larger than controller 
caches: 2) only synced write activity is cached in a ZIL, whereas a controller 
cache will cache everything, needed or not, thus running out of space sooner; 
3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. 
the all in one cache of a controller. 4) A controller *may* be faster, since it 
uses ram for the cache.

One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node 
goes down, the other can bring up the pool, check the ZIL for any necessary 
transactions, and apply them. To do this with battery backed cache, you would 
need fancy interconnects between the nodes, cache mirroring, etc. All of those 
things that SAN array products do. 

Sounds like you have a fun project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread Scott Meilicke
Apple users have different expectations regarding data loss than Solaris and 
Linux users do.

Come on, no Apple user bashing. Not true, not fair.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?

2010-03-16 Thread Scott Meilicke
This is what I used:
http://wikis.sun.com/display/OpenSolarisInfo200906/How+to+Configure+iSCSI+Target+Ports

I distilled that to:

disable the old, enable the new (comstar)

* sudo svcadm disable iscsitgt
* sudo svcadm enable stmf

Then four steps (using my zfs/zpool info - substitute for yours):

* sudo zfs create -s -V 5t data01/san/gallardo/g (the -s makes it thin, -V 
specifies a block volume)
* sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g
* sudo itadm create-target
* sudo stmfadm add-view 600144F0E24785004A80910A0001

This should allow any initiator to connect to your volume, no security.

Not quite a one liner. After you create the target once (step 3), you do not 
have to do that again for the next volume. So three lines.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-15 Thread Scott Meilicke
Greg, I am using NetBackup 6.5.3.1 (7.x is out) with fine results. Nice and 
fast.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What a

2010-03-04 Thread Scott Meilicke
To be clear, you can do what you want with the following items (besides 
your server):

(1) OpenSolaris LiveCD
(1) 8GB USB Flash drive
As many tapes as you need to store your data pools on.

Make sure the USB drive has a saved stream from your rpool. It should 
also have a downloaded copy of whichever main backup software you use.

That's it. You backup data using Amanda/Bacula/et al onto tape. You 
backup your boot/root filesystem using 'zfs send' onto the USB key.

Erik, great! I never thought of the USB key to store an rpool copy. I will give 
it a go on my test box.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz2 array FAULTED with only 1 drive down

2010-02-25 Thread Scott Meilicke
You might have to force the import with -f.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Scott Meilicke
I don't think adding an SSD mirror to an existing pool will do much for 
performance. Some of your data will surely go to those SSDs, but I don't think 
the solaris will know they are SSDs and move blocks in and out according to 
usage patterns to give you an all around boost. They will just be used to store 
data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for 
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI 
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added 
to the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Thanks Dan.

When I try the clone then import:

pfexec zfs clone 
data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 
data01/san/gallardo/g-testandlab
pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

The sbdadm import-lu gives me:

sbdadm: guid in use

which makes sense, now that I see it. The man pages make it look like I cannot 
give it another GUID during the import. Any other thoughts? I *could* delete 
the current lu, import, get my data off and reverse the process, but that would 
take the current volume off line, which is not what I want to do.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Sure, but that will put me back into the original situation.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
That is likely it. I create the volume using 2009.06, then later upgraded to 
124. I just now created a new zvol, connected it to my windows server, 
formatted, and added some data. Then I snapped the zvol, cloned the snap, and 
used 'pfexec sbdadm create-lu'. When presented to the windows server, it 
behaved as expected. I could see the data I created prior to the snapshot.

Thank you very much Dave (and everyone else).

Now,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
I plan on filing a support request with Sun, and will try to post back with any 
results.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-04 Thread Scott Meilicke
I have a single zfs volume, shared out using COMSTAR and connected to a Windows 
VM. I am taking snapshots of the volume regularly. I now want to mount a 
previous snapshot, but when I go through the process, Windows sees the new 
volume, but thinks it is blank and wants to initialize it. Any ideas how to get 
Windows to see that it has data on it?

Steps I took after the snap:

zfs clone snapshot data01/san/gallardo/g-recovery
sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-recovery
stmfadm add-view -h HG-Gallardo -t TG-Gallardo -n 1 
600144F0EAE40A004B6B59090003

At this point, my server Gallardo can see the LUN, but like I said, it looks 
blank to the OS. I suspect the 'sbdadm create-lu' phase.

Any help to get Windows to see it as a LUN with NTFS data would be appreciated.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Scott Meilicke
Link aggregation can use different algorithms to load balance. Using L4 (IP 
plus originating port I think), using a single client computer and the same 
protocol (NFS), but different origination ports has allowed me to saturate both 
NICS in my LAG. So yes, you just need more than one 'conversation', but the LAG 
setup will determine how a conversation is defined.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Scott Meilicke
It looks like there is not a free slot for a hot spare? If that is the case, 
then it is one more factor to push towards raidz2, as you will need time to 
remove the failed disk and insert a new one. During that time you don't want to 
be left unprotected.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL to disk

2010-01-15 Thread Scott Meilicke
I think Y is such a variable and complex number it would be difficult to give a 
rule of thumb, other than to 'test with your workload'. 

My server, having three, five disk raidzs (striped) and an intel x25-e as a zil 
can fill my two G ethernet pipes over NFS (~200MBps) during mostly sequential 
writes. That same server can only consume about 22 MBps using an artificial 
load designed to simulate my VM activity (using iometer). So it varies greatly 
depending upon Y.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz data loss stories?

2009-12-21 Thread Scott Meilicke
Yes, a coworker lost a second disk during a rebuild of a raid5 and lost all 
data. I have not had a failure, however when migrating EqualLogic arrays in and 
out of pools, I lost a disk on an array. No data loss, but it concerns me 
because during the moves, you are essentially reading and writing all of the 
data on the disk. Did I have a latent problem on that particular disk that only 
exposed itself when doing such a large read/write? What if another disk had 
failed, and during the rebuild this latent problem was exposed? Trouble, 
trouble.

They say security is an onion. So is data protection.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using iSCSI on ZFS with non-native FS - How to backup.

2009-12-07 Thread Scott Meilicke
It does 'just work', however you may have some file and/or file system 
corruption if the snapshot was taken at the moment that your mac is updating 
some files. So use the time slider function and take a lot of snaps. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mirroring ZIL device

2009-11-23 Thread Scott Meilicke
# 1. It may help to use 15k disks as the zil. When I tested using three 15k 
disks striped as my zil, it made my workload go slower, even though it seems 
like it should have been faster. My suggestion is to test it out, and see if it 
helps.

#3. You may get good performance with an inexpensive SSD because the SSD should 
have fast random writes, but probably not fast sequential writes. But I would 
test it first against your anticipated workload. :) An Intel 32G X25-E runs 
just shy of $400, and they are pretty speedy. I don't know if that would fit 
your budget. There is also some concern about losing power and having the X25 
RAM cache disappear during a write. 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X45xx storage vs 7xxx Unified storage

2009-11-23 Thread Scott Meilicke
If the 7310s can meet your performance expectations, they sound much better 
than a pair of x4540s. Auto-fail over, SSD performance (although these can be 
added to the 4540s), ease of management, and a great front end. 

I haven't seen if you can use your backup software with the 7310s, but from 
what I have read in this thread, that may be the only downside (a big one). 
Everything else points to the 7310s.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness

2009-11-18 Thread Scott Meilicke
I second the use of zilstat - very useful, especially if you don't want to mess 
around with adding a log device and then having to destroy the pool if you 
don't want the log device any longer.

On Nov 18, 2009, at 2:20 AM, Dushyanth wrote:

 Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator 
 to a third party target go through ZIL ?

It depends on whether the application requires a sync or not. dd does not, but 
databases (in general) do. As Richard said, ZFS treats the iSCSI volume just 
like any other vdev (pool of disks), so the fact that it is an iSCSI volume has 
nothing to do with ZFS' zil usage. 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness

2009-11-17 Thread Scott Meilicke
I am sorry that I don't have any links, but here is what I observe on my 
system. dd does not do sync writes, so the ZIL is not used. iSCSI traffic does 
sync writes (as of 2009.06, but not 2008.05), so if you repeat your test using 
an iSCSI target from your system, you should see log activity. Same for NFS. I 
see no ZIL activity using rsync, for an example of a network file transfer that 
does not require sync.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Difficulty testing an SSD as a ZIL

2009-10-30 Thread Scott Meilicke
Excellent! That worked just fine. Thank you Victor.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Difficulty testing an SSD as a ZIL

2009-10-29 Thread Scott Meilicke
Hi all,

I received my SSD, and wanted to test it out using fake zpools with files as 
backing stores before attaching it to my production pool. However, when I 
exported the test pool and imported, I get an error. Here is what I did:

I created a file to use as a backing store for my new pool:
mkfile 1g /data01/test2/1gtest

Created a new pool:
zpool create ziltest2 /data01/test2/1gtest 

Added the SSD as a log:
zpool add -f ziltest2 log c7t1d0

(c7t1d0 is my SSD. I used the -f option since I had done this before with a 
pool called 'ziltest', same results)

A 'zpool status' returned no errors.

Exported:
zpool export ziltest2

Imported:
zpool import -d /data01/test2 ziltest2
cannot import 'ziltest2': one or more devices is currently unavailable

This happened twice with two different test pools using file-based backing 
stores.

I am nervous about adding the SSD to my production pool. Any ideas why I am 
getting the import error?

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] File level cloning

2009-10-28 Thread Scott Meilicke
I don't think so. But, you can clone at the ZFS level, and then just use the 
vmdk(s) that you need. As long as you don't muck about with the other stuff in 
the clone, the space usage should be the same.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool getting in a stuck state?

2009-10-28 Thread Scott Meilicke
Hi Jeremy,

I had a loosely similar problem with my 2009.06 box. In my case (which may not 
be yours), working with support we found a bug that was causing my pool to 
hang. I also got erroneous errors when I did a scrub ( 3 x 5 disk raidz). I am 
using the same LSI controller. A sure fire way to kill the box was to setup a 
file system as an iSCSI target, and write a lot of data to it, around 1-2MB/s. 
It would usually die inside of a few hours. NFS writing was not as bad, but 
within a day it would panic there too.

The solution for me was to upgrade to 124. Since the upgrade three weeks ago, I 
have had no problems.

Again, I don't know if this would fix your problem, but it may be worth a try. 
Just don't upgrade your ZFS version, and you will be able to roll back to 
2009.06 at any time.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Scott Meilicke
sigh

Thanks Frédéric, that is a very interesting read. 

So my options as I see them now:

1. Keep the x25-e, and disable the cache. Performance should still be improved, 
but not by a *whole* like, right? I will google for an expectation, but if 
anyone knows off the top of their head, I would be appreciative.
2. Buy a ZEUS or similar SSD with a cap backed cache. Pricing is a little hard 
to come by, based on my quick google, but I am seeing $2-3k for an 8G model. Is 
that right? Yowch.
3. Wait for the x25-e g2, which is rumored to have cap backed cache, and may or 
may not work well (but probably will).
4. Put the x25-e with disabled cache behind my PERC with the PERC cache enabled.

My budget is tight. I want better performance now. #4 sounds good. Thoughts?

Regarding mirrored SSDs for the ZIL, it was my understanding that if the SSD 
backed ZIL failed, ZFS would fail back to using the regular pool for the ZIL, 
correct? Assuming this is correct, a mirror would be to preserve performance 
during a failure?

Thanks everyone, this has been really helpful.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Scott Meilicke
Ed, your comment:

If solaris is able to install at all, I would have to acknowledge, I
have to shutdown anytime I need to change the Perc configuration, including
replacing failed disks.

Replacing failed disks is easy when PERC is doing the RAID. Just remove the 
failed drive and replace with a good one, and the PERC will rebuild 
automatically. But are you talking about OpenSolaris managed RAID? I am pretty 
sure, but not tested, that in pseudo JBOD mode (each disk a raid 0 or 1), the 
PERC would still present a replaced disk to the OS without reconfiguring the 
PERC BIOS.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Scott Meilicke
I have an Intel X25-E 32G in the mail (actually the kingston version), and 
wanted to get a sanity check before I start.

System:
Dell 2950
16G RAM
16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no extra drive 
slots, a single zpool.
svn_124, but with my zpool still running at the 2009.06 version (14).

I will likely get another chassis and 16 disks for another pool in the 3-18 
month time frame.

My plan is to put the SSD into an open disk slot on the 2950, but will have to 
configure it as a RAID 0, since the onboard PERC5 controller does not have a 
JBOD mode.

Options I am considering:

A. Use all 32G for the ZIL
B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up an SSD like 
this?
C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used as a ZIL for 
the future zpool.

Since my future zpool would just be used as a backup to disk target, I am 
leaning towards option C. Any gotchas I should be aware of?  

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Incremental snapshot size

2009-09-30 Thread Scott Meilicke
It is more cost, but a WAN Accelerator (Cisco WAAS, Riverbed, etc.) would be a 
big help.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] poor man's Drobo on FreeNAS

2009-09-30 Thread Scott Meilicke
Requires a login...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to verify if the ZIL is disabled

2009-09-23 Thread Scott Meilicke
How can I verify if the ZIL has been disabled or not? 

I am trying to see how much benefit I might get by using an SSD as a ZIL. I 
disabled the ZIL via the ZFS Evil Tuning Guide:

echo zil_disable/W0t1 | mdb -kw

and then rebooted. However, I do not see any benefits for my NFS workload.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to verify if the ZIL is disabled

2009-09-23 Thread Scott Meilicke
Thank you both, much appreciated.

I ended up having to put the flag into /etc/system. When I disabled the ZIL and 
umount/mounted without a reboot, my ESX host would not see the NFS export, nor 
could I create a new NFS connection from my ESX host. I could get into the file 
system from the host itself of course.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to verify if the ZIL is disabled

2009-09-23 Thread Scott Meilicke
 zfs share -a

Ah-ha! Thanks.

FYI, I got between 2.5x and 10x improvement in performance, depending on the 
test. So tempting :)

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ versus mirrroed

2009-09-16 Thread Scott Meilicke
I think in theory the ZIL/L2ARC should make things nice and fast if your 
workload includes sync requests (database, iscsi, nfs, etc.), regardless of the 
backend disks. But the only sure way to know is test with your work load.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-08 Thread Scott Meilicke
True, this setup is not designed for high random I/O, but rather lots of 
storage with fair performance. This box is for our dev/test backend storage. 
Our production VI runs in the 500-700 IOPS (80+ VMs, production plus dev/test) 
on average, so for our development VI, we are expecting half of that at most, 
on average. Testing with parameters that match the observed behavior of the 
production VI gets us about 750 IOPS with compression (NFS, 2009.06), so I am 
happy with the performance and very happy with the amount of available space.

Stripped mirrors are much faster, ~2200 IOPS with 16 disks (but alas, tested 
with iSCSI on 2008.11, compression on. We got about 1,000 IOPS with the 3x5 
raidz setup with compression to compare iSCSI and 2008.11 vs NFS and 2009.06), 
but again we are shooting for available space, with performance being a 
secondary goal. And yes, we would likely get much better performance using SSDs 
for the ZIL and L2ARC. 

This has been an interesting thread! Sorry for the bit of hijacking...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Roch Bourbonnais Wrote:
100% random writes produce around 200 IOPS with a 4-6 second pause
around every 10 seconds. 

This indicates that the bandwidth you're able to transfer
through the protocol is about 50% greater than the bandwidth
the pool can offer to ZFS. Since, this is is not sustainable, you
see here ZFS trying to balance the 2 numbers.

When I have tested using 50% reads, 60% random using iometer over NFS, I can 
see the data going straight to disk due to the sync nature of NFS. But I also 
see writes coming to a stand still every 10 seconds or so, which I have 
attributed to the ZIL dumping to disk. Therefore I conclude that it is the 
process of dumping the ZIL to disk that (mostly?) blocks writes during the 
dumping. I do agree with Bob and others that suggest making the size of the 
dump smaller will mask this behavior, and that seems like a good idea, although 
I have not yet tried and tested it myself.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding when (and how) ZFS will use spare disks

2009-09-04 Thread Scott Meilicke
This sounds like the same behavior as opensolaris 2009.06. I had several disks 
recently go UNAVAIL, and the spares did not take over. But as soon as I 
physically removed a disk, the spare started replacing the removed disk. It 
seems UNAVAIL is not the same as the disk not being there. I wish the spare 
*would* take over in these cases, since the pool is degraded.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
So what happens during the txg commit?

For example, if the ZIL is a separate device, SSD for this example, does it not 
work like:

1. A sync operation commits the data to the SSD
2. A txg commit happens, and the data from the SSD are written to the spinning 
disk

So this is two writes, correct?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Doh! I knew that, but then forgot...

So, for the case of no separate device for the ZIL, the ZIL lives on the disk 
pool. In which case, the data are written to the pool twice during a sync:

1. To the ZIL (on disk) 
2. From RAM to disk during tgx

If this is correct (and my history in this thread is not so good, so...), would 
that then explain some sort of pulsing write behavior for sync write operations?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
So, I just re-read the thread, and you can forget my last post. I had thought 
the argument was that the data were not being written to disk twice (assuming 
no separate device for the ZIL), but it was just explaining to me that the data 
are not read from the ZIL to disk, but rather from memory to disk. I need more 
coffee...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Yes, I was getting confused. Thanks to you (and everyone else) for clarifying.

Sync or async, I see the txg flushing to disk starve read IO.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
I only see the blocking while load testing, not during regular usage, so I am 
not so worried. I will try the kernel settings to see if that helps if/when I 
see the issue in production. 

For what it is worth, here is the pattern I see when load testing NFS (iometer, 
60% random, 65% read, 8k chunks, 32 outstanding I/Os):

data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M

LSI SAS HBA, 3 x 5 disk raidz, Dell 2950, 16GB RAM.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS iSCSI Clustered for VMware Host use

2009-09-01 Thread Scott Meilicke
You are completely off your rocker :)

No, just kidding. Assuming the virtual front-end servers are running on 
different hosts, and you are doing some sort of raid, you should be fine. 
Performance may be poor due to the inexpensive targets on the back end, but you 
probably know that. A while back I thought of doing similar stuff using local 
storage on my ESX hosts, and abstracting that with an OpenSolaris VM and 
iSCSI/NFS.

Perhaps consider inexpensive but decent NAS/SAN devices from Synology. They are 
not expensive, offer NFS and iSCSI, and you can also replicate/backup between 
two of them using rsync. Yes, you would be 'wasting' the storage space by 
having two, but like I said, they are inexpensive. Then you would not have the 
two layer architecture.  

I just tested a two disk model, using ESXi 3.5u4 and a Windows VM. I used 
iometer, realworld test, and IOs were about what you would expect from mirrored 
7200 SATA drives - 138 IOPS, about 1.1 Mbps. The internal CPU was around 20%, 
RAM usage was 128MB out of the 512MB on board, so it was disk limited. 

The Dell 2950 that I have 2009.06 installed on (16GB of RAM and an LSI HBA with 
an external SAS enclosure) with a single mirror using two 7200 drives gave me 
about 200 IOPS using the same test, presumably because of the large amounts of 
RAM for the L2ARC cache.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-08-31 Thread Scott Meilicke
As I understand it, when you expand a pool, the data do not automatically 
migrate to the other disks. You will have to rewrite the data somehow, usually 
a backup/restore.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Connect couple of SATA JBODs to one storage server

2009-08-27 Thread Scott Meilicke
Roman, are you saying you want to install OpenSolaris on your old servers, or 
make the servers look like an external JBOD array, that another server will 
then connect to?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Scott Meilicke
You can try:

zpool iostat pool_name -v 1

This will show you IO on each vdev at one second intervals. Perhaps you will 
see different IO behavior on any suspect drive.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS load balancing / was: ZFS, ESX , and NFS. oh my!

2009-08-12 Thread Scott Meilicke
Yes! That would be icing on the cake.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Live resize/grow of iscsi shared ZVOL

2009-08-12 Thread Scott Meilicke
My EqualLogic arrays do not disconnect when resizing volumes.

When I need to resize, on the Windows side I open the iSCSI control panel, and 
get ready to click the 'logon' button. I then resize the volume on the 
OpenSolaris box, and immediately after that is complete, on the Windows side, 
re-login to the target. Since the Windows initiator can tolerate brief 
disconnects, IO is not stopped or adversely affected, just paused for those few 
seconds. It works fine. Multi-path is a little more complicated as you would 
have to re-logon to all of your paths, but if you have at least one path 
active, you should be fine.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-07 Thread Scott Meilicke
Note - this has a mini PCIe interface, not PCIe.

I had the 64GB version in a Dell Mini 9. While it was great for it's small 
size, low power and low heat characteristics (no fan on the Mini 9!), it was 
only faster than the striped sata drives in my mac pro when it came to random 
reads. Everything else was slower, sometimes by a lot, as measured by XBench. 
Unfortunately I no longer have the numbers to share. I see the sustained writes 
listed as up to 25 MB/s, and bursts up to 51 MB/s.

That said, I have read of people having good luck with fast CF cards (no ref, 
sorry). So maybe this will be just fine :) 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-07 Thread Scott Meilicke
 ZFS absolutely observes synchronous write requests (e.g. by NFS or a 
 database). The synchronous write requests do not benefit from the 
 long write aggregation delay so the result may not be written as 
 ideally as ordinary write requests. Recently zfs has added support 
 for using a SSD as a synchronous write log, and this allows zfs to 
 turn synchronous writes into more ordinary writes which can be written 
 more intelligently while returning to the user with minimal latency.

Bob, since the ZIL is used always, whether a separate device or not, won't 
writes to a system without a separate ZIL also be written as intelligently as 
with a separate ZIL?

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I setting 'zil_disable' to increase ZFS/iscsi performance ?

2009-08-06 Thread Scott Meilicke
You can use a separate SSD ZIL.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Scott Meilicke
 which gap?
 
 'RAID-Z should mind the gap on writes' ?
 
 Message was edited by: thometal

I believe this is in reference to the raid 5 write hole, described here:
http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance

RAIDZ should avoid this via it's Copy on Write model:
http://en.wikipedia.org/wiki/Zfs#Copy-on-write_transactional_model

So I'm not sure what the 'RAID-Z should mind the gap on writes' comment is 
getting at either. 

Clarification?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
For what it is worth, I too have seen this behavior when load testing our zfs 
box. I used iometer and the RealLife profile (1 worker, 1 target, 65% reads, 
60% random, 8k, 32 IOs in the queue). When writes are being dumped, reads drop 
close to zero, from 600-700 read IOPS to 15-30 read IOPS.

zpool iostat data01 1

Where data01 is my pool name

pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  55.5G  20.4T691  0  4.21M  0
data01  55.5G  20.4T632  0  3.80M  0
data01  55.5G  20.4T657  0  3.93M  0
data01  55.5G  20.4T669  0  4.12M  0
data01  55.5G  20.4T689  0  4.09M  0
data01  55.5G  20.4T488  1.77K  2.94M  9.56M
data01  55.5G  20.4T 29  4.28K   176K  23.5M
data01  55.5G  20.4T 25  4.26K   165K  23.7M
data01  55.5G  20.4T 20  3.97K   133K  22.0M
data01  55.6G  20.4T170  2.26K  1.01M  11.8M
data01  55.6G  20.4T678  0  4.05M  0
data01  55.6G  20.4T625  0  3.74M  0
data01  55.6G  20.4T685  0  4.17M  0
data01  55.6G  20.4T690  0  4.04M  0
data01  55.6G  20.4T679  0  4.02M  0
data01  55.6G  20.4T664  0  4.03M  0
data01  55.6G  20.4T699  0  4.27M  0
data01  55.6G  20.4T423  1.73K  2.66M  9.32M
data01  55.6G  20.4T 26  3.97K   151K  21.8M
data01  55.6G  20.4T 34  4.23K   223K  23.2M
data01  55.6G  20.4T 13  4.37K  87.1K  23.9M
data01  55.6G  20.4T 21  3.33K   136K  18.6M
data01  55.6G  20.4T468496  2.89M  1.82M
data01  55.6G  20.4T687  0  4.13M  0

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
 On Tue, 30 Jun 2009, Bob Friesenhahn wrote:
 
 Note that this issue does not apply at all to NFS
 service, database 
 service, or any other usage which does synchronous
 writes.

I see read starvation with NFS. I was using iometer on a Windows VM, connecting 
to an NFS mount on a 2008.11 physical box. iometer params: 65% read, 60% 
random, 8k blocks, 32 outstanding IO requests, 1 worker, 1 target.

NFS Testing
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M
data01  59.7G  20.4T 35 23   536K  2.97M
data01  59.7G  20.4T 32 23   483K  2.97M
data01  59.7G  20.4T 37 37   538K  4.70M

While writes are being committed to the ZIL all the time, periodic dumping to 
the pool still occurs, and during those times reads are starved. Maybe this 
doesn't happen in the 'real world' ?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-26 Thread Scott Meilicke
I ran the RealLife iometer profile on NFS based storage (vs. SW iSCSI), and got 
nearly identical results to having the disks on iSCSI:

iSCSI
IOPS: 1003.8
MB/s: 7.8
Avg Latency (s): 27.9

NFS
IOPS: 1005.9
MB/s: 7.9
Avg Latency (s): 29.7

Interesting!

Here is how the pool was behaving during the testing. Again this is NFS backed 
storage:

data01   122G  20.3T166 63  2.80M  4.49M
data01   122G  20.3T145 59  2.28M  3.35M
data01   122G  20.3T168 58  2.89M  4.38M
data01   122G  20.3T169 59  2.79M  3.69M
data01   122G  20.3T 54935   856K  18.1M
data01   122G  20.3T  9  7.96K   183K   134M
data01   122G  20.3T 49  3.82K   900K  61.8M
data01   122G  20.3T160 61  2.73M  4.23M
data01   122G  20.3T166 63  2.62M  4.01M
data01   122G  20.3T162 64  2.55M  4.24M
data01   122G  20.3T163 61  2.63M  4.14M
data01   122G  20.3T145 54  2.37M  3.89M
data01   122G  20.3T163 63  2.69M  4.35M
data01   122G  20.3T171 64  2.80M  3.97M
data01   122G  20.3T153 67  2.68M  4.65M
data01   122G  20.3T164 66  2.63M  4.10M
data01   122G  20.3T171 66  2.75M  4.51M
data01   122G  20.3T175 53  3.02M  3.83M
data01   122G  20.3T157 59  2.64M  3.80M
data01   122G  20.3T172 59  2.85M  4.11M
data01   122G  20.3T173 68  2.99M  4.11M
data01   122G  20.3T 97 35  1.66M  2.61M
data01   122G  20.3T170 58  2.87M  3.62M
data01   122G  20.3T160 64  2.72M  4.17M
data01   122G  20.3T163 63  2.68M  3.77M
data01   122G  20.3T160 60  2.67M  4.29M
data01   122G  20.3T165 65  2.66M  4.05M
data01   122G  20.3T191 59  3.25M  3.97M
data01   122G  20.3T159 65  2.76M  4.18M
data01   122G  20.3T154 52  2.64M  3.50M
data01   122G  20.3T164 61  2.76M  4.38M
data01   122G  20.3T154 62  2.66M  4.08M
data01   122G  20.3T160 58  2.71M  3.95M
data01   122G  20.3T 84 34  1.48M  2.37M
data01   122G  20.3T  9  7.27K   156K   125M
data01   122G  20.3T 25  5.20K   422K  84.3M
data01   122G  20.3T170 60  2.77M  3.64M
data01   122G  20.3T170 63  2.85M  3.85M
 
So it appears NFS is doing syncs, while iSCSI is not (See my earlier zpool 
iostat data for iSCSI). Isn't this what we expect, because NFS does syncs, 
while iSCSI does not (assumed)?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow ls or slow zfs

2009-06-26 Thread Scott Meilicke
Hi,

When you have a lot of random read/writes, raidz/raidz2 can be fairly slow.
http://blogs.sun.com/roch/entry/when_to_and_not_to

The recommendation is to break the disks into smaller raidz/z2 stripes, thereby 
improving IO.

From the ZFS Best Practices Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations

The recommended number of disks per group is between 3 and 9. If you have more 
disks, use multiple groups.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-25 Thread Scott Meilicke
 if those servers are on physical boxes right now i'd do some perfmon
 caps and add up the iops.

Using perfmon to get a sense of what is required is a good idea. Use the 95 
percentile to be conservative. The counters I have used are in the Physical 
disk object. Don't ignore the latency counters either. In my book, anything 
consistently over 20ms or so is excessive.

I run 30+ VMs on an Equallogic array with 14 sata disks, broken up as two 
striped 6 disk raid5 sets (raid 50) with 2 hot spares. That array is, on 
average, about 25% loaded from an IO stand point. Obviously my VMs are pretty 
light. And the EQL gear is *fast*, which makes me feel better about spending 
all of that money :).

 Regarding ZIL usage, from what I have read you will only see 
 benefits if you are using NFS backed storage, but that it can be 
 significant.

 link?

From the ZFS Evil Tuning Guide 
(http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide):
ZIL stands for ZFS Intent Log. It is used during synchronous writes 
operations.

further down:

If you've noticed terrible NFS or database performance on SAN storage array, 
the problem is not with ZFS, but with the way the disk drivers interact with 
the storage devices.
ZFS is designed to work with storage devices that manage a disk-level cache. 
ZFS commonly asks the storage device to ensure that data is safely placed on 
stable storage by requesting a cache flush. For JBOD storage, this works as 
designed and without problems. For many NVRAM-based storage arrays, a problem 
might come up if the array takes the cache flush request and actually does 
something rather than ignoring it. Some storage will flush their caches despite 
the fact that the NVRAM protection makes those caches as good as stable storage.
ZFS issues infrequent flushes (every 5 second or so) after the uberblock 
updates. The problem here is fairly inconsequential. No tuning is warranted 
here.
ZFS also issues a flush every time an application requests a synchronous write 
(O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush 
is waited upon by the application and impacts performance. Greatly so, in fact. 
From a performance standpoint, this neutralizes the benefits of having an 
NVRAM-based storage.

When I was testing iSCSI vs. NFS, it was clear iSCSI was not doing sync, NFS 
was. Here are some zpool iostat numbers:

iSCSI testing using iometer with the RealLife work load (65% read, 60% random, 
8k transfers - see the link in my previous post) - it is clear that writes are 
being cached in RAM, and then spun off to disk.

# zpool iostat data01 1

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  55.5G  20.4T691  0  4.21M  0
data01  55.5G  20.4T632  0  3.80M  0
data01  55.5G  20.4T657  0  3.93M  0
data01  55.5G  20.4T669  0  4.12M  0
data01  55.5G  20.4T689  0  4.09M  0
data01  55.5G  20.4T488  1.77K  2.94M  9.56M
data01  55.5G  20.4T 29  4.28K   176K  23.5M
data01  55.5G  20.4T 25  4.26K   165K  23.7M
data01  55.5G  20.4T 20  3.97K   133K  22.0M
data01  55.6G  20.4T170  2.26K  1.01M  11.8M
data01  55.6G  20.4T678  0  4.05M  0
data01  55.6G  20.4T625  0  3.74M  0
data01  55.6G  20.4T685  0  4.17M  0
data01  55.6G  20.4T690  0  4.04M  0
data01  55.6G  20.4T679  0  4.02M  0
data01  55.6G  20.4T664  0  4.03M  0
data01  55.6G  20.4T699  0  4.27M  0
data01  55.6G  20.4T423  1.73K  2.66M  9.32M
data01  55.6G  20.4T 26  3.97K   151K  21.8M
data01  55.6G  20.4T 34  4.23K   223K  23.2M
data01  55.6G  20.4T 13  4.37K  87.1K  23.9M
data01  55.6G  20.4T 21  3.33K   136K  18.6M
data01  55.6G  20.4T468496  2.89M  1.82M
data01  55.6G  20.4T687  0  4.13M  0

Testing against NFS shows writes to disk continuously.

NFS Testing
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  59.6G  20.4T 57216   352K  1.74M
data01  59.6G  20.4T 41 21   660K  2.74M
data01  59.6G  20.4T 44 24   655K  3.09M
data01  59.6G  20.4T 41 23   598K  2.97M
data01  59.6G  20.4T 34 33   552K  4.21M
data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  

Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-25 Thread Scott Meilicke
 Isn't that section of the evil tuning guide you're quoting actually about
 checking if the NVRAM/driver connection is working right or not?

Miles, yes, you are correct. I just thought it was interesting reading about 
how syncs and such work within ZFS.

Regarding my NFS test, you remind me that my test was flawed, in that my iSCSI 
numbers were using the ESXi iSCSI SW initiator, while the NFS tests were 
performed with the VM as the guest, not ESX. I'll give ESX as the NFS client, 
vmdks on NFS, a go and get back to you. Thanks!

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Scott Meilicke
See this thread for information on load testing for vmware:
http://communities.vmware.com/thread/73745?tstart=0start=0

Within the thread there are instructions for using iometer to load test your 
storage. You should test out your solution before going live, and compare what 
you get with what you need. Just because striping 3 mirrors *will* give you 
more performance than raidz2 doesn't always mean that is the best solution. 
Choose the best solution for your use case.

You should have at least two NICs per connection to storage and LAN (4 total in 
this simple example), for redundancy if nothing else. Performance wise, vsphere 
can now have multiple SW iSCSI connections to a single LUN. 

My testing showed compression increased iSCSI performance by 1.7x, so I like 
compression. But again, these are my tests in my situation. Your results may 
differ from mine.

Regarding ZIL usage, from what I have read you will only see benefits if you 
are using NFS backed storage, but that it can be significant. Remove the ZIL 
for testing to see the max benefit you could get. Don't do this in production!

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAN server

2009-06-23 Thread Scott Meilicke
For ~100 people, I like Bob's answer. RAID 10 will get you lots of speed. 
Perhaps RAID50 would be just fine for you as well and give your more space, but 
without measuring, you won't be sure. Don't forget a hot spare (or two)!

Your MySQL database - will that generate a lot of IO?

Also, to ensure you can recover from failures, consider separate pools for your 
database files and log files, both for MySQL and Exchange. 

Good luck!

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!

2009-06-19 Thread Scott Meilicke
So how are folks getting around the NFS speed hit? Using SSD or battery backed 
RAM ZILs?

Regarding limited NFS mounts, underneath a single NFS mount, would it work to:

* Create a new VM
* Remove the VM from inventory
* Create a new ZFS file system underneath the original
* Copy the VM to that file system
* Add to inventory

At this point the VM is running underneath it's own file system. I don't know 
if ESX would see this?

To create another VM:

* Snap the original VM
* Create a clone underneath the original NFS FS, along side the original VM ZFS.

Laborious to be sure.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O th

2009-06-19 Thread Scott Meilicke
Generally, yes. Test it with your workload and see how it works out for you.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 7110 questions

2009-06-18 Thread Scott Meilicke
Both iSCSI and NFS are slow? I would expect NFS to be slow, but in my iSCSI 
testing with OpenSolaris 2008.11, performance we reasonable, about 2x NFS. 

Setup: Dell 2950 with a SAS HBA and SATA 3x5 raidz (15 disks, no separate ZIL), 
iSCSI using vmware ESXi 3.5 software initiator.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric mirroring

2009-06-10 Thread Scott Meilicke
The SATA drive will be your bottleneck, and you will lose any speed advantages 
of the SAS drives, especially using 3 vdevs on a single SATA disk.

I am with Richard, figure out what performance you need, and build accordingly.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >