Re: [zfs-discuss] VDI iops with caching

2013-01-04 Thread Geoff Nordli

On 13-01-04 02:08 PM, Richard Elling wrote:



All of these IOPS -- VDI users guidelines are wrong. The problem 
is that the variability of
response time is too great for a HDD. The only hope we have of 
getting the back-of-the-napkin
calculations to work is to reduce the variability by using a device 
that is more consistent in its

response (eg SSDs).


For sure there is going to be a lot of variability, but it seems we 
aren't even close.


Have you seen any back-of-the-napkin calculations which take into 
consideration SSDs for cache usage?


Yes. I've written a white paper on the subject, somewhere on the 
nexenta.com http://nexenta.com website (if it is still available).

But more current info is presentation at ZFSday.
http://www.youtube.com/watch?v=A4yrSfaskwI
http://www.slideshare.net/relling



Great presentation Richard.

Our system is designed to provide hands-on labs for education.  We use a 
saved state file for our VMs which eliminate the need for cold 
boot/login and shutdown issues.  This reduces the need for random IO.  
As well, in this scenario we don't need to worry about software updates 
or AV scans, because the labs are completely sandboxed.  We need to use 
HDDs because you have a large amount of labs which can be stored for an 
extended period.


I have been asked to adapt the platform to deliver a VDI solution so I 
need to make a few more tweaks.


thanks,

Geoff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VDI iops with caching

2013-01-03 Thread Geoff Nordli

Thanks Richard, Happy New Year.

On 13-01-03 09:45 AM, Richard Elling wrote:
On Jan 2, 2013, at 8:45 PM, Geoff Nordli geo...@gnaa.net 
mailto:geo...@gnaa.net wrote:



I am looking at the performance numbers for the Oracle VDI admin guide.

http://docs.oracle.com/html/E26214_02/performance-storage.html

From my calculations for 200 desktops running Windows 7 knowledge 
user (15 iops) with a 30-70 read/write split it comes to 5100 iops. 
Using 7200 rpm disks the requirement will be 68 disks.


This doesn't seem right, because if you are using clones with 
caching, you should be able to easily satisfy your reads from ARC and 
L2ARC.  As well, Oracle VDI by default caches writes; therefore the 
writes will be coalesced and there will be no ZIL activity.


All of these IOPS -- VDI users guidelines are wrong. The problem is 
that the variability of
response time is too great for a HDD. The only hope we have of getting 
the back-of-the-napkin
calculations to work is to reduce the variability by using a device 
that is more consistent in its

response (eg SSDs).


For sure there is going to be a lot of variability, but it seems we 
aren't even close.


Have you seen any back-of-the-napkin calculations which take into 
consideration SSDs for cache usage?




Anyone have other guidelines on what they are seeing for iops with vdi?



The successful VDI implementations I've seen have relatively small 
space requirements for
the performance-critical work. So there are a bunch  of companies 
offering SSD-based arrays
for that market. If you're stuck with HDDs, then effective use of 
snapshots+clones with a few

GB of RAM and slog can support quite a few desktops.
 -- richard



Yes, I would like to stick with HDDs.

I am just not quite sure what quite a few desktops mean.

I thought for sure there would be lots of people around that have done 
small deployments using a standard ZFS deployment.


thanks,

Geoff




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] VDI iops with caching

2013-01-02 Thread Geoff Nordli

I am looking at the performance numbers for the Oracle VDI admin guide.

http://docs.oracle.com/html/E26214_02/performance-storage.html

From my calculations for 200 desktops running Windows 7 knowledge user 
(15 iops) with a 30-70 read/write split it comes to 5100 iops. Using 
7200 rpm disks the requirement will be 68 disks.


This doesn't seem right, because if you are using clones with caching, 
you should be able to easily satisfy your reads from ARC and L2ARC.  As 
well, Oracle VDI by default caches writes; therefore the writes will be 
coalesced and there will be no ZIL activity.


Anyone have other guidelines on what they are seeing for iops with vdi?

Happy New Year!

Geoff








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-16 Thread Geoff Nordli

On 12-11-16 03:02 AM, Jim Klimov wrote:

On 2012-11-15 21:43, Geoff Nordli wrote:

Instead of using vdi, I use comstar targets and then use vbox built-in
scsi initiator.


Out of curiosity: in this case are there any devices whose ownership
might get similarly botched, or you've tested that this approach also
works well for non-root VMs?

Did you measure any overheads of initiator-target vs. zvol, both being
on the local system? Is there any significant performance difference
worth thinking and talking about?


Hi Jim.

This works for non-root VMs.

I haven't measured the difference between them, but it has been working 
fine.  These aren't high-performance VMs.  The design was to replicate 
the entire infrastructure for a small office every night to an off-site 
location.I have two of these in production right now and it has been 
working really well.


I still need to work on some scripts to on the fly rebuild the VMs. One 
thing that I have done in the past is store the LUN and LU GUID in the 
zfs user defined properties to keep track of it.  I love zfs user 
defined properties, they are one of the killer features of ZFS.  Really, 
no reason to not be able to store the entire VM configuration as zfs 
properties.  That could be interesting with your vboxsvc smf project.


I work for another company that uses vbox for a lab management solution 
for education.  We use the same architecture (vbox iscsi initiator - 
comstar target) but separate out the virtual machines from the storage.  
It is a very slick system.


have a great day!

Geoff





















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot

2012-11-15 Thread Geoff Nordli
On 12-11-15 11:57 AM, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:


When I google around for anyone else who cares and may have already 
solved the problem before I came along - it seems we're all doing the 
same thing for the same reason.If by any chance you are running 
VirtualBox on a solaris / opensolaris / openidiana / whatever ZFS 
host, you could of course use .vdi files for the VM virtual disks, but 
a lot of us are using zvol instead, for various reasons.To do the 
zvol, you first create the zvol (sudo zfs create -V) and then chown it 
to the user who runs VBox (sudo chown someuser /dev/zvol/rdsk/...) and 
then create a rawvmdk that references it (VBoxManage internalcommands 
createrawvmdk -filename /home/someuser/somedisk.vmdk -rawdisk 
/dev/zvol/rdsk/...)


The problem is - during boot / reboot, or anytime the zpool or zfs 
filesystem is mounted or remounted, export, import...The zvol 
ownership reverts back to root:root.So you have to repeat your sudo 
chown before the guest VM can start.


And the question is ...Obviously I can make an SMF service which will 
chown those devices automatically, but that's kind of a crappy solution.


Is there any good way to assign the access rights, or persistently 
assign ownership of zvol's?





Instead of using vdi, I use comstar targets and then use vbox built-in 
scsi initiator.



Geoff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-08 Thread Geoff Nordli
Dan,

If you are going to do the all in one with vbox, you probably want to look
at:

http://sourceforge.net/projects/vboxsvc/

It manages the starting/stopping of vbox vms via smf.

Kudos to Jim Klimov for creating and maintaining it.

Geoff


On Thu, Nov 8, 2012 at 7:32 PM, Dan Swartzendruber dswa...@druber.comwrote:


 I have to admit Ned's (what do I call you?)idea is interesting.  I may give
 it a try...

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] defer_destroy property set on snapshot creation

2012-09-13 Thread Geoff Nordli

I am running NexentaOS_134f

This is really weird, but I for some reason the defer_destroy property 
is being set on new snapshots and I can't turn it off. Normally it 
should be enabled when using the zfs destroy -d command.  The property 
doesn't seem to be inherited from anywhere.


It seems to have just started happening.

Here are the steps showing how it works.  Really, it is working as 
expected, but the property shouldn't be set on creation.


Create snapshot:
root@grok-zfs1:~# zfs snapshot groklab/ws08r2-U2037@5
root@grok-zfs1:~# zfs get defer_destroy | grep U2037\@5
groklab/ws08r2-U2037@5   defer_destroy on -

Create a clone:
root@grok-zfs1:~# zfs clone groklab/ws08r2-U2037@5 groklab/test2
root@grok-zfs1:~# zfs list -t all | grep test2
groklab/test20   886G  11.6G  -

The snapshot is still there:
root@grok-zfs1:~# zfs list -t all | grep U2037\@5
groklab/ws08r2-U2037@5   0  -  11.6G  -

Destroy the clone:
root@grok-zfs1:~# zfs destroy groklab/test2

Snapshot is gone:
root@grok-zfs1:~# zfs list -t all | grep U2037\@5
root@grok-zfs1:~#

thanks,

Geoff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SAS world-wide name

2012-03-01 Thread Geoff Nordli
trying to figure out a reliable way to identify drives to make sure I pull the 
right drive when there is a failure.   These will be smaller installations 
(16 drives)

I am pretty sure the wwn name on a sas device is preassigned like a MAC 
address, but I just want to make sure.   Is there any scenario where the wwn 
changes?  

so ideally, as long as I label the disk, with the correct wwn, then I should 
be able to identify it as failed, and be able to pull it?


  NAME   STATE READ WRITE CKSUM
tank   ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
c2t5000C50033F5BD7Fd0  ONLINE   0 0 0
c2t5000C50033F5BE3Bd0  ONLINE   0 0 0
  mirror-1 ONLINE   0 0 0
c2t5000C50033F5BF9Fd0  ONLINE   0 0 0
c2t5000C50033F5BFBBd0  ONLINE   0 0 0
spares
  c2t5000C50033F5D607d0AVAIL   


thanks,

Geoff 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using Solaris iSCSI target in VirtualBox iSCSI Initiator

2011-02-25 Thread Geoff Nordli


-Original Message-
From: Thierry Delaitre
Sent: Wednesday, February 23, 2011 4:42 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Using Solaris iSCSI target in VirtualBox iSCSI
Initiator

Hello,

I’m using ZFS to export some iscsi targets for the virtual box iscsi
initiator.

It works ok if I try to install the guest OS manually.

However, I’d like to be able to import my already prepared guest os vdi
images
into the iscsi devices but I can’t figure out how to do it.

Each time I tried, I cannot boot.

It only works if I save the manually installed guest os and re-instate the
same as
follows:

dd if=/dev/dsk/c3t600144F07551C2004D619D170002d0p0 of=debian.vdi dd
if=debian.vdi  of=/dev/dsk/c3t600144F07551C2004D619D170002d0p0

fdisk /dev/dsk/c3t600144F07551C2004D619D170002d0p0

   Total disk size is 512 cylinders
 Cylinder size is 4096 (512 byte) blocks

   Cylinders
  Partition   Status    Type  Start   End   Length    %
  =   ==      =   ===   ==   ===
  1   Active    Linux native  0   463 464 91
  2 EXT-DOS 464   511  48  9

I’m wondering whether there is an issue with the disk geometry hardcoded in
the
vdi file container ?

Does the VDI disk geometry need to match the LUN size ?

Thanks,

Thierry.

Hi Thierry.

You need to convert the VDI image into a raw image before you import into a
zvol.  

Something like:  vboxmanage internalcommands converttoraw debian.vdi
debian.raw

Then I run dd directly into the zvol device, not the iscsi LUN like:
/dev/zvol/rdsk/zpool_name/debian

Geoff 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] a single nfs file system shared out twice with different permissions

2010-12-20 Thread Geoff Nordli
From: Edward Ned Harvey
Sent: Monday, December 20, 2010 9:25 AM
Subject: RE: [zfs-discuss] a single nfs file system shared out twice with
different
permissions

 From: Richard Elling

  zfs create tank/snapshots
  zfs set sharenfs=on tank/snapshots

 on by default sets the NFS share parameters to: rw
 You can set specific NFS share parameters by using a string that
 contains the parameters.  For example,

  zfs set sharenfs=rw=192.168.12.13,ro=192.168.12.14 my/file/system

 sets readonly access for host 192.168.12.14 and read/write access for
 192.168.12.13.

Yeah, but for some reason, the OP didn't want to make it readonly for
different
clients ... He wanted a single client to have it mounted twice on two
different
directories, one with readonly, and the other with read-write.

I guess he has some application he can imprison into a specific read-only
subdirectory, while some other application should be able to read/write or
something like that, using the same username, on the same machine.

It is the same application, but for some functions it needs to use read-only
access or it will modify the files when I don't want it to. 

Have a great day!

Geoff 

   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] a single nfs file system shared out twice with different permissions

2010-12-20 Thread Geoff Nordli
From: Richard Elling 
Sent: Monday, December 20, 2010 8:14 PM
Subject: Re: [zfs-discuss] a single nfs file system shared out twice with
different
permissions

On Dec 20, 2010, at 11:26 AM, Geoff Nordli geo...@gnaa.net wrote:

 From: Edward Ned Harvey
 Sent: Monday, December 20, 2010 9:25 AM
 Subject: RE: [zfs-discuss] a single nfs file system shared out twice
 with
 different
 permissions

 From: Richard Elling

 zfs create tank/snapshots
 zfs set sharenfs=on tank/snapshots

 on by default sets the NFS share parameters to: rw
 You can set specific NFS share parameters by using a string that
 contains the parameters.  For example,

zfs set sharenfs=rw=192.168.12.13,ro=192.168.12.14 my/file/system

 sets readonly access for host 192.168.12.14 and read/write access
 for 192.168.12.13.

 Yeah, but for some reason, the OP didn't want to make it readonly for
 different
 clients ... He wanted a single client to have it mounted twice on two
 different
 directories, one with readonly, and the other with read-write.

Is someone suggesting my solution won't work? Or are they just not up to
the
challenge? :-)


It won't work :) 

The challenge is exporting two shares from the same folder.  Linux has a
bind command which will make this work, but from what I can see there
isn't an equivalent on OpenSolaris.  

This isn't a big deal though; I can make it work using CIFS.   It isn't
something that has to be NFS, but I thought I would ask to see if there was
a simple solution I was missing.   

 I guess he has some application he can imprison into a specific
 read-only subdirectory, while some other application should be able
 to read/write or something like that, using the same username, on the
same
machine.

 It is the same application, but for some functions it needs to use
 read-only access or it will modify the files when I don't want it to.

Sounds like a simple dtrace script should do the trick, too.

Unfortunately, there isn't anything I can do about the application, and it
really isn't a big deal.  There is a pretty straight forward workaround.


Have a great day!

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] a single nfs file system shared out twice with different permissions

2010-12-20 Thread Geoff Nordli
From: Darren J Moffat 
Sent: Monday, December 20, 2010 4:15 AM
Subject: Re: [zfs-discuss] a single nfs file system shared out twice with
different
permissions

On 18/12/2010 07:09, Geoff Nordli wrote:
 I am trying to configure a system where I have two different NFS
 shares which point to the same directory.  The idea is if you come in
 via one path, you will have read-only access and can't delete any
 files, if you come in the 2nd path, then you will have read/write access.

That sounds very similar to what you would do with Trusted Extensions.
The read/write label would be a higher classification than the read-only
one -
since you can read down, can't see higher and need to be equal to modify.

For more information on Trusted Extensions start with these resources:


Oracle Solaris 11 Express Trusted Extensions Collection

   http://docs.sun.com/app/docs/coll/2580.1?l=en

OpenSolaris Security Community pages on TX:

http://hub.opensolaris.org/bin/view/Community+Group+security/tx


Darren, thanks for the suggestion.  I think I am going to go back to using
CIFS.   It seems to be quite a bit simpler than what I am looking at with
NFS.

Have a great day!

Geoff  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] a single nfs file system shared out twice with different permissions

2010-12-18 Thread Geoff Nordli


-Original Message-
From: Edward Ned Harvey
[mailto:opensolarisisdeadlongliveopensola...@nedharvey.com]
Sent: Saturday, December 18, 2010 6:13 AM
To: 'Geoff Nordli'; zfs-discuss@opensolaris.org
Subject: RE: [zfs-discuss] a single nfs file system shared out twice with
different
permissions

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Geoff Nordli

 I am trying to configure a system where I have two different NFS
 shares which point to the same directory.  The idea is if you come in
 via one
path,
 you will have read-only access and can't delete any files, if you come
 in the 2nd path, then you will have read/write access.

I think you can do this client-side.

mkdir /foo1
mkdir /foo2
mount nfsserver:/exports/bar /foo1
mount -o ro nfsserver:/exports/bar /foo2

Thanks Edward.

The client side solution works great. 

Happy holidays!!

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] a single nfs file system shared out twice with different permissions

2010-12-17 Thread Geoff Nordli
I am trying to configure a system where I have two different NFS shares
which point to the same directory.  The idea is if you come in via one path,
you will have read-only access and can't delete any files, if you come in
the 2nd path, then you will have read/write access.

For example, create the read/write nfs share:

zfs create tank/snapshots
zfs set sharenfs=on tank/snapshots

r...@grok-zfs1:/# sharemgr show -vp
default nfs=()
zfs
zfs/tank/snapshots nfs=()
  /tank/snapshots


I have had some luck doing it with Samba. 

Any pointers to making it work with NFS? 

Thanks,

Geoff 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs list takes a long time to return

2010-11-02 Thread Geoff Nordli
I am running the latest version of Nexenta Core 3.0 (b134 + extra
backports).  

 

The time to run zfs list is starting to increase as the number of datasets
increase where it takes almost 30 seconds now to return 1500 datasets.  

 

r...@zfs1:/etc# time zfs list -t all | wc -l

1491

 

real0m29.614s

user0m0.382s

sys 0m6.329s

 

 

This machine has plenty room on the memory side of things:

 

ARC Size:

 Current Size: 1090 MB (arcsize)

 Target Size (Adaptive):   7159 MB (c)

 Min Size (Hard Limit):894 MB (zfs_arc_min)

 Max Size (Hard Limit):7159 MB (zfs_arc_max)

 

 

Are there other things I can look at which may improve the performance of
the zfs list command or is this as good as it is going to get?   

 

Thanks,

 

Geoff 

 

 

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] way to find out of a dataset has children

2010-09-27 Thread Geoff Nordli
Is there a way to find out if a dataset has children or not using zfs
properties or other scriptable method?  

I am looking for a more efficient way to delete datasets after they are
finished being used.  Right now I use custom property to set delete=1 on a
dataset, and then I have a script that runs async to clean them up.  If
there are children then the delete will fail. 

This method works, but I would rather filter it again so it only tries to
delete a dataset which can actually be deleted. 

I have looked at the usedbychildren and usedbysnapshots, but they are just
showing 0 in the columns.

Thanks,

Geoff 







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] way to find out of a dataset has children

2010-09-27 Thread Geoff Nordli
From: Darren J Moffat 
Sent: Monday, September 27, 2010 11:03 AM


On 27/09/2010 18:14, Geoff Nordli wrote:
 Is there a way to find out if a dataset has children or not using zfs
 properties or other scriptable method?

 I am looking for a more efficient way to delete datasets after they
 are finished being used.  Right now I use custom property to set
 delete=1 on a dataset, and then I have a script that runs async to
 clean them up.  If there are children then the delete will fail.

 This method works, but I would rather filter it again so it only tries
 to delete a dataset which can actually be deleted.

This sounds very like what 'zfs hold' and 'zfs destroy -d' were designed
for.  When
using 'zfs send' holds will automatically be taken out for pool versions 18
and
higher.


Darren, thanks for this tip.  It this looks like it will work well for
snapshots, but I can't apply the property to a clone.

Are there any properties I can set on the clone side?

I could definitely do a zfs list, and look for same name as the clone which
I am trying to delete, but I am looking for a better way. 

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] way to find out of a dataset has children

2010-09-27 Thread Geoff Nordli


From: Richard Elling  
Sent: Monday, September 27, 2010 1:01 PM

On Sep 27, 2010, at 11:54 AM, Geoff Nordli wrote:

 Are there any properties I can set on the clone side?

Each clone records its origin snapshot in the origin property.

$ zfs get origin syspool/rootfs-nmu-001
NAMEPROPERTY  VALUE   SOURCE
syspool/rootfs-nmu-001  originsyspool/rootfs-nmu-...@nmu-001  -

Enjoy
 -- richard

Hi Richard.

Yes, we can use the origin, but that tells me where it came from, not how
many snapshots are built from it.  

Before I can delete it, the clone can't have any snapshots under it.  

This isn't hard to solve, I can just do a regex on the clone name looking
for snapshots name, but I was hoping there was a simple zfs property I can
query.

Thanks,

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] stmf corruption and dealing with dynamic lun mapping

2010-09-01 Thread Geoff Nordli
I am running Nexenta NCP 3.0 (134f).

My stmf configuration was corrupted.  I was getting errors like in
/var/adm/messages:

Sep  1 10:32:04 llift-zfs1 svc-stmf[378]: [ID 130283 user.error] get
property view_entry-0/all_hosts failed - entity not found
Sep  1 10:32:04 llift-zfs1 svc.startd[9]: [ID 652011 daemon.warning]
svc:/system/stmf:default: Method /lib/svc/method/svc-stmf start failed
with exit status 1

In the /var/adm/system-stmf\:default.log

[ Sep  1 10:32:05 Executing start method (/lib/svc/method/svc-stmf start).
]
svc-stmf: Unable to load the configuration. See /var/adm/messages for
details
svc-stmf: For information on reverting the stmf:default instance to a
previously running configuration see the man page for svccfg(1M)
svc-stmf: After reverting the instance you must clear the service
maintenance state. See the man page for svcadm(1M)


I fixed it by going into the svccfg and reverted to the previous running
snap. 

We have a lab management system which continuously creates and deletes LUNs
as virtual machines are built and destroyed.  When I recovered to the
previous running state we had a mismatch between what the LUNs should be and
what they were.  

Is there a backup configuration somewhere, or a way to re-read the LUN
configuration? 

If not, I set the LUN for each volume in the custom zfs properties.  I may
just need to build a sanitizer script to rebuild the LUN mappings in the
event of catastrophic failure.

BTW, I am running this system inside a VMWare Server vm, which has caused
some instability, but I guess it is good to be prepared. 

Thanks,

Geoff  







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS

2010-08-08 Thread Geoff Nordli
From: Edward Ned Harvey [mailto:sh...@nedharvey.com]
Sent: Sunday, August 08, 2010 8:34 PM

 boun...@opensolaris.org] On Behalf Of Geoff Nordli

 Anyone have any experience with a R510 with the PERC H200/H700
 controller with ZFS?

 My perception is that Dell doesn't play well with OpenSolaris.

I have an R710...  Not quite the same, but similar.

When you say doesn't play nice with opensolaris, you're hitting a grain
of truth
but blowing it out of proportion.  It's true that some stuff might not
brainlessly
work the way you'd like.  And it's true that you'll be lacking some
functionality
that you'd probably like.  But it's easy to quantify and describe, and easy
to
describe the options of dealing with it, too:

#1  Optical drive, by default, in AHCI mode.  For some reason, apparently
using a
chipset that osol can't handle.  So put the optical drive into ATA or
Legacy mode
before booting.  Or use an external optical drive for installation.

#2  On my R710, the perc driver wasn't included in the sol10 DVD by
default, so I
had to download a special one from support.dell.com, and hit 5 - insert
driver
disk or whatever, during sol10 installation.  This was not an issue for
osol
installation.

#3  OMSA provides the web-based gui to manage your raid card.  Not
available in
sol10/osol.  Instead, find the appropriate MegaCLI for your device ...
which is really tough to do.  This is needed if you want the ability to
replace a
failed drive without rebooting.

Once you overcome these things, it works great.

Thanks Edward.

What did you end up using for the L2ARC?   The SSDs shown in the online
configurator are SLC based.  

Did you use the Broadcom or the optional Intel NIC?

Geoff 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS

2010-08-07 Thread Geoff Nordli
Anyone have any experience with a R510 with the PERC H200/H700 controller
with ZFS?

My perception is that Dell doesn't play well with OpenSolaris. 

Thanks,

Geoff 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS

2010-08-07 Thread Geoff Nordli


-Original Message-
From: Brian Hechinger [mailto:wo...@4amlunch.net]
Sent: Saturday, August 07, 2010 8:10 AM
To: Geoff Nordli
Subject: Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS

On Sat, Aug 07, 2010 at 08:00:11AM -0700, Geoff Nordli wrote:
 Anyone have any experience with a R510 with the PERC H200/H700
 controller with ZFS?

Not that particular setup, but I do run Solaris on a Precision 690 with
PERC 6i
controllers.

 My perception is that Dell doesn't play well with OpenSolaris.

What makes you say that?  I've run Solaris on quite a few Dell boxes and
have
never had any issues.

-brian
--
 
Hi Brian.

I am glad to hear that, because I would prefer to use a dell box.  

Is there a JBOD mode with the PERC 6i? 

It is funny how sometimes one forms these views as you gather information.  

Geoff   
 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] block align SSD for use as a l2arc cache

2010-07-09 Thread Geoff Nordli
I have an Intel X25-M 80GB SSD.

 

For optimum performance, I need to block align the SSD device, but I am not
sure exactly how I should to it.  

 

If I run the format - fdisk it allows me to partition based on a cylinder,
but I don't think that is sufficient enough.

 

Can someone tell me how they block aligned an SSD device for use in l2arc.

 

Thanks,

 

Geoff 

 

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] block align SSD for use as a l2arc cache

2010-07-09 Thread Geoff Nordli


-Original Message-
From: Erik Trimble  
Sent: Friday, July 09, 2010 6:45 PM

Subject: Re: [zfs-discuss] block align SSD for use as a l2arc cache

On 7/9/2010 5:55 PM, Geoff Nordli wrote:
I have an Intel X25-M 80GB SSD.

For optimum performance, I need to block align the SSD device, but I am not
sure
exactly how I should to it.

If I run the format - fdisk it allows me to partition based on a cylinder,
but I don't
think that is sufficient enough.

Can someone tell me how they block aligned an SSD device for use in l2arc.

Thanks,

Geoff


(a) what makes you think you need to do block alignment for an L2ARC usage
(particularly if you give the entire device to ZFS)?

(b) what makes you think that even if (a) is needed, that ZFS will respect
4k block
boundaries? That is, why do you think that ZFS would put any effort into
doing
block alignment with its L2ARC writes?


Thanks Erik.

So obviously what you are saying is you don't need to worry about doing
block alignment with an l2arc cache device because it will randomly
read/write at the device block level instead of doing a larger writes like a
file system.   

Have a great weekend!

Geoff 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup RAM requirements, vs. L2ARC?

2010-07-01 Thread Geoff Nordli
 Actually, I think the rule-of-thumb is 270 bytes/DDT
 entry.  It's 200 
 bytes of ARC for every L2ARC entry.
 
 DDT doesn't count for this ARC space usage
 
 E.g.:I have 1TB of 4k files that are to be
 deduped, and it turns 
 out that I have about a 5:1 dedup ratio. I'd also
 like to see how much 
 ARC usage I eat up with a 160GB L2ARC.
 
 (1)How many entries are there in the DDT:
 1TB of 4k files means there are 2^30
  files (about 1 billion).
 However, at a 5:1 dedup ratio, I'm only
  actually storing 
 0% of that, so I have about 214 million blocks.
 Thus, I need a DDT of about 270 * 214
  million  =~  58GB in size
 (2)My L2ARC is 160GB in size, but I'm using 58GB
 for the DDT.  Thus, 
 I have 102GB free for use as a data cache.
 102GB / 4k =~ 27 million blocks can be
  stored in the 
 emaining L2ARC space.
 However, 26 million files takes up:
200 * 27 million =~ 
 GB of space in ARC
 Thus, I'd better have at least 5.5GB of
  RAM allocated 
 olely for L2ARC reference pointers, and no other use.
 
 

Hi Erik.

Are you saying the DDT will automatically look to be stored in an L2ARC device 
if one exists in the pool, instead of using ARC? 

Or is there some sort of memory pressure point where the DDT gets moved from 
ARC to L2ARC?

Thanks,

Geoff
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OCZ Vertex 2 Pro performance numbers

2010-06-25 Thread Geoff Nordli


From: Arne Jansen
Sent: Friday, June 25, 2010 3:21 AM

Now the test for the Vertex 2 Pro. This was fun.
For more explanation please see the thread Crucial RealSSD C300 and cache
flush?
This time I made sure the device is attached via 3GBit SATA. This is also
only a
short test. I'll retest after some weeks of usage.

cache enabled, 32 buffers, 64k blocks
linear write, random data: 96 MB/s
linear read, random data: 206 MB/s
linear write, zero data: 234 MB/s
linear read, zero data: 255 MB/s
random write, random data: 84 MB/s
random read, random data: 180 MB/s
random write, zero data: 224 MB/s
randow read, zero data: 190 MB/s

cache enabled, 32 buffers, 4k blocks
linear write, random data: 93 MB/s
linear read, random data: 138 MB/s
linear write, zero data: 113 MB/s
linear read, zero data: 141 MB/s
random write, random data: 41 MB/s (10300 ops/s) random read, random data:
76 MB/s (19000 ops/s) random write, zero data: 54 MB/s (13800 ops/s) random
read, zero data: 91 MB/s (22800 ops/s)


cache enabled, 1 buffer, 4k blocks
linear write, random data: 62 MB/s (15700 ops/s) linear read, random data:
32
MB/s (8000 ops/s) linear write, zero data: 64 MB/s (16100 ops/s) linear
read, zero
data: 45 MB/s (11300 ops/s) random write, random data: 14 MB/s (3400 ops/s)
random read, random data: 22 MB/s (5600 ops/s) random write, zero data: 19
MB/s (4500 ops/s) random read, zero data: 21 MB/s (5100 ops/s)

cache enabled, 1 buffer, 4k blocks, with cache flushes:
linear write, random data, flush after every write: 5700 ops/s linear
write, zero
data, flush after every write: 5700 ops/s linear write, random data, flush
after
every 4th write: 8500 ops/s linear write, zero data, flush after every 4th
write:
8500 ops/s


Some remarks:

The random op numbers have to be read with care:
 - reading occurs in the same order as the writing before
 - the ops are not aligned to any specific boundary

The device also passed the write-loss-test: after 5 repeats no data has
been lost.

It doesn't make any difference if the cache is enabled or disabled, so it
might be
worth to tune zfs to not issue cache flushes.

Conclusion: This device will make an excellent slog device. I'll order them
today ;)

--Arne

Arne, thanks for doing these tests, they are great to see.  

Is this the one
(http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/maxim
um-performance-enterprise-solid-state-drives/ocz-vertex-2-pro-series-sata-ii
-2-5--ssd-.html) with the built in supercap? 

Geoff 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-18 Thread Geoff Nordli
-Original Message-
From: Linder, Doug
Sent: Friday, June 18, 2010 12:53 PM

Try doing inline quoting/response with Outlook, where you quote one
section,
reply, quote again, etc.  It's impossible.  You can't split up the quoted
section to
add new text - no way, no how.  Very infuriating.  It's like Outlook was
*designed* to force people to top post.   

Hi Doug.

I use Outlook too, and you are right, it is a major PITA.  

I was hoping that OL2010 was going to solve the problem, but it doesn't :(

The only way I can get it to sort of work is by editing the HTML message,
and saving it as plain text, then replying to that.  If you try to reply to
an HTML formatted message, it is awful. 

I also manually clean up some of the header information below Original
Message.  

Have a great weekend!

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup... still in beta status

2010-06-15 Thread Geoff Nordli
From: Fco Javier Garcia
Sent: Tuesday, June 15, 2010 11:21 AM

 Realistically, I think people are overtly-enamored with dedup as a
 feature - I would generally only consider it worth-while in cases
 where you get significant savings. And by significant, I'm talking an
 order of magnitude space savings.  A 2x savings isn't really enough to
 counteract the down sides.  Especially when even enterprise disk space
 is
 (relatively) cheap.



I think dedup may have its greatest appeal in VDI environments (think about
a
environment with 85% if the data that the virtual machine needs is into ARC
or
L2ARC... is like a dream...almost instantaneous response... and you can
boot a
new machine in a few seconds)...


Does dedup benefit in the ARC/L2ARC space?  

For some reason, I have it in my head that for each time it requests the
block from storage it will copy it into cache; therefore if I had 10 VMs
requesting the same dedup'd block, there will be 10 copies of the same block
in ARC/L2ARC.

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-09 Thread Geoff Nordli


 On Behalf Of Joe Auty
Sent: Tuesday, June 08, 2010 11:27 AM


I'd love to use Virtualbox, but right now it (3.2.2 commercial which I'm
evaluating, I haven't been able to compile OSE on the CentOS 5.5 host yet)
is
giving me kernel panics on the host while starting up VMs which are
obviously
bothersome, so I'm exploring continuing to use VMWare Server and seeing
what I
can do on the Solaris/ZFS side of things. I've also read this on a VMWare
forum,
although I don't know if this correct? This is in context to me questioning
why I
don't seem to have these same load average problems running Virtualbox:



Hi Joe.

One thing about Vbox is they are rapidly adding new features which cause
some instability and regressions.  Unless there is a real need for one of
the new features in the 3.2 branch, I would recommend working with the 3.0
branch in a production environment.  They will announce when they feel that
3.2 becomes production ready.  

VirtualBox is a great type 2 hypervisor, and I can't believe how much it has
improved over the last year. 

Have a great day!

Geoff 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-09 Thread Geoff Nordli


Brandon High wrote:
On Tue, Jun 8, 2010 at 10:33 AM, besson3c j...@netmusician.org wrote:


What VM software are you using? There are a few knobs you can turn in VBox
which will help with slow storage. See
http://www.virtualbox.org/manual/ch12.html#id2662300 for instructions on
reducing the flush interval.

-B

Hi Brandon.

Have you played with the flush interval? 

I am using iscsi based zvols, and I am thinking about not using the caching
in vbox and instead rely on the comstar/zfs side.  

What do you think? 

Geoff 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-05-26 Thread Geoff Nordli


-Original Message-
From: Matt Connolly
Sent: Wednesday, May 26, 2010 5:08 AM

I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:

sh-4.0# zfs create rpool/iscsi
sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g
rpool/iscsi/test

The underlying zpool is a mirror of two SATA drives. I'm connecting from a
Mac client with global SAN initiator software, connected via Gigabit LAN. It
connects fine, and I've initialiased a mac format volume on that iScsi
volume.

Performance, however, is terribly slow, about 10 times slower than an SMB
share on the same pool. I expected it would be very similar, if not faster
than SMB.

Here's my test results copying 3GB data:

iScsi:  44m01s  1.185MB/s
SMB share:  4m2711.73MB/s

Reading (the same 3GB) is also worse than SMB, but only by a factor of about
3:

iScsi:  4m3611.34MB/s
SMB share:  1m4529.81MB/s


Is there something obvious I've missed here?
--

Hi Matt, here is a decent post on how to setup the COMSTAR and disable the
old iscsitgt service.

http://toic.org/2009/11/08/opensolaris-server-with-comstar-and-zfs/ 

Have a great day!

Geoff 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-17 Thread Geoff Nordli


-Original Message-
From: Edward Ned Harvey [mailto:solar...@nedharvey.com]
Sent: Monday, May 17, 2010 6:29 AM

 I was messing around with a ramdisk on a pool and I forgot to remove
 it before I shut down the server.  Now I am not able to mount the
 pool.  I am not concerned with the data in this pool, but I would like
 to try to figure out how to recover it.

 I am running Nexenta 3.0 NCP (b134+).

Try this:
   zpool upgrade
By default, it will just tell you the current versions of zpools, without
actually
doing any upgrades.  If your zpool is 19 or greater, then the loss of a ZIL
is not
fatal to the pool.  You should be able to zpool import and then you'll
see a
message about zpool import -F

If you have zpool  19, then it's lost.

BTW, just to make sure you know ... Having a ZIL in RAM makes no sense
whatsoever, except for academic purposes.  For a system in actual usage,
you
should either implement nonvolatile ZIL device, or disable ZIL (to be used
with
caution.)


Thanks Edward.

The syspool is sitting at level 18 so I assume the old pool is toast.  I was
more curious why nothing was working because there are reports that you can
do it, but it wasn't working for me.  

This system isn't in production, I was just testing to see if the zil was
being used or not.  

Have a great day!

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can you recover a pool if you lose the zil (b134+)

2010-05-16 Thread Geoff Nordli
I was messing around with a ramdisk on a pool and I forgot to remove it
before I shut down the server.  Now I am not able to mount the pool.  I am
not concerned with the data in this pool, but I would like to try to figure
out how to recover it.  

I am running Nexenta 3.0 NCP (b134+).

I have tried a couple of the commands (zpool import -f and zpool import -FX
llift) 

r...@zfs1:/export/home/gnordli# zpool import -f
  pool: llift
id: 15946357767934802606
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

lliftUNAVAIL  missing device
  mirror-0   ONLINE
c4t8d0   ONLINE
c4t9d0   ONLINE
  mirror-1   ONLINE
c4t10d0  ONLINE
c4t11d0  ONLINE

Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.


r...@zfs1:/export/home/gnordli# zpool import -FX llift
cannot import 'llift': no such pool or dataset
Destroy and re-create the pool from
a backup source.



I do not have a copy of the zpool.cache file.

Any other commands I could try to recover it or is it just unrecoverable?  

Thanks,

Geoff 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-12 Thread Geoff Nordli


From: James C. McPherson [mailto:james.mcpher...@oracle.com]
Sent: Wednesday, May 12, 2010 2:28 AM

On 12/05/10 03:18 PM, Geoff Nordli wrote:

 I have been wondering what the compatibility is like on OpenSolaris.
 My perception is basic network driver support is decent, but storage
 controllers are more difficult for driver support.

Now wait just a minute. You're casting aspersions on stuff here without saying
what you're talking about, still less where you're getting your info from.

Be specific - put up, or shut up.

 My perception is if you are using external cards which you know work
 for networking and storage, then you should be alright.
 Am I out in left-field on this?

I believe you are talking through your hat.

 
James, it is not my intention to cast an aspersion in this thread.  I should 
have worded my reply differently instead of posting my perception, but I really 
didn't think I would get piled on for it.  This subject interests me because we 
are going to have customers deploy OpenSolaris on their own equipment and I 
have been concerned about compatibility.  
  
Is the MOBO/Chipset/CPU actually something to be worried about with OpenSolaris 
compatibility?   

I know this is not an all-encompassing list, but I got my hardware info from 
the Nexenta site 
(http://www.nexenta.com/corp/supported-hardware/hardware-supported-list) 
because that is the distro I started with.  

Have a great day!

Geoff 

 
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-11 Thread Geoff Nordli


-Original Message-
From: Brandon High [mailto:bh...@freaks.com]
Sent: Monday, May 10, 2010 5:56 PM

On Mon, May 10, 2010 at 3:53 PM, Geoff Nordli geo...@gnaa.net wrote:
 Doesn't this alignment have more to do with aligning writes to the
 stripe/segment size of a traditional storage array?  The articles I am

It is a lot like a stripe / segment size. If you want to think of it in
those terms,
you've got a segment of 512b (the iscsi block size) and a width of 16,
giving you
an 8k stripe size. Any write that is less than 8k will require a RMW cycle,
and any
write in multiples of 8k will do full stripe writes. If the write doesn't
start on an
8k boundary, you risk having writes span multiple underlying zvol blocks.


When using a zvol, you've essentially got $volblocksize sized physical
sectors, but
the initiator sees the 512b block size that the LUN is reporting. If you
don't block
align, you risk having a write straddle two zfs blocks. There may be some
benefit
to using a 4k volblocksize, but you'll use more time and space on block
checksums
and, etc in your zpool. I think 8k is a reasonable trade off.


If you're using the whole disk with zfs, you don't need to worry about it.
If you're
using fdisk partitions or slices, you need be a little more careful.


So...  as long as you use whole disks, set the volblocksize to a multiple of
the virtual machines file system allocation size, then you don't have to
worry about alignment/optimization with ZFS.  

Thanks again!! 

Geoff 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-11 Thread Geoff Nordli


On Behalf Of James C. McPherson
Sent: Tuesday, May 11, 2010 5:41 PM

On 12/05/10 10:32 AM, Michael DeMan wrote:
 I agree on the motherboard and peripheral chipset issue.

 This, and the last generation AMD quad/six core motherboards
  all seem to use the AMD SP56x0/SP5100 chipset, which I can't   find
much
information about support on for either OpenSolaris or FreeBSD.

If you can get the device driver detection utility to run on it, that will
give you a
reasonable idea.

 Another issue is the LSI SAS2008 chipset for SAS controller
  which is frequently offered as an onboard option for many motherboards
 as
well and still seems to be somewhat of a work in progress in   regards to
being
'production ready'.

What metric are you using for production ready ?
Are there features missing which you expect to see in the driver, or is it
just oh
noes, I haven't seen enough big customers with it ?


 
I have been wondering what the compatibility is like on OpenSolaris.  My
perception is basic network driver support is decent, but storage
controllers are more difficult for driver support.  

My perception is if you are using external cards which you know work for
networking and storage, then you should be alright.  

Am I out in left-field on this?

Thanks,

Geoff 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-10 Thread Geoff Nordli


-Original Message-
From: Brandon High [mailto:bh...@freaks.com]
Sent: Monday, May 10, 2010 9:55 AM

On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli geo...@gnaa.net wrote:
 I am looking at using 8K block size on the zfs volume.

8k is the default for zvols.


You are right, I didn't look at that property, and instead I was focused on
the record size property.  

 I was looking at the comstar iscsi settings and there is also a blk
 size configuration, which defaults as 512 bytes. That would make me
 believe that all of the IO will be broken down into 512 bytes which
 seems very inefficient.

I haven't done any tuning on my comstar volumes, and they're using 8k
blocks.
The setting is in the dataset's volblocksize parameter.

When I look at the stmfadm llift-lu -v  it shows me the block size of
512.  I am running NexentaCore 3.0 (b134+) .  I wonder if the default size
has changed with different versions.  


 It seems this value should match the file system allocation/cluster
 size in the VM, maybe 4K if you are using an ntfs file system.

You'll have more overhead using smaller volblocksize values, and get worse
compression (since compression is done on the block). If you have dedup
enabled, you'll create more entries in the DDT which can have pretty
disastrous
consequences on write performance.

Ensuring that your VM is block-aligned to 4k (or the guest OS's block
size) boundaries will help performance and dedup as well.

This is where I am probably the most confused l need to get straightened in
my mind.  I thought dedup and compression is done on the record level.  

As long as you are using a multiple of the file system block size, then
alignment shouldn't be a problem with iscsi based zvols.  When using a zvol
comstar stores the metadata in a zvol object; instead of the first part of
the volume. 

As Roy pointed out, you have to be careful on the record size because DDT
and L2ARC lists consuming lots of RAM.  

But it seems you have four things to look at:

File system block size - Iscsi blk size - zvol block size - zvol record
size.  

What is the relationship between iscsi blk size and zvol block size?

What is the relationship between zvol block size and zvol record size?

Thanks,

Geoff 








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and Comstar iSCSI BLK size

2010-05-09 Thread Geoff Nordli
I am using ZFS as the backing store for an iscsi target running a virtual
machine.

 

I am looking at using 8K block size on the zfs volume.  

 

I was looking at the comstar iscsi settings and there is also a blk size
configuration, which defaults as 512 bytes. That would make me believe that
all of the IO will be broken down into 512 bytes which seems very
inefficient.  

 

It seems this value should match the file system allocation/cluster size in
the VM, maybe 4K if you are using an ntfs file system. 

 

Does anyone have any input on this?

 

Thanks,

 

Geoff 

 

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-23 Thread Geoff Nordli
-Original Message-
From: Ross Walker [mailto:rswwal...@gmail.com]
Sent: Friday, April 23, 2010 7:08 AM

 We are currently porting over our existing Learning Lab Infrastructure
 platform from MS Virtual Server to VBox + ZFS.  When students
 connect into
 their lab environment it dynamically creates their VMs and load
 balances
 them across physical servers.

You can also check out OpenSolaris' Xen implementation, which if you
use Linux VMs will allow PV VMs as well as hardware assisted full
virtualized Windows VMs. There are public domain Windows Xen drivers
out there.

The advantage of using Xen is it's VM live migration and XMLRPC
management API. As it runs as a bare metal hypervisor it also allows
fine granularity of CPU schedules, between guests and the host VM, but
unfortunately it's remote display technology leaves something to be
desired. For Windows VMs I use the built-in remote desktop, and for
Linux VMs I use XDM and use something like 'thinstation' on the client
side.

-Ross

Hi Ross.

We decided to use a hosted hypervisor like VirtualBox because our customers
use a variety of different platforms and they don't run high end workloads.
We want a lot of flexibility on configuration and OS support (both host and
guest).

Remote control is a challenge.   In our scenario students are going to spin
up exact copies of a lab environment and we need to isolate their machines
in separate networks so you can't directly connect to the VMs.  We don't
know what Guest OS they are going to run so we can't rely on the guest OS
remote control tools.  We want students to be able to have console access
and they need to be able to share it out with an instructor.  We want
students to be able to connect from any type of device.  We don't want to
rely on other connection broker software to coordinate access.   VirtualBox
is great because it provides console level access via RDP.  RDP performs
well enough and is pretty much on everything. 

This is probably getting a bit off topic now :) 

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-22 Thread Geoff Nordli
From: Ross Walker [mailto:rswwal...@gmail.com]
Sent: Thursday, April 22, 2010 6:34 AM

On Apr 20, 2010, at 4:44 PM, Geoff Nordli geo...@grokworx.com wrote:


If you combine the hypervisor and storage server and have students
connect to the VMs via RDP or VNC or XDM then you will have the
performance of local storage and even script VirtualBox to take a
snapshot right after a save state.

A lot less difficult to configure on the client side, and allows you
to deploy thin clients instead of full desktops where you can get away
with it.

It also allows you to abstract the hypervisor from the client.

Need a bigger storage server with lots of memory, CPU and storage
though.

Later, if need be, you can break out the disks to a storage appliance
with an 8GB FC or 10Gbe iSCSI interconnect.


Right, I am in the process now of trying to figure out what the load looks
like with a central storage box and how ZFS needs to be configured to
support that load.  So far what I am seeing is very exciting :)   

We are currently porting over our existing Learning Lab Infrastructure
platform from MS Virtual Server to VBox + ZFS.  When students connect into
their lab environment it dynamically creates their VMs and load balances
them across physical servers.  

Geoff 



  




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-21 Thread Geoff Nordli
From: matthew patton [mailto:patto...@yahoo.com]
Sent: Tuesday, April 20, 2010 12:54 PM

Geoff Nordli geo...@grokworx.com wrote:

 With our particular use case we are going to do a save
 state on their
 virtual machines, which is going to write  100-400 MB
 per VM via CIFS or
 NFS, then we take a snapshot of the volume, which
 guarantees we get a
 consistent copy of their VM.

maybe you left out a detail or two but I can't see how your ZFS snapshot
is going to be consistent UNLESS every VM on that ZFS volume is
prevented from doing any and all I/O from the time it finishes save
state and you take your ZFS snapshot.

If by save state you mean something akin to VMWare's disk snapshot,
why would you even bother with a ZFS snapshot in addition?


We are using VirtualBox as our hypervisor.  When it does a save state it
generates a memory file.  The memory file plus the volume snapshot creates a
consistent state.  

In our platform each student's VM points to a unique backend volume via
iscsi using VBox's built-in iscsi initiator.  So there is a one-to-one
relationship between VM and Volume.  Just for clarity, a single VM could
have multiple disks attached to it.  In that scenario, then a VM would have
multiple volumes.  


 end we could have
 maybe 20-30 VMs getting saved at the same time, which could
 mean several GB
 of data would need to get written in a short time frame and
 would need to
 get committed to disk.

 So it seems the best case would be to get those save
 state writes as sync
 and get them into a ZIL.

That I/O pattern is vastly 32kb and so will hit the 'rust' ZIL (which
ALWAYS exists) and if you were thinking an SSD would help you, I don't
see any/much evidence it will buy you anything.



If I set the logbias (b122) to latency, then it will direct all sync IO to
the log device, even if it exceeds the zfs_immediate_write_sz threshold.  


Have great day!

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-20 Thread Geoff Nordli
From: Richard Elling [mailto:richard.ell...@gmail.com]
Sent: Monday, April 19, 2010 10:17 PM

Hi Geoff,
The Canucks have already won their last game of the season :-)
more below...


Hi Richard, 
I didn't watch the game last night, but obviously Vancouver better pick up
their socks or they will be joining San Jose on the sidelines.  With Ottawa,
Montreal on the way out too, it could be a tough spring for Canadian hockey
fans.  


On Apr 18, 2010, at 11:21 PM, Geoff Nordli wrote:

 Hi Richard.

 Can you explain in a little bit more detail how this process works?

 Let's say you are writing from a remote virtual machine via an iscsi
target
 set for async writes and I take a snapshot of that volume.

 Are you saying any outstanding writes for that volume will need to be
 written to disk before the snapshot happens?

Yes.

That is interesting, so if your system is under write load and you are doing
snapshots it could lead to problems.  I was thinking writes wouldn't be an
issue because they would be lazily written. 

With our particular use case we are going to do a save state on their
virtual machines, which is going to write  100-400 MB per VM via CIFS or
NFS, then we take a snapshot of the volume, which guarantees we get a
consistent copy of their VM.  When a class came to and end we could have
maybe 20-30 VMs getting saved at the same time, which could mean several GB
of data would need to get written in a short time frame and would need to
get committed to disk.  

So it seems the best case would be to get those save state writes as sync
and get them into a ZIL.  Would you agree with that? 


I'm glad you enjoyed it.  I'm looking forward to Vegas next week and
there
are some seats still open.
 -- richard

I would love to go to Vegas, but I need to work on getting our new product
out the door.

Enjoy yourself in Vegas next week!

Geoff  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots and Data Loss

2010-04-19 Thread Geoff Nordli
On Apr 13, 2010, at 5:22 AM, Tony MacDoodle wrote:

 I was wondering if any data was lost while doing a snapshot on a
running system?

ZFS will not lose data during a snapshot.

 Does it flush everything to disk or would some stuff be lost?

Yes, all ZFS data will be committed to disk and then the snapshot
is taken.


Hi Richard.

Can you explain in a little bit more detail how this process works?  

Let's say you are writing from a remote virtual machine via an iscsi target
set for async writes and I take a snapshot of that volume.  

Are you saying any outstanding writes for that volume will need to be
written to disk before the snapshot happens?  

Setting the target to sync writes and using a ZIL might have better
performance, if you were doing a lot of snapshots. 

I know there is a potential to lose data with async set target, but the
virtual machines running on the system are just lab machines using
non-production data.

BTW, great Nexenta / ZFS class in Atlanta.  Thanks for getting me on the
right track!!

Geoff 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss