Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread erik.ableson

On 19 nov. 2010, at 03:53, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 
 SAS Controller
 and all ZFS Disks/ Pools are passed-through to Nexenta to have full
 ZFS-Disk
 control like on real hardware. 
 
 This is precisely the thing I'm interested in.  How do you do that?  On my
 ESXi (test) server, I have a solaris ZFS VM.  When I configure it... and add
 disk ... my options are (a) create a new virtual disk (b) use an existing
 virtual disk, or (c) (grayed out) raw device mapping.  There is a comment
 Give your virtual machine direct access to a SAN.  So I guess it only is
 available if you have some iscsi target available...
 
 But you seem to be saying ... don't add the disks individually to the ZFS
 VM.  You seem to be saying...  Ensure the bulk storage is on a separate
 sas/scsi/sata controller from the ESXi OS...  And then add the sas/scsi/sata
 PCI device to the guest, which will implicitly get all of the disks.  Right?
 
 Or maybe ... the disks have to be scsi (sas)?  And then you can add the scsi
 device directly pass-thru?

As mentioned by Will, you'll need to use the VMDirectPath which allows you to 
map a hardware device (the disk controller) directly to the VM without passing 
through the VMware managed storage stack. Note that you are presenting the 
hardware directly so it needs to be a compatible controller.

You'll need two controllers in the server since ESXi needs at least one disk 
that it controls to be formatted a VMFS to hold some of its files as well as 
the .vmx configuration files for the VM that will host the storage (and the 
swap file so it's got to be at least as large as the memory you plan to assign 
to the VM). Caveats - while you can install ESXi onto a USB drive, you can't 
manually format a USB drive as VMFS so for best performance you'll want at 
least one SATA or SAS controller that you can leave controlled by ESXi and the 
second controller where the bulk of the storage is attached for the ZFS VM.

As far as the eggs in one basket issue goes, you can either use a clustering 
solution like the Nexenta HA between two servers and then you have a highly 
available storage solution based on two servers that can also run your VMs or 
for a more manual failover, just use zfs send|recv to replicate the data.

You can also accomplish something similar if you have only the one controller 
by manually created local Raw Device Maps of the local disks and presenting 
them individually to the ZFS VM but you don't have direct access to the 
controller so I don't think stuff like blinking a drive will work in this 
configuration since you're not talking directly to the hardware. There's no UI 
for creating RDMs for local drives, but there's a good procedure over at 
http://www.vm-help.com/esx40i/SATA_RDMs.php which explains the technique.

From a performance standpoint it works really well - I have NFS hosted VMs in 
this configuration getting 396Mo/s throughput on simple dd tests backed by 10 
zfs mirrored disks, all protected with hourly send|recv to a second box.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Günther
hmmm  br
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide br br
Disabling the ZIL (Don't)  br
Caution: Disabling the ZIL on an NFS server can lead to client side corruption. 
The ZFS pool integrity itself is not compromised by this tuning.  brbr
so especially with nfs i won`t disable it.brbr
its better to add ssd read/write caches or use ssd-only pools. we use spindels 
for backups or test-server. out main vms are all on ssd-pools (striped raid1 
build of 120 GB sandforce based mlc drives, about 190 euro each) brbr
we do not use slc, i suppose mlc are good enough for the next three (the 
warranty-time). we will change them after this.

about integrated storage in vmware:
 i have some infos on my homepage about our solution br
http://www.napp-it.org/napp-it/all-in-one/index_en.html  br

gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-11-19 Thread Darren J Moffat

On 19/11/2010 00:39, David Magda wrote:

On Nov 16, 2010, at 05:09, Darren J Moffat wrote:


Both CCM[1] and GCM[2] are provided so that if one turns out to have
flaws hopefully the other will still be available for use safely even
though they are roughly similar styles of modes.

On systems without hardware/cpu support for Galios multiplication
(Intel Westmere and later and SPARC T3 and later) GCM will be slower
because the Galios field multiplication has to happen in software
without any hardware/cpu assist. However depending on your workload
you might not even notice the difference.


Both modes of operation are authenticating. At one point the design of
ZFS crypto had the checksum automatically go to SHA-256 when it was
enabled. [1] Is SHA activation still the case, or are the two modes of
operations simply used in themselves to verify data integrity?


That is still the case, the blockpointer contains the IV, the SHA256 
checksum (truncated) and the MAC from CCM and GCM.



Also, are slog and cache devices encrypted at this time? Given a pool,
and the fact that only particular data sets on it could be encrypted,
would these special devices be entirely encrypted, or only data from the
particular encrypted data set/s? I would also assume the in-memory ARC
would be clear-text.


The ZIL wither it is in pool or on a slog is always encrypted for an 
encrypted dataset, it is encrypted in exactly the same way.


Data from encrypted datasets does not currently go to the L2ARC cache 
devices.


The in memory ARC is in the clear and it has to be because those buffers 
can be shared via zero copy means to other parts of the system including 
other filesystems like NFS and CIFS.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-11-19 Thread Darren J Moffat
The design for ZFS crypto was done in the open via opensolaris.org and 
versions of the source (though not the final version at this time) are 
available on opensolaris.org.


It was reviewed by internal and external to Sun/Oracle people who have 
considerable crypto experience.  Important parts of the cryptography 
design were also discussed on other archived public forums as well as 
zfs-crypto-discuss.


The design was also presented at IEEE 1619 SISWG and at SNIA.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Edward Ned Harvey
 From: Saxon, Will [mailto:will.sa...@sage.com]
 
 In order to do this, you need to configure passthrough for the device at
the
 host level (host - configuration - hardware - advanced settings). This

Awesome.  :-)
The only problem is that once a device is configured to pass-thru to the
guest VM, then that device isn't available for the host anymore.  So you
have to have your boot disks on a separate controller from the primary
storage disks that are pass-thru to the guest ZFS server.

For a typical ... let's say dell server ... that could be a problem.  The
boot disks would need to hold ESXi plus a ZFS server and then you can
pass-thru the primary hotswappable storage HBA to the ZFS guest.  Then the
ZFS guest can export its storage back to the ESXi host via NFS or iSCSI...
So all the remaining VM's can be backed by ZFS.  Of course you have to
configure ESXi to boot the ZFS guest before any of the other guests.

The problem is just the boot device.  One option is to boot from a USB
dongle, but that's unattractive for a lot of reasons.  Another option would
be a PCIe storage device, which isn't too bad an idea.  Anyone using PXE to
boot ESXi?

Got any other suggestions?  In a typical dell server, there is no place to
put a disk, which isn't attached via the primary hotswappable storage HBA.
I suppose you could use a 1U rackmount server with only 2 internal disks,
and add a 2nd HBA with external storage tray, to use as pass-thru to the ZFS
guest.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of VO
 
 How to accomplish ESXi 4 raw device mapping with SATA at least:
 http://www.vm-help.com/forum/viewtopic.php?f=14t=1025

It says:  
You can pass-thru individual disks, if you have SCSI, but you can't
pass-thru individual SATA disks.
I don't have any way to verify this, but it seems unlikely... since SAS and
SATA are interchangeable.  (Sort of.)  I know I have a dell server, with a
few SAS disks plugged in, and a few SATA disks plugged in.  Maybe the
backplane is doing some kind of magic?  But they're all presented to the OS
by the HBA, and the OS has no way of knowing if the disks are actually SAS
or SATA...  As far as I know.

It also says:
You can pass-thru PCI SATA controller, but the entire controller must be
given to the guest.
This I have confirmed.  I have an ESXi server with eSATA controller and
external disk attached.  One reboot was required in order to configure the
pass-thru.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of VO

 This sounds interesting as I have been thinking something similar but
never
 implemented it because all the eggs would be in the same basket. If you
 don't mind me asking for more information:
 Since you use Mapped Raw LUNs don't you lose HA/fault tolerance on the
 storage servers as they cannot be moved to another host?

There is at least one situation I can imagine, where you wouldn't care.

At present, I have a bunch of Linux servers, with local attached disk.  I
often wish I could run ZFS on Linux.  You could install ESXi, Linux, and a
ZFS server all into the same machine.  You could export the ZFS filesystem
to the Linux system via NFS.  Since the network interfaces are all virtual,
you should be able to achieve near-disk speed from the Linux client, and you
should have no problem doing snapshots  zfs send  all the other features
of ZFS.

I'd love to do a proof of concept... Or hear that somebody has.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Edward Ned Harvey
 From: Gil Vidals [mailto:gvid...@gmail.com]
 
 connected to my ESXi hosts using 1 gigabit switches and network cards: The
 speed is very good as can be seen by IOZONE tests:
 
 KB  reclen   write rewrite    read    reread
 512000  32    71789   76155    94382   101022
 512000 1024   75104   69860    64282    58181
 1024000    1024   66226   60451    65974    61884
 
 These speeds were achieved by:
 
 1) Turning OFF ZIL Cache (write cache)
 2) Using SSD drives for L2ARC (read cache)
 3) Use NFSv3 as NFSv4 isn't supported by ESXi version 4.0.

I have the following results using local disk.  ZIL enabled, no SSD, HBA
writeback enabled.
KB  reclen   write rewritereadreread 
524288  64  189783  200303  2827021  2847086

5242881024  201472  201837  3094348  3100793

10485761024  201883  201154  3076932  3087206


So ... I think your results were good relative to a 1Gb interface, but I
think you're severely limited by the 1Gb as compared to local disk.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Günther
 
 br br Disabling the ZIL (Don't)  br

This is relative.  There are indeed situations where it's acceptable to
disable ZIL.  To make your choice, you need to understand a few things...

#1  In the event of an ungraceful reboot, with your ZIL disabled, after
reboot, your filesystem will be in a valid state, which is not the latest
point of time before the crash.  Your filesystem will be valid, but you will
lose up to 30 seconds of the latest writes leading up to the crash.
#2  Even if you have ZIL enabled, all of the above statements still apply to
async writes.  The ZIL only provides nonvolatile storage for sync writes.

Given these facts, it quickly becomes much less scary to disable the ZIL,
depending on what you use your server for.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread erik.ableson

On 19 nov. 2010, at 15:04, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Günther
 
 br br Disabling the ZIL (Don't)  br
 
 This is relative.  There are indeed situations where it's acceptable to
 disable ZIL.  To make your choice, you need to understand a few things...
 
 #1  In the event of an ungraceful reboot, with your ZIL disabled, after
 reboot, your filesystem will be in a valid state, which is not the latest
 point of time before the crash.  Your filesystem will be valid, but you will
 lose up to 30 seconds of the latest writes leading up to the crash.
 #2  Even if you have ZIL enabled, all of the above statements still apply to
 async writes.  The ZIL only provides nonvolatile storage for sync writes.
 
 Given these facts, it quickly becomes much less scary to disable the ZIL,
 depending on what you use your server for.

Not to mention that in this particular scenario (local storage, local VM, 
loopback to ESXi) where the NFS server is only publishing to the local host, if 
the local host crashes, there are no other NFS clients involved that have local 
caches that will be out of sync with the storage.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing log devices takes ages

2010-11-19 Thread Bryan Horstmann-Allen
Disclaimer: Solaris 10 U8.

I had an SSD die this morning and am in the process of replacing the 1GB
partition which was part of a log mirror. The SSDs do nothing else.

The resilver has been running for ~30m, and suggests it will finish sometime
before Elvis returns from Andromeda, though perhaps only just barely (we'll
probably have to run to the airport to meet him at security).

 scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go
 scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go
 scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go
 scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go
 scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go
 scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go
 scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go
 scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go
 scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go
 scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go
 scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go

According to zpool iostat -v, the log contains ~900k of data on it.

The disks are not particularly busy (c0t3d0 is the replacing disk):

# iostat -xne c0t3d0 c0t5d0 5
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.20.10.65.0  0.0  0.00.05.7   0   0   0   0   0   0 
c0t3d0
5.3   52.3   68.2 1694.1  0.0  0.20.04.2   0   2   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
3.0  112.60.9 6064.0  0.0  0.10.00.8   0   9   0   0   0   0 
c0t3d0
6.4  118.8   39.5 6519.7  0.0  0.00.00.3   0   3   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
1.0   50.20.3 5068.8  0.0  1.40.0   27.5   0   6   0   0   0   0 
c0t3d0
   36.0   61.8  534.1 5921.6  0.0  0.50.05.5   0   6   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.0   58.00.0 1590.4  0.0  0.00.00.8   0   3   0   0   0   0 
c0t3d0
   39.2   67.0  651.3 1884.9  0.0  0.00.00.5   0   3   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.0   23.40.0  678.3  0.0  0.00.00.4   0   1   0   0   0   0 
c0t3d0
   11.8   30.6  135.0 1025.4  0.0  0.00.00.3   0   1   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.0   20.20.0 1045.0  0.0  0.00.01.2   0   1   0   0   0   0 
c0t3d0
   14.8   25.8  131.9 1335.7  0.0  0.00.00.4   0   1   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.0   33.00.0 2029.6  0.0  0.10.01.9   0   2   0   0   0   0 
c0t3d0
1.8   37.6   37.9 2107.0  0.0  0.00.00.6   0   1   2   0   0   2 
c0t5d0
extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.0   21.20.0  797.6  0.0  0.00.00.7   0   1   0   0   0   0 
c0t3d0
   12.2   22.8  111.9  823.2  0.0  0.00.00.4   0   1   2   0   0   2 
c0t5d0

My question is twofold:

Why do log mirrors need to resilver at all?

Why does this seem like it's going to take a full day, if I'm lucky?

(If the answer is: Shut up and upgrade, that's fine.)

Cheers.
-- 
bdha
cyberpunk is dead. long live cyberpunk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Günther
i have the same problem with my 2HE supermicro server (24x2,5, connected via 
6x mini SAS 8087) and no additional mounting possibilities for 2,5 or 3,5 
drives.
brbr
on those machines i use one sas port (4 drives) of an old adaptec 3805 (i have 
used them in my pre zfs-times) to build a raid-1 + hotfix for esxi to boot 
from. the other 20 slots are connected to 3 lsi sas controller for pass-through 
- so i have 4 sas controller in these machines. 
brbr
maybee the new ssd-drives mounted on a pci-e (ex ocz revo drive) may be an 
alternative. have anyone used them already with esxi?
brbr
gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Mark Little


On Fri, 19 Nov 2010 07:16:20 PST, Günther wrote:

i have the same problem with my 2HE supermicro server (24x2,5,
connected via 6x mini SAS 8087) and no additional mounting
possibilities for 2,5 or 3,5 drives.
brbr
on those machines i use one sas port (4 drives) of an old adaptec
3805 (i have used them in my pre zfs-times) to build a raid-1 + 
hotfix

for esxi to boot from. the other 20 slots are connected to 3 lsi sas
controller for pass-through - so i have 4 sas controller in these
machines.
brbr
maybee the new ssd-drives mounted on a pci-e (ex ocz revo drive) may
be an alternative. have anyone used them already with esxi?
brbr
gea



Hey - just as a side note..

Depending on what motherboard you use, you may be able to use this:  
MCP-220-82603-0N - Dual 2.5 fixed HDD tray kit for SC826 (for E-ATX X8 
DP MB)


I haven't used one yet myself but am currently planning a SMC build and 
contacted their support as I really did not want to have my system 
drives hanging off the controller.  As far as I can tell from a picture 
they sent, it mounts on top of the motherboard itself somewhere where 
there is normally open space, and it can hold two 2.5 drives.  So maybe 
give in touch with their support and see if you can use something 
similar.



Cheers,
Mark


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing log devices takes ages

2010-11-19 Thread Khushil Dep
I'm not sure that leaving the ZIL enabled whilst replacing the log devices
is a good idea?

Also - I had no idea Elvis was coming back tomorrow! Sweet. ;-)

---
W. A. Khushil Dep - khushil@gmail.com -  07905374843

Visit my blog at http://www.khushil.com/






On 19 November 2010 14:57, Bryan Horstmann-Allen b...@mirrorshades.netwrote:

 Disclaimer: Solaris 10 U8.

 I had an SSD die this morning and am in the process of replacing the 1GB
 partition which was part of a log mirror. The SSDs do nothing else.

 The resilver has been running for ~30m, and suggests it will finish
 sometime
 before Elvis returns from Andromeda, though perhaps only just barely (we'll
 probably have to run to the airport to meet him at security).

  scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go
  scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go
  scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go
  scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go
  scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go
  scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go
  scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go
  scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go
  scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go
  scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go
  scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go

 According to zpool iostat -v, the log contains ~900k of data on it.

 The disks are not particularly busy (c0t3d0 is the replacing disk):

 # iostat -xne c0t3d0 c0t5d0 5
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.20.10.65.0  0.0  0.00.05.7   0   0   0   0   0   0
 c0t3d0
5.3   52.3   68.2 1694.1  0.0  0.20.04.2   0   2   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
3.0  112.60.9 6064.0  0.0  0.10.00.8   0   9   0   0   0   0
 c0t3d0
6.4  118.8   39.5 6519.7  0.0  0.00.00.3   0   3   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
1.0   50.20.3 5068.8  0.0  1.40.0   27.5   0   6   0   0   0   0
 c0t3d0
   36.0   61.8  534.1 5921.6  0.0  0.50.05.5   0   6   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.0   58.00.0 1590.4  0.0  0.00.00.8   0   3   0   0   0   0
 c0t3d0
   39.2   67.0  651.3 1884.9  0.0  0.00.00.5   0   3   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.0   23.40.0  678.3  0.0  0.00.00.4   0   1   0   0   0   0
 c0t3d0
   11.8   30.6  135.0 1025.4  0.0  0.00.00.3   0   1   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.0   20.20.0 1045.0  0.0  0.00.01.2   0   1   0   0   0   0
 c0t3d0
   14.8   25.8  131.9 1335.7  0.0  0.00.00.4   0   1   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.0   33.00.0 2029.6  0.0  0.10.01.9   0   2   0   0   0   0
 c0t3d0
1.8   37.6   37.9 2107.0  0.0  0.00.00.6   0   1   2   0   0   2
 c0t5d0
extended device statistics    errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot
 device
0.0   21.20.0  797.6  0.0  0.00.00.7   0   1   0   0   0   0
 c0t3d0
   12.2   22.8  111.9  823.2  0.0  0.00.00.4   0   1   2   0   0   2
 c0t5d0

 My question is twofold:

 Why do log mirrors need to resilver at all?

 Why does this seem like it's going to take a full day, if I'm lucky?

 (If the answer is: Shut up and upgrade, that's fine.)

 Cheers.
 --
 bdha
 cyberpunk is dead. long live cyberpunk.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Saxon, Will

 -Original Message-
 From: Edward Ned Harvey [mailto:sh...@nedharvey.com] 
 Sent: Friday, November 19, 2010 8:03 AM
 To: Saxon, Will; 'Günther'; zfs-discuss@opensolaris.org
 Subject: RE: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
 
  From: Saxon, Will [mailto:will.sa...@sage.com]
  
  In order to do this, you need to configure passthrough for 
 the device at
 the
  host level (host - configuration - hardware - advanced 
 settings). This
 
 Awesome.  :-)
 The only problem is that once a device is configured to 
 pass-thru to the
 guest VM, then that device isn't available for the host 
 anymore.  So you
 have to have your boot disks on a separate controller from the primary
 storage disks that are pass-thru to the guest ZFS server.
 
 For a typical ... let's say dell server ... that could be a 
 problem.  The
 boot disks would need to hold ESXi plus a ZFS server and then you can
 pass-thru the primary hotswappable storage HBA to the ZFS 
 guest.  Then the
 ZFS guest can export its storage back to the ESXi host via 
 NFS or iSCSI...
 So all the remaining VM's can be backed by ZFS.  Of course you have to
 configure ESXi to boot the ZFS guest before any of the other guests.
 
 The problem is just the boot device.  One option is to boot from a USB
 dongle, but that's unattractive for a lot of reasons.  
 Another option would
 be a PCIe storage device, which isn't too bad an idea.  
 Anyone using PXE to
 boot ESXi?
 
 Got any other suggestions?  In a typical dell server, there 
 is no place to
 put a disk, which isn't attached via the primary hotswappable 
 storage HBA.
 I suppose you could use a 1U rackmount server with only 2 
 internal disks,
 and add a 2nd HBA with external storage tray, to use as 
 pass-thru to the ZFS
 guest.

Well, with 4.1 ESXi does support boot from SAN. I guess that still presents a 
chicken-and-egg problem in this scenario, but maybe you have another san 
somewhere you can boot from.

Also, most of the big name vendors have a USB or SD option for booting ESXi. I 
believe this is the 'ESXi Embedded' flavor vs. the typical 'ESXi Installable' 
that we're used to. I don't think it's a bad idea at all. I've got a 
not-quite-production system I'm booting off USB right now, and while it takes a 
really long time to boot it does work. I think I like the SD card option better 
though.

What I am wondering is whether this is really worth it. Are you planning to 
share the storage out to other VM hosts, or are all the VMs running on the host 
using the 'local' storage? I know we like ZFS vs. traditional RAID and volume 
management, and I get that being able to boot any ZFS-capable OS is good for 
disaster recovery, but what I don't get is how this ends up working better than 
a larger dedicated ZFS system and a storage network. Is it cheaper over several 
hosts? Are you getting better performance through e.g. the vmxnet3 adapter and 
NFS than you would just using the disks directly?

-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Deduped zfs streams broken in post b134 ?

2010-11-19 Thread evaldas
Hi,

Here is a small script to test deduped zfs send stream:

=
#!/bin/bash
ZFSPOOL=rpool
ZFSDATASET=zfs-send-dedup-test
dd if=/dev/random of=/var/tmp/testfile1 bs=512 count=10
zfs create $ZFSPOOL/$ZFSDATASET
cp /var/tmp/testfile1 /$ZFSPOOL/$ZFSDATASET/testfile1
zfs snapshot $ZFSPOOL/$zfsdata...@snap1
cp /var/tmp/testfile1 /$ZFSPOOL/$ZFSDATASET/testfile2
zfs snapshot $ZFSPOOL/$zfsdata...@snap2
zfs send -D -R $ZFSPOOL/$zfsdata...@snap2  /var/tmp/ddtest-snap2.zfs
zfs destroy -r $ZFSPOOL/$ZFSDATASET
zfs receive -Fv $ZFSPOOL/$ZFSDATASET  /var/tmp/ddtest-snap2.zfs
=

It works in OpenSolaris b134, but not in OpenIndiana b147, nor Solaris Express 
11, where zfs receive exists on second incremental snapshot with error message:

cannot receive incremental stream: invalid backup stream

Does it look like a bug ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduped zfs streams broken in post b134 ?

2010-11-19 Thread evaldas
Sry, the script was cut off, ending part is:

mp/ddtest-snap2.zfs
=

It works in OpenSolaris b134, but not in OpenIndiana b147, nor Solaris Express 
11, where zfs receive exists on second incremental snapshot with error message:

cannot receive incremental stream: invalid backup stream

Does it look like a bug ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread Günther
 Also, most of the big name vendors have a USB or SD
 option for booting ESXi. I believe this is the 'ESXi
 Embedded' flavor vs. the typical 'ESXi Installable'
 that we're used to. I don't think it's a bad idea at
 all. I've got a not-quite-production system I'm
 booting off USB right now, and while it takes a
 really long time to boot it does work. I think I like
 the SD card option better though.

i need 4gb extra space for the Nexenta zfs storage server. 
and it should not be as slow as a usb stick or management 
via web-gui is painfull slow.

 
 What I am wondering is whether this is really worth
 it. Are you planning to share the storage out to
 other VM hosts, or are all the VMs running on the
 host using the 'local' storage? I know we like ZFS
 vs. traditional RAID and volume management, and I get
 that being able to boot any ZFS-capable OS is good
 for disaster recovery, but what I don't get is how
 this ends up working better than a larger dedicated
 ZFS system and a storage network. Is it cheaper over
 several hosts? Are you getting better performance
 through e.g. the vmxnet3 adapter and NFS than you
 would just using the disks directly?
 

mainly the storage is used via NFS for local vm's. but we share
the nfs datastores also via cifs to have a simple move/ clone/ copy
or backup. we also replicate datastores at least once per day to a second
machine via incremental zfs send.

we have or plan the same system on all of our esxi 
machines. each esxi has its own local san-like storage 
server. (i do not like a to have one big san-storage to be
a single point of failure + high speed san cabling. so we have 
4 esxi server, each with its own virtualized zfs-storage server + 
three common used backup systems - connected via 10Gbe VLAN). 

we formerly had separate storage and esxi server but with pass-through
we could integrate the two and reduce our hardware that could fail and cabling 
at a rate of 50%. 

 
gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss